May 7, 2019

3185 words 15 mins read

Paper Group AWR 30

Paper Group AWR 30

Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics. Overcoming catastrophic forgetting in neural networks. Synthesized Classifiers for Zero-Shot Learning. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. Hierarchical Question-Image Co-Attention for Visual Question Answering. Spatially …

Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics

Title Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics
Authors Xi Wu, Fengan Li, Arun Kumar, Kamalika Chaudhuri, Somesh Jha, Jeffrey F. Naughton
Abstract While significant progress has been made separately on analytics systems for scalable stochastic gradient descent (SGD) and private SGD, none of the major scalable analytics frameworks have incorporated differentially private SGD. There are two inter-related issues for this disconnect between research and practice: (1) low model accuracy due to added noise to guarantee privacy, and (2) high development and runtime overhead of the private algorithms. This paper takes a first step to remedy this disconnect and proposes a private SGD algorithm to address \emph{both} issues in an integrated manner. In contrast to the white-box approach adopted by previous work, we revisit and use the classical technique of {\em output perturbation} to devise a novel “bolt-on” approach to private SGD. While our approach trivially addresses (2), it makes (1) even more challenging. We address this challenge by providing a novel analysis of the $L_2$-sensitivity of SGD, which allows, under the same privacy guarantees, better convergence of SGD when only a constant number of passes can be made over the data. We integrate our algorithm, as well as other state-of-the-art differentially private SGD, into Bismarck, a popular scalable SGD-based analytics system on top of an RDBMS. Extensive experiments show that our algorithm can be easily integrated, incurs virtually no overhead, scales well, and most importantly, yields substantially better (up to 4X) test accuracy than the state-of-the-art algorithms on many real datasets.
Tasks
Published 2016-06-15
URL http://arxiv.org/abs/1606.04722v3
PDF http://arxiv.org/pdf/1606.04722v3.pdf
PWC https://paperswithcode.com/paper/bolt-on-differential-privacy-for-scalable
Repo https://github.com/sunblaze-ucb/dpml-benchmark
Framework tf

Overcoming catastrophic forgetting in neural networks

Title Overcoming catastrophic forgetting in neural networks
Authors James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, Raia Hadsell
Abstract The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Neural networks are not, in general, capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks which they have not experienced for a long time. Our approach remembers old tasks by selectively slowing down learning on the weights important for those tasks. We demonstrate our approach is scalable and effective by solving a set of classification tasks based on the MNIST hand written digit dataset and by learning several Atari 2600 games sequentially.
Tasks Atari Games
Published 2016-12-02
URL http://arxiv.org/abs/1612.00796v2
PDF http://arxiv.org/pdf/1612.00796v2.pdf
PWC https://paperswithcode.com/paper/overcoming-catastrophic-forgetting-in-neural
Repo https://github.com/daskol/paper-reviews
Framework none

Synthesized Classifiers for Zero-Shot Learning

Title Synthesized Classifiers for Zero-Shot Learning
Authors Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, Fei Sha
Abstract Given semantic descriptions of object classes, zero-shot learning aims to accurately recognize objects of the unseen classes, from which no examples are available at the training stage, by associating them to the seen classes, from which labeled examples are provided. We propose to tackle this problem from the perspective of manifold learning. Our main idea is to align the semantic space that is derived from external information to the model space that concerns itself with recognizing visual features. To this end, we introduce a set of “phantom” object classes whose coordinates live in both the semantic space and the model space. Serving as bases in a dictionary, they can be optimized from labeled data such that the synthesized real object classifiers achieve optimal discriminative performance. We demonstrate superior accuracy of our approach over the state of the art on four benchmark datasets for zero-shot learning, including the full ImageNet Fall 2011 dataset with more than 20,000 unseen classes.
Tasks Zero-Shot Learning
Published 2016-03-02
URL http://arxiv.org/abs/1603.00550v3
PDF http://arxiv.org/pdf/1603.00550v3.pdf
PWC https://paperswithcode.com/paper/synthesized-classifiers-for-zero-shot
Repo https://github.com/JudyYe/zero-shot-gcn
Framework tf

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

Title XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
Authors Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi
Abstract We propose two efficient approximations to standard convolutional neural networks: Binary-Weight-Networks and XNOR-Networks. In Binary-Weight-Networks, the filters are approximated with binary values resulting in 32x memory saving. In XNOR-Networks, both the filters and the input to convolutional layers are binary. XNOR-Networks approximate convolutions using primarily binary operations. This results in 58x faster convolutional operations and 32x memory savings. XNOR-Nets offer the possibility of running state-of-the-art networks on CPUs (rather than GPUs) in real-time. Our binary networks are simple, accurate, efficient, and work on challenging visual tasks. We evaluate our approach on the ImageNet classification task. The classification accuracy with a Binary-Weight-Network version of AlexNet is only 2.9% less than the full-precision AlexNet (in top-1 measure). We compare our method with recent network binarization methods, BinaryConnect and BinaryNets, and outperform these methods by large margins on ImageNet, more than 16% in top-1 accuracy.
Tasks
Published 2016-03-16
URL http://arxiv.org/abs/1603.05279v4
PDF http://arxiv.org/pdf/1603.05279v4.pdf
PWC https://paperswithcode.com/paper/xnor-net-imagenet-classification-using-binary
Repo https://github.com/hpi-xnor/BMXNet-v2
Framework mxnet

Hierarchical Question-Image Co-Attention for Visual Question Answering

Title Hierarchical Question-Image Co-Attention for Visual Question Answering
Authors Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh
Abstract A number of recent works have proposed attention models for Visual Question Answering (VQA) that generate spatial maps highlighting image regions relevant to answering the question. In this paper, we argue that in addition to modeling “where to look” or visual attention, it is equally important to model “what words to listen to” or question attention. We present a novel co-attention model for VQA that jointly reasons about image and question attention. In addition, our model reasons about the question (and consequently the image via the co-attention mechanism) in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN). Our model improves the state-of-the-art on the VQA dataset from 60.3% to 60.5%, and from 61.6% to 63.3% on the COCO-QA dataset. By using ResNet, the performance is further improved to 62.1% for VQA and 65.4% for COCO-QA.
Tasks Visual Question Answering
Published 2016-05-31
URL http://arxiv.org/abs/1606.00061v5
PDF http://arxiv.org/pdf/1606.00061v5.pdf
PWC https://paperswithcode.com/paper/hierarchical-question-image-co-attention-for
Repo https://github.com/ShivaliGoel/Coattention_VQA_tf2
Framework tf

Spatially Adaptive Computation Time for Residual Networks

Title Spatially Adaptive Computation Time for Residual Networks
Authors Michael Figurnov, Maxwell D. Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry Vetrov, Ruslan Salakhutdinov
Abstract This paper proposes a deep learning architecture based on Residual Network that dynamically adjusts the number of executed layers for the regions of the image. This architecture is end-to-end trainable, deterministic and problem-agnostic. It is therefore applicable without any modifications to a wide range of computer vision problems such as image classification, object detection and image segmentation. We present experimental results showing that this model improves the computational efficiency of Residual Networks on the challenging ImageNet classification and COCO object detection datasets. Additionally, we evaluate the computation time maps on the visual saliency dataset cat2000 and find that they correlate surprisingly well with human eye fixation positions.
Tasks Image Classification, Object Detection, Semantic Segmentation
Published 2016-12-07
URL http://arxiv.org/abs/1612.02297v2
PDF http://arxiv.org/pdf/1612.02297v2.pdf
PWC https://paperswithcode.com/paper/spatially-adaptive-computation-time-for
Repo https://github.com/mfigurnov/sact
Framework tf

A Study of Vision based Human Motion Recognition and Analysis

Title A Study of Vision based Human Motion Recognition and Analysis
Authors Geetanjali Vinayak Kale, Varsha Hemant Patil
Abstract Vision based human motion recognition has fascinated many researchers due to its critical challenges and a variety of applications. The applications range from simple gesture recognition to complicated behaviour understanding in surveillance system. This leads to major development in the techniques related to human motion representation and recognition. This paper discusses applications, general framework of human motion recognition, and the details of each of its components. The paper emphasizes on human motion representation and the recognition methods along with their advantages and disadvantages. This study also discusses the selected literature, popular datasets, and concludes with the challenges in the domain along with a future direction. The human motion recognition domain has been active for more than two decades, and has provided a large amount of literature. A bird’s eye view for new researchers in the domain is presented in the paper.
Tasks Gesture Recognition
Published 2016-08-24
URL http://arxiv.org/abs/1608.06761v1
PDF http://arxiv.org/pdf/1608.06761v1.pdf
PWC https://paperswithcode.com/paper/a-study-of-vision-based-human-motion
Repo https://github.com/ndesale/BabyMonitoring
Framework none

Interactive Learning from Multiple Noisy Labels

Title Interactive Learning from Multiple Noisy Labels
Authors Shankar Vembu, Sandra Zilles
Abstract Interactive learning is a process in which a machine learning algorithm is provided with meaningful, well-chosen examples as opposed to randomly chosen examples typical in standard supervised learning. In this paper, we propose a new method for interactive learning from multiple noisy labels where we exploit the disagreement among annotators to quantify the easiness (or meaningfulness) of an example. We demonstrate the usefulness of this method in estimating the parameters of a latent variable classification model, and conduct experimental analyses on a range of synthetic and benchmark datasets. Furthermore, we theoretically analyze the performance of perceptron in this interactive learning framework.
Tasks
Published 2016-07-24
URL http://arxiv.org/abs/1607.06988v1
PDF http://arxiv.org/pdf/1607.06988v1.pdf
PWC https://paperswithcode.com/paper/interactive-learning-from-multiple-noisy
Repo https://github.com/svembu/ilearn
Framework none

Training an Interactive Humanoid Robot Using Multimodal Deep Reinforcement Learning

Title Training an Interactive Humanoid Robot Using Multimodal Deep Reinforcement Learning
Authors Heriberto Cuayáhuitl, Guillaume Couly, Clément Olalainty
Abstract Training robots to perceive, act and communicate using multiple modalities still represents a challenging problem, particularly if robots are expected to learn efficiently from small sets of example interactions. We describe a learning approach as a step in this direction, where we teach a humanoid robot how to play the game of noughts and crosses. Given that multiple multimodal skills can be trained to play this game, we focus our attention to training the robot to perceive the game, and to interact in this game. Our multimodal deep reinforcement learning agent perceives multimodal features and exhibits verbal and non-verbal actions while playing. Experimental results using simulations show that the robot can learn to win or draw up to 98% of the games. A pilot test of the proposed multimodal system for the targeted game—integrating speech, vision and gestures—reports that reasonable and fluent interactions can be achieved using the proposed approach.
Tasks
Published 2016-11-26
URL http://arxiv.org/abs/1611.08666v1
PDF http://arxiv.org/pdf/1611.08666v1.pdf
PWC https://paperswithcode.com/paper/training-an-interactive-humanoid-robot-using
Repo https://github.com/cuayahuitl/SimpleDS
Framework none

Iterative Inversion of Deformation Vector Fields with Feedback Control

Title Iterative Inversion of Deformation Vector Fields with Feedback Control
Authors Abhishek Kumar Dubey, Alexandros-Stavros Iliopoulos, Xiaobai Sun, Fang-Fang Yin, Lei Ren
Abstract Purpose: Often, the inverse deformation vector field (DVF) is needed together with the corresponding forward DVF in 4D reconstruction and dose calculation, adaptive radiation therapy, and simultaneous deformable registration. This study aims at improving both accuracy and efficiency of iterative algorithms for DVF inversion, and advancing our understanding of divergence and latency conditions. Method: We introduce a framework of fixed-point iteration algorithms with active feedback control for DVF inversion. Based on rigorous convergence analysis, we design control mechanisms for modulating the inverse consistency (IC) residual of the current iterate, to be used as feedback into the next iterate. The control is designed adaptively to the input DVF with the objective to enlarge the convergence area and expedite convergence. Three particular settings of feedback control are introduced: constant value over the domain throughout the iteration; alternating values between iteration steps; and spatially variant values. We also introduce three spectral measures of the displacement Jacobian for characterizing a DVF. These measures reveal the critical role of what we term the non-translational displacement component (NTDC) of the DVF. We carry out inversion experiments with an analytical DVF pair, and with DVFs associated with thoracic CT images of 6 patients at end of expiration and end of inspiration. Results: NTDC-adaptive iterations are shown to attain a larger convergence region at a faster pace compared to previous non-adaptive DVF inversion iteration algorithms. By our numerical experiments, alternating control yields smaller IC residuals and inversion errors than constant control. Spatially variant control renders smaller residuals and errors by at least an order of magnitude, compared to other schemes, in no more than 10 steps. Inversion results also show remarkable quantitative agreement with analysis-based predictions. Conclusion: Our analysis captures properties of DVF data associated with clinical CT images, and provides new understanding of iterative DVF inversion algorithms with a simple residual feedback control. Adaptive control is necessary and highly effective in the presence of non-small NTDCs. The adaptive iterations or the spectral measures, or both, may potentially be incorporated into deformable image registration methods.
Tasks Image Registration
Published 2016-10-27
URL http://arxiv.org/abs/1610.08589v4
PDF http://arxiv.org/pdf/1610.08589v4.pdf
PWC https://paperswithcode.com/paper/iterative-inversion-of-deformation-vector
Repo https://github.com/ailiop/idvf
Framework none

Using Neural Network Formalism to Solve Multiple-Instance Problems

Title Using Neural Network Formalism to Solve Multiple-Instance Problems
Authors Tomas Pevny, Petr Somol
Abstract Many objects in the real world are difficult to describe by a single numerical vector of a fixed length, whereas describing them by a set of vectors is more natural. Therefore, Multiple instance learning (MIL) techniques have been constantly gaining on importance throughout last years. MIL formalism represents each object (sample) by a set (bag) of feature vectors (instances) of fixed length where knowledge about objects (e.g., class label) is available on bag level but not necessarily on instance level. Many standard tools including supervised classifiers have been already adapted to MIL setting since the problem got formalized in late nineties. In this work we propose a neural network (NN) based formalism that intuitively bridges the gap between MIL problem definition and the vast existing knowledge-base of standard models and classifiers. We show that the proposed NN formalism is effectively optimizable by a modified back-propagation algorithm and can reveal unknown patterns inside bags. Comparison to eight types of classifiers from the prior art on a set of 14 publicly available benchmark datasets confirms the advantages and accuracy of the proposed solution.
Tasks Multiple Instance Learning
Published 2016-09-23
URL http://arxiv.org/abs/1609.07257v3
PDF http://arxiv.org/pdf/1609.07257v3.pdf
PWC https://paperswithcode.com/paper/using-neural-network-formalism-to-solve
Repo https://github.com/pevnak/Mill.jl
Framework none

Back to the Basics: Bayesian extensions of IRT outperform neural networks for proficiency estimation

Title Back to the Basics: Bayesian extensions of IRT outperform neural networks for proficiency estimation
Authors Kevin H. Wilson, Yan Karklin, Bojian Han, Chaitanya Ekanadham
Abstract Estimating student proficiency is an important task for computer based learning systems. We compare a family of IRT-based proficiency estimation methods to Deep Knowledge Tracing (DKT), a recently proposed recurrent neural network model with promising initial results. We evaluate how well each model predicts a student’s future response given previous responses using two publicly available and one proprietary data set. We find that IRT-based methods consistently matched or outperformed DKT across all data sets at the finest level of content granularity that was tractable for them to be trained on. A hierarchical extension of IRT that captured item grouping structure performed best overall. When data sets included non-trivial autocorrelations in student response patterns, a temporal extension of IRT improved performance over standard IRT while the RNN-based method did not. We conclude that IRT-based models provide a simpler, better-performing alternative to existing RNN-based models of student interaction data while also affording more interpretability and guarantees due to their formulation as Bayesian probabilistic models.
Tasks Knowledge Tracing
Published 2016-04-08
URL http://arxiv.org/abs/1604.02336v2
PDF http://arxiv.org/pdf/1604.02336v2.pdf
PWC https://paperswithcode.com/paper/back-to-the-basics-bayesian-extensions-of-irt
Repo https://github.com/Knewton/edm2016
Framework none

HyperFace: A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition

Title HyperFace: A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition
Authors Rajeev Ranjan, Vishal M. Patel, Rama Chellappa
Abstract We present an algorithm for simultaneous face detection, landmarks localization, pose estimation and gender recognition using deep convolutional neural networks (CNN). The proposed method called, HyperFace, fuses the intermediate layers of a deep CNN using a separate CNN followed by a multi-task learning algorithm that operates on the fused features. It exploits the synergy among the tasks which boosts up their individual performances. Additionally, we propose two variants of HyperFace: (1) HyperFace-ResNet that builds on the ResNet-101 model and achieves significant improvement in performance, and (2) Fast-HyperFace that uses a high recall fast face detector for generating region proposals to improve the speed of the algorithm. Extensive experiments show that the proposed models are able to capture both global and local information in faces and performs significantly better than many competitive algorithms for each of these four tasks.
Tasks Face Detection, Multi-Task Learning, Pose Estimation
Published 2016-03-03
URL http://arxiv.org/abs/1603.01249v3
PDF http://arxiv.org/pdf/1603.01249v3.pdf
PWC https://paperswithcode.com/paper/hyperface-a-deep-multi-task-learning
Repo https://github.com/pasrichashivam/hyperface_deep_multi-task-learning_keras_implementation
Framework none

Optical Flow Estimation using a Spatial Pyramid Network

Title Optical Flow Estimation using a Spatial Pyramid Network
Authors Anurag Ranjan, Michael J. Black
Abstract We learn to compute optical flow by combining a classical spatial-pyramid formulation with deep learning. This estimates large motions in a coarse-to-fine approach by warping one image of a pair at each pyramid level by the current flow estimate and computing an update to the flow. Instead of the standard minimization of an objective function at each pyramid level, we train one deep network per level to compute the flow update. Unlike the recent FlowNet approach, the networks do not need to deal with large motions; these are dealt with by the pyramid. This has several advantages. First, our Spatial Pyramid Network (SPyNet) is much simpler and 96% smaller than FlowNet in terms of model parameters. This makes it more efficient and appropriate for embedded applications. Second, since the flow at each pyramid level is small (< 1 pixel), a convolutional approach applied to pairs of warped images is appropriate. Third, unlike FlowNet, the learned convolution filters appear similar to classical spatio-temporal filters, giving insight into the method and how to improve it. Our results are more accurate than FlowNet on most standard benchmarks, suggesting a new direction of combining classical flow methods with deep learning.
Tasks Dense Pixel Correspondence Estimation, Optical Flow Estimation
Published 2016-11-03
URL http://arxiv.org/abs/1611.00850v2
PDF http://arxiv.org/pdf/1611.00850v2.pdf
PWC https://paperswithcode.com/paper/optical-flow-estimation-using-a-spatial
Repo https://github.com/rickyHong/tfoptflow-repl
Framework tf

Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings

Title Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings
Authors Spandana Gella, Mirella Lapata, Frank Keller
Abstract We introduce a new task, visual sense disambiguation for verbs: given an image and a verb, assign the correct sense of the verb, i.e., the one that describes the action depicted in the image. Just as textual word sense disambiguation is useful for a wide range of NLP tasks, visual sense disambiguation can be useful for multimodal tasks such as image retrieval, image description, and text illustration. We introduce VerSe, a new dataset that augments existing multimodal datasets (COCO and TUHOI) with sense labels. We propose an unsupervised algorithm based on Lesk which performs visual sense disambiguation using textual, visual, or multimodal embeddings. We find that textual embeddings perform well when gold-standard textual annotations (object labels and image descriptions) are available, while multimodal embeddings perform well on unannotated images. We also verify our findings by using the textual and multimodal embeddings as features in a supervised setting and analyse the performance of visual sense disambiguation task. VerSe is made publicly available and can be downloaded at: https://github.com/spandanagella/verse.
Tasks Image Retrieval, Word Sense Disambiguation
Published 2016-03-30
URL http://arxiv.org/abs/1603.09188v1
PDF http://arxiv.org/pdf/1603.09188v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-visual-sense-disambiguation-for
Repo https://github.com/spandanagella/verse
Framework none
comments powered by Disqus