May 7, 2019

3400 words 16 mins read

Paper Group AWR 99

Attention-over-Attention Neural Networks for Reading Comprehension. Beyond CCA: Moment Matching for Multi-View Models. Recurrent Neural Networks for Multivariate Time Series with Missing Values. Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering. Making the V in VQA Matter: Elevating the Role of Image Understanding in …

Attention-over-Attention Neural Networks for Reading Comprehension


Title	Attention-over-Attention Neural Networks for Reading Comprehension
Authors	Yiming Cui, Zhipeng Chen, Si Wei, Shijin Wang, Ting Liu, Guoping Hu
Abstract	Cloze-style queries are representative problems in reading comprehension. Over the past few months, we have seen much progress that utilizing neural network approach to solve Cloze-style questions. In this paper, we present a novel model called attention-over-attention reader for the Cloze-style reading comprehension task. Our model aims to place another attention mechanism over the document-level attention, and induces “attended attention” for final predictions. Unlike the previous works, our neural network model requires less pre-defined hyper-parameters and uses an elegant architecture for modeling. Experimental results show that the proposed attention-over-attention model significantly outperforms various state-of-the-art systems by a large margin in public datasets, such as CNN and Children’s Book Test datasets.
Tasks	Question Answering, Reading Comprehension
Published	2016-07-15
URL	http://arxiv.org/abs/1607.04423v4
PDF	http://arxiv.org/pdf/1607.04423v4.pdf
PWC	https://paperswithcode.com/paper/attention-over-attention-neural-networks-for
Repo	https://github.com/shuxiaobo/QA-Experiment
Framework	tf

Beyond CCA: Moment Matching for Multi-View Models


Title	Beyond CCA: Moment Matching for Multi-View Models
Authors	Anastasia Podosinnikova, Francis Bach, Simon Lacoste-Julien
Abstract	We introduce three novel semi-parametric extensions of probabilistic canonical correlation analysis with identifiability guarantees. We consider moment matching techniques for estimation in these models. For that, by drawing explicit links between the new models and a discrete version of independent component analysis (DICA), we first extend the DICA cumulant tensors to the new discrete version of CCA. By further using a close connection with independent component analysis, we introduce generalized covariance matrices, which can replace the cumulant tensors in the moment matching framework, and, therefore, improve sample complexity and simplify derivations and algorithms significantly. As the tensor power method or orthogonal joint diagonalization are not applicable in the new setting, we use non-orthogonal joint diagonalization techniques for matching the cumulants. We demonstrate performance of the proposed models and estimation techniques on experiments with both synthetic and real datasets.
Tasks
Published	2016-02-29
URL	http://arxiv.org/abs/1602.09013v2
PDF	http://arxiv.org/pdf/1602.09013v2.pdf
PWC	https://paperswithcode.com/paper/beyond-cca-moment-matching-for-multi-view
Repo	https://github.com/anastasia-podosinnikova/cca
Framework	none

Recurrent Neural Networks for Multivariate Time Series with Missing Values


Title	Recurrent Neural Networks for Multivariate Time Series with Missing Values
Authors	Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, Yan Liu
Abstract	Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missing patterns for effective imputation and improving prediction performance. In this paper, we develop novel deep learning models, namely GRU-D, as one of the early attempts. GRU-D is based on Gated Recurrent Unit (GRU), a state-of-the-art recurrent neural network. It takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provides useful insights for better understanding and utilization of missing values in time series analysis.
Tasks	Imputation, Multivariate Time Series Forecasting, Multivariate Time Series Imputation, Time Series, Time Series Analysis, Time Series Classification, Time Series Prediction
Published	2016-06-06
URL	http://arxiv.org/abs/1606.01865v2
PDF	http://arxiv.org/pdf/1606.01865v2.pdf
PWC	https://paperswithcode.com/paper/recurrent-neural-networks-for-multivariate
Repo	https://github.com/Han-JD/GRU-D
Framework	pytorch

Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering


Title	Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering
Authors	Zhuxi Jiang, Yin Zheng, Huachun Tan, Bangsheng Tang, Hanning Zhou
Abstract	Clustering is among the most fundamental tasks in computer vision and machine learning. In this paper, we propose Variational Deep Embedding (VaDE), a novel unsupervised generative clustering approach within the framework of Variational Auto-Encoder (VAE). Specifically, VaDE models the data generative procedure with a Gaussian Mixture Model (GMM) and a deep neural network (DNN): 1) the GMM picks a cluster; 2) from which a latent embedding is generated; 3) then the DNN decodes the latent embedding into observables. Inference in VaDE is done in a variational way: a different DNN is used to encode observables to latent embeddings, so that the evidence lower bound (ELBO) can be optimized using Stochastic Gradient Variational Bayes (SGVB) estimator and the reparameterization trick. Quantitative comparisons with strong baselines are included in this paper, and experimental results show that VaDE significantly outperforms the state-of-the-art clustering methods on 4 benchmarks from various modalities. Moreover, by VaDE’s generative nature, we show its capability of generating highly realistic samples for any specified cluster, without using supervised information during training. Lastly, VaDE is a flexible and extensible framework for unsupervised generative clustering, more general mixture models than GMM can be easily plugged in.
Tasks
Published	2016-11-16
URL	http://arxiv.org/abs/1611.05148v3
PDF	http://arxiv.org/pdf/1611.05148v3.pdf
PWC	https://paperswithcode.com/paper/variational-deep-embedding-an-unsupervised
Repo	https://github.com/mori97/VaDE
Framework	pytorch

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering


Title	Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Authors	Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, Devi Parikh
Abstract	Problems at the intersection of vision and language are of significant importance both as challenging research questions and for the rich set of applications they enable. However, inherent structure in our world and bias in our language tend to be a simpler signal for learning than visual modalities, resulting in models that ignore visual information, leading to an inflated sense of their capability. We propose to counter these language priors for the task of Visual Question Answering (VQA) and make vision (the V in VQA) matter! Specifically, we balance the popular VQA dataset by collecting complementary images such that every question in our balanced dataset is associated with not just a single image, but rather a pair of similar images that result in two different answers to the question. Our dataset is by construction more balanced than the original VQA dataset and has approximately twice the number of image-question pairs. Our complete balanced dataset is publicly available at www.visualqa.org as part of the 2nd iteration of the Visual Question Answering Dataset and Challenge (VQA v2.0). We further benchmark a number of state-of-art VQA models on our balanced dataset. All models perform significantly worse on our balanced dataset, suggesting that these models have indeed learned to exploit language priors. This finding provides the first concrete empirical evidence for what seems to be a qualitative sense among practitioners. Finally, our data collection protocol for identifying complementary images enables us to develop a novel interpretable model, which in addition to providing an answer to the given (image, question) pair, also provides a counter-example based explanation. Specifically, it identifies an image that is similar to the original image, but it believes has a different answer to the same question. This can help in building trust for machines among their users.
Tasks	Visual Question Answering
Published	2016-12-02
URL	http://arxiv.org/abs/1612.00837v3
PDF	http://arxiv.org/pdf/1612.00837v3.pdf
PWC	https://paperswithcode.com/paper/making-the-v-in-vqa-matter-elevating-the-role
Repo	https://github.com/SatyamGaba/visual_question_answering
Framework	pytorch

Bags of Local Convolutional Features for Scalable Instance Search


Title	Bags of Local Convolutional Features for Scalable Instance Search
Authors	Eva Mohedano, Amaia Salvador, Kevin McGuinness, Ferran Marques, Noel E. O’Connor, Xavier Giro-i-Nieto
Abstract	This work proposes a simple instance retrieval pipeline based on encoding the convolutional features of CNN using the bag of words aggregation scheme (BoW). Assigning each local array of activations in a convolutional layer to a visual word produces an \textit{assignment map}, a compact representation that relates regions of an image with a visual word. We use the assignment map for fast spatial reranking, obtaining object localizations that are used for query expansion. We demonstrate the suitability of the BoW representation based on local CNN features for instance retrieval, achieving competitive performance on the Oxford and Paris buildings benchmarks. We show that our proposed system for CNN feature aggregation with BoW outperforms state-of-the-art techniques using sum pooling at a subset of the challenging TRECVid INS benchmark.
Tasks	Instance Search
Published	2016-04-15
URL	http://arxiv.org/abs/1604.04653v1
PDF	http://arxiv.org/pdf/1604.04653v1.pdf
PWC	https://paperswithcode.com/paper/bags-of-local-convolutional-features-for
Repo	https://github.com/hbwang1427/image_retrieval
Framework	none

OctNet: Learning Deep 3D Representations at High Resolutions


Title	OctNet: Learning Deep 3D Representations at High Resolutions
Authors	Gernot Riegler, Ali Osman Ulusoy, Andreas Geiger
Abstract	We present OctNet, a representation for deep learning with sparse 3D data. In contrast to existing models, our representation enables 3D convolutional networks which are both deep and high resolution. Towards this goal, we exploit the sparsity in the input data to hierarchically partition the space using a set of unbalanced octrees where each leaf node stores a pooled feature representation. This allows to focus memory allocation and computation to the relevant dense regions and enables deeper networks without compromising resolution. We demonstrate the utility of our OctNet representation by analyzing the impact of resolution on several 3D tasks including 3D object classification, orientation estimation and point cloud labeling.
Tasks	3D Object Classification, Object Classification
Published	2016-11-15
URL	http://arxiv.org/abs/1611.05009v4
PDF	http://arxiv.org/pdf/1611.05009v4.pdf
PWC	https://paperswithcode.com/paper/octnet-learning-deep-3d-representations-at
Repo	https://github.com/griegler/octnet
Framework	torch

Stochastic Bouncy Particle Sampler


Title	Stochastic Bouncy Particle Sampler
Authors	Ari Pakman, Dar Gilboa, David Carlson, Liam Paninski
Abstract	We introduce a novel stochastic version of the non-reversible, rejection-free Bouncy Particle Sampler (BPS), a Markov process whose sample trajectories are piecewise linear. The algorithm is based on simulating first arrival times in a doubly stochastic Poisson process using the thinning method, and allows efficient sampling of Bayesian posteriors in big datasets. We prove that in the BPS no bias is introduced by noisy evaluations of the log-likelihood gradient. On the other hand, we argue that efficiency considerations favor a small, controllable bias in the construction of the thinning proposals, in exchange for faster mixing. We introduce a simple regression-based proposal intensity for the thinning method that controls this trade-off. We illustrate the algorithm in several examples in which it outperforms both unbiased, but slowly mixing stochastic versions of BPS, as well as biased stochastic gradient-based samplers.
Tasks
Published	2016-09-03
URL	http://arxiv.org/abs/1609.00770v3
PDF	http://arxiv.org/pdf/1609.00770v3.pdf
PWC	https://paperswithcode.com/paper/stochastic-bouncy-particle-sampler
Repo	https://github.com/dargilboa/SBPS-public
Framework	tf

Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee


Title	Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee
Authors	Alireza Aghasi, Afshin Abdi, Nam Nguyen, Justin Romberg
Abstract	We introduce and analyze a new technique for model reduction for deep neural networks. While large networks are theoretically capable of learning arbitrarily complex models, overfitting and model redundancy negatively affects the prediction accuracy and model variance. Our Net-Trim algorithm prunes (sparsifies) a trained network layer-wise, removing connections at each layer by solving a convex optimization program. This program seeks a sparse set of weights at each layer that keeps the layer inputs and outputs consistent with the originally trained model. The algorithms and associated analysis are applicable to neural networks operating with the rectified linear unit (ReLU) as the nonlinear activation. We present both parallel and cascade versions of the algorithm. While the latter can achieve slightly simpler models with the same generalization performance, the former can be computed in a distributed manner. In both cases, Net-Trim significantly reduces the number of connections in the network, while also providing enough regularization to slightly reduce the generalization error. We also provide a mathematical analysis of the consistency between the initial network and the retrained model. To analyze the model sample complexity, we derive the general sufficient conditions for the recovery of a sparse transform matrix. For a single layer taking independent Gaussian random vectors of length $N$ as inputs, we show that if the network response can be described using a maximum number of $s$ non-zero weights per node, these weights can be learned from $\mathcal{O}(s\log N)$ samples.
Tasks
Published	2016-11-16
URL	http://arxiv.org/abs/1611.05162v4
PDF	http://arxiv.org/pdf/1611.05162v4.pdf
PWC	https://paperswithcode.com/paper/net-trim-convex-pruning-of-deep-neural
Repo	https://github.com/DNNToolBox/Net-Trim-v1
Framework	tf

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems


Title	TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Authors	Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, Xiaoqiang Zheng
Abstract	TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.
Tasks	Dimensionality Reduction
Published	2016-03-14
URL	http://arxiv.org/abs/1603.04467v2
PDF	http://arxiv.org/pdf/1603.04467v2.pdf
PWC	https://paperswithcode.com/paper/tensorflow-large-scale-machine-learning-on
Repo	https://github.com/tensorflow/tensorflow
Framework	tf

A Framework for Fast Image Deconvolution with Incomplete Observations


Title	A Framework for Fast Image Deconvolution with Incomplete Observations
Authors	Miguel Simões, Luis B. Almeida, José Bioucas-Dias, Jocelyn Chanussot
Abstract	In image deconvolution problems, the diagonalization of the underlying operators by means of the FFT usually yields very large speedups. When there are incomplete observations (e.g., in the case of unknown boundaries), standard deconvolution techniques normally involve non-diagonalizable operators, resulting in rather slow methods, or, otherwise, use inexact convolution models, resulting in the occurrence of artifacts in the enhanced images. In this paper, we propose a new deconvolution framework for images with incomplete observations that allows us to work with diagonalized convolution operators, and therefore is very fast. We iteratively alternate the estimation of the unknown pixels and of the deconvolved image, using, e.g., an FFT-based deconvolution method. This framework is an efficient, high-quality alternative to existing methods of dealing with the image boundaries, such as edge tapering. It can be used with any fast deconvolution method. We give an example in which a state-of-the-art method that assumes periodic boundary conditions is extended, through the use of this framework, to unknown boundary conditions. Furthermore, we propose a specific implementation of this framework, based on the alternating direction method of multipliers (ADMM). We provide a proof of convergence for the resulting algorithm, which can be seen as a “partial” ADMM, in which not all variables are dualized. We report experimental comparisons with other primal-dual methods, where the proposed one performed at the level of the state of the art. Four different kinds of applications were tested in the experiments: deconvolution, deconvolution with inpainting, superresolution, and demosaicing, all with unknown boundaries.
Tasks	Demosaicking, Image Deconvolution
Published	2016-02-03
URL	http://arxiv.org/abs/1602.01410v2
PDF	http://arxiv.org/pdf/1602.01410v2.pdf
PWC	https://paperswithcode.com/paper/a-framework-for-fast-image-deconvolution-with
Repo	https://github.com/alfaiate/DeconvolutionIncompleteObs
Framework	none

Understanding intermediate layers using linear classifier probes


Title	Understanding intermediate layers using linear classifier probes
Authors	Guillaume Alain, Yoshua Bengio
Abstract	Neural network models have a reputation for being black boxes. We propose to monitor the features at every layer of a model and measure how suitable they are for classification. We use linear classifiers, which we refer to as “probes”, trained entirely independently of the model itself. This helps us better understand the roles and dynamics of the intermediate layers. We demonstrate how this can be used to develop a better intuition about models and to diagnose potential problems. We apply this technique to the popular models Inception v3 and Resnet-50. Among other things, we observe experimentally that the linear separability of features increase monotonically along the depth of the model.
Tasks
Published	2016-10-05
URL	http://arxiv.org/abs/1610.01644v4
PDF	http://arxiv.org/pdf/1610.01644v4.pdf
PWC	https://paperswithcode.com/paper/understanding-intermediate-layers-using
Repo	https://github.com/ceegeechow/ECE471
Framework	tf

Bayesian Estimation of Bipartite Matchings for Record Linkage


Title	Bayesian Estimation of Bipartite Matchings for Record Linkage
Authors	Mauricio Sadinle
Abstract	The bipartite record linkage task consists of merging two disparate datafiles containing information on two overlapping sets of entities. This is non-trivial in the absence of unique identifiers and it is important for a wide variety of applications given that it needs to be solved whenever we have to combine information from different sources. Most statistical techniques currently used for record linkage are derived from a seminal paper by Fellegi and Sunter (1969). These techniques usually assume independence in the matching statuses of record pairs to derive estimation procedures and optimal point estimators. We argue that this independence assumption is unreasonable and instead target a bipartite matching between the two datafiles as our parameter of interest. Bayesian implementations allow us to quantify uncertainty on the matching decisions and derive a variety of point estimators using different loss functions. We propose partial Bayes estimates that allow uncertain parts of the bipartite matching to be left unresolved. We evaluate our approach to record linkage using a variety of challenging scenarios and show that it outperforms the traditional methodology. We illustrate the advantages of our methods merging two datafiles on casualties from the civil war of El Salvador.
Tasks
Published	2016-01-25
URL	http://arxiv.org/abs/1601.06630v1
PDF	http://arxiv.org/pdf/1601.06630v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-estimation-of-bipartite-matchings
Repo	https://github.com/msadinle/BRL
Framework	none

Tracking the World State with Recurrent Entity Networks


Title	Tracking the World State with Recurrent Entity Networks
Authors	Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, Yann LeCun
Abstract	We introduce a new model, the Recurrent Entity Network (EntNet). It is equipped with a dynamic long-term memory which allows it to maintain and update a representation of the state of the world as it receives new data. For language understanding tasks, it can reason on-the-fly as it reads text, not just when it is required to answer a question or respond as is the case for a Memory Network (Sukhbaatar et al., 2015). Like a Neural Turing Machine or Differentiable Neural Computer (Graves et al., 2014; 2016) it maintains a fixed size memory and can learn to perform location and content-based read and write operations. However, unlike those models it has a simple parallel architecture in which several memory locations can be updated simultaneously. The EntNet sets a new state-of-the-art on the bAbI tasks, and is the first method to solve all the tasks in the 10k training examples setting. We also demonstrate that it can solve a reasoning task which requires a large number of supporting facts, which other methods are not able to solve, and can generalize past its training horizon. It can also be practically used on large scale datasets such as Children’s Book Test, where it obtains competitive performance, reading the story in a single pass.
Tasks	Question Answering
Published	2016-12-12
URL	http://arxiv.org/abs/1612.03969v3
PDF	http://arxiv.org/pdf/1612.03969v3.pdf
PWC	https://paperswithcode.com/paper/tracking-the-world-state-with-recurrent
Repo	https://github.com/hsakas/Recurrent-Entity-Network-EntNet
Framework	tf

Image-based localization using LSTMs for structured feature correlation


Title	Image-based localization using LSTMs for structured feature correlation
Authors	Florian Walch, Caner Hazirbas, Laura Leal-Taixé, Torsten Sattler, Sebastian Hilsenbeck, Daniel Cremers
Abstract	In this work we propose a new CNN+LSTM architecture for camera pose regression for indoor and outdoor scenes. CNNs allow us to learn suitable feature representations for localization that are robust against motion blur and illumination changes. We make use of LSTM units on the CNN output, which play the role of a structured dimensionality reduction on the feature vector, leading to drastic improvements in localization performance. We provide extensive quantitative comparison of CNN-based and SIFT-based localization methods, showing the weaknesses and strengths of each. Furthermore, we present a new large-scale indoor dataset with accurate ground truth from a laser scanner. Experimental results on both indoor and outdoor public datasets show our method outperforms existing deep architectures, and can localize images in hard conditions, e.g., in the presence of mostly textureless surfaces, where classic SIFT-based methods fail.
Tasks	Dimensionality Reduction, Image-Based Localization
Published	2016-11-23
URL	http://arxiv.org/abs/1611.07890v4
PDF	http://arxiv.org/pdf/1611.07890v4.pdf
PWC	https://paperswithcode.com/paper/image-based-localization-using-lstms-for
Repo	https://github.com/NavVisResearch/NavVis-Indoor-Dataset
Framework	none