October 21, 2019

2977 words 14 mins read

Paper Group AWR 101

Fast yet Simple Natural-Gradient Descent for Variational Inference in Complex Models. How to train your MAML. A Corpus for Reasoning About Natural Language Grounded in Photographs. Collective Entity Disambiguation with Structured Gradient Tree Boosting. Sampling Theory for Graph Signals on Product Graphs. Exploring the Semantic Content of Unsupervi …

Fast yet Simple Natural-Gradient Descent for Variational Inference in Complex Models


Title	Fast yet Simple Natural-Gradient Descent for Variational Inference in Complex Models
Authors	Mohammad Emtiyaz Khan, Didrik Nielsen
Abstract	Bayesian inference plays an important role in advancing machine learning, but faces computational challenges when applied to complex models such as deep neural networks. Variational inference circumvents these challenges by formulating Bayesian inference as an optimization problem and solving it using gradient-based optimization. In this paper, we argue in favor of natural-gradient approaches which, unlike their gradient-based counterparts, can improve convergence by exploiting the information geometry of the solutions. We show how to derive fast yet simple natural-gradient updates by using a duality associated with exponential-family distributions. An attractive feature of these methods is that, by using natural-gradients, they are able to extract accurate local approximations for individual model components. We summarize recent results for Bayesian deep learning showing the superiority of natural-gradient approaches over their gradient counterparts.
Tasks	Bayesian Inference
Published	2018-07-12
URL	http://arxiv.org/abs/1807.04489v2
PDF	http://arxiv.org/pdf/1807.04489v2.pdf
PWC	https://paperswithcode.com/paper/fast-yet-simple-natural-gradient-descent-for
Repo	https://github.com/ssggreg/active_learning
Framework	pytorch

How to train your MAML


Title	How to train your MAML
Authors	Antreas Antoniou, Harrison Edwards, Amos Storkey
Abstract	The field of few-shot learning has recently seen substantial advancements. Most of these advancements came from casting few-shot learning as a meta-learning problem. Model Agnostic Meta Learning or MAML is currently one of the best approaches for few-shot learning via meta-learning. MAML is simple, elegant and very powerful, however, it has a variety of issues, such as being very sensitive to neural network architectures, often leading to instability during training, requiring arduous hyperparameter searches to stabilize training and achieve high generalization and being very computationally expensive at both training and inference times. In this paper, we propose various modifications to MAML that not only stabilize the system, but also substantially improve the generalization performance, convergence speed and computational overhead of MAML, which we call MAML++.
Tasks	Few-Shot Image Classification, Few-Shot Learning, Meta-Learning
Published	2018-10-22
URL	http://arxiv.org/abs/1810.09502v3
PDF	http://arxiv.org/pdf/1810.09502v3.pdf
PWC	https://paperswithcode.com/paper/how-to-train-your-maml
Repo	https://github.com/AntreasAntoniou/HowToTrainYourMAMLPytorch
Framework	pytorch

A Corpus for Reasoning About Natural Language Grounded in Photographs


Title	A Corpus for Reasoning About Natural Language Grounded in Photographs
Authors	Alane Suhr, Stephanie Zhou, Ally Zhang, Iris Zhang, Huajun Bai, Yoav Artzi
Abstract	We introduce a new dataset for joint reasoning about natural language and images, with a focus on semantic diversity, compositionality, and visual reasoning challenges. The data contains 107,292 examples of English sentences paired with web photographs. The task is to determine whether a natural language caption is true about a pair of photographs. We crowdsource the data using sets of visually rich images and a compare-and-contrast task to elicit linguistically diverse language. Qualitative analysis shows the data requires compositional joint reasoning, including about quantities, comparisons, and relations. Evaluation using state-of-the-art visual reasoning methods shows the data presents a strong challenge.
Tasks	Visual Reasoning
Published	2018-11-01
URL	https://arxiv.org/abs/1811.00491v3
PDF	https://arxiv.org/pdf/1811.00491v3.pdf
PWC	https://paperswithcode.com/paper/a-corpus-for-reasoning-about-natural-language
Repo	https://github.com/vortexJCH/nlvr
Framework	none

Collective Entity Disambiguation with Structured Gradient Tree Boosting


Title	Collective Entity Disambiguation with Structured Gradient Tree Boosting
Authors	Yi Yang, Ozan Irsoy, Kazi Shefaet Rahman
Abstract	We present a gradient-tree-boosting-based structured learning model for jointly disambiguating named entities in a document. Gradient tree boosting is a widely used machine learning algorithm that underlies many top-performing natural language processing systems. Surprisingly, most works limit the use of gradient tree boosting as a tool for regular classification or regression problems, despite the structured nature of language. To the best of our knowledge, our work is the first one that employs the structured gradient tree boosting (SGTB) algorithm for collective entity disambiguation. By defining global features over previous disambiguation decisions and jointly modeling them with local features, our system is able to produce globally optimized entity assignments for mentions in a document. Exact inference is prohibitively expensive for our globally normalized model. To solve this problem, we propose Bidirectional Beam Search with Gold path (BiBSG), an approximate inference algorithm that is a variant of the standard beam search algorithm. BiBSG makes use of global information from both past and future to perform better local search. Experiments on standard benchmark datasets show that SGTB significantly improves upon published results. Specifically, SGTB outperforms the previous state-of-the-art neural system by near 1% absolute accuracy on the popular AIDA-CoNLL dataset.
Tasks	Entity Disambiguation
Published	2018-02-28
URL	http://arxiv.org/abs/1802.10229v2
PDF	http://arxiv.org/pdf/1802.10229v2.pdf
PWC	https://paperswithcode.com/paper/collective-entity-disambiguation-with
Repo	https://github.com/bloomberg/sgtb
Framework	none

Sampling Theory for Graph Signals on Product Graphs


Title	Sampling Theory for Graph Signals on Product Graphs
Authors	Rohan Varma, Jelena Kovačević
Abstract	In this paper, we extend the sampling theory on graphs by constructing a framework that exploits the structure in product graphs for efficient sampling and recovery of bandlimited graph signals that lie on them. Product graphs are graphs that are composed from smaller graph atoms; we motivate how this model is a flexible and useful way to model richer classes of data that can be multi-modal in nature. Previous works have established a sampling theory on graphs for bandlimited signals. Importantly, the framework achieves significant savings in both sample complexity and computational complexity
Tasks
Published	2018-09-26
URL	http://arxiv.org/abs/1809.10049v1
PDF	http://arxiv.org/pdf/1809.10049v1.pdf
PWC	https://paperswithcode.com/paper/sampling-theory-for-graph-signals-on-product
Repo	https://github.com/CrowdArt/node-chat-app
Framework	none

Exploring the Semantic Content of Unsupervised Graph Embeddings: An Empirical Study


Title	Exploring the Semantic Content of Unsupervised Graph Embeddings: An Empirical Study
Authors	Stephen Bonner, Ibad Kureshi, John Brennan, Georgios Theodoropoulos, Andrew Stephen McGough, Boguslaw Obara
Abstract	Graph embeddings have become a key and widely used technique within the field of graph mining, proving to be successful across a broad range of domains including social, citation, transportation and biological. Graph embedding techniques aim to automatically create a low-dimensional representation of a given graph, which captures key structural elements in the resulting embedding space. However, to date, there has been little work exploring exactly which topological structures are being learned in the embeddings process. In this paper, we investigate if graph embeddings are approximating something analogous with traditional vertex level graph features. If such a relationship can be found, it could be used to provide a theoretical insight into how graph embedding approaches function. We perform this investigation by predicting known topological features, using supervised and unsupervised methods, directly from the embedding space. If a mapping between the embeddings and topological features can be found, then we argue that the structural information encapsulated by the features is represented in the embedding space. To explore this, we present extensive experimental evaluation from five state-of-the-art unsupervised graph embedding techniques, across a range of empirical graph datasets, measuring a selection of topological features. We demonstrate that several topological features are indeed being approximated by the embedding space, allowing key insight into how graph embeddings create good representations.
Tasks	Graph Embedding
Published	2018-06-19
URL	http://arxiv.org/abs/1806.07464v1
PDF	http://arxiv.org/pdf/1806.07464v1.pdf
PWC	https://paperswithcode.com/paper/exploring-the-semantic-content-of
Repo	https://github.com/sbonner0/unsupervised-graph-embeddings
Framework	tf

Learning Private Neural Language Modeling with Attentive Aggregation


Title	Learning Private Neural Language Modeling with Attentive Aggregation
Authors	Shaoxiong Ji, Shirui Pan, Guodong Long, Xue Li, Jing Jiang, Zi Huang
Abstract	Mobile keyboard suggestion is typically regarded as a word-level language modeling problem. Centralized machine learning technique requires massive user data collected to train on, which may impose privacy concerns for sensitive personal typing data of users. Federated learning (FL) provides a promising approach to learning private language modeling for intelligent personalized keyboard suggestion by training models in distributed clients rather than training in a central server. To obtain a global model for prediction, existing FL algorithms simply average the client models and ignore the importance of each client during model aggregation. Furthermore, there is no optimization for learning a well-generalized global model on the central server. To solve these problems, we propose a novel model aggregation with the attention mechanism considering the contribution of clients models to the global model, together with an optimization technique during server aggregation. Our proposed attentive aggregation method minimizes the weighted distance between the server model and client models through iterative parameters updating while attends the distance between the server model and client models. Through experiments on two popular language modeling datasets and a social media dataset, our proposed method outperforms its counterparts in terms of perplexity and communication cost in most settings of comparison.
Tasks	Language Modelling
Published	2018-12-17
URL	http://arxiv.org/abs/1812.07108v2
PDF	http://arxiv.org/pdf/1812.07108v2.pdf
PWC	https://paperswithcode.com/paper/learning-private-neural-language-modeling
Repo	https://github.com/shaoxiongji/fed-att
Framework	pytorch

Meta-Learning Probabilistic Inference For Prediction


Title	Meta-Learning Probabilistic Inference For Prediction
Authors	Jonathan Gordon, John Bronskill, Matthias Bauer, Sebastian Nowozin, Richard E. Turner
Abstract	This paper introduces a new framework for data efficient and versatile learning. Specifically: 1) We develop ML-PIP, a general framework for Meta-Learning approximate Probabilistic Inference for Prediction. ML-PIP extends existing probabilistic interpretations of meta-learning to cover a broad class of methods. 2) We introduce VERSA, an instance of the framework employing a flexible and versatile amortization network that takes few-shot learning datasets as inputs, with arbitrary numbers of shots, and outputs a distribution over task-specific parameters in a single forward pass. VERSA substitutes optimization at test time with forward passes through inference networks, amortizing the cost of inference and relieving the need for second derivatives during training. 3) We evaluate VERSA on benchmark datasets where the method sets new state-of-the-art results, handles arbitrary numbers of shots, and for classification, arbitrary numbers of classes at train and test time. The power of the approach is then demonstrated through a challenging few-shot ShapeNet view reconstruction task.
Tasks	Few-Shot Learning, Meta-Learning
Published	2018-05-24
URL	https://arxiv.org/abs/1805.09921v4
PDF	https://arxiv.org/pdf/1805.09921v4.pdf
PWC	https://paperswithcode.com/paper/meta-learning-probabilistic-inference-for
Repo	https://github.com/Gordonjo/versa
Framework	tf

Quaternion Recurrent Neural Networks


Title	Quaternion Recurrent Neural Networks
Authors	Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Chiheb Trabelsi, Renato De Mori, Yoshua Bengio
Abstract	Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence. Nonetheless, popular tasks such as speech or images recognition, involve multi-dimensional input features that are characterized by strong internal dependencies between the dimensions of the input vector. We propose a novel quaternion recurrent neural network (QRNN), alongside with a quaternion long-short term memory neural network (QLSTM), that take into account both the external relations and these internal structural dependencies with the quaternion algebra. Similarly to capsules, quaternions allow the QRNN to code internal dependencies by composing and processing multidimensional features as single entities, while the recurrent operation reveals correlations between the elements composing the sequence. We show that both QRNN and QLSTM achieve better performances than RNN and LSTM in a realistic application of automatic speech recognition. Finally, we show that QRNN and QLSTM reduce by a maximum factor of 3.3x the number of free parameters needed, compared to real-valued RNNs and LSTMs to reach better results, leading to a more compact representation of the relevant information.
Tasks	Speech Recognition
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04418v3
PDF	http://arxiv.org/pdf/1806.04418v3.pdf
PWC	https://paperswithcode.com/paper/quaternion-recurrent-neural-networks
Repo	https://github.com/Riccardo-Vecchi/Pytorch-Quaternion-Neural-Networks
Framework	pytorch

Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates


Title	Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates
Authors	Taku Kudo
Abstract	Subword units are an effective way to alleviate the open vocabulary problems in neural machine translation (NMT). While sentences are usually converted into unique subword sequences, subword segmentation is potentially ambiguous and multiple segmentations are possible even with the same vocabulary. The question addressed in this paper is whether it is possible to harness the segmentation ambiguity as a noise to improve the robustness of NMT. We present a simple regularization method, subword regularization, which trains the model with multiple subword segmentations probabilistically sampled during training. In addition, for better subword sampling, we propose a new subword segmentation algorithm based on a unigram language model. We experiment with multiple corpora and report consistent improvements especially on low resource and out-of-domain settings.
Tasks	Language Modelling, Machine Translation
Published	2018-04-29
URL	http://arxiv.org/abs/1804.10959v1
PDF	http://arxiv.org/pdf/1804.10959v1.pdf
PWC	https://paperswithcode.com/paper/subword-regularization-improving-neural
Repo	https://github.com/Waino/OpenNMT-py
Framework	pytorch

Conditional Inference in Pre-trained Variational Autoencoders via Cross-coding


Title	Conditional Inference in Pre-trained Variational Autoencoders via Cross-coding
Authors	Ga Wu, Justin Domke, Scott Sanner
Abstract	Variational Autoencoders (VAEs) are a popular generative model, but one in which conditional inference can be challenging. If the decomposition into query and evidence variables is fixed, conditional VAEs provide an attractive solution. To support arbitrary queries, one is generally reduced to Markov Chain Monte Carlo sampling methods that can suffer from long mixing times. In this paper, we propose an idea we term cross-coding to approximate the distribution over the latent variables after conditioning on an evidence assignment to some subset of the variables. This allows generating query samples without retraining the full VAE. We experimentally evaluate three variations of cross-coding showing that (i) they can be quickly optimized for different decompositions of evidence and query and (ii) they quantitatively and qualitatively outperform Hamiltonian Monte Carlo.
Tasks
Published	2018-05-20
URL	http://arxiv.org/abs/1805.07785v2
PDF	http://arxiv.org/pdf/1805.07785v2.pdf
PWC	https://paperswithcode.com/paper/conditional-inference-in-pre-trained
Repo	https://github.com/wuga214/XCoder_VAE_Conditional_Inference
Framework	tf

What made you do this? Understanding black-box decisions with sufficient input subsets


Title	What made you do this? Understanding black-box decisions with sufficient input subsets
Authors	Brandon Carter, Jonas Mueller, Siddhartha Jain, David Gifford
Abstract	Local explanation frameworks aim to rationalize particular decisions made by a black-box prediction model. Existing techniques are often restricted to a specific type of predictor or based on input saliency, which may be undesirably sensitive to factors unrelated to the model’s decision making process. We instead propose sufficient input subsets that identify minimal subsets of features whose observed values alone suffice for the same decision to be reached, even if all other input feature values are missing. General principles that globally govern a model’s decision-making can also be revealed by searching for clusters of such input patterns across many data points. Our approach is conceptually straightforward, entirely model-agnostic, simply implemented using instance-wise backward selection, and able to produce more concise rationales than existing techniques. We demonstrate the utility of our interpretation method on various neural network models trained on text, image, and genomic data.
Tasks	Decision Making
Published	2018-10-09
URL	http://arxiv.org/abs/1810.03805v2
PDF	http://arxiv.org/pdf/1810.03805v2.pdf
PWC	https://paperswithcode.com/paper/what-made-you-do-this-understanding-black-box
Repo	https://github.com/b-carter/SufficientInputSubsets
Framework	tf

EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images


Title	EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images
Authors	Changha Shin, Hae-Gon Jeon, Youngjin Yoon, In So Kweon, Seon Joo Kim
Abstract	Light field cameras capture both the spatial and the angular properties of light rays in space. Due to its property, one can compute the depth from light fields in uncontrolled lighting environments, which is a big advantage over active sensing devices. Depth computed from light fields can be used for many applications including 3D modelling and refocusing. However, light field images from hand-held cameras have very narrow baselines with noise, making the depth estimation difficult. any approaches have been proposed to overcome these limitations for the light field depth estimation, but there is a clear trade-off between the accuracy and the speed in these methods. In this paper, we introduce a fast and accurate light field depth estimation method based on a fully-convolutional neural network. Our network is designed by considering the light field geometry and we also overcome the lack of training data by proposing light field specific data augmentation methods. We achieved the top rank in the HCI 4D Light Field Benchmark on most metrics, and we also demonstrate the effectiveness of the proposed method on real-world light-field images.
Tasks	Data Augmentation, Depth Estimation
Published	2018-04-06
URL	http://arxiv.org/abs/1804.02379v1
PDF	http://arxiv.org/pdf/1804.02379v1.pdf
PWC	https://paperswithcode.com/paper/epinet-a-fully-convolutional-neural-network
Repo	https://github.com/chshin10/epinet
Framework	tf

Teaching Machines to Code: Neural Markup Generation with Visual Attention


Title	Teaching Machines to Code: Neural Markup Generation with Visual Attention
Authors	Sumeet S. Singh
Abstract	We present a neural transducer model with visual attention that learns to generate LaTeX markup of a real-world math formula given its image. Applying sequence modeling and transduction techniques that have been very successful across modalities such as natural language, image, handwriting, speech and audio; we construct an image-to-markup model that learns to produce syntactically and semantically correct LaTeX markup code over 150 words long and achieves a BLEU score of 89%; improving upon the previous state-of-art for the Im2Latex problem. We also demonstrate with heat-map visualization how attention helps in interpreting the model and can pinpoint (detect and localize) symbols on the image accurately despite having been trained without any bounding box data.
Tasks
Published	2018-02-15
URL	http://arxiv.org/abs/1802.05415v2
PDF	http://arxiv.org/pdf/1802.05415v2.pdf
PWC	https://paperswithcode.com/paper/teaching-machines-to-code-neural-markup
Repo	https://github.com/untrix/im2latex
Framework	tf


Title	Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360° Panoramic Imagery
Authors	Grégoire Payen de La Garanderie, Amir Atapour Abarghouei, Toby P. Breckon
Abstract	Recent automotive vision work has focused almost exclusively on processing forward-facing cameras. However, future autonomous vehicles will not be viable without a more comprehensive surround sensing, akin to a human driver, as can be provided by 360{\deg} panoramic cameras. We present an approach to adapt contemporary deep network architectures developed on conventional rectilinear imagery to work on equirectangular 360{\deg} panoramic imagery. To address the lack of annotated panoramic automotive datasets availability, we adapt a contemporary automotive dataset, via style and projection transformations, to facilitate the cross-domain retraining of contemporary algorithms for panoramic imagery. Following this approach we retrain and adapt existing architectures to recover scene depth and 3D pose of vehicles from monocular panoramic imagery without any panoramic training labels or calibration parameters. Our approach is evaluated qualitatively on crowd-sourced panoramic images and quantitatively using an automotive environment simulator to provide the first benchmark for such techniques within panoramic imagery.
Tasks	3D Object Detection, Autonomous Vehicles, Calibration, Depth Estimation, Monocular Depth Estimation, Object Detection
Published	2018-08-19
URL	http://arxiv.org/abs/1808.06253v1
PDF	http://arxiv.org/pdf/1808.06253v1.pdf
PWC	https://paperswithcode.com/paper/eliminating-the-blind-spot-adapting-3d-object-1
Repo	https://github.com/gdlg/panoramic-depth-estimation
Framework	tf