Paper Group AWR 101
Fast yet Simple Natural-Gradient Descent for Variational Inference in Complex Models. How to train your MAML. A Corpus for Reasoning About Natural Language Grounded in Photographs. Collective Entity Disambiguation with Structured Gradient Tree Boosting. Sampling Theory for Graph Signals on Product Graphs. Exploring the Semantic Content of Unsupervi …
Fast yet Simple Natural-Gradient Descent for Variational Inference in Complex Models
Title | Fast yet Simple Natural-Gradient Descent for Variational Inference in Complex Models |
Authors | Mohammad Emtiyaz Khan, Didrik Nielsen |
Abstract | Bayesian inference plays an important role in advancing machine learning, but faces computational challenges when applied to complex models such as deep neural networks. Variational inference circumvents these challenges by formulating Bayesian inference as an optimization problem and solving it using gradient-based optimization. In this paper, we argue in favor of natural-gradient approaches which, unlike their gradient-based counterparts, can improve convergence by exploiting the information geometry of the solutions. We show how to derive fast yet simple natural-gradient updates by using a duality associated with exponential-family distributions. An attractive feature of these methods is that, by using natural-gradients, they are able to extract accurate local approximations for individual model components. We summarize recent results for Bayesian deep learning showing the superiority of natural-gradient approaches over their gradient counterparts. |
Tasks | Bayesian Inference |
Published | 2018-07-12 |
URL | http://arxiv.org/abs/1807.04489v2 |
http://arxiv.org/pdf/1807.04489v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-yet-simple-natural-gradient-descent-for |
Repo | https://github.com/ssggreg/active_learning |
Framework | pytorch |
How to train your MAML
Title | How to train your MAML |
Authors | Antreas Antoniou, Harrison Edwards, Amos Storkey |
Abstract | The field of few-shot learning has recently seen substantial advancements. Most of these advancements came from casting few-shot learning as a meta-learning problem. Model Agnostic Meta Learning or MAML is currently one of the best approaches for few-shot learning via meta-learning. MAML is simple, elegant and very powerful, however, it has a variety of issues, such as being very sensitive to neural network architectures, often leading to instability during training, requiring arduous hyperparameter searches to stabilize training and achieve high generalization and being very computationally expensive at both training and inference times. In this paper, we propose various modifications to MAML that not only stabilize the system, but also substantially improve the generalization performance, convergence speed and computational overhead of MAML, which we call MAML++. |
Tasks | Few-Shot Image Classification, Few-Shot Learning, Meta-Learning |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09502v3 |
http://arxiv.org/pdf/1810.09502v3.pdf | |
PWC | https://paperswithcode.com/paper/how-to-train-your-maml |
Repo | https://github.com/AntreasAntoniou/HowToTrainYourMAMLPytorch |
Framework | pytorch |
A Corpus for Reasoning About Natural Language Grounded in Photographs
Title | A Corpus for Reasoning About Natural Language Grounded in Photographs |
Authors | Alane Suhr, Stephanie Zhou, Ally Zhang, Iris Zhang, Huajun Bai, Yoav Artzi |
Abstract | We introduce a new dataset for joint reasoning about natural language and images, with a focus on semantic diversity, compositionality, and visual reasoning challenges. The data contains 107,292 examples of English sentences paired with web photographs. The task is to determine whether a natural language caption is true about a pair of photographs. We crowdsource the data using sets of visually rich images and a compare-and-contrast task to elicit linguistically diverse language. Qualitative analysis shows the data requires compositional joint reasoning, including about quantities, comparisons, and relations. Evaluation using state-of-the-art visual reasoning methods shows the data presents a strong challenge. |
Tasks | Visual Reasoning |
Published | 2018-11-01 |
URL | https://arxiv.org/abs/1811.00491v3 |
https://arxiv.org/pdf/1811.00491v3.pdf | |
PWC | https://paperswithcode.com/paper/a-corpus-for-reasoning-about-natural-language |
Repo | https://github.com/vortexJCH/nlvr |
Framework | none |
Collective Entity Disambiguation with Structured Gradient Tree Boosting
Title | Collective Entity Disambiguation with Structured Gradient Tree Boosting |
Authors | Yi Yang, Ozan Irsoy, Kazi Shefaet Rahman |
Abstract | We present a gradient-tree-boosting-based structured learning model for jointly disambiguating named entities in a document. Gradient tree boosting is a widely used machine learning algorithm that underlies many top-performing natural language processing systems. Surprisingly, most works limit the use of gradient tree boosting as a tool for regular classification or regression problems, despite the structured nature of language. To the best of our knowledge, our work is the first one that employs the structured gradient tree boosting (SGTB) algorithm for collective entity disambiguation. By defining global features over previous disambiguation decisions and jointly modeling them with local features, our system is able to produce globally optimized entity assignments for mentions in a document. Exact inference is prohibitively expensive for our globally normalized model. To solve this problem, we propose Bidirectional Beam Search with Gold path (BiBSG), an approximate inference algorithm that is a variant of the standard beam search algorithm. BiBSG makes use of global information from both past and future to perform better local search. Experiments on standard benchmark datasets show that SGTB significantly improves upon published results. Specifically, SGTB outperforms the previous state-of-the-art neural system by near 1% absolute accuracy on the popular AIDA-CoNLL dataset. |
Tasks | Entity Disambiguation |
Published | 2018-02-28 |
URL | http://arxiv.org/abs/1802.10229v2 |
http://arxiv.org/pdf/1802.10229v2.pdf | |
PWC | https://paperswithcode.com/paper/collective-entity-disambiguation-with |
Repo | https://github.com/bloomberg/sgtb |
Framework | none |
Sampling Theory for Graph Signals on Product Graphs
Title | Sampling Theory for Graph Signals on Product Graphs |
Authors | Rohan Varma, Jelena Kovačević |
Abstract | In this paper, we extend the sampling theory on graphs by constructing a framework that exploits the structure in product graphs for efficient sampling and recovery of bandlimited graph signals that lie on them. Product graphs are graphs that are composed from smaller graph atoms; we motivate how this model is a flexible and useful way to model richer classes of data that can be multi-modal in nature. Previous works have established a sampling theory on graphs for bandlimited signals. Importantly, the framework achieves significant savings in both sample complexity and computational complexity |
Tasks | |
Published | 2018-09-26 |
URL | http://arxiv.org/abs/1809.10049v1 |
http://arxiv.org/pdf/1809.10049v1.pdf | |
PWC | https://paperswithcode.com/paper/sampling-theory-for-graph-signals-on-product |
Repo | https://github.com/CrowdArt/node-chat-app |
Framework | none |
Exploring the Semantic Content of Unsupervised Graph Embeddings: An Empirical Study
Title | Exploring the Semantic Content of Unsupervised Graph Embeddings: An Empirical Study |
Authors | Stephen Bonner, Ibad Kureshi, John Brennan, Georgios Theodoropoulos, Andrew Stephen McGough, Boguslaw Obara |
Abstract | Graph embeddings have become a key and widely used technique within the field of graph mining, proving to be successful across a broad range of domains including social, citation, transportation and biological. Graph embedding techniques aim to automatically create a low-dimensional representation of a given graph, which captures key structural elements in the resulting embedding space. However, to date, there has been little work exploring exactly which topological structures are being learned in the embeddings process. In this paper, we investigate if graph embeddings are approximating something analogous with traditional vertex level graph features. If such a relationship can be found, it could be used to provide a theoretical insight into how graph embedding approaches function. We perform this investigation by predicting known topological features, using supervised and unsupervised methods, directly from the embedding space. If a mapping between the embeddings and topological features can be found, then we argue that the structural information encapsulated by the features is represented in the embedding space. To explore this, we present extensive experimental evaluation from five state-of-the-art unsupervised graph embedding techniques, across a range of empirical graph datasets, measuring a selection of topological features. We demonstrate that several topological features are indeed being approximated by the embedding space, allowing key insight into how graph embeddings create good representations. |
Tasks | Graph Embedding |
Published | 2018-06-19 |
URL | http://arxiv.org/abs/1806.07464v1 |
http://arxiv.org/pdf/1806.07464v1.pdf | |
PWC | https://paperswithcode.com/paper/exploring-the-semantic-content-of |
Repo | https://github.com/sbonner0/unsupervised-graph-embeddings |
Framework | tf |
Learning Private Neural Language Modeling with Attentive Aggregation
Title | Learning Private Neural Language Modeling with Attentive Aggregation |
Authors | Shaoxiong Ji, Shirui Pan, Guodong Long, Xue Li, Jing Jiang, Zi Huang |
Abstract | Mobile keyboard suggestion is typically regarded as a word-level language modeling problem. Centralized machine learning technique requires massive user data collected to train on, which may impose privacy concerns for sensitive personal typing data of users. Federated learning (FL) provides a promising approach to learning private language modeling for intelligent personalized keyboard suggestion by training models in distributed clients rather than training in a central server. To obtain a global model for prediction, existing FL algorithms simply average the client models and ignore the importance of each client during model aggregation. Furthermore, there is no optimization for learning a well-generalized global model on the central server. To solve these problems, we propose a novel model aggregation with the attention mechanism considering the contribution of clients models to the global model, together with an optimization technique during server aggregation. Our proposed attentive aggregation method minimizes the weighted distance between the server model and client models through iterative parameters updating while attends the distance between the server model and client models. Through experiments on two popular language modeling datasets and a social media dataset, our proposed method outperforms its counterparts in terms of perplexity and communication cost in most settings of comparison. |
Tasks | Language Modelling |
Published | 2018-12-17 |
URL | http://arxiv.org/abs/1812.07108v2 |
http://arxiv.org/pdf/1812.07108v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-private-neural-language-modeling |
Repo | https://github.com/shaoxiongji/fed-att |
Framework | pytorch |
Meta-Learning Probabilistic Inference For Prediction
Title | Meta-Learning Probabilistic Inference For Prediction |
Authors | Jonathan Gordon, John Bronskill, Matthias Bauer, Sebastian Nowozin, Richard E. Turner |
Abstract | This paper introduces a new framework for data efficient and versatile learning. Specifically: 1) We develop ML-PIP, a general framework for Meta-Learning approximate Probabilistic Inference for Prediction. ML-PIP extends existing probabilistic interpretations of meta-learning to cover a broad class of methods. 2) We introduce VERSA, an instance of the framework employing a flexible and versatile amortization network that takes few-shot learning datasets as inputs, with arbitrary numbers of shots, and outputs a distribution over task-specific parameters in a single forward pass. VERSA substitutes optimization at test time with forward passes through inference networks, amortizing the cost of inference and relieving the need for second derivatives during training. 3) We evaluate VERSA on benchmark datasets where the method sets new state-of-the-art results, handles arbitrary numbers of shots, and for classification, arbitrary numbers of classes at train and test time. The power of the approach is then demonstrated through a challenging few-shot ShapeNet view reconstruction task. |
Tasks | Few-Shot Learning, Meta-Learning |
Published | 2018-05-24 |
URL | https://arxiv.org/abs/1805.09921v4 |
https://arxiv.org/pdf/1805.09921v4.pdf | |
PWC | https://paperswithcode.com/paper/meta-learning-probabilistic-inference-for |
Repo | https://github.com/Gordonjo/versa |
Framework | tf |
Quaternion Recurrent Neural Networks
Title | Quaternion Recurrent Neural Networks |
Authors | Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Chiheb Trabelsi, Renato De Mori, Yoshua Bengio |
Abstract | Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence. Nonetheless, popular tasks such as speech or images recognition, involve multi-dimensional input features that are characterized by strong internal dependencies between the dimensions of the input vector. We propose a novel quaternion recurrent neural network (QRNN), alongside with a quaternion long-short term memory neural network (QLSTM), that take into account both the external relations and these internal structural dependencies with the quaternion algebra. Similarly to capsules, quaternions allow the QRNN to code internal dependencies by composing and processing multidimensional features as single entities, while the recurrent operation reveals correlations between the elements composing the sequence. We show that both QRNN and QLSTM achieve better performances than RNN and LSTM in a realistic application of automatic speech recognition. Finally, we show that QRNN and QLSTM reduce by a maximum factor of 3.3x the number of free parameters needed, compared to real-valued RNNs and LSTMs to reach better results, leading to a more compact representation of the relevant information. |
Tasks | Speech Recognition |
Published | 2018-06-12 |
URL | http://arxiv.org/abs/1806.04418v3 |
http://arxiv.org/pdf/1806.04418v3.pdf | |
PWC | https://paperswithcode.com/paper/quaternion-recurrent-neural-networks |
Repo | https://github.com/Riccardo-Vecchi/Pytorch-Quaternion-Neural-Networks |
Framework | pytorch |
Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates
Title | Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates |
Authors | Taku Kudo |
Abstract | Subword units are an effective way to alleviate the open vocabulary problems in neural machine translation (NMT). While sentences are usually converted into unique subword sequences, subword segmentation is potentially ambiguous and multiple segmentations are possible even with the same vocabulary. The question addressed in this paper is whether it is possible to harness the segmentation ambiguity as a noise to improve the robustness of NMT. We present a simple regularization method, subword regularization, which trains the model with multiple subword segmentations probabilistically sampled during training. In addition, for better subword sampling, we propose a new subword segmentation algorithm based on a unigram language model. We experiment with multiple corpora and report consistent improvements especially on low resource and out-of-domain settings. |
Tasks | Language Modelling, Machine Translation |
Published | 2018-04-29 |
URL | http://arxiv.org/abs/1804.10959v1 |
http://arxiv.org/pdf/1804.10959v1.pdf | |
PWC | https://paperswithcode.com/paper/subword-regularization-improving-neural |
Repo | https://github.com/Waino/OpenNMT-py |
Framework | pytorch |
Conditional Inference in Pre-trained Variational Autoencoders via Cross-coding
Title | Conditional Inference in Pre-trained Variational Autoencoders via Cross-coding |
Authors | Ga Wu, Justin Domke, Scott Sanner |
Abstract | Variational Autoencoders (VAEs) are a popular generative model, but one in which conditional inference can be challenging. If the decomposition into query and evidence variables is fixed, conditional VAEs provide an attractive solution. To support arbitrary queries, one is generally reduced to Markov Chain Monte Carlo sampling methods that can suffer from long mixing times. In this paper, we propose an idea we term cross-coding to approximate the distribution over the latent variables after conditioning on an evidence assignment to some subset of the variables. This allows generating query samples without retraining the full VAE. We experimentally evaluate three variations of cross-coding showing that (i) they can be quickly optimized for different decompositions of evidence and query and (ii) they quantitatively and qualitatively outperform Hamiltonian Monte Carlo. |
Tasks | |
Published | 2018-05-20 |
URL | http://arxiv.org/abs/1805.07785v2 |
http://arxiv.org/pdf/1805.07785v2.pdf | |
PWC | https://paperswithcode.com/paper/conditional-inference-in-pre-trained |
Repo | https://github.com/wuga214/XCoder_VAE_Conditional_Inference |
Framework | tf |
What made you do this? Understanding black-box decisions with sufficient input subsets
Title | What made you do this? Understanding black-box decisions with sufficient input subsets |
Authors | Brandon Carter, Jonas Mueller, Siddhartha Jain, David Gifford |
Abstract | Local explanation frameworks aim to rationalize particular decisions made by a black-box prediction model. Existing techniques are often restricted to a specific type of predictor or based on input saliency, which may be undesirably sensitive to factors unrelated to the model’s decision making process. We instead propose sufficient input subsets that identify minimal subsets of features whose observed values alone suffice for the same decision to be reached, even if all other input feature values are missing. General principles that globally govern a model’s decision-making can also be revealed by searching for clusters of such input patterns across many data points. Our approach is conceptually straightforward, entirely model-agnostic, simply implemented using instance-wise backward selection, and able to produce more concise rationales than existing techniques. We demonstrate the utility of our interpretation method on various neural network models trained on text, image, and genomic data. |
Tasks | Decision Making |
Published | 2018-10-09 |
URL | http://arxiv.org/abs/1810.03805v2 |
http://arxiv.org/pdf/1810.03805v2.pdf | |
PWC | https://paperswithcode.com/paper/what-made-you-do-this-understanding-black-box |
Repo | https://github.com/b-carter/SufficientInputSubsets |
Framework | tf |
EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images
Title | EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images |
Authors | Changha Shin, Hae-Gon Jeon, Youngjin Yoon, In So Kweon, Seon Joo Kim |
Abstract | Light field cameras capture both the spatial and the angular properties of light rays in space. Due to its property, one can compute the depth from light fields in uncontrolled lighting environments, which is a big advantage over active sensing devices. Depth computed from light fields can be used for many applications including 3D modelling and refocusing. However, light field images from hand-held cameras have very narrow baselines with noise, making the depth estimation difficult. any approaches have been proposed to overcome these limitations for the light field depth estimation, but there is a clear trade-off between the accuracy and the speed in these methods. In this paper, we introduce a fast and accurate light field depth estimation method based on a fully-convolutional neural network. Our network is designed by considering the light field geometry and we also overcome the lack of training data by proposing light field specific data augmentation methods. We achieved the top rank in the HCI 4D Light Field Benchmark on most metrics, and we also demonstrate the effectiveness of the proposed method on real-world light-field images. |
Tasks | Data Augmentation, Depth Estimation |
Published | 2018-04-06 |
URL | http://arxiv.org/abs/1804.02379v1 |
http://arxiv.org/pdf/1804.02379v1.pdf | |
PWC | https://paperswithcode.com/paper/epinet-a-fully-convolutional-neural-network |
Repo | https://github.com/chshin10/epinet |
Framework | tf |
Teaching Machines to Code: Neural Markup Generation with Visual Attention
Title | Teaching Machines to Code: Neural Markup Generation with Visual Attention |
Authors | Sumeet S. Singh |
Abstract | We present a neural transducer model with visual attention that learns to generate LaTeX markup of a real-world math formula given its image. Applying sequence modeling and transduction techniques that have been very successful across modalities such as natural language, image, handwriting, speech and audio; we construct an image-to-markup model that learns to produce syntactically and semantically correct LaTeX markup code over 150 words long and achieves a BLEU score of 89%; improving upon the previous state-of-art for the Im2Latex problem. We also demonstrate with heat-map visualization how attention helps in interpreting the model and can pinpoint (detect and localize) symbols on the image accurately despite having been trained without any bounding box data. |
Tasks | |
Published | 2018-02-15 |
URL | http://arxiv.org/abs/1802.05415v2 |
http://arxiv.org/pdf/1802.05415v2.pdf | |
PWC | https://paperswithcode.com/paper/teaching-machines-to-code-neural-markup |
Repo | https://github.com/untrix/im2latex |
Framework | tf |
Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360° Panoramic Imagery
Title | Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360° Panoramic Imagery |
Authors | Grégoire Payen de La Garanderie, Amir Atapour Abarghouei, Toby P. Breckon |
Abstract | Recent automotive vision work has focused almost exclusively on processing forward-facing cameras. However, future autonomous vehicles will not be viable without a more comprehensive surround sensing, akin to a human driver, as can be provided by 360{\deg} panoramic cameras. We present an approach to adapt contemporary deep network architectures developed on conventional rectilinear imagery to work on equirectangular 360{\deg} panoramic imagery. To address the lack of annotated panoramic automotive datasets availability, we adapt a contemporary automotive dataset, via style and projection transformations, to facilitate the cross-domain retraining of contemporary algorithms for panoramic imagery. Following this approach we retrain and adapt existing architectures to recover scene depth and 3D pose of vehicles from monocular panoramic imagery without any panoramic training labels or calibration parameters. Our approach is evaluated qualitatively on crowd-sourced panoramic images and quantitatively using an automotive environment simulator to provide the first benchmark for such techniques within panoramic imagery. |
Tasks | 3D Object Detection, Autonomous Vehicles, Calibration, Depth Estimation, Monocular Depth Estimation, Object Detection |
Published | 2018-08-19 |
URL | http://arxiv.org/abs/1808.06253v1 |
http://arxiv.org/pdf/1808.06253v1.pdf | |
PWC | https://paperswithcode.com/paper/eliminating-the-blind-spot-adapting-3d-object-1 |
Repo | https://github.com/gdlg/panoramic-depth-estimation |
Framework | tf |