October 20, 2019

2951 words 14 mins read

Paper Group AWR 260

Paper Group AWR 260

Read + Verify: Machine Reading Comprehension with Unanswerable Questions. Unsupervised Learning of Shape and Pose with Differentiable Point Clouds. Experiential Robot Learning with Accelerated Neuroevolution. A Gray Box Interpretable Visual Debugging Approach for Deep Sequence Learning Model. Fisher Efficient Inference of Intractable Models. The Na …

Read + Verify: Machine Reading Comprehension with Unanswerable Questions

Title Read + Verify: Machine Reading Comprehension with Unanswerable Questions
Authors Minghao Hu, Furu Wei, Yuxing Peng, Zhen Huang, Nan Yang, Dongsheng Li
Abstract Machine reading comprehension with unanswerable questions aims to abstain from answering when no answer can be inferred. In addition to extract answers, previous works usually predict an additional “no-answer” probability to detect unanswerable cases. However, they fail to validate the answerability of the question by verifying the legitimacy of the predicted answer. To address this problem, we propose a novel read-then-verify system, which not only utilizes a neural reader to extract candidate answers and produce no-answer probabilities, but also leverages an answer verifier to decide whether the predicted answer is entailed by the input snippets. Moreover, we introduce two auxiliary losses to help the reader better handle answer extraction as well as no-answer detection, and investigate three different architectures for the answer verifier. Our experiments on the SQuAD 2.0 dataset show that our system achieves a score of 74.2 F1 on the test set, achieving state-of-the-art results at the time of submission (Aug. 28th, 2018).
Tasks Machine Reading Comprehension, Question Answering, Reading Comprehension
Published 2018-08-17
URL http://arxiv.org/abs/1808.05759v5
PDF http://arxiv.org/pdf/1808.05759v5.pdf
PWC https://paperswithcode.com/paper/read-verify-machine-reading-comprehension
Repo https://github.com/yujiakimoto/mnemonic-reader
Framework pytorch

Unsupervised Learning of Shape and Pose with Differentiable Point Clouds

Title Unsupervised Learning of Shape and Pose with Differentiable Point Clouds
Authors Eldar Insafutdinov, Alexey Dosovitskiy
Abstract We address the problem of learning accurate 3D shape and camera pose from a collection of unlabeled category-specific images. We train a convolutional network to predict both the shape and the pose from a single image by minimizing the reprojection error: given several views of an object, the projections of the predicted shapes to the predicted camera poses should match the provided views. To deal with pose ambiguity, we introduce an ensemble of pose predictors which we then distill to a single “student” model. To allow for efficient learning of high-fidelity shapes, we represent the shapes by point clouds and devise a formulation allowing for differentiable projection of these. Our experiments show that the distilled ensemble of pose predictors learns to estimate the pose accurately, while the point cloud representation allows to predict detailed shape models. The supplementary video can be found at https://www.youtube.com/watch?v=LuIGovKeo60
Tasks 3D Pose Estimation
Published 2018-10-22
URL http://arxiv.org/abs/1810.09381v1
PDF http://arxiv.org/pdf/1810.09381v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-learning-of-shape-and-pose-with
Repo https://github.com/eldar/differentiable-point-clouds
Framework tf

Experiential Robot Learning with Accelerated Neuroevolution

Title Experiential Robot Learning with Accelerated Neuroevolution
Authors Ahmed Aly, Joanne B. Dugan
Abstract Derivative-based optimization techniques such as Stochastic Gradient Descent has been wildly successful in training deep neural networks. However, it has constraints such as end-to-end network differentiability. As an alternative, we present the Accelerated Neuroevolution algorithm. The new algorithm is aimed towards physical robotic learning tasks following the Experiential Robot Learning method. We test our algorithm first on a simulated task of playing the game Flappy Bird, then on a physical NAO robot in a static Object Centering task. The agents successfully navigate the given tasks, in a relatively low number of generations. Based on our results, we propose to use the algorithm in more complex tasks.
Tasks
Published 2018-08-16
URL http://arxiv.org/abs/1808.05525v1
PDF http://arxiv.org/pdf/1808.05525v1.pdf
PWC https://paperswithcode.com/paper/experiential-robot-learning-with-accelerated
Repo https://github.com/AroMorin/DNNOP
Framework pytorch

A Gray Box Interpretable Visual Debugging Approach for Deep Sequence Learning Model

Title A Gray Box Interpretable Visual Debugging Approach for Deep Sequence Learning Model
Authors Md Mofijul Islam, Amar Debnath, Tahsin Al Sayeed, Jyotirmay Nag Setu, Md Mahmudur Rahman, Md Sadman Sakib, Md Abdur Razzaque, Md. Mosaddek Khan, Swakkhar Shatabda
Abstract Deep Learning algorithms are often used as black box type learning and they are too complex to understand. The widespread usability of Deep Learning algorithms to solve various machine learning problems demands deep and transparent understanding of the internal representation as well as decision making. Moreover, the learning models, trained on sequential data, such as audio and video data, have intricate internal reasoning process due to their complex distribution of features. Thus, a visual simulator might be helpful to trace the internal decision making mechanisms in response to adversarial input data, and it would help to debug and design appropriate deep learning models. However, interpreting the internal reasoning of deep learning model is not well studied in the literature. In this work, we have developed a visual interactive web application, namely d-DeVIS, which helps to visualize the internal reasoning of the learning model which is trained on the audio data. The proposed system allows to perceive the behavior as well as to debug the model by interactively generating adversarial audio data point. The web application of d-DeVIS is available at ddevis.herokuapp.com.
Tasks Decision Making
Published 2018-11-20
URL http://arxiv.org/abs/1811.08374v1
PDF http://arxiv.org/pdf/1811.08374v1.pdf
PWC https://paperswithcode.com/paper/a-gray-box-interpretable-visual-debugging
Repo https://github.com/anon-conf/d-DeVIS
Framework none

Fisher Efficient Inference of Intractable Models

Title Fisher Efficient Inference of Intractable Models
Authors Song Liu, Takafumi Kanamori, Wittawat Jitkrittum, Yu Chen
Abstract Maximum Likelihood Estimators (MLE) has many good properties. For example, the asymptotic variance of MLE solution attains equality of the asymptotic Cram{'e}r-Rao lower bound (efficiency bound), which is the minimum possible variance for an unbiased estimator. However, obtaining such MLE solution requires calculating the likelihood function which may not be tractable due to the normalization term of the density model. In this paper, we derive a Discriminative Likelihood Estimator (DLE) from the Kullback-Leibler divergence minimization criterion implemented via density ratio estimation and a Stein operator. We study the problem of model inference using DLE. We prove its consistency and show that the asymptotic variance of its solution can attain the equality of the efficiency bound under mild regularity conditions. We also propose a dual formulation of DLE which can be easily optimized. Numerical studies validate our asymptotic theorems and we give an example where DLE successfully estimates an intractable model constructed using a pre-trained deep neural network.
Tasks
Published 2018-05-18
URL https://arxiv.org/abs/1805.07454v5
PDF https://arxiv.org/pdf/1805.07454v5.pdf
PWC https://paperswithcode.com/paper/fisher-efficient-inference-of-intractable
Repo https://github.com/anewgithubname/Stein-Density-Ratio-Estimation
Framework none

The Natural Language Decathlon: Multitask Learning as Question Answering

Title The Natural Language Decathlon: Multitask Learning as Question Answering
Authors Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher
Abstract Deep learning has improved performance on many natural language processing (NLP) tasks individually. However, general NLP models cannot emerge within a paradigm that focuses on the particularities of a single metric, dataset, and task. We introduce the Natural Language Decathlon (decaNLP), a challenge that spans ten tasks: question answering, machine translation, summarization, natural language inference, sentiment analysis, semantic role labeling, zero-shot relation extraction, goal-oriented dialogue, semantic parsing, and commonsense pronoun resolution. We cast all tasks as question answering over a context. Furthermore, we present a new Multitask Question Answering Network (MQAN) jointly learns all tasks in decaNLP without any task-specific modules or parameters in the multitask setting. MQAN shows improvements in transfer learning for machine translation and named entity recognition, domain adaptation for sentiment analysis and natural language inference, and zero-shot capabilities for text classification. We demonstrate that the MQAN’s multi-pointer-generator decoder is key to this success and performance further improves with an anti-curriculum training strategy. Though designed for decaNLP, MQAN also achieves state of the art results on the WikiSQL semantic parsing task in the single-task setting. We also release code for procuring and processing data, training and evaluating models, and reproducing all experiments for decaNLP.
Tasks Domain Adaptation, Machine Translation, Named Entity Recognition, Natural Language Inference, Question Answering, Relation Extraction, Semantic Parsing, Semantic Role Labeling, Sentiment Analysis, Text Classification, Transfer Learning
Published 2018-06-20
URL http://arxiv.org/abs/1806.08730v1
PDF http://arxiv.org/pdf/1806.08730v1.pdf
PWC https://paperswithcode.com/paper/the-natural-language-decathlon-multitask
Repo https://github.com/cheng-Ye/daguan-competition-master
Framework none

Bayesian Sparsification of Gated Recurrent Neural Networks

Title Bayesian Sparsification of Gated Recurrent Neural Networks
Authors Ekaterina Lobacheva, Nadezhda Chirkova, Dmitry Vetrov
Abstract Bayesian methods have been successfully applied to sparsify weights of neural networks and to remove structure units from the networks, e. g. neurons. We apply and further develop this approach for gated recurrent architectures. Specifically, in addition to sparsification of individual weights and neurons, we propose to sparsify preactivations of gates and information flow in LSTM. It makes some gates and information flow components constant, speeds up forward pass and improves compression. Moreover, the resulting structure of gate sparsity is interpretable and depends on the task. Code is available on github: https://github.com/tipt0p/SparseBayesianRNN
Tasks
Published 2018-12-12
URL http://arxiv.org/abs/1812.05692v1
PDF http://arxiv.org/pdf/1812.05692v1.pdf
PWC https://paperswithcode.com/paper/bayesian-sparsification-of-gated-recurrent
Repo https://github.com/tipt0p/SparseBayesianRNN
Framework none

Deep Private-Feature Extraction

Title Deep Private-Feature Extraction
Authors Seyed Ali Osia, Ali Taheri, Ali Shahin Shamsabadi, Kleomenis Katevas, Hamed Haddadi, Hamid R. Rabiee
Abstract We present and evaluate Deep Private-Feature Extractor (DPFE), a deep model which is trained and evaluated based on information theoretic constraints. Using the selective exchange of information between a user’s device and a service provider, DPFE enables the user to prevent certain sensitive information from being shared with a service provider, while allowing them to extract approved information using their model. We introduce and utilize the log-rank privacy, a novel measure to assess the effectiveness of DPFE in removing sensitive information and compare different models based on their accuracy-privacy tradeoff. We then implement and evaluate the performance of DPFE on smartphones to understand its complexity, resource demands, and efficiency tradeoffs. Our results on benchmark image datasets demonstrate that under moderate resource utilization, DPFE can achieve high accuracy for primary tasks while preserving the privacy of sensitive features.
Tasks
Published 2018-02-09
URL http://arxiv.org/abs/1802.03151v2
PDF http://arxiv.org/pdf/1802.03151v2.pdf
PWC https://paperswithcode.com/paper/deep-private-feature-extraction
Repo https://github.com/aliosia/DPFE
Framework caffe2

“Bilingual Expert” Can Find Translation Errors

Title “Bilingual Expert” Can Find Translation Errors
Authors Kai Fan, Jiayi Wang, Bo Li, Fengming Zhou, Boxing Chen, Luo Si
Abstract Recent advances in statistical machine translation via the adoption of neural sequence-to-sequence models empower the end-to-end system to achieve state-of-the-art in many WMT benchmarks. The performance of such machine translation (MT) system is usually evaluated by automatic metric BLEU when the golden references are provided for validation. However, for model inference or production deployment, the golden references are prohibitively available or require expensive human annotation with bilingual expertise. In order to address the issue of quality evaluation (QE) without reference, we propose a general framework for automatic evaluation of translation output for most WMT quality evaluation tasks. We first build a conditional target language model with a novel bidirectional transformer, named neural bilingual expert model, which is pre-trained on large parallel corpora for feature extraction. For QE inference, the bilingual expert model can simultaneously produce the joint latent representation between the source and the translation, and real-valued measurements of possible erroneous tokens based on the prior knowledge learned from parallel data. Subsequently, the features will further be fed into a simple Bi-LSTM predictive model for quality evaluation. The experimental results show that our approach achieves the state-of-the-art performance in the quality estimation track of WMT 2017/2018.
Tasks Language Modelling, Machine Translation
Published 2018-07-25
URL http://arxiv.org/abs/1807.09433v3
PDF http://arxiv.org/pdf/1807.09433v3.pdf
PWC https://paperswithcode.com/paper/bilingual-expert-can-find-translation-errors
Repo https://github.com/lovecambi/qebrain
Framework tf

Actor-Attention-Critic for Multi-Agent Reinforcement Learning

Title Actor-Attention-Critic for Multi-Agent Reinforcement Learning
Authors Shariq Iqbal, Fei Sha
Abstract Reinforcement learning in multi-agent scenarios is important for real-world applications but presents challenges beyond those seen in single-agent settings. We present an actor-critic algorithm that trains decentralized policies in multi-agent settings, using centrally computed critics that share an attention mechanism which selects relevant information for each agent at every timestep. This attention mechanism enables more effective and scalable learning in complex multi-agent environments, when compared to recent approaches. Our approach is applicable not only to cooperative settings with shared rewards, but also individualized reward settings, including adversarial settings, as well as settings that do not provide global states, and it makes no assumptions about the action spaces of the agents. As such, it is flexible enough to be applied to most multi-agent learning problems.
Tasks Multi-agent Reinforcement Learning
Published 2018-10-05
URL https://arxiv.org/abs/1810.02912v2
PDF https://arxiv.org/pdf/1810.02912v2.pdf
PWC https://paperswithcode.com/paper/actor-attention-critic-for-multi-agent
Repo https://github.com/shariqiqbal2810/MAAC
Framework pytorch

A Less Biased Evaluation of Out-of-distribution Sample Detectors

Title A Less Biased Evaluation of Out-of-distribution Sample Detectors
Authors Alireza Shafaei, Mark Schmidt, James J. Little
Abstract In the real world, a learning system could receive an input that is unlike anything it has seen during training. Unfortunately, out-of-distribution samples can lead to unpredictable behaviour. We need to know whether any given input belongs to the population distribution of the training/evaluation data to prevent unpredictable behaviour in deployed systems. A recent surge of interest in this problem has led to the development of sophisticated techniques in the deep learning literature. However, due to the absence of a standard problem definition or an exhaustive evaluation, it is not evident if we can rely on these methods. What makes this problem different from a typical supervised learning setting is that the distribution of outliers used in training may not be the same as the distribution of outliers encountered in the application. Classical approaches that learn inliers vs. outliers with only two datasets can yield optimistic results. We introduce OD-test, a three-dataset evaluation scheme as a more reliable strategy to assess progress on this problem. We present an exhaustive evaluation of a broad set of methods from related areas on image classification tasks. Contrary to the existing results, we show that for realistic applications of high-dimensional images the previous techniques have low accuracy and are not reliable in practice.
Tasks Image Classification
Published 2018-09-13
URL https://arxiv.org/abs/1809.04729v2
PDF https://arxiv.org/pdf/1809.04729v2.pdf
PWC https://paperswithcode.com/paper/does-your-model-know-the-digit-6-is-not-a-cat
Repo https://github.com/ashafaei/OD-test
Framework pytorch

What can I do here? Leveraging Deep 3D saliency and geometry for fast and scalable multiple affordance detection

Title What can I do here? Leveraging Deep 3D saliency and geometry for fast and scalable multiple affordance detection
Authors Eduardo Ruiz, Walterio Mayol-Cuevas
Abstract This paper develops and evaluates a novel method that allows for the detection of affordances in a scalable and multiple-instance manner on visually recovered pointclouds. Our approach has many advantages over alternative methods, as it is based on highly parallelizable, one-shot learning that is fast in commodity hardware. The approach is hybrid in that it uses a geometric representation together with a state-of-the-art deep learning method capable of identifying 3D scene saliency. The geometric component allows for a compact and efficient representation, boosting the performance of the deep network architecture which proved insufficient on its own. Moreover, our approach allows not only to predict whether an input scene affords or not the interactions, but also the pose of the objects that allow these interactions to take place. Our predictions align well with crowd-sourced human judgment as they are preferred with 87% probability, show high rates of improvement with almost four times (4x) better performance over a deep learning-only baseline and are seven times (7x) faster than previous art.
Tasks Multiple Affordance Detection, One-Shot Learning
Published 2018-12-03
URL http://arxiv.org/abs/1812.00889v1
PDF http://arxiv.org/pdf/1812.00889v1.pdf
PWC https://paperswithcode.com/paper/what-can-i-do-here-leveraging-deep-3d
Repo https://github.com/eduard626/deep-interaction-tensor
Framework none

A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music

Title A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music
Authors Adam Roberts, Jesse Engel, Colin Raffel, Curtis Hawthorne, Douglas Eck
Abstract The Variational Autoencoder (VAE) has proven to be an effective model for producing semantically meaningful latent representations for natural data. However, it has thus far seen limited application to sequential data, and, as we demonstrate, existing recurrent VAE models have difficulty modeling sequences with long-term structure. To address this issue, we propose the use of a hierarchical decoder, which first outputs embeddings for subsequences of the input and then uses these embeddings to generate each subsequence independently. This structure encourages the model to utilize its latent code, thereby avoiding the “posterior collapse” problem, which remains an issue for recurrent VAEs. We apply this architecture to modeling sequences of musical notes and find that it exhibits dramatically better sampling, interpolation, and reconstruction performance than a “flat” baseline model. An implementation of our “MusicVAE” is available online at http://g.co/magenta/musicvae-code.
Tasks
Published 2018-03-13
URL https://arxiv.org/abs/1803.05428v5
PDF https://arxiv.org/pdf/1803.05428v5.pdf
PWC https://paperswithcode.com/paper/a-hierarchical-latent-vector-model-for
Repo https://github.com/dkoh0207/CS221-Project
Framework pytorch

Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora

Title Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora
Authors Stephen Roller, Douwe Kiela, Maximilian Nickel
Abstract Methods for unsupervised hypernym detection may broadly be categorized according to two paradigms: pattern-based and distributional methods. In this paper, we study the performance of both approaches on several hypernymy tasks and find that simple pattern-based methods consistently outperform distributional methods on common benchmark datasets. Our results show that pattern-based models provide important contextual constraints which are not yet captured in distributional methods.
Tasks
Published 2018-06-08
URL http://arxiv.org/abs/1806.03191v1
PDF http://arxiv.org/pdf/1806.03191v1.pdf
PWC https://paperswithcode.com/paper/hearst-patterns-revisited-automatic-hypernym
Repo https://github.com/facebookresearch/hypernymysuite
Framework none

Stochastic Gradient MCMC for State Space Models

Title Stochastic Gradient MCMC for State Space Models
Authors Christopher Aicher, Yi-An Ma, Nicholas J. Foti, Emily B. Fox
Abstract State space models (SSMs) are a flexible approach to modeling complex time series. However, inference in SSMs is often computationally prohibitive for long time series. Stochastic gradient MCMC (SGMCMC) is a popular method for scalable Bayesian inference for large independent data. Unfortunately when applied to dependent data, such as in SSMs, SGMCMC’s stochastic gradient estimates are biased as they break crucial temporal dependencies. To alleviate this, we propose stochastic gradient estimators that control this bias by performing additional computation in a `buffer’ to reduce breaking dependencies. Furthermore, we derive error bounds for this bias and show a geometric decay under mild conditions. Using these estimators, we develop novel SGMCMC samplers for discrete, continuous and mixed-type SSMs with analytic message passing. Our experiments on real and synthetic data demonstrate the effectiveness of our SGMCMC algorithms compared to batch MCMC, allowing us to scale inference to long time series with millions of time points. |
Tasks Bayesian Inference, Time Series
Published 2018-10-22
URL https://arxiv.org/abs/1810.09098v2
PDF https://arxiv.org/pdf/1810.09098v2.pdf
PWC https://paperswithcode.com/paper/stochastic-gradient-mcmc-for-state-space
Repo https://github.com/aicherc/sgmcmc_ssm_code
Framework none
comments powered by Disqus