October 20, 2019

2951 words 14 mins read

Paper Group AWR 260

Read + Verify: Machine Reading Comprehension with Unanswerable Questions. Unsupervised Learning of Shape and Pose with Differentiable Point Clouds. Experiential Robot Learning with Accelerated Neuroevolution. A Gray Box Interpretable Visual Debugging Approach for Deep Sequence Learning Model. Fisher Efficient Inference of Intractable Models. The Na …

Read + Verify: Machine Reading Comprehension with Unanswerable Questions


Title	Read + Verify: Machine Reading Comprehension with Unanswerable Questions
Authors	Minghao Hu, Furu Wei, Yuxing Peng, Zhen Huang, Nan Yang, Dongsheng Li
Abstract	Machine reading comprehension with unanswerable questions aims to abstain from answering when no answer can be inferred. In addition to extract answers, previous works usually predict an additional “no-answer” probability to detect unanswerable cases. However, they fail to validate the answerability of the question by verifying the legitimacy of the predicted answer. To address this problem, we propose a novel read-then-verify system, which not only utilizes a neural reader to extract candidate answers and produce no-answer probabilities, but also leverages an answer verifier to decide whether the predicted answer is entailed by the input snippets. Moreover, we introduce two auxiliary losses to help the reader better handle answer extraction as well as no-answer detection, and investigate three different architectures for the answer verifier. Our experiments on the SQuAD 2.0 dataset show that our system achieves a score of 74.2 F1 on the test set, achieving state-of-the-art results at the time of submission (Aug. 28th, 2018).
Tasks	Machine Reading Comprehension, Question Answering, Reading Comprehension
Published	2018-08-17
URL	http://arxiv.org/abs/1808.05759v5
PDF	http://arxiv.org/pdf/1808.05759v5.pdf
PWC	https://paperswithcode.com/paper/read-verify-machine-reading-comprehension
Repo	https://github.com/yujiakimoto/mnemonic-reader
Framework	pytorch

Unsupervised Learning of Shape and Pose with Differentiable Point Clouds


Title	Unsupervised Learning of Shape and Pose with Differentiable Point Clouds
Authors	Eldar Insafutdinov, Alexey Dosovitskiy
Abstract	We address the problem of learning accurate 3D shape and camera pose from a collection of unlabeled category-specific images. We train a convolutional network to predict both the shape and the pose from a single image by minimizing the reprojection error: given several views of an object, the projections of the predicted shapes to the predicted camera poses should match the provided views. To deal with pose ambiguity, we introduce an ensemble of pose predictors which we then distill to a single “student” model. To allow for efficient learning of high-fidelity shapes, we represent the shapes by point clouds and devise a formulation allowing for differentiable projection of these. Our experiments show that the distilled ensemble of pose predictors learns to estimate the pose accurately, while the point cloud representation allows to predict detailed shape models. The supplementary video can be found at https://www.youtube.com/watch?v=LuIGovKeo60
Tasks	3D Pose Estimation
Published	2018-10-22
URL	http://arxiv.org/abs/1810.09381v1
PDF	http://arxiv.org/pdf/1810.09381v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-learning-of-shape-and-pose-with
Repo	https://github.com/eldar/differentiable-point-clouds
Framework	tf

Experiential Robot Learning with Accelerated Neuroevolution


Title	Experiential Robot Learning with Accelerated Neuroevolution
Authors	Ahmed Aly, Joanne B. Dugan
Abstract	Derivative-based optimization techniques such as Stochastic Gradient Descent has been wildly successful in training deep neural networks. However, it has constraints such as end-to-end network differentiability. As an alternative, we present the Accelerated Neuroevolution algorithm. The new algorithm is aimed towards physical robotic learning tasks following the Experiential Robot Learning method. We test our algorithm first on a simulated task of playing the game Flappy Bird, then on a physical NAO robot in a static Object Centering task. The agents successfully navigate the given tasks, in a relatively low number of generations. Based on our results, we propose to use the algorithm in more complex tasks.
Tasks
Published	2018-08-16
URL	http://arxiv.org/abs/1808.05525v1
PDF	http://arxiv.org/pdf/1808.05525v1.pdf
PWC	https://paperswithcode.com/paper/experiential-robot-learning-with-accelerated
Repo	https://github.com/AroMorin/DNNOP
Framework	pytorch

A Gray Box Interpretable Visual Debugging Approach for Deep Sequence Learning Model


Title	A Gray Box Interpretable Visual Debugging Approach for Deep Sequence Learning Model
Authors	Md Mofijul Islam, Amar Debnath, Tahsin Al Sayeed, Jyotirmay Nag Setu, Md Mahmudur Rahman, Md Sadman Sakib, Md Abdur Razzaque, Md. Mosaddek Khan, Swakkhar Shatabda
Abstract	Deep Learning algorithms are often used as black box type learning and they are too complex to understand. The widespread usability of Deep Learning algorithms to solve various machine learning problems demands deep and transparent understanding of the internal representation as well as decision making. Moreover, the learning models, trained on sequential data, such as audio and video data, have intricate internal reasoning process due to their complex distribution of features. Thus, a visual simulator might be helpful to trace the internal decision making mechanisms in response to adversarial input data, and it would help to debug and design appropriate deep learning models. However, interpreting the internal reasoning of deep learning model is not well studied in the literature. In this work, we have developed a visual interactive web application, namely d-DeVIS, which helps to visualize the internal reasoning of the learning model which is trained on the audio data. The proposed system allows to perceive the behavior as well as to debug the model by interactively generating adversarial audio data point. The web application of d-DeVIS is available at ddevis.herokuapp.com.
Tasks	Decision Making
Published	2018-11-20
URL	http://arxiv.org/abs/1811.08374v1
PDF	http://arxiv.org/pdf/1811.08374v1.pdf
PWC	https://paperswithcode.com/paper/a-gray-box-interpretable-visual-debugging
Repo	https://github.com/anon-conf/d-DeVIS
Framework	none

Fisher Efficient Inference of Intractable Models


Title	Fisher Efficient Inference of Intractable Models
Authors	Song Liu, Takafumi Kanamori, Wittawat Jitkrittum, Yu Chen
Abstract	Maximum Likelihood Estimators (MLE) has many good properties. For example, the asymptotic variance of MLE solution attains equality of the asymptotic Cram{'e}r-Rao lower bound (efficiency bound), which is the minimum possible variance for an unbiased estimator. However, obtaining such MLE solution requires calculating the likelihood function which may not be tractable due to the normalization term of the density model. In this paper, we derive a Discriminative Likelihood Estimator (DLE) from the Kullback-Leibler divergence minimization criterion implemented via density ratio estimation and a Stein operator. We study the problem of model inference using DLE. We prove its consistency and show that the asymptotic variance of its solution can attain the equality of the efficiency bound under mild regularity conditions. We also propose a dual formulation of DLE which can be easily optimized. Numerical studies validate our asymptotic theorems and we give an example where DLE successfully estimates an intractable model constructed using a pre-trained deep neural network.
Tasks
Published	2018-05-18
URL	https://arxiv.org/abs/1805.07454v5
PDF	https://arxiv.org/pdf/1805.07454v5.pdf
PWC	https://paperswithcode.com/paper/fisher-efficient-inference-of-intractable
Repo	https://github.com/anewgithubname/Stein-Density-Ratio-Estimation
Framework	none

The Natural Language Decathlon: Multitask Learning as Question Answering


Title	The Natural Language Decathlon: Multitask Learning as Question Answering
Authors	Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher
Abstract	Deep learning has improved performance on many natural language processing (NLP) tasks individually. However, general NLP models cannot emerge within a paradigm that focuses on the particularities of a single metric, dataset, and task. We introduce the Natural Language Decathlon (decaNLP), a challenge that spans ten tasks: question answering, machine translation, summarization, natural language inference, sentiment analysis, semantic role labeling, zero-shot relation extraction, goal-oriented dialogue, semantic parsing, and commonsense pronoun resolution. We cast all tasks as question answering over a context. Furthermore, we present a new Multitask Question Answering Network (MQAN) jointly learns all tasks in decaNLP without any task-specific modules or parameters in the multitask setting. MQAN shows improvements in transfer learning for machine translation and named entity recognition, domain adaptation for sentiment analysis and natural language inference, and zero-shot capabilities for text classification. We demonstrate that the MQAN’s multi-pointer-generator decoder is key to this success and performance further improves with an anti-curriculum training strategy. Though designed for decaNLP, MQAN also achieves state of the art results on the WikiSQL semantic parsing task in the single-task setting. We also release code for procuring and processing data, training and evaluating models, and reproducing all experiments for decaNLP.
Tasks	Domain Adaptation, Machine Translation, Named Entity Recognition, Natural Language Inference, Question Answering, Relation Extraction, Semantic Parsing, Semantic Role Labeling, Sentiment Analysis, Text Classification, Transfer Learning
Published	2018-06-20
URL	http://arxiv.org/abs/1806.08730v1
PDF	http://arxiv.org/pdf/1806.08730v1.pdf
PWC	https://paperswithcode.com/paper/the-natural-language-decathlon-multitask
Repo	https://github.com/cheng-Ye/daguan-competition-master
Framework	none

Bayesian Sparsification of Gated Recurrent Neural Networks


Title	Bayesian Sparsification of Gated Recurrent Neural Networks
Authors	Ekaterina Lobacheva, Nadezhda Chirkova, Dmitry Vetrov
Abstract	Bayesian methods have been successfully applied to sparsify weights of neural networks and to remove structure units from the networks, e. g. neurons. We apply and further develop this approach for gated recurrent architectures. Specifically, in addition to sparsification of individual weights and neurons, we propose to sparsify preactivations of gates and information flow in LSTM. It makes some gates and information flow components constant, speeds up forward pass and improves compression. Moreover, the resulting structure of gate sparsity is interpretable and depends on the task. Code is available on github: https://github.com/tipt0p/SparseBayesianRNN
Tasks
Published	2018-12-12
URL	http://arxiv.org/abs/1812.05692v1
PDF	http://arxiv.org/pdf/1812.05692v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-sparsification-of-gated-recurrent
Repo	https://github.com/tipt0p/SparseBayesianRNN
Framework	none

Deep Private-Feature Extraction


Title	Deep Private-Feature Extraction
Authors	Seyed Ali Osia, Ali Taheri, Ali Shahin Shamsabadi, Kleomenis Katevas, Hamed Haddadi, Hamid R. Rabiee
Abstract	We present and evaluate Deep Private-Feature Extractor (DPFE), a deep model which is trained and evaluated based on information theoretic constraints. Using the selective exchange of information between a user’s device and a service provider, DPFE enables the user to prevent certain sensitive information from being shared with a service provider, while allowing them to extract approved information using their model. We introduce and utilize the log-rank privacy, a novel measure to assess the effectiveness of DPFE in removing sensitive information and compare different models based on their accuracy-privacy tradeoff. We then implement and evaluate the performance of DPFE on smartphones to understand its complexity, resource demands, and efficiency tradeoffs. Our results on benchmark image datasets demonstrate that under moderate resource utilization, DPFE can achieve high accuracy for primary tasks while preserving the privacy of sensitive features.
Tasks
Published	2018-02-09
URL	http://arxiv.org/abs/1802.03151v2
PDF	http://arxiv.org/pdf/1802.03151v2.pdf
PWC	https://paperswithcode.com/paper/deep-private-feature-extraction
Repo	https://github.com/aliosia/DPFE
Framework	caffe2

“Bilingual Expert” Can Find Translation Errors


Title	“Bilingual Expert” Can Find Translation Errors
Authors	Kai Fan, Jiayi Wang, Bo Li, Fengming Zhou, Boxing Chen, Luo Si
Abstract	Recent advances in statistical machine translation via the adoption of neural sequence-to-sequence models empower the end-to-end system to achieve state-of-the-art in many WMT benchmarks. The performance of such machine translation (MT) system is usually evaluated by automatic metric BLEU when the golden references are provided for validation. However, for model inference or production deployment, the golden references are prohibitively available or require expensive human annotation with bilingual expertise. In order to address the issue of quality evaluation (QE) without reference, we propose a general framework for automatic evaluation of translation output for most WMT quality evaluation tasks. We first build a conditional target language model with a novel bidirectional transformer, named neural bilingual expert model, which is pre-trained on large parallel corpora for feature extraction. For QE inference, the bilingual expert model can simultaneously produce the joint latent representation between the source and the translation, and real-valued measurements of possible erroneous tokens based on the prior knowledge learned from parallel data. Subsequently, the features will further be fed into a simple Bi-LSTM predictive model for quality evaluation. The experimental results show that our approach achieves the state-of-the-art performance in the quality estimation track of WMT 2017/2018.
Tasks	Language Modelling, Machine Translation
Published	2018-07-25
URL	http://arxiv.org/abs/1807.09433v3
PDF	http://arxiv.org/pdf/1807.09433v3.pdf
PWC	https://paperswithcode.com/paper/bilingual-expert-can-find-translation-errors
Repo	https://github.com/lovecambi/qebrain
Framework	tf

Actor-Attention-Critic for Multi-Agent Reinforcement Learning


Title	Actor-Attention-Critic for Multi-Agent Reinforcement Learning
Authors	Shariq Iqbal, Fei Sha
Abstract	Reinforcement learning in multi-agent scenarios is important for real-world applications but presents challenges beyond those seen in single-agent settings. We present an actor-critic algorithm that trains decentralized policies in multi-agent settings, using centrally computed critics that share an attention mechanism which selects relevant information for each agent at every timestep. This attention mechanism enables more effective and scalable learning in complex multi-agent environments, when compared to recent approaches. Our approach is applicable not only to cooperative settings with shared rewards, but also individualized reward settings, including adversarial settings, as well as settings that do not provide global states, and it makes no assumptions about the action spaces of the agents. As such, it is flexible enough to be applied to most multi-agent learning problems.
Tasks	Multi-agent Reinforcement Learning
Published	2018-10-05
URL	https://arxiv.org/abs/1810.02912v2
PDF	https://arxiv.org/pdf/1810.02912v2.pdf
PWC	https://paperswithcode.com/paper/actor-attention-critic-for-multi-agent
Repo	https://github.com/shariqiqbal2810/MAAC
Framework	pytorch

A Less Biased Evaluation of Out-of-distribution Sample Detectors


Title	A Less Biased Evaluation of Out-of-distribution Sample Detectors
Authors	Alireza Shafaei, Mark Schmidt, James J. Little
Abstract	In the real world, a learning system could receive an input that is unlike anything it has seen during training. Unfortunately, out-of-distribution samples can lead to unpredictable behaviour. We need to know whether any given input belongs to the population distribution of the training/evaluation data to prevent unpredictable behaviour in deployed systems. A recent surge of interest in this problem has led to the development of sophisticated techniques in the deep learning literature. However, due to the absence of a standard problem definition or an exhaustive evaluation, it is not evident if we can rely on these methods. What makes this problem different from a typical supervised learning setting is that the distribution of outliers used in training may not be the same as the distribution of outliers encountered in the application. Classical approaches that learn inliers vs. outliers with only two datasets can yield optimistic results. We introduce OD-test, a three-dataset evaluation scheme as a more reliable strategy to assess progress on this problem. We present an exhaustive evaluation of a broad set of methods from related areas on image classification tasks. Contrary to the existing results, we show that for realistic applications of high-dimensional images the previous techniques have low accuracy and are not reliable in practice.
Tasks	Image Classification
Published	2018-09-13
URL	https://arxiv.org/abs/1809.04729v2
PDF	https://arxiv.org/pdf/1809.04729v2.pdf
PWC	https://paperswithcode.com/paper/does-your-model-know-the-digit-6-is-not-a-cat
Repo	https://github.com/ashafaei/OD-test
Framework	pytorch

What can I do here? Leveraging Deep 3D saliency and geometry for fast and scalable multiple affordance detection


Title	What can I do here? Leveraging Deep 3D saliency and geometry for fast and scalable multiple affordance detection
Authors	Eduardo Ruiz, Walterio Mayol-Cuevas
Abstract	This paper develops and evaluates a novel method that allows for the detection of affordances in a scalable and multiple-instance manner on visually recovered pointclouds. Our approach has many advantages over alternative methods, as it is based on highly parallelizable, one-shot learning that is fast in commodity hardware. The approach is hybrid in that it uses a geometric representation together with a state-of-the-art deep learning method capable of identifying 3D scene saliency. The geometric component allows for a compact and efficient representation, boosting the performance of the deep network architecture which proved insufficient on its own. Moreover, our approach allows not only to predict whether an input scene affords or not the interactions, but also the pose of the objects that allow these interactions to take place. Our predictions align well with crowd-sourced human judgment as they are preferred with 87% probability, show high rates of improvement with almost four times (4x) better performance over a deep learning-only baseline and are seven times (7x) faster than previous art.
Tasks	Multiple Affordance Detection, One-Shot Learning
Published	2018-12-03
URL	http://arxiv.org/abs/1812.00889v1
PDF	http://arxiv.org/pdf/1812.00889v1.pdf
PWC	https://paperswithcode.com/paper/what-can-i-do-here-leveraging-deep-3d
Repo	https://github.com/eduard626/deep-interaction-tensor
Framework	none

A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music


Title	A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music
Authors	Adam Roberts, Jesse Engel, Colin Raffel, Curtis Hawthorne, Douglas Eck
Abstract	The Variational Autoencoder (VAE) has proven to be an effective model for producing semantically meaningful latent representations for natural data. However, it has thus far seen limited application to sequential data, and, as we demonstrate, existing recurrent VAE models have difficulty modeling sequences with long-term structure. To address this issue, we propose the use of a hierarchical decoder, which first outputs embeddings for subsequences of the input and then uses these embeddings to generate each subsequence independently. This structure encourages the model to utilize its latent code, thereby avoiding the “posterior collapse” problem, which remains an issue for recurrent VAEs. We apply this architecture to modeling sequences of musical notes and find that it exhibits dramatically better sampling, interpolation, and reconstruction performance than a “flat” baseline model. An implementation of our “MusicVAE” is available online at http://g.co/magenta/musicvae-code.
Tasks
Published	2018-03-13
URL	https://arxiv.org/abs/1803.05428v5
PDF	https://arxiv.org/pdf/1803.05428v5.pdf
PWC	https://paperswithcode.com/paper/a-hierarchical-latent-vector-model-for
Repo	https://github.com/dkoh0207/CS221-Project
Framework	pytorch

Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora


Title	Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora
Authors	Stephen Roller, Douwe Kiela, Maximilian Nickel
Abstract	Methods for unsupervised hypernym detection may broadly be categorized according to two paradigms: pattern-based and distributional methods. In this paper, we study the performance of both approaches on several hypernymy tasks and find that simple pattern-based methods consistently outperform distributional methods on common benchmark datasets. Our results show that pattern-based models provide important contextual constraints which are not yet captured in distributional methods.
Tasks
Published	2018-06-08
URL	http://arxiv.org/abs/1806.03191v1
PDF	http://arxiv.org/pdf/1806.03191v1.pdf
PWC	https://paperswithcode.com/paper/hearst-patterns-revisited-automatic-hypernym
Repo	https://github.com/facebookresearch/hypernymysuite
Framework	none

Stochastic Gradient MCMC for State Space Models


Title	Stochastic Gradient MCMC for State Space Models
Authors	Christopher Aicher, Yi-An Ma, Nicholas J. Foti, Emily B. Fox
Abstract	State space models (SSMs) are a flexible approach to modeling complex time series. However, inference in SSMs is often computationally prohibitive for long time series. Stochastic gradient MCMC (SGMCMC) is a popular method for scalable Bayesian inference for large independent data. Unfortunately when applied to dependent data, such as in SSMs, SGMCMC’s stochastic gradient estimates are biased as they break crucial temporal dependencies. To alleviate this, we propose stochastic gradient estimators that control this bias by performing additional computation in a `buffer’ to reduce breaking dependencies. Furthermore, we derive error bounds for this bias and show a geometric decay under mild conditions. Using these estimators, we develop novel SGMCMC samplers for discrete, continuous and mixed-type SSMs with analytic message passing. Our experiments on real and synthetic data demonstrate the effectiveness of our SGMCMC algorithms compared to batch MCMC, allowing us to scale inference to long time series with millions of time points. \|
Tasks	Bayesian Inference, Time Series
Published	2018-10-22
URL	https://arxiv.org/abs/1810.09098v2
PDF	https://arxiv.org/pdf/1810.09098v2.pdf
PWC	https://paperswithcode.com/paper/stochastic-gradient-mcmc-for-state-space
Repo	https://github.com/aicherc/sgmcmc_ssm_code
Framework	none