February 1, 2020

3243 words 16 mins read

Paper Group AWR 282

Functional Tensors for Probabilistic Programming. Joint Source-Target Self Attention with Locality Constraints. A Game Theoretic Approach to Class-wise Selective Rationalization. ProPublica’s COMPAS Data Revisited. Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning. Investigating Evaluation of Open-Domain Dialogue Systems …

Functional Tensors for Probabilistic Programming


Title	Functional Tensors for Probabilistic Programming
Authors	Fritz Obermeyer, Eli Bingham, Martin Jankowiak, Du Phan, Jonathan P. Chen
Abstract	It is a significant challenge to design probabilistic programming systems that can accommodate a wide variety of inference strategies within a unified framework. Noting that the versatility of modern automatic differentiation frameworks is based in large part on the unifying concept of tensors, we describe a software abstraction for integration –functional tensors– that captures many of the benefits of tensors, while also being able to describe continuous probability distributions. Moreover, functional tensors are a natural candidate for generalized variable elimination and parallel-scan filtering algorithms that enable parallel exact inference for a large family of tractable modeling motifs. We demonstrate the versatility of functional tensors by integrating them into the modeling frontend and inference backend of the Pyro programming language. In experiments we show that the resulting framework enables a large variety of inference strategies, including those that mix exact and approximate inference.
Tasks	Probabilistic Programming
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10775v2
PDF	https://arxiv.org/pdf/1910.10775v2.pdf
PWC	https://paperswithcode.com/paper/functional-tensors-for-probabilistic
Repo	https://github.com/pyro-ppl/funsor
Framework	pytorch

Joint Source-Target Self Attention with Locality Constraints


Title	Joint Source-Target Self Attention with Locality Constraints
Authors	José A. R. Fonollosa, Noe Casas, Marta R. Costa-jussà
Abstract	The dominant neural machine translation models are based on the encoder-decoder structure, and many of them rely on an unconstrained receptive field over source and target sequences. In this paper we study a new architecture that breaks with both conventions. Our simplified architecture consists in the decoder part of a transformer model, based on self-attention, but with locality constraints applied on the attention receptive field. As input for training, both source and target sentences are fed to the network, which is trained as a language model. At inference time, the target tokens are predicted autoregressively starting with the source sequence as previous tokens. The proposed model achieves a new state of the art of 35.7 BLEU on IWSLT’14 German-English and matches the best reported results in the literature on the WMT’14 English-German and WMT’14 English-French translation benchmarks.
Tasks	Language Modelling, Machine Translation
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06596v1
PDF	https://arxiv.org/pdf/1905.06596v1.pdf
PWC	https://paperswithcode.com/paper/190506596
Repo	https://github.com/jarfo/joint
Framework	pytorch

A Game Theoretic Approach to Class-wise Selective Rationalization


Title	A Game Theoretic Approach to Class-wise Selective Rationalization
Authors	Shiyu Chang, Yang Zhang, Mo Yu, Tommi S. Jaakkola
Abstract	Selection of input features such as relevant pieces of text has become a common technique of highlighting how complex neural predictors operate. The selection can be optimized post-hoc for trained models or incorporated directly into the method itself (self-explaining). However, an overall selection does not properly capture the multi-faceted nature of useful rationales such as pros and cons for decisions. To this end, we propose a new game theoretic approach to class-dependent rationalization, where the method is specifically trained to highlight evidence supporting alternative conclusions. Each class involves three players set up competitively to find evidence for factual and counterfactual scenarios. We show theoretically in a simplified scenario how the game drives the solution towards meaningful class-dependent rationales. We evaluate the method in single- and multi-aspect sentiment classification tasks and demonstrate that the proposed method is able to identify both factual (justifying the ground truth label) and counterfactual (countering the ground truth label) rationales consistent with human rationalization. The code for our method is publicly available.
Tasks	Sentiment Analysis
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12853v1
PDF	https://arxiv.org/pdf/1910.12853v1.pdf
PWC	https://paperswithcode.com/paper/a-game-theoretic-approach-to-class-wise
Repo	https://github.com/code-terminator/classwise_rationale
Framework	tf

ProPublica’s COMPAS Data Revisited


Title	ProPublica’s COMPAS Data Revisited
Authors	Matias Barenstein
Abstract	I examine the COMPAS recidivism risk score and criminal history data collected by ProPublica in 2016 that fueled intense debate and research in the nascent field of ‘algorithmic fairness’. ProPublica’s COMPAS data is used in an increasing number of studies to test various definitions of algorithmic fairness. This paper takes a closer look at the actual datasets put together by ProPublica. In particular, the sub-datasets built to study the likelihood of recidivism within two years of a defendant’s original COMPAS survey screening date. I take a new yet simple approach to visualize these data, by analyzing the distribution of defendants across COMPAS screening dates. I find that ProPublica made an important data processing error when it created these datasets, failing to implement a two-year sample cutoff rule for recidivists in such datasets (whereas it implemented a two-year sample cutoff rule for non-recidivists). When I implement a simple two-year COMPAS screen date cutoff rule for recidivists, I estimate that in the two-year general recidivism dataset ProPublica kept over 40% more recidivists than it should have. This fundamental problem in dataset construction affects some statistics more than others. It obviously has a substantial impact on the recidivism rate; artificially inflating it. For the two-year general recidivism dataset created by ProPublica, the two-year recidivism rate is 45.1%, whereas, with the simple COMPAS screen date cutoff correction I implement, it is 36.2%. Thus, the two-year recidivism rate in ProPublica’s dataset is inflated by over 24%. This also affects the positive and negative predictive values. On the other hand, this data processing error has little impact on some of the other key statistical measures, which are less susceptible to changes in the relative share of recidivists, such as the false positive and false negative rates, and the overall accuracy.
Tasks
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04711v3
PDF	https://arxiv.org/pdf/1906.04711v3.pdf
PWC	https://paperswithcode.com/paper/propublicas-compas-data-revisited
Repo	https://github.com/mbarenstein/ProPublica_COMPAS_Data_Revisited
Framework	none

Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning


Title	Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning
Authors	Alexandros Papangelis, Yi-Chia Wang, Piero Molino, Gokhan Tur
Abstract	We present the first complete attempt at concurrently training conversational agents that communicate only via self-generated language. Using DSTC2 as seed data, we trained natural language understanding (NLU) and generation (NLG) networks for each agent and let the agents interact online. We model the interaction as a stochastic collaborative game where each agent (player) has a role (“assistant”, “tourist”, “eater”, etc.) and their own objectives, and can only interact via natural language they generate. Each agent, therefore, needs to learn to operate optimally in an environment with multiple sources of uncertainty (its own NLU and NLG, the other agent’s NLU, Policy, and NLG). In our evaluation, we show that the stochastic-game agents outperform deep learning based supervised baselines.
Tasks
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05507v2
PDF	https://arxiv.org/pdf/1907.05507v2.pdf
PWC	https://paperswithcode.com/paper/collaborative-multi-agent-dialogue-model
Repo	https://github.com/uber-research/plato-research-dialogue-system
Framework	tf

Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References


Title	Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References
Authors	Prakhar Gupta, Shikib Mehri, Tiancheng Zhao, Amy Pavel, Maxine Eskenazi, Jeffrey P. Bigham
Abstract	The aim of this paper is to mitigate the shortcomings of automatic evaluation of open-domain dialog systems through multi-reference evaluation. Existing metrics have been shown to correlate poorly with human judgement, particularly in open-domain dialog. One alternative is to collect human annotations for evaluation, which can be expensive and time consuming. To demonstrate the effectiveness of multi-reference evaluation, we augment the test set of DailyDialog with multiple references. A series of experiments show that the use of multiple references results in improved correlation between several automatic metrics and human judgement for both the quality and the diversity of system output.
Tasks
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10568v2
PDF	https://arxiv.org/pdf/1907.10568v2.pdf
PWC	https://paperswithcode.com/paper/investigating-evaluation-of-open-domain
Repo	https://github.com/prakharguptaz/multirefeval
Framework	none

Is Multilingual BERT Fluent in Language Generation?


Title	Is Multilingual BERT Fluent in Language Generation?
Authors	Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski, Filip Ginter
Abstract	The multilingual BERT model is trained on 104 languages and meant to serve as a universal language model and tool for encoding sentences. We explore how well the model performs on several languages across several tasks: a diagnostic classification probing the embeddings for a particular syntactic property, a cloze task testing the language modelling ability to fill in gaps in a sentence, and a natural language generation task testing for the ability to produce coherent text fitting a given context. We find that the currently available multilingual BERT model is clearly inferior to the monolingual counterparts, and cannot in many cases serve as a substitute for a well-trained monolingual model. We find that the English and German models perform well at generation, whereas the multilingual model is lacking, in particular, for Nordic languages.
Tasks	Language Modelling, Text Generation
Published	2019-10-09
URL	https://arxiv.org/abs/1910.03806v1
PDF	https://arxiv.org/pdf/1910.03806v1.pdf
PWC	https://paperswithcode.com/paper/is-multilingual-bert-fluent-in-language
Repo	https://github.com/TurkuNLP/bert-eval
Framework	pytorch

Biologically Plausible Sequence Learning with Spiking Neural Networks


Title	Biologically Plausible Sequence Learning with Spiking Neural Networks
Authors	Zuozhu Liu, Thiparat Chotibut, Christopher Hillar, Shaowei Lin
Abstract	Motivated by the celebrated discrete-time model of nervous activity outlined by McCulloch and Pitts in 1943, we propose a novel continuous-time model, the McCulloch-Pitts network (MPN), for sequence learning in spiking neural networks. Our model has a local learning rule, such that the synaptic weight updates depend only on the information directly accessible by the synapse. By exploiting asymmetry in the connections between binary neurons, we show that MPN can be trained to robustly memorize multiple spatiotemporal patterns of binary vectors, generalizing the ability of the symmetric Hopfield network to memorize static spatial patterns. In addition, we demonstrate that the model can efficiently learn sequences of binary pictures as well as generative models for experimental neural spike-train data. Our learning rule is consistent with spike-timing-dependent plasticity (STDP), thus providing a theoretical ground for the systematic design of biologically inspired networks with large and robust long-range sequence storage capacity.
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10943v1
PDF	https://arxiv.org/pdf/1911.10943v1.pdf
PWC	https://paperswithcode.com/paper/biologically-plausible-sequence-learning-with
Repo	https://github.com/owen94/MPNets
Framework	none

Searching for Ambiguous Objects in Videos using Relational Referring Expressions


Title	Searching for Ambiguous Objects in Videos using Relational Referring Expressions
Authors	Hazan Anayurt, Sezai Artun Ozyegin, Ulfet Cetin, Utku Aktas, Sinan Kalkan
Abstract	Humans frequently use referring (identifying) expressions to refer to objects. Especially in ambiguous settings, humans prefer expressions (called relational referring expressions) that describe an object with respect to a distinguishing, unique object. Unlike studies on video object search using referring expressions, in this paper, our focus is on (i) relational referring expressions in highly ambiguous settings, and (ii) methods that can both generate and comprehend a referring expression. For this goal, we first introduce a new dataset for video object search with referring expressions that includes numerous copies of the objects, making it difficult to use non-relational expressions. Moreover, we train two baseline deep networks on this dataset, which show promising results. Finally, we propose a deep attention network that significantly outperforms the baselines on our dataset. The dataset and the codes are available at https://github.com/hazananayurt/viref.
Tasks	Deep Attention, Natural Language Visual Grounding
Published	2019-08-03
URL	https://arxiv.org/abs/1908.01189v2
PDF	https://arxiv.org/pdf/1908.01189v2.pdf
PWC	https://paperswithcode.com/paper/searching-for-ambiguous-objects-in-videos
Repo	https://github.com/hazananayurt/viref
Framework	pytorch

PGU-net+: Progressive Growing of U-net+ for Automated Cervical Nuclei Segmentation


Title	PGU-net+: Progressive Growing of U-net+ for Automated Cervical Nuclei Segmentation
Authors	Jie Zhao, Lei Dai, Mo Zhang, Fei Yu, Meng Li, Hongfeng Li, Wenjia Wang, Li Zhang
Abstract	Automated cervical nucleus segmentation based on deep learning can effectively improve the quantitative analysis of cervical cancer. However, accurate nuclei segmentation is still challenging. The classic U-net has not achieved satisfactory results on this task, because it mixes the information of different scales that affect each other, which limits the segmentation accuracy of the model. To solve this problem, we propose a progressive growing U-net (PGU-net+) model, which uses two paradigms to extract image features at different scales in a more independent way. First, we add residual modules between different scales of U-net, which enforces the model to learn the approximate shape of the annotation in the coarser scale, and to learn the residual between the annotation and the approximate shape in the finer scale. Second, we start to train the model with the coarsest part and then progressively add finer part to the training until the full model is included. When we train a finer part, we will reduce the learning rate of the previous coarser part, which further ensures that the model independently extracts information from different scales. We conduct several comparative experiments on the Herlev dataset. The experimental results show that the PGU-net+ has superior accuracy than the previous state-of-the-art methods on cervical nuclei segmentation.
Tasks
Published	2019-11-04
URL	https://arxiv.org/abs/1911.01062v3
PDF	https://arxiv.org/pdf/1911.01062v3.pdf
PWC	https://paperswithcode.com/paper/pgu-net-progressive-growing-of-u-net-for
Repo	https://github.com/Minerva-jiezhao/PGU-net-Model
Framework	pytorch

Data-driven prediction of a multi-scale Lorenz 96 chaotic system using deep learning methods: Reservoir computing, ANN, and RNN-LSTM


Title	Data-driven prediction of a multi-scale Lorenz 96 chaotic system using deep learning methods: Reservoir computing, ANN, and RNN-LSTM
Authors	Ashesh Chattopadhyay, Pedram Hassanzadeh, Devika Subramanian
Abstract	In this paper, the performance of three deep learning methods for predicting short-term evolution and for reproducing the long-term statistics of a multi-scale spatio-temporal Lorenz 96 system is examined. The methods are: echo state network (a type of reservoir computing, RC-ESN), deep feed-forward artificial neural network (ANN), and recurrent neural network with long short-term memory (RNN-LSTM). This Lorenz 96 system has three tiers of nonlinearly interacting variables representing slow/large-scale ($X$), intermediate ($Y$), and fast/small-scale ($Z$) processes. For training or testing, only $X$ is available; $Y$ and $Z$ are never known or used. We show that RC-ESN substantially outperforms ANN and RNN-LSTM for short-term prediction, e.g., accurately forecasting the chaotic trajectories for hundreds of numerical solver’s time steps, equivalent to several Lyapunov timescales. The RNN-LSTM and ANN show some prediction skills as well; RNN-LSTM bests ANN. Furthermore, even after losing the trajectory, data predicted by RC-ESN and RNN-LSTM have probability density functions (PDFs) that closely match the true PDF, even at the tails. The PDF of the data predicted using ANN, however, deviates from the true PDF. Implications, caveats, and applications to data-driven and data-assisted surrogate modeling of complex nonlinear dynamical systems such as weather/climate are discussed.
Tasks
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08829v3
PDF	https://arxiv.org/pdf/1906.08829v3.pdf
PWC	https://paperswithcode.com/paper/data-driven-prediction-of-a-multi-scale
Repo	https://github.com/ashesh6810/RCESN_spatio_temporal
Framework	tf

Cerberus: A Multi-headed Derenderer


Title	Cerberus: A Multi-headed Derenderer
Authors	Boyang Deng, Simon Kornblith, Geoffrey Hinton
Abstract	To generalize to novel visual scenes with new viewpoints and new object poses, a visual system needs representations of the shapes of the parts of an object that are invariant to changes in viewpoint or pose. 3D graphics representations disentangle visual factors such as viewpoints and lighting from object structure in a natural way. It is possible to learn to invert the process that converts 3D graphics representations into 2D images, provided the 3D graphics representations are available as labels. When only the unlabeled images are available, however, learning to derender is much harder. We consider a simple model which is just a set of free floating parts. Each part has its own relation to the camera and its own triangular mesh which can be deformed to model the shape of the part. At test time, a neural network looks at a single image and extracts the shapes of the parts and their relations to the camera. Each part can be viewed as one head of a multi-headed derenderer. During training, the extracted parts are used as input to a differentiable 3D renderer and the reconstruction error is backpropagated to train the neural net. We make the learning task easier by encouraging the deformations of the part meshes to be invariant to changes in viewpoint and invariant to the changes in the relative positions of the parts that occur when the pose of an articulated body changes. Cerberus, our multi-headed derenderer, outperforms previous methods for extracting 3D parts from single images without part annotations, and it does quite well at extracting natural parts of human figures.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11940v1
PDF	https://arxiv.org/pdf/1905.11940v1.pdf
PWC	https://paperswithcode.com/paper/cerberus-a-multi-headed-derenderer
Repo	https://github.com/Chrisackerman1/Cerberus-A-Multi-headed-Derenderer
Framework	none

Rare Words: A Major Problem for Contextualized Embeddings And How to Fix it by Attentive Mimicking


Title	Rare Words: A Major Problem for Contextualized Embeddings And How to Fix it by Attentive Mimicking
Authors	Timo Schick, Hinrich Schütze
Abstract	Pretraining deep neural network architectures with a language modeling objective has brought large improvements for many natural language processing tasks. Exemplified by BERT, a recently proposed such architecture, we demonstrate that despite being trained on huge amounts of data, deep language models still struggle to understand rare words. To fix this problem, we adapt Attentive Mimicking, a method that was designed to explicitly learn embeddings for rare words, to deep language models. In order to make this possible, we introduce one-token approximation, a procedure that enables us to use Attentive Mimicking even when the underlying language model uses subword-based tokenization, i.e., it does not assign embeddings to all words. To evaluate our method, we create a novel dataset that tests the ability of language models to capture semantic properties of words without any task-specific fine-tuning. Using this dataset, we show that adding our adapted version of Attentive Mimicking to BERT does indeed substantially improve its understanding of rare words.
Tasks	Language Modelling, Tokenization
Published	2019-04-14
URL	https://arxiv.org/abs/1904.06707v4
PDF	https://arxiv.org/pdf/1904.06707v4.pdf
PWC	https://paperswithcode.com/paper/rare-words-a-major-problem-for-contextualized
Repo	https://github.com/timoschick/am-for-bert
Framework	none

Out-of-distribution Detection in Classifiers via Generation


Title	Out-of-distribution Detection in Classifiers via Generation
Authors	Sachin Vernekar, Ashish Gaurav, Vahdat Abdelzad, Taylor Denouden, Rick Salay, Krzysztof Czarnecki
Abstract	By design, discriminatively trained neural network classifiers produce reliable predictions only for in-distribution samples. For their real-world deployments, detecting out-of-distribution (OOD) samples is essential. Assuming OOD to be outside the closed boundary of in-distribution, typical neural classifiers do not contain the knowledge of this boundary for OOD detection during inference. There have been recent approaches to instill this knowledge in classifiers by explicitly training the classifier with OOD samples close to the in-distribution boundary. However, these generated samples fail to cover the entire in-distribution boundary effectively, thereby resulting in a sub-optimal OOD detector. In this paper, we analyze the feasibility of such approaches by investigating the complexity of producing such “effective” OOD samples. We also propose a novel algorithm to generate such samples using a manifold learning network (e.g., variational autoencoder) and then train an n+1 classifier for OOD detection, where the $n+1^{th}$ class represents the OOD samples. We compare our approach against several recent classifier-based OOD detectors on MNIST and Fashion-MNIST datasets. Overall the proposed approach consistently performs better than the others.
Tasks	Out-of-Distribution Detection
Published	2019-10-09
URL	https://arxiv.org/abs/1910.04241v1
PDF	https://arxiv.org/pdf/1910.04241v1.pdf
PWC	https://paperswithcode.com/paper/out-of-distribution-detection-in-classifiers
Repo	https://github.com/sverneka/OODGen
Framework	none

Dynamic Real-time Multimodal Routing with Hierarchical Hybrid Planning


Title	Dynamic Real-time Multimodal Routing with Hierarchical Hybrid Planning
Authors	Shushman Choudhury, Jacob P. Knickerbocker, Mykel J. Kochenderfer
Abstract	We introduce the problem of Dynamic Real-time Multimodal Routing (DREAMR), which requires planning and executing routes under uncertainty for an autonomous agent. The agent has access to a time-varying transit vehicle network in which it can use multiple modes of transportation. For instance, a drone can either fly or ride on terrain vehicles for segments of their routes. DREAMR is a difficult problem of sequential decision making under uncertainty with both discrete and continuous variables. We design a novel hierarchical hybrid planning framework to solve the DREAMR problem that exploits its structural decomposability. Our framework consists of a global open-loop planning layer that invokes and monitors a local closed-loop execution layer. Additional abstractions allow efficient and seamless interleaving of planning and execution. We create a large-scale simulation for DREAMR problems, with each scenario having hundreds of transportation routes and thousands of connection points. Our algorithmic framework significantly outperforms a receding horizon control baseline, in terms of elapsed time to reach the destination and energy expended by the agent.
Tasks	Decision Making, Decision Making Under Uncertainty
Published	2019-02-05
URL	https://arxiv.org/abs/1902.01560v2
PDF	https://arxiv.org/pdf/1902.01560v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-real-time-multimodal-routing-with
Repo	https://github.com/sisl/DreamrHHP
Framework	none