Paper Group AWR 282
Functional Tensors for Probabilistic Programming. Joint Source-Target Self Attention with Locality Constraints. A Game Theoretic Approach to Class-wise Selective Rationalization. ProPublica’s COMPAS Data Revisited. Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning. Investigating Evaluation of Open-Domain Dialogue Systems …
Functional Tensors for Probabilistic Programming
Title | Functional Tensors for Probabilistic Programming |
Authors | Fritz Obermeyer, Eli Bingham, Martin Jankowiak, Du Phan, Jonathan P. Chen |
Abstract | It is a significant challenge to design probabilistic programming systems that can accommodate a wide variety of inference strategies within a unified framework. Noting that the versatility of modern automatic differentiation frameworks is based in large part on the unifying concept of tensors, we describe a software abstraction for integration –functional tensors– that captures many of the benefits of tensors, while also being able to describe continuous probability distributions. Moreover, functional tensors are a natural candidate for generalized variable elimination and parallel-scan filtering algorithms that enable parallel exact inference for a large family of tractable modeling motifs. We demonstrate the versatility of functional tensors by integrating them into the modeling frontend and inference backend of the Pyro programming language. In experiments we show that the resulting framework enables a large variety of inference strategies, including those that mix exact and approximate inference. |
Tasks | Probabilistic Programming |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10775v2 |
https://arxiv.org/pdf/1910.10775v2.pdf | |
PWC | https://paperswithcode.com/paper/functional-tensors-for-probabilistic |
Repo | https://github.com/pyro-ppl/funsor |
Framework | pytorch |
Joint Source-Target Self Attention with Locality Constraints
Title | Joint Source-Target Self Attention with Locality Constraints |
Authors | José A. R. Fonollosa, Noe Casas, Marta R. Costa-jussà |
Abstract | The dominant neural machine translation models are based on the encoder-decoder structure, and many of them rely on an unconstrained receptive field over source and target sequences. In this paper we study a new architecture that breaks with both conventions. Our simplified architecture consists in the decoder part of a transformer model, based on self-attention, but with locality constraints applied on the attention receptive field. As input for training, both source and target sentences are fed to the network, which is trained as a language model. At inference time, the target tokens are predicted autoregressively starting with the source sequence as previous tokens. The proposed model achieves a new state of the art of 35.7 BLEU on IWSLT’14 German-English and matches the best reported results in the literature on the WMT’14 English-German and WMT’14 English-French translation benchmarks. |
Tasks | Language Modelling, Machine Translation |
Published | 2019-05-16 |
URL | https://arxiv.org/abs/1905.06596v1 |
https://arxiv.org/pdf/1905.06596v1.pdf | |
PWC | https://paperswithcode.com/paper/190506596 |
Repo | https://github.com/jarfo/joint |
Framework | pytorch |
A Game Theoretic Approach to Class-wise Selective Rationalization
Title | A Game Theoretic Approach to Class-wise Selective Rationalization |
Authors | Shiyu Chang, Yang Zhang, Mo Yu, Tommi S. Jaakkola |
Abstract | Selection of input features such as relevant pieces of text has become a common technique of highlighting how complex neural predictors operate. The selection can be optimized post-hoc for trained models or incorporated directly into the method itself (self-explaining). However, an overall selection does not properly capture the multi-faceted nature of useful rationales such as pros and cons for decisions. To this end, we propose a new game theoretic approach to class-dependent rationalization, where the method is specifically trained to highlight evidence supporting alternative conclusions. Each class involves three players set up competitively to find evidence for factual and counterfactual scenarios. We show theoretically in a simplified scenario how the game drives the solution towards meaningful class-dependent rationales. We evaluate the method in single- and multi-aspect sentiment classification tasks and demonstrate that the proposed method is able to identify both factual (justifying the ground truth label) and counterfactual (countering the ground truth label) rationales consistent with human rationalization. The code for our method is publicly available. |
Tasks | Sentiment Analysis |
Published | 2019-10-28 |
URL | https://arxiv.org/abs/1910.12853v1 |
https://arxiv.org/pdf/1910.12853v1.pdf | |
PWC | https://paperswithcode.com/paper/a-game-theoretic-approach-to-class-wise |
Repo | https://github.com/code-terminator/classwise_rationale |
Framework | tf |
ProPublica’s COMPAS Data Revisited
Title | ProPublica’s COMPAS Data Revisited |
Authors | Matias Barenstein |
Abstract | I examine the COMPAS recidivism risk score and criminal history data collected by ProPublica in 2016 that fueled intense debate and research in the nascent field of ‘algorithmic fairness’. ProPublica’s COMPAS data is used in an increasing number of studies to test various definitions of algorithmic fairness. This paper takes a closer look at the actual datasets put together by ProPublica. In particular, the sub-datasets built to study the likelihood of recidivism within two years of a defendant’s original COMPAS survey screening date. I take a new yet simple approach to visualize these data, by analyzing the distribution of defendants across COMPAS screening dates. I find that ProPublica made an important data processing error when it created these datasets, failing to implement a two-year sample cutoff rule for recidivists in such datasets (whereas it implemented a two-year sample cutoff rule for non-recidivists). When I implement a simple two-year COMPAS screen date cutoff rule for recidivists, I estimate that in the two-year general recidivism dataset ProPublica kept over 40% more recidivists than it should have. This fundamental problem in dataset construction affects some statistics more than others. It obviously has a substantial impact on the recidivism rate; artificially inflating it. For the two-year general recidivism dataset created by ProPublica, the two-year recidivism rate is 45.1%, whereas, with the simple COMPAS screen date cutoff correction I implement, it is 36.2%. Thus, the two-year recidivism rate in ProPublica’s dataset is inflated by over 24%. This also affects the positive and negative predictive values. On the other hand, this data processing error has little impact on some of the other key statistical measures, which are less susceptible to changes in the relative share of recidivists, such as the false positive and false negative rates, and the overall accuracy. |
Tasks | |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.04711v3 |
https://arxiv.org/pdf/1906.04711v3.pdf | |
PWC | https://paperswithcode.com/paper/propublicas-compas-data-revisited |
Repo | https://github.com/mbarenstein/ProPublica_COMPAS_Data_Revisited |
Framework | none |
Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning
Title | Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning |
Authors | Alexandros Papangelis, Yi-Chia Wang, Piero Molino, Gokhan Tur |
Abstract | We present the first complete attempt at concurrently training conversational agents that communicate only via self-generated language. Using DSTC2 as seed data, we trained natural language understanding (NLU) and generation (NLG) networks for each agent and let the agents interact online. We model the interaction as a stochastic collaborative game where each agent (player) has a role (“assistant”, “tourist”, “eater”, etc.) and their own objectives, and can only interact via natural language they generate. Each agent, therefore, needs to learn to operate optimally in an environment with multiple sources of uncertainty (its own NLU and NLG, the other agent’s NLU, Policy, and NLG). In our evaluation, we show that the stochastic-game agents outperform deep learning based supervised baselines. |
Tasks | |
Published | 2019-07-11 |
URL | https://arxiv.org/abs/1907.05507v2 |
https://arxiv.org/pdf/1907.05507v2.pdf | |
PWC | https://paperswithcode.com/paper/collaborative-multi-agent-dialogue-model |
Repo | https://github.com/uber-research/plato-research-dialogue-system |
Framework | tf |
Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References
Title | Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References |
Authors | Prakhar Gupta, Shikib Mehri, Tiancheng Zhao, Amy Pavel, Maxine Eskenazi, Jeffrey P. Bigham |
Abstract | The aim of this paper is to mitigate the shortcomings of automatic evaluation of open-domain dialog systems through multi-reference evaluation. Existing metrics have been shown to correlate poorly with human judgement, particularly in open-domain dialog. One alternative is to collect human annotations for evaluation, which can be expensive and time consuming. To demonstrate the effectiveness of multi-reference evaluation, we augment the test set of DailyDialog with multiple references. A series of experiments show that the use of multiple references results in improved correlation between several automatic metrics and human judgement for both the quality and the diversity of system output. |
Tasks | |
Published | 2019-07-24 |
URL | https://arxiv.org/abs/1907.10568v2 |
https://arxiv.org/pdf/1907.10568v2.pdf | |
PWC | https://paperswithcode.com/paper/investigating-evaluation-of-open-domain |
Repo | https://github.com/prakharguptaz/multirefeval |
Framework | none |
Is Multilingual BERT Fluent in Language Generation?
Title | Is Multilingual BERT Fluent in Language Generation? |
Authors | Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski, Filip Ginter |
Abstract | The multilingual BERT model is trained on 104 languages and meant to serve as a universal language model and tool for encoding sentences. We explore how well the model performs on several languages across several tasks: a diagnostic classification probing the embeddings for a particular syntactic property, a cloze task testing the language modelling ability to fill in gaps in a sentence, and a natural language generation task testing for the ability to produce coherent text fitting a given context. We find that the currently available multilingual BERT model is clearly inferior to the monolingual counterparts, and cannot in many cases serve as a substitute for a well-trained monolingual model. We find that the English and German models perform well at generation, whereas the multilingual model is lacking, in particular, for Nordic languages. |
Tasks | Language Modelling, Text Generation |
Published | 2019-10-09 |
URL | https://arxiv.org/abs/1910.03806v1 |
https://arxiv.org/pdf/1910.03806v1.pdf | |
PWC | https://paperswithcode.com/paper/is-multilingual-bert-fluent-in-language |
Repo | https://github.com/TurkuNLP/bert-eval |
Framework | pytorch |
Biologically Plausible Sequence Learning with Spiking Neural Networks
Title | Biologically Plausible Sequence Learning with Spiking Neural Networks |
Authors | Zuozhu Liu, Thiparat Chotibut, Christopher Hillar, Shaowei Lin |
Abstract | Motivated by the celebrated discrete-time model of nervous activity outlined by McCulloch and Pitts in 1943, we propose a novel continuous-time model, the McCulloch-Pitts network (MPN), for sequence learning in spiking neural networks. Our model has a local learning rule, such that the synaptic weight updates depend only on the information directly accessible by the synapse. By exploiting asymmetry in the connections between binary neurons, we show that MPN can be trained to robustly memorize multiple spatiotemporal patterns of binary vectors, generalizing the ability of the symmetric Hopfield network to memorize static spatial patterns. In addition, we demonstrate that the model can efficiently learn sequences of binary pictures as well as generative models for experimental neural spike-train data. Our learning rule is consistent with spike-timing-dependent plasticity (STDP), thus providing a theoretical ground for the systematic design of biologically inspired networks with large and robust long-range sequence storage capacity. |
Tasks | |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.10943v1 |
https://arxiv.org/pdf/1911.10943v1.pdf | |
PWC | https://paperswithcode.com/paper/biologically-plausible-sequence-learning-with |
Repo | https://github.com/owen94/MPNets |
Framework | none |
Searching for Ambiguous Objects in Videos using Relational Referring Expressions
Title | Searching for Ambiguous Objects in Videos using Relational Referring Expressions |
Authors | Hazan Anayurt, Sezai Artun Ozyegin, Ulfet Cetin, Utku Aktas, Sinan Kalkan |
Abstract | Humans frequently use referring (identifying) expressions to refer to objects. Especially in ambiguous settings, humans prefer expressions (called relational referring expressions) that describe an object with respect to a distinguishing, unique object. Unlike studies on video object search using referring expressions, in this paper, our focus is on (i) relational referring expressions in highly ambiguous settings, and (ii) methods that can both generate and comprehend a referring expression. For this goal, we first introduce a new dataset for video object search with referring expressions that includes numerous copies of the objects, making it difficult to use non-relational expressions. Moreover, we train two baseline deep networks on this dataset, which show promising results. Finally, we propose a deep attention network that significantly outperforms the baselines on our dataset. The dataset and the codes are available at https://github.com/hazananayurt/viref. |
Tasks | Deep Attention, Natural Language Visual Grounding |
Published | 2019-08-03 |
URL | https://arxiv.org/abs/1908.01189v2 |
https://arxiv.org/pdf/1908.01189v2.pdf | |
PWC | https://paperswithcode.com/paper/searching-for-ambiguous-objects-in-videos |
Repo | https://github.com/hazananayurt/viref |
Framework | pytorch |
PGU-net+: Progressive Growing of U-net+ for Automated Cervical Nuclei Segmentation
Title | PGU-net+: Progressive Growing of U-net+ for Automated Cervical Nuclei Segmentation |
Authors | Jie Zhao, Lei Dai, Mo Zhang, Fei Yu, Meng Li, Hongfeng Li, Wenjia Wang, Li Zhang |
Abstract | Automated cervical nucleus segmentation based on deep learning can effectively improve the quantitative analysis of cervical cancer. However, accurate nuclei segmentation is still challenging. The classic U-net has not achieved satisfactory results on this task, because it mixes the information of different scales that affect each other, which limits the segmentation accuracy of the model. To solve this problem, we propose a progressive growing U-net (PGU-net+) model, which uses two paradigms to extract image features at different scales in a more independent way. First, we add residual modules between different scales of U-net, which enforces the model to learn the approximate shape of the annotation in the coarser scale, and to learn the residual between the annotation and the approximate shape in the finer scale. Second, we start to train the model with the coarsest part and then progressively add finer part to the training until the full model is included. When we train a finer part, we will reduce the learning rate of the previous coarser part, which further ensures that the model independently extracts information from different scales. We conduct several comparative experiments on the Herlev dataset. The experimental results show that the PGU-net+ has superior accuracy than the previous state-of-the-art methods on cervical nuclei segmentation. |
Tasks | |
Published | 2019-11-04 |
URL | https://arxiv.org/abs/1911.01062v3 |
https://arxiv.org/pdf/1911.01062v3.pdf | |
PWC | https://paperswithcode.com/paper/pgu-net-progressive-growing-of-u-net-for |
Repo | https://github.com/Minerva-jiezhao/PGU-net-Model |
Framework | pytorch |
Data-driven prediction of a multi-scale Lorenz 96 chaotic system using deep learning methods: Reservoir computing, ANN, and RNN-LSTM
Title | Data-driven prediction of a multi-scale Lorenz 96 chaotic system using deep learning methods: Reservoir computing, ANN, and RNN-LSTM |
Authors | Ashesh Chattopadhyay, Pedram Hassanzadeh, Devika Subramanian |
Abstract | In this paper, the performance of three deep learning methods for predicting short-term evolution and for reproducing the long-term statistics of a multi-scale spatio-temporal Lorenz 96 system is examined. The methods are: echo state network (a type of reservoir computing, RC-ESN), deep feed-forward artificial neural network (ANN), and recurrent neural network with long short-term memory (RNN-LSTM). This Lorenz 96 system has three tiers of nonlinearly interacting variables representing slow/large-scale ($X$), intermediate ($Y$), and fast/small-scale ($Z$) processes. For training or testing, only $X$ is available; $Y$ and $Z$ are never known or used. We show that RC-ESN substantially outperforms ANN and RNN-LSTM for short-term prediction, e.g., accurately forecasting the chaotic trajectories for hundreds of numerical solver’s time steps, equivalent to several Lyapunov timescales. The RNN-LSTM and ANN show some prediction skills as well; RNN-LSTM bests ANN. Furthermore, even after losing the trajectory, data predicted by RC-ESN and RNN-LSTM have probability density functions (PDFs) that closely match the true PDF, even at the tails. The PDF of the data predicted using ANN, however, deviates from the true PDF. Implications, caveats, and applications to data-driven and data-assisted surrogate modeling of complex nonlinear dynamical systems such as weather/climate are discussed. |
Tasks | |
Published | 2019-06-20 |
URL | https://arxiv.org/abs/1906.08829v3 |
https://arxiv.org/pdf/1906.08829v3.pdf | |
PWC | https://paperswithcode.com/paper/data-driven-prediction-of-a-multi-scale |
Repo | https://github.com/ashesh6810/RCESN_spatio_temporal |
Framework | tf |
Cerberus: A Multi-headed Derenderer
Title | Cerberus: A Multi-headed Derenderer |
Authors | Boyang Deng, Simon Kornblith, Geoffrey Hinton |
Abstract | To generalize to novel visual scenes with new viewpoints and new object poses, a visual system needs representations of the shapes of the parts of an object that are invariant to changes in viewpoint or pose. 3D graphics representations disentangle visual factors such as viewpoints and lighting from object structure in a natural way. It is possible to learn to invert the process that converts 3D graphics representations into 2D images, provided the 3D graphics representations are available as labels. When only the unlabeled images are available, however, learning to derender is much harder. We consider a simple model which is just a set of free floating parts. Each part has its own relation to the camera and its own triangular mesh which can be deformed to model the shape of the part. At test time, a neural network looks at a single image and extracts the shapes of the parts and their relations to the camera. Each part can be viewed as one head of a multi-headed derenderer. During training, the extracted parts are used as input to a differentiable 3D renderer and the reconstruction error is backpropagated to train the neural net. We make the learning task easier by encouraging the deformations of the part meshes to be invariant to changes in viewpoint and invariant to the changes in the relative positions of the parts that occur when the pose of an articulated body changes. Cerberus, our multi-headed derenderer, outperforms previous methods for extracting 3D parts from single images without part annotations, and it does quite well at extracting natural parts of human figures. |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11940v1 |
https://arxiv.org/pdf/1905.11940v1.pdf | |
PWC | https://paperswithcode.com/paper/cerberus-a-multi-headed-derenderer |
Repo | https://github.com/Chrisackerman1/Cerberus-A-Multi-headed-Derenderer |
Framework | none |
Rare Words: A Major Problem for Contextualized Embeddings And How to Fix it by Attentive Mimicking
Title | Rare Words: A Major Problem for Contextualized Embeddings And How to Fix it by Attentive Mimicking |
Authors | Timo Schick, Hinrich Schütze |
Abstract | Pretraining deep neural network architectures with a language modeling objective has brought large improvements for many natural language processing tasks. Exemplified by BERT, a recently proposed such architecture, we demonstrate that despite being trained on huge amounts of data, deep language models still struggle to understand rare words. To fix this problem, we adapt Attentive Mimicking, a method that was designed to explicitly learn embeddings for rare words, to deep language models. In order to make this possible, we introduce one-token approximation, a procedure that enables us to use Attentive Mimicking even when the underlying language model uses subword-based tokenization, i.e., it does not assign embeddings to all words. To evaluate our method, we create a novel dataset that tests the ability of language models to capture semantic properties of words without any task-specific fine-tuning. Using this dataset, we show that adding our adapted version of Attentive Mimicking to BERT does indeed substantially improve its understanding of rare words. |
Tasks | Language Modelling, Tokenization |
Published | 2019-04-14 |
URL | https://arxiv.org/abs/1904.06707v4 |
https://arxiv.org/pdf/1904.06707v4.pdf | |
PWC | https://paperswithcode.com/paper/rare-words-a-major-problem-for-contextualized |
Repo | https://github.com/timoschick/am-for-bert |
Framework | none |
Out-of-distribution Detection in Classifiers via Generation
Title | Out-of-distribution Detection in Classifiers via Generation |
Authors | Sachin Vernekar, Ashish Gaurav, Vahdat Abdelzad, Taylor Denouden, Rick Salay, Krzysztof Czarnecki |
Abstract | By design, discriminatively trained neural network classifiers produce reliable predictions only for in-distribution samples. For their real-world deployments, detecting out-of-distribution (OOD) samples is essential. Assuming OOD to be outside the closed boundary of in-distribution, typical neural classifiers do not contain the knowledge of this boundary for OOD detection during inference. There have been recent approaches to instill this knowledge in classifiers by explicitly training the classifier with OOD samples close to the in-distribution boundary. However, these generated samples fail to cover the entire in-distribution boundary effectively, thereby resulting in a sub-optimal OOD detector. In this paper, we analyze the feasibility of such approaches by investigating the complexity of producing such “effective” OOD samples. We also propose a novel algorithm to generate such samples using a manifold learning network (e.g., variational autoencoder) and then train an n+1 classifier for OOD detection, where the $n+1^{th}$ class represents the OOD samples. We compare our approach against several recent classifier-based OOD detectors on MNIST and Fashion-MNIST datasets. Overall the proposed approach consistently performs better than the others. |
Tasks | Out-of-Distribution Detection |
Published | 2019-10-09 |
URL | https://arxiv.org/abs/1910.04241v1 |
https://arxiv.org/pdf/1910.04241v1.pdf | |
PWC | https://paperswithcode.com/paper/out-of-distribution-detection-in-classifiers |
Repo | https://github.com/sverneka/OODGen |
Framework | none |
Dynamic Real-time Multimodal Routing with Hierarchical Hybrid Planning
Title | Dynamic Real-time Multimodal Routing with Hierarchical Hybrid Planning |
Authors | Shushman Choudhury, Jacob P. Knickerbocker, Mykel J. Kochenderfer |
Abstract | We introduce the problem of Dynamic Real-time Multimodal Routing (DREAMR), which requires planning and executing routes under uncertainty for an autonomous agent. The agent has access to a time-varying transit vehicle network in which it can use multiple modes of transportation. For instance, a drone can either fly or ride on terrain vehicles for segments of their routes. DREAMR is a difficult problem of sequential decision making under uncertainty with both discrete and continuous variables. We design a novel hierarchical hybrid planning framework to solve the DREAMR problem that exploits its structural decomposability. Our framework consists of a global open-loop planning layer that invokes and monitors a local closed-loop execution layer. Additional abstractions allow efficient and seamless interleaving of planning and execution. We create a large-scale simulation for DREAMR problems, with each scenario having hundreds of transportation routes and thousands of connection points. Our algorithmic framework significantly outperforms a receding horizon control baseline, in terms of elapsed time to reach the destination and energy expended by the agent. |
Tasks | Decision Making, Decision Making Under Uncertainty |
Published | 2019-02-05 |
URL | https://arxiv.org/abs/1902.01560v2 |
https://arxiv.org/pdf/1902.01560v2.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-real-time-multimodal-routing-with |
Repo | https://github.com/sisl/DreamrHHP |
Framework | none |