Paper Group NANR 231
Lifelong Inverse Reinforcement Learning. Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text. Understanding Degeneracies and Ambiguities in Attribute Transfer. Keynote: Unveiling the Linguistic Weaknesses of Neural MT. Task-oriented Dialogue System for Automatic Diagnosis. Multicalibration: Calibration for th …
Lifelong Inverse Reinforcement Learning
Title | Lifelong Inverse Reinforcement Learning |
Authors | Jorge Armando Mendez Mendez, Shashank Shivkumar, Eric Eaton |
Abstract | Methods for learning from demonstration (LfD) have shown success in acquiring behavior policies by imitating a user. However, even for a single task, LfD may require numerous demonstrations. For versatile agents that must learn many tasks via demonstration, this process would substantially burden the user if each task were learned in isolation. To address this challenge, we introduce the novel problem of lifelong learning from demonstration, which allows the agent to continually build upon knowledge learned from previously demonstrated tasks to accelerate the learning of new tasks, reducing the amount of demonstrations required. As one solution to this problem, we propose the first lifelong learning approach to inverse reinforcement learning, which learns consecutive tasks via demonstration, continually transferring knowledge between tasks to improve performance. |
Tasks | |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7702-lifelong-inverse-reinforcement-learning |
http://papers.nips.cc/paper/7702-lifelong-inverse-reinforcement-learning.pdf | |
PWC | https://paperswithcode.com/paper/lifelong-inverse-reinforcement-learning |
Repo | |
Framework | |
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text
Title | Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text |
Authors | |
Abstract | |
Tasks | |
Published | 2018-11-01 |
URL | https://www.aclweb.org/anthology/W18-6100/ |
https://www.aclweb.org/anthology/W18-6100 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-2018-emnlp-workshop-w-nut |
Repo | |
Framework | |
Understanding Degeneracies and Ambiguities in Attribute Transfer
Title | Understanding Degeneracies and Ambiguities in Attribute Transfer |
Authors | Attila Szabo, Qiyang Hu, Tiziano Portenier, Matthias Zwicker, Paolo Favaro |
Abstract | We study the problem of building models that can transfer selected attributes from one image to another without affecting the other attributes. Towards this goal, we develop analysis and a training methodology for autoencoding models, whose encoded features aim to disentangle attributes. These features are explicitly split into two components: one that should represent attributes in common between pairs of images, and another that should represent attributes that change between pairs of images. We show that achieving this objective faces two main challenges: One is that the model may learn degenerate mappings, which we call shortcut problem, and the other is that the attribute representation for an image is not guaranteed to follow the same interpretation on another image, which we call reference ambiguity. To address the shortcut problem, we introduce novel constraints on image pairs and triplets and show their effectiveness both analytically and experimentally. In the case of the reference ambiguity, we formally prove that a model that guarantees an ideal feature separation cannot be built. We validate our findings on several datasets and show that, surprisingly, trained neural networks often do not exhibit the reference ambiguity. |
Tasks | |
Published | 2018-09-01 |
URL | http://openaccess.thecvf.com/content_ECCV_2018/html/Attila_Szabo_Understanding_Degeneracies_and_ECCV_2018_paper.html |
http://openaccess.thecvf.com/content_ECCV_2018/papers/Attila_Szabo_Understanding_Degeneracies_and_ECCV_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/understanding-degeneracies-and-ambiguities-in |
Repo | |
Framework | |
Keynote: Unveiling the Linguistic Weaknesses of Neural MT
Title | Keynote: Unveiling the Linguistic Weaknesses of Neural MT |
Authors | Arianna Bisazza |
Abstract | |
Tasks | Machine Translation |
Published | 2018-03-01 |
URL | https://www.aclweb.org/anthology/W18-1801/ |
https://www.aclweb.org/anthology/W18-1801 | |
PWC | https://paperswithcode.com/paper/keynote-unveiling-the-linguistic-weaknesses |
Repo | |
Framework | |
Task-oriented Dialogue System for Automatic Diagnosis
Title | Task-oriented Dialogue System for Automatic Diagnosis |
Authors | Zhongyu Wei, Qianlong Liu, Baolin Peng, Huaixiao Tou, Ting Chen, Xuanjing Huang, Kam-fai Wong, Xiangying Dai |
Abstract | In this paper, we make a move to build a dialogue system for automatic diagnosis. We first build a dataset collected from an online medical forum by extracting symptoms from both patients{'} self-reports and conversational data between patients and doctors. Then we propose a task-oriented dialogue system framework to make diagnosis for patients automatically, which can converse with patients to collect additional symptoms beyond their self-reports. Experimental results on our dataset show that additional symptoms extracted from conversation can greatly improve the accuracy for disease identification and our dialogue system is able to collect these symptoms automatically and make a better diagnosis. |
Tasks | |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-2033/ |
https://www.aclweb.org/anthology/P18-2033 | |
PWC | https://paperswithcode.com/paper/task-oriented-dialogue-system-for-automatic |
Repo | |
Framework | |
Multicalibration: Calibration for the (Computationally-Identifiable) Masses
Title | Multicalibration: Calibration for the (Computationally-Identifiable) Masses |
Authors | Ursula Hebert-Johnson, Michael Kim, Omer Reingold, Guy Rothblum |
Abstract | We develop and study multicalibration as a new measure of fairness in machine learning that aims to mitigate inadvertent or malicious discrimination that is introduced at training time (even from ground truth data). Multicalibration guarantees meaningful (calibrated) predictions for every subpopulation that can be identified within a specified class of computations. The specified class can be quite rich; in particular, it can contain many overlapping subgroups of a protected group. We demonstrate that in many settings this strong notion of protection from discrimination is provably attainable and aligned with the goal of obtaining accurate predictions. Along the way, we present algorithms for learning a multicalibrated predictor, study the computational complexity of this task, and illustrate tight connections to the agnostic learning model. |
Tasks | Calibration |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=2448 |
http://proceedings.mlr.press/v80/hebert-johnson18a/hebert-johnson18a.pdf | |
PWC | https://paperswithcode.com/paper/multicalibration-calibration-for-the |
Repo | |
Framework | |
SimpleNLG-ZH: a Linguistic Realisation Engine for Mandarin
Title | SimpleNLG-ZH: a Linguistic Realisation Engine for Mandarin |
Authors | Guanyi Chen, Kees van Deemter, Chenghua Lin |
Abstract | We introduce SimpleNLG-ZH, a realisation engine for Mandarin that follows the software design paradigm of SimpleNLG (Gatt and Reiter, 2009). We explain the core grammar (morphology and syntax) and the lexicon of SimpleNLG-ZH, which is very different from English and other languages for which SimpleNLG engines have been built. The system was evaluated by regenerating expressions from a body of test sentences and a corpus of human-authored expressions. Human evaluation was conducted to estimate the quality of regenerated sentences. |
Tasks | Morphological Inflection, Text Generation |
Published | 2018-11-01 |
URL | https://www.aclweb.org/anthology/W18-6506/ |
https://www.aclweb.org/anthology/W18-6506 | |
PWC | https://paperswithcode.com/paper/simplenlg-zh-a-linguistic-realisation-engine |
Repo | |
Framework | |
How Many Samples are Needed to Estimate a Convolutional Neural Network?
Title | How Many Samples are Needed to Estimate a Convolutional Neural Network? |
Authors | Simon S. Du, Yining Wang, Xiyu Zhai, Sivaraman Balakrishnan, Ruslan R. Salakhutdinov, Aarti Singh |
Abstract | A widespread folklore for explaining the success of Convolutional Neural Networks (CNNs) is that CNNs use a more compact representation than the Fully-connected Neural Network (FNN) and thus require fewer training samples to accurately estimate their parameters. We initiate the study of rigorously characterizing the sample complexity of estimating CNNs. We show that for an $m$-dimensional convolutional filter with linear activation acting on a $d$-dimensional input, the sample complexity of achieving population prediction error of $\epsilon$ is $\widetilde{O(m/\epsilon^2)$, whereas the sample-complexity for its FNN counterpart is lower bounded by $\Omega(d/\epsilon^2)$ samples. Since, in typical settings $m \ll d$, this result demonstrates the advantage of using a CNN. We further consider the sample complexity of estimating a one-hidden-layer CNN with linear activation where both the $m$-dimensional convolutional filter and the $r$-dimensional output weights are unknown. For this model, we show that the sample complexity is $\widetilde{O}\left((m+r)/\epsilon^2\right)$ when the ratio between the stride size and the filter size is a constant. For both models, we also present lower bounds showing our sample complexities are tight up to logarithmic factors. Our main tools for deriving these results are a localized empirical process analysis and a new lemma characterizing the convolutional structure. We believe that these tools may inspire further developments in understanding CNNs. |
Tasks | |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7320-how-many-samples-are-needed-to-estimate-a-convolutional-neural-network |
http://papers.nips.cc/paper/7320-how-many-samples-are-needed-to-estimate-a-convolutional-neural-network.pdf | |
PWC | https://paperswithcode.com/paper/how-many-samples-are-needed-to-estimate-a-1 |
Repo | |
Framework | |
A Brief Introduction to Natural Language Generation within Computational Creativity
Title | A Brief Introduction to Natural Language Generation within Computational Creativity |
Authors | Ben Burtenshaw |
Abstract | |
Tasks | Text Generation |
Published | 2018-11-01 |
URL | https://www.aclweb.org/anthology/W18-6601/ |
https://www.aclweb.org/anthology/W18-6601 | |
PWC | https://paperswithcode.com/paper/a-brief-introduction-to-natural-language |
Repo | |
Framework | |
A probabilistic population code based on neural samples
Title | A probabilistic population code based on neural samples |
Authors | Sabyasachi Shivkumar, Richard Lange, Ankani Chattoraj, Ralf Haefner |
Abstract | Sensory processing is often characterized as implementing probabilistic inference: networks of neurons compute posterior beliefs over unobserved causes given the sensory inputs. How these beliefs are computed and represented by neural responses is much-debated (Fiser et al. 2010, Pouget et al. 2013). A central debate concerns the question of whether neural responses represent samples of latent variables (Hoyer & Hyvarinnen 2003) or parameters of their distributions (Ma et al. 2006) with efforts being made to distinguish between them (Grabska-Barwinska et al. 2013). A separate debate addresses the question of whether neural responses are proportionally related to the encoded probabilities (Barlow 1969), or proportional to the logarithm of those probabilities (Jazayeri & Movshon 2006, Ma et al. 2006, Beck et al. 2012). Here, we show that these alternatives – contrary to common assumptions – are not mutually exclusive and that the very same system can be compatible with all of them. As a central analytical result, we show that modeling neural responses in area V1 as samples from a posterior distribution over latents in a linear Gaussian model of the image implies that those neural responses form a linear Probabilistic Population Code (PPC, Ma et al. 2006). In particular, the posterior distribution over some experimenter-defined variable like “orientation” is part of the exponential family with sufficient statistics that are linear in the neural sampling-based firing rates. |
Tasks | |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7938-a-probabilistic-population-code-based-on-neural-samples |
http://papers.nips.cc/paper/7938-a-probabilistic-population-code-based-on-neural-samples.pdf | |
PWC | https://paperswithcode.com/paper/a-probabilistic-population-code-based-on |
Repo | |
Framework | |
Learning Distributional Token Representations from Visual Features
Title | Learning Distributional Token Representations from Visual Features |
Authors | Samuel Broscheit |
Abstract | In this study, we compare token representations constructed from visual features (i.e., pixels) with standard lookup-based embeddings. Our goal is to gain insight about the challenges of encoding a text representation from low-level features, e.g. from characters or pixels. We focus on Chinese, which{—}as a logographic language{—}has properties that make a representation via visual features challenging and interesting. To train and evaluate different models for the token representation, we chose the task of character-based neural machine translation (NMT) from Chinese to English. We found that a token representation computed only from visual features can achieve competitive results to lookup embeddings. However, we also show different strengths and weaknesses in the models{'} performance in a part-of-speech tagging task and also a semantic similarity task. In summary, we show that it is possible to achieve a \textit{text representation} only from pixels. We hope that this is a useful stepping stone for future studies that exclusively rely on visual input, or aim at exploiting visual features of written language. |
Tasks | Machine Translation, Part-Of-Speech Tagging, Representation Learning, Semantic Similarity, Semantic Textual Similarity, Tokenization |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/W18-3025/ |
https://www.aclweb.org/anthology/W18-3025 | |
PWC | https://paperswithcode.com/paper/learning-distributional-token-representations |
Repo | |
Framework | |
Learning Deep Generative Models With Discrete Latent Variables
Title | Learning Deep Generative Models With Discrete Latent Variables |
Authors | Hengyuan Hu, Ruslan Salakhutdinov |
Abstract | There have been numerous recent advancements on learning deep generative models with latent variables thanks to the reparameterization trick that allows to train deep directed models effectively. However, since reparameterization trick only works on continuous variables, deep generative models with discrete latent variables still remain hard to train and perform considerably worse than their continuous counterparts. In this paper, we attempt to shrink this gap by introducing a new architecture and its learning procedure. We develop a hybrid generative model with binary latent variables that consists of an undirected graphical model and a deep neural network. We propose an efficient two-stage pretraining and training procedure that is crucial for learning these models. Experiments on binarized digits and images of natural scenes demonstrate that our model achieves close to the state-of-the-art performance in terms of density estimation and is capable of generating coherent images of natural scenes. |
Tasks | Density Estimation |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=SkZ-BnyCW |
https://openreview.net/pdf?id=SkZ-BnyCW | |
PWC | https://paperswithcode.com/paper/learning-deep-generative-models-with-discrete |
Repo | |
Framework | |
Efficient Neural Architecture Search via Parameters Sharing
Title | Efficient Neural Architecture Search via Parameters Sharing |
Authors | Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, Jeff Dean |
Abstract | We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. ENAS constructs a large computational graph, where each subgraph represents a neural network architecture, hence forcing all architectures to share their parameters. A controller is trained with policy gradient to search for a subgraph that maximizes the expected reward on a validation set. Meanwhile a model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Sharing parameters among child models allows ENAS to deliver strong empirical performances, whilst using much fewer GPU-hours than existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On Penn Treebank, ENAS discovers a novel architecture that achieves a test perplexity of 56.3, on par with the existing state-of-the-art among all methods without post-training processing. On CIFAR-10, ENAS finds a novel architecture that achieves 2.89% test error, which is on par with the 2.65% test error of NASNet (Zoph et al., 2018). |
Tasks | Neural Architecture Search |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=2247 |
http://proceedings.mlr.press/v80/pham18a/pham18a.pdf | |
PWC | https://paperswithcode.com/paper/efficient-neural-architecture-search-via |
Repo | |
Framework | |
Proceedings of the 7th workshop on NLP for Computer Assisted Language Learning
Title | Proceedings of the 7th workshop on NLP for Computer Assisted Language Learning |
Authors | |
Abstract | |
Tasks | |
Published | 2018-11-01 |
URL | https://www.aclweb.org/anthology/W18-7100/ |
https://www.aclweb.org/anthology/W18-7100 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-7th-workshop-on-nlp-for |
Repo | |
Framework | |
A Mathematical Model For Optimal Decisions In A Representative Democracy
Title | A Mathematical Model For Optimal Decisions In A Representative Democracy |
Authors | Malik Magdon-Ismail, Lirong Xia |
Abstract | Direct democracy, where each voter casts one vote, fails when the average voter competence falls below 50%. This happens in noisy settings when voters have limited information. Representative democracy, where voters choose representatives to vote, can be an elixir in both these situations. We introduce a mathematical model for studying representative democracy, in particular understanding the parameters of a representative democracy that gives maximum decision making capability. Our main result states that under general and natural conditions, 1. for fixed voting cost, the optimal number of representatives is linear; 2. for polynomial cost, the optimal number of representatives is logarithmic. |
Tasks | Decision Making |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7720-a-mathematical-model-for-optimal-decisions-in-a-representative-democracy |
http://papers.nips.cc/paper/7720-a-mathematical-model-for-optimal-decisions-in-a-representative-democracy.pdf | |
PWC | https://paperswithcode.com/paper/a-mathematical-model-for-optimal-decisions-in |
Repo | |
Framework | |