February 1, 2020

2987 words 15 mins read

Paper Group AWR 324

Paper Group AWR 324

Personalizing Dialogue Agents via Meta-Learning. Consistent Dialogue Generation with Self-supervised Feature Learning. Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks. Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations. Learning the Difference that Makes a Difference with …

Personalizing Dialogue Agents via Meta-Learning

Title Personalizing Dialogue Agents via Meta-Learning
Authors Zhaojiang Lin, Andrea Madotto, Chien-Sheng Wu, Pascale Fung
Abstract Existing personalized dialogue models use human designed persona descriptions to improve dialogue consistency. Collecting such descriptions from existing dialogues is expensive and requires hand-crafted feature designs. In this paper, we propose to extend Model-Agnostic Meta-Learning (MAML)(Finn et al., 2017) to personalized dialogue learning without using any persona descriptions. Our model learns to quickly adapt to new personas by leveraging only a few dialogue samples collected from the same user, which is fundamentally different from conditioning the response on the persona descriptions. Empirical results on Persona-chat dataset (Zhang et al., 2018) indicate that our solution outperforms non-meta-learning baselines using automatic evaluation metrics, and in terms of human-evaluated fluency and consistency.
Tasks Dialogue Generation, Meta-Learning
Published 2019-05-24
URL https://arxiv.org/abs/1905.10033v1
PDF https://arxiv.org/pdf/1905.10033v1.pdf
PWC https://paperswithcode.com/paper/personalizing-dialogue-agents-via-meta
Repo https://github.com/HLTCHKUST/PAML
Framework pytorch

Consistent Dialogue Generation with Self-supervised Feature Learning

Title Consistent Dialogue Generation with Self-supervised Feature Learning
Authors Yizhe Zhang, Xiang Gao, Sungjin Lee, Chris Brockett, Michel Galley, Jianfeng Gao, Bill Dolan
Abstract Generating responses that are consistent with the dialogue context is one of the central challenges in building engaging conversational agents. In this paper, we propose a neural conversation model that generates consistent responses by maintaining certain features related to topics and personas throughout the conversation. Unlike past work that requires external supervision such as user identities, which are often unavailable or classified as sensitive information, our approach trains topic and persona feature extractors in a self-supervised way by utilizing the natural structure of dialogue data. Moreover, we adopt a binary feature representation and introduce a feature disentangling loss which, paired with controllable response generation techniques, allows us to promote or demote certain learned topics and personas features. The evaluation result demonstrates the model’s capability of capturing meaningful topics and personas features, and the incorporation of the learned features brings significant improvement in terms of the quality of generated responses on two datasets, even comparing with model which explicit persona information.
Tasks Dialogue Generation
Published 2019-03-13
URL http://arxiv.org/abs/1903.05759v2
PDF http://arxiv.org/pdf/1903.05759v2.pdf
PWC https://paperswithcode.com/paper/consistent-dialogue-generation-with-self
Repo https://github.com/dreasysnail/CoCon
Framework tf

Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks

Title Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks
Authors Yanzhao Wu, Ling Liu, Juhyun Bae, Ka-Ho Chow, Arun Iyengar, Calton Pu, Wenqi Wei, Lei Yu, Qi Zhang
Abstract Learning Rate (LR) is an important hyper-parameter to tune for effective training of deep neural networks (DNNs). Even for the baseline of a constant learning rate, it is non-trivial to choose a good constant value for training a DNN. Dynamic learning rates involve multi-step tuning of LR values at various stages of the training process and offer high accuracy and fast convergence. However, they are much harder to tune. In this paper, we present a comprehensive study of 13 learning rate functions and their associated LR policies by examining their range parameters, step parameters, and value update parameters. We propose a set of metrics for evaluating and selecting LR policies, including the classification confidence, variance, cost, and robustness, and implement them in LRBench, an LR benchmarking system. LRBench can assist end-users and DNN developers to select good LR policies and avoid bad LR policies for training their DNNs. We tested LRBench on Caffe, an open source deep learning framework, to showcase the tuning optimization of LR policies. Evaluated through extensive experiments, we attempt to demystify the tuning of LR policies by identifying good LR policies with effective LR value ranges and step sizes for LR update schedules.
Tasks
Published 2019-08-18
URL https://arxiv.org/abs/1908.06477v2
PDF https://arxiv.org/pdf/1908.06477v2.pdf
PWC https://paperswithcode.com/paper/demystifying-learning-rate-polices-for-high
Repo https://github.com/git-disl/LRBench
Framework none

Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations

Title Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations
Authors Christian Hadiwinoto, Hwee Tou Ng, Wee Chung Gan
Abstract Contextualized word representations are able to give different representations for the same word in different contexts, and they have been shown to be effective in downstream natural language processing tasks, such as question answering, named entity recognition, and sentiment analysis. However, evaluation on word sense disambiguation (WSD) in prior work shows that using contextualized word representations does not outperform the state-of-the-art approach that makes use of non-contextualized word embeddings. In this paper, we explore different strategies of integrating pre-trained contextualized word representations and our best strategy achieves accuracies exceeding the best prior published accuracies by significant margins on multiple benchmark WSD datasets. We make the source code available at https://github.com/nusnlp/contextemb-wsd.
Tasks Named Entity Recognition, Question Answering, Sentiment Analysis, Word Embeddings, Word Sense Disambiguation
Published 2019-10-01
URL https://arxiv.org/abs/1910.00194v2
PDF https://arxiv.org/pdf/1910.00194v2.pdf
PWC https://paperswithcode.com/paper/improved-word-sense-disambiguation-using-pre
Repo https://github.com/nusnlp/contextemb-wsd
Framework pytorch

Learning the Difference that Makes a Difference with Counterfactually-Augmented Data

Title Learning the Difference that Makes a Difference with Counterfactually-Augmented Data
Authors Divyansh Kaushik, Eduard Hovy, Zachary C. Lipton
Abstract Despite alarm over the reliance of machine learning systems on so-called spurious patterns, the term lacks coherent meaning in standard statistical frameworks. However, the language of causality offers clarity: spurious associations are due to confounding (e.g., a common cause), but not direct or indirect causal effects. In this paper, we focus on natural language processing, introducing methods and resources for training models less sensitive to spurious patterns. Given documents and their initial labels, we task humans with revising each document so that it (i) accords with a counterfactual target label; (ii) retains internal coherence; and (iii) avoids unnecessary changes. Interestingly, on sentiment analysis and natural language inference tasks, classifiers trained on original data fail on their counterfactually-revised counterparts and vice versa. Classifiers trained on combined datasets perform remarkably well, just shy of those specialized to either domain. While classifiers trained on either original or manipulated data alone are sensitive to spurious features (e.g., mentions of genre), models trained on the combined data are less sensitive to this signal. Both datasets are publicly available.
Tasks Data Augmentation, Natural Language Inference, Sentiment Analysis
Published 2019-09-26
URL https://arxiv.org/abs/1909.12434v2
PDF https://arxiv.org/pdf/1909.12434v2.pdf
PWC https://paperswithcode.com/paper/learning-the-difference-that-makes-a
Repo https://github.com/dkaushik96/bizarro-data
Framework none

Syntax-Aware Aspect-Level Sentiment Classification with Proximity-Weighted Convolution Network

Title Syntax-Aware Aspect-Level Sentiment Classification with Proximity-Weighted Convolution Network
Authors Chen Zhang, Qiuchi Li, Dawei Song
Abstract It has been widely accepted that Long Short-Term Memory (LSTM) network, coupled with attention mechanism and memory module, is useful for aspect-level sentiment classification. However, existing approaches largely rely on the modelling of semantic relatedness of an aspect with its context words, while to some extent ignore their syntactic dependencies within sentences. Consequently, this may lead to an undesirable result that the aspect attends on contextual words that are descriptive of other aspects. In this paper, we propose a proximity-weighted convolution network to offer an aspect-specific syntax-aware representation of contexts. In particular, two ways of determining proximity weight are explored, namely position proximity and dependency proximity. The representation is primarily abstracted by a bidirectional LSTM architecture and further enhanced by a proximity-weighted convolution. Experiments conducted on the SemEval 2014 benchmark demonstrate the effectiveness of our proposed approach compared with a range of state-of-the-art models.
Tasks Sentiment Analysis
Published 2019-09-23
URL https://arxiv.org/abs/1909.10171v1
PDF https://arxiv.org/pdf/1909.10171v1.pdf
PWC https://paperswithcode.com/paper/190910171
Repo https://github.com/GeneZC/PWCN
Framework pytorch

Text Length Adaptation in Sentiment Classification

Title Text Length Adaptation in Sentiment Classification
Authors Reinald Kim Amplayo, Seonjae Lim, Seung-won Hwang
Abstract Can a text classifier generalize well for datasets where the text length is different? For example, when short reviews are sentiment-labeled, can these transfer to predict the sentiment of long reviews (i.e., short to long transfer), or vice versa? While unsupervised transfer learning has been well-studied for cross domain/lingual transfer tasks, Cross Length Transfer (CLT) has not yet been explored. One reason is the assumption that length difference is trivially transferable in classification. We show that it is not, because short/long texts differ in context richness and word intensity. We devise new benchmark datasets from diverse domains and languages, and show that existing models from similar tasks cannot deal with the unique challenge of transferring across text lengths. We introduce a strong baseline model called BaggedCNN that treats long texts as bags containing short texts. We propose a state-of-the-art CLT model called Length Transfer Networks (LeTraNets) that introduces a two-way encoding scheme for short and long texts using multiple training mechanisms. We test our models and find that existing models perform worse than the BaggedCNN baseline, while LeTraNets outperforms all models.
Tasks Sentiment Analysis, Transfer Learning
Published 2019-09-18
URL https://arxiv.org/abs/1909.08306v1
PDF https://arxiv.org/pdf/1909.08306v1.pdf
PWC https://paperswithcode.com/paper/text-length-adaptation-in-sentiment
Repo https://github.com/rktamplayo/LeTraNets
Framework tf

Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks

Title Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks
Authors Chen Zhang, Qiuchi Li, Dawei Song
Abstract Due to their inherent capability in semantic alignment of aspects and their context words, attention mechanism and Convolutional Neural Networks (CNNs) are widely applied for aspect-based sentiment classification. However, these models lack a mechanism to account for relevant syntactical constraints and long-range word dependencies, and hence may mistakenly recognize syntactically irrelevant contextual words as clues for judging aspect sentiment. To tackle this problem, we propose to build a Graph Convolutional Network (GCN) over the dependency tree of a sentence to exploit syntactical information and word dependencies. Based on it, a novel aspect-specific sentiment classification framework is raised. Experiments on three benchmarking collections illustrate that our proposed model has comparable effectiveness to a range of state-of-the-art models, and further demonstrate that both syntactical information and long-range word dependencies are properly captured by the graph convolution structure.
Tasks Sentiment Analysis
Published 2019-09-08
URL https://arxiv.org/abs/1909.03477v2
PDF https://arxiv.org/pdf/1909.03477v2.pdf
PWC https://paperswithcode.com/paper/aspect-based-sentiment-classification-with
Repo https://github.com/GeneZC/ASGCN
Framework pytorch
Title Transfer Learning Between Related Tasks Using Expected Label Proportions
Authors Matan Ben Noach, Yoav Goldberg
Abstract Deep learning systems thrive on abundance of labeled training data but such data is not always available, calling for alternative methods of supervision. One such method is expectation regularization (XR) (Mann and McCallum, 2007), where models are trained based on expected label proportions. We propose a novel application of the XR framework for transfer learning between related tasks, where knowing the labels of task A provides an estimation of the label proportion of task B. We then use a model trained for A to label a large corpus, and use this corpus with an XR loss to train a model for task B. To make the XR framework applicable to large-scale deep-learning setups, we propose a stochastic batched approximation procedure. We demonstrate the approach on the task of Aspect-based Sentiment classification, where we effectively use a sentence-level sentiment predictor to train accurate aspect-based predictor. The method improves upon fully supervised neural system trained on aspect-level data, and is also cumulative with LM-based pretraining, as we demonstrate by improving a BERT-based Aspect-based Sentiment model.
Tasks Sentiment Analysis, Transfer Learning
Published 2019-09-01
URL https://arxiv.org/abs/1909.00430v1
PDF https://arxiv.org/pdf/1909.00430v1.pdf
PWC https://paperswithcode.com/paper/transfer-learning-between-related-tasks-using
Repo https://github.com/MatanBN/XRTransfer
Framework none

Adversarial Training and Robustness for Multiple Perturbations

Title Adversarial Training and Robustness for Multiple Perturbations
Authors Florian Tramèr, Dan Boneh
Abstract Defenses against adversarial examples, such as adversarial training, are typically tailored to a single perturbation type (e.g., small $\ell_\infty$-noise). For other perturbations, these defenses offer no guarantees and, at times, even increase the model’s vulnerability. Our aim is to understand the reasons underlying this robustness trade-off, and to train models that are simultaneously robust to multiple perturbation types. We prove that a trade-off in robustness to different types of $\ell_p$-bounded and spatial perturbations must exist in a natural and simple statistical setting. We corroborate our formal analysis by demonstrating similar robustness trade-offs on MNIST and CIFAR10. Building upon new multi-perturbation adversarial training schemes, and a novel efficient attack for finding $\ell_1$-bounded adversarial examples, we show that no model trained against multiple attacks achieves robustness competitive with that of models trained on each attack individually. In particular, we uncover a pernicious gradient-masking phenomenon on MNIST, which causes adversarial training with first-order $\ell_\infty, \ell_1$ and $\ell_2$ adversaries to achieve merely $50%$ accuracy. Our results question the viability and computational scalability of extending adversarial robustness, and adversarial training, to multiple perturbation types.
Tasks
Published 2019-04-30
URL https://arxiv.org/abs/1904.13000v2
PDF https://arxiv.org/pdf/1904.13000v2.pdf
PWC https://paperswithcode.com/paper/adversarial-training-and-robustness-for
Repo https://github.com/ftramer/MultiRobustness
Framework tf

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Title Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Authors Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov
Abstract Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the context fragmentation problem. As a result, Transformer-XL learns dependency that is 80% longer than RNNs and 450% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformers during evaluation. Notably, we improve the state-of-the-art results of bpc/perplexity to 0.99 on enwiki8, 1.08 on text8, 18.3 on WikiText-103, 21.8 on One Billion Word, and 54.5 on Penn Treebank (without finetuning). When trained only on WikiText-103, Transformer-XL manages to generate reasonably coherent, novel text articles with thousands of tokens. Our code, pretrained models, and hyperparameters are available in both Tensorflow and PyTorch.
Tasks Language Modelling
Published 2019-01-09
URL https://arxiv.org/abs/1901.02860v3
PDF https://arxiv.org/pdf/1901.02860v3.pdf
PWC https://paperswithcode.com/paper/transformer-xl-attentive-language-models
Repo https://github.com/benkrause/dynamiceval-transformer
Framework tf

How Does Language Influence Documentation Workflow? Unsupervised Word Discovery Using Translations in Multiple Languages

Title How Does Language Influence Documentation Workflow? Unsupervised Word Discovery Using Translations in Multiple Languages
Authors Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier
Abstract For language documentation initiatives, transcription is an expensive resource: one minute of audio is estimated to take one hour and a half on average of a linguist’s work (Austin and Sallabank, 2013). Recently, collecting aligned translations in well-resourced languages became a popular solution for ensuring posterior interpretability of the recordings (Adda et al. 2016). In this paper we investigate language-related impact in automatic approaches for computational language documentation. We translate the bilingual Mboshi-French parallel corpus (Godard et al. 2017) into four other languages, and we perform bilingual-rooted unsupervised word discovery. Our results hint towards an impact of the well-resourced language in the quality of the output. However, by combining the information learned by different bilingual models, we are only able to marginally increase the quality of the segmentation.
Tasks
Published 2019-10-11
URL https://arxiv.org/abs/1910.05154v1
PDF https://arxiv.org/pdf/1910.05154v1.pdf
PWC https://paperswithcode.com/paper/how-does-language-influence-documentation
Repo https://github.com/mzboito/mmboshi
Framework none

Deep Learning with Gaussian Differential Privacy

Title Deep Learning with Gaussian Differential Privacy
Authors Zhiqi Bu, Jinshuo Dong, Qi Long, Weijie J. Su
Abstract Deep learning models are often trained on datasets that contain sensitive information such as individuals’ shopping transactions, personal contacts, and medical records. An increasingly important line of work therefore has sought to train neural networks subject to privacy constraints that are specified by differential privacy or its divergence-based relaxations. These privacy definitions, however, have weaknesses in handling certain important primitives (composition and subsampling), thereby giving loose or complicated privacy analyses of training neural networks. In this paper, we consider a recently proposed privacy definition termed f-differential privacy [17] for a refined privacy analysis of training neural networks. Leveraging the appealing properties of f-differential privacy in handling composition and subsampling, this paper derives analytically tractable expressions for the privacy guarantees of both stochastic gradient descent and Adam used in training deep neural networks, without the need of developing sophisticated techniques as [3] did. Our results demonstrate that the f-differential privacy framework allows for a new privacy analysis that improves on the prior analysis [3], which in turn suggests tuning certain parameters of neural networks for a better prediction accuracy without violating the privacy budget. These theoretically derived improvements are confirmed by our experiments in a range of tasks in image classification, text classification, and recommender systems.
Tasks Image Classification, Recommendation Systems, Text Classification
Published 2019-11-26
URL https://arxiv.org/abs/1911.11607v2
PDF https://arxiv.org/pdf/1911.11607v2.pdf
PWC https://paperswithcode.com/paper/deep-learning-with-gaussian-differential
Repo https://github.com/woodyx218/Deep-Learning-with-GDP
Framework tf

Detecting Adversarial Examples through Nonlinear Dimensionality Reduction

Title Detecting Adversarial Examples through Nonlinear Dimensionality Reduction
Authors Francesco Crecchi, Davide Bacciu, Battista Biggio
Abstract Deep neural networks are vulnerable to adversarial examples, i.e., carefully-perturbed inputs aimed to mislead classification. This work proposes a detection method based on combining non-linear dimensionality reduction and density estimation techniques. Our empirical findings show that the proposed approach is able to effectively detect adversarial examples crafted by non-adaptive attackers, i.e., not specifically tuned to bypass the detection method. Given our promising results, we plan to extend our analysis to adaptive attackers in future work.
Tasks Density Estimation, Dimensionality Reduction
Published 2019-04-30
URL http://arxiv.org/abs/1904.13094v2
PDF http://arxiv.org/pdf/1904.13094v2.pdf
PWC https://paperswithcode.com/paper/detecting-adversarial-examples-through
Repo https://github.com/FrancescoCrecchi/AE_Detector
Framework none

Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization

Title Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization
Authors Aaron Klein, Frank Hutter
Abstract Due to the high computational demands executing a rigorous comparison between hyperparameter optimization (HPO) methods is often cumbersome. The goal of this paper is to facilitate a better empirical evaluation of HPO methods by providing benchmarks that are cheap to evaluate, but still represent realistic use cases. We believe these benchmarks provide an easy and efficient way to conduct reproducible experiments for neural hyperparameter search. Our benchmarks consist of a large grid of configurations of a feed forward neural network on four different regression datasets including architectural hyperparameters and hyperparameters concerning the training pipeline. Based on this data, we performed an in-depth analysis to gain a better understanding of the properties of the optimization problem, as well as of the importance of different types of hyperparameters. Second, we exhaustively compared various different state-of-the-art methods from the hyperparameter optimization literature on these benchmarks in terms of performance and robustness.
Tasks Hyperparameter Optimization
Published 2019-05-13
URL https://arxiv.org/abs/1905.04970v1
PDF https://arxiv.org/pdf/1905.04970v1.pdf
PWC https://paperswithcode.com/paper/tabular-benchmarks-for-joint-architecture-and
Repo https://github.com/automl/nas_benchmarks
Framework none
comments powered by Disqus