February 1, 2020

2994 words 15 mins read

Paper Group AWR 148

Look Who’s Talking: Inferring Speaker Attributes from Personal Longitudinal Dialog. Continual and Multi-Task Architecture Search. Regularization Matters in Policy Optimization. PAWS: Paraphrase Adversaries from Word Scrambling. Overcoming Catastrophic Forgetting with Unlabeled Data in the Wild. Learning Reward Functions by Integrating Human Demonst …

Look Who’s Talking: Inferring Speaker Attributes from Personal Longitudinal Dialog


Title	Look Who’s Talking: Inferring Speaker Attributes from Personal Longitudinal Dialog
Authors	Charles Welch, Verónica Pérez-Rosas, Jonathan K. Kummerfeld, Rada Mihalcea
Abstract	We examine a large dialog corpus obtained from the conversation history of a single individual with 104 conversation partners. The corpus consists of half a million instant messages, across several messaging platforms. We focus our analyses on seven speaker attributes, each of which partitions the set of speakers, namely: gender; relative age; family member; romantic partner; classmate; co-worker; and native to the same country. In addition to the content of the messages, we examine conversational aspects such as the time messages are sent, messaging frequency, psycholinguistic word categories, linguistic mirroring, and graph-based features reflecting how people in the corpus mention each other. We present two sets of experiments predicting each attribute using (1) short context windows; and (2) a larger set of messages. We find that using all features leads to gains of 9-14% over using message text only.
Tasks
Published	2019-04-25
URL	http://arxiv.org/abs/1904.11610v1
PDF	http://arxiv.org/pdf/1904.11610v1.pdf
PWC	https://paperswithcode.com/paper/look-whos-talking-inferring-speaker
Repo	https://github.com/cfwelch/longitudinal_dialog
Framework	pytorch

Continual and Multi-Task Architecture Search


Title	Continual and Multi-Task Architecture Search
Authors	Ramakanth Pasunuru, Mohit Bansal
Abstract	Architecture search is the process of automatically learning the neural model or cell structure that best suits the given task. Recently, this approach has shown promising performance improvements (on language modeling and image classification) with reasonable training speed, using a weight sharing strategy called Efficient Neural Architecture Search (ENAS). In our work, we first introduce a novel continual architecture search (CAS) approach, so as to continually evolve the model parameters during the sequential training of several tasks, without losing performance on previously learned tasks (via block-sparsity and orthogonality constraints), thus enabling life-long learning. Next, we explore a multi-task architecture search (MAS) approach over ENAS for finding a unified, single cell structure that performs well across multiple tasks (via joint controller rewards), and hence allows more generalizable transfer of the cell structure knowledge to an unseen new task. We empirically show the effectiveness of our sequential continual learning and parallel multi-task learning based architecture search approaches on diverse sentence-pair classification tasks (GLUE) and multimodal-generation based video captioning tasks. Further, we present several ablations and analyses on the learned cell structures.
Tasks	Continual Learning, Image Classification, Language Modelling, Multi-Task Learning, Neural Architecture Search, Video Captioning
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05226v1
PDF	https://arxiv.org/pdf/1906.05226v1.pdf
PWC	https://paperswithcode.com/paper/continual-and-multi-task-architecture-search
Repo	https://github.com/ramakanth-pasunuru/CAS-MAS
Framework	pytorch

Regularization Matters in Policy Optimization


Title	Regularization Matters in Policy Optimization
Authors	Zhuang Liu, Xuanlin Li, Bingyi Kang, Trevor Darrell
Abstract	Deep Reinforcement Learning (Deep RL) has been receiving increasingly more attention thanks to its encouraging performance on a variety of control tasks. Yet, conventional regularization techniques in training neural networks (e.g., $L_2$ regularization, dropout) have been largely ignored in RL methods, possibly because agents are typically trained and evaluated in the same environment, and because the deep RL community focuses more on high-level algorithm designs. In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks. Interestingly, we find conventional regularization techniques on the policy networks can often bring large improvement, especially on harder tasks. We also compare with the widely used entropy regularization and find $L_2$ regularization is generally better. Our findings are further shown to be robust against training hyperparameters variations. We further study regularizing different components and find that only regularizing the policy network is typically the best. We hope our study provides guidance for future practices in regularizing policy optimization algorithms.
Tasks	Continuous Control
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09191v2
PDF	https://arxiv.org/pdf/1910.09191v2.pdf
PWC	https://paperswithcode.com/paper/regularization-matters-in-policy-optimization-1
Repo	https://github.com/xuanlinli17/po-rl-regularization
Framework	tf

PAWS: Paraphrase Adversaries from Word Scrambling


Title	PAWS: Paraphrase Adversaries from Word Scrambling
Authors	Yuan Zhang, Jason Baldridge, Luheng He
Abstract	Existing paraphrase identification datasets lack sentence pairs that have high lexical overlap without being paraphrases. Models trained on such data fail to distinguish pairs like flights from New York to Florida and flights from Florida to New York. This paper introduces PAWS (Paraphrase Adversaries from Word Scrambling), a new dataset with 108,463 well-formed paraphrase and non-paraphrase pairs with high lexical overlap. Challenging pairs are generated by controlled word swapping and back translation, followed by fluency and paraphrase judgments by human raters. State-of-the-art models trained on existing datasets have dismal performance on PAWS (<40% accuracy); however, including PAWS training data for these models improves their accuracy to 85% while maintaining performance on existing tasks. In contrast, models that do not capture non-local contextual information fail even with PAWS training examples. As such, PAWS provides an effective instrument for driving further progress on models that better exploit structure, context, and pairwise comparisons.
Tasks	Paraphrase Identification
Published	2019-04-01
URL	http://arxiv.org/abs/1904.01130v1
PDF	http://arxiv.org/pdf/1904.01130v1.pdf
PWC	https://paperswithcode.com/paper/paws-paraphrase-adversaries-from-word
Repo	https://github.com/google-research-datasets/paws
Framework	none

Overcoming Catastrophic Forgetting with Unlabeled Data in the Wild


Title	Overcoming Catastrophic Forgetting with Unlabeled Data in the Wild
Authors	Kibok Lee, Kimin Lee, Jinwoo Shin, Honglak Lee
Abstract	Lifelong learning with deep neural networks is well-known to suffer from catastrophic forgetting: the performance on previous tasks drastically degrades when learning a new task. To alleviate this effect, we propose to leverage a large stream of unlabeled data easily obtainable in the wild. In particular, we design a novel class-incremental learning scheme with (a) a new distillation loss, termed global distillation, (b) a learning strategy to avoid overfitting to the most recent task, and (c) a confidence-based sampling method to effectively leverage unlabeled external data. Our experimental results on various datasets, including CIFAR and ImageNet, demonstrate the superiority of the proposed methods over prior methods, particularly when a stream of unlabeled data is accessible: our method shows up to 15.8% higher accuracy and 46.5% less forgetting compared to the state-of-the-art method. The code is available at https://github.com/kibok90/iccv2019-inc.
Tasks
Published	2019-03-29
URL	https://arxiv.org/abs/1903.12648v3
PDF	https://arxiv.org/pdf/1903.12648v3.pdf
PWC	https://paperswithcode.com/paper/incremental-learning-with-unlabeled-data-in
Repo	https://github.com/kibok90/iccv2019-inc
Framework	pytorch

Learning Reward Functions by Integrating Human Demonstrations and Preferences


Title	Learning Reward Functions by Integrating Human Demonstrations and Preferences
Authors	Malayandi Palan, Nicholas C. Landolfi, Gleb Shevchuk, Dorsa Sadigh
Abstract	Our goal is to accurately and efficiently learn reward functions for autonomous robots. Current approaches to this problem include inverse reinforcement learning (IRL), which uses expert demonstrations, and preference-based learning, which iteratively queries the user for her preferences between trajectories. In robotics however, IRL often struggles because it is difficult to get high-quality demonstrations; conversely, preference-based learning is very inefficient since it attempts to learn a continuous, high-dimensional function from binary feedback. We propose a new framework for reward learning, DemPref, that uses both demonstrations and preference queries to learn a reward function. Specifically, we (1) use the demonstrations to learn a coarse prior over the space of reward functions, to reduce the effective size of the space from which queries are generated; and (2) use the demonstrations to ground the (active) query generation process, to improve the quality of the generated queries. Our method alleviates the efficiency issues faced by standard preference-based learning methods and does not exclusively depend on (possibly low-quality) demonstrations. In numerical experiments, we find that DemPref is significantly more efficient than a standard active preference-based learning method. In a user study, we compare our method to a standard IRL method; we find that users rated the robot trained with DemPref as being more successful at learning their desired behavior, and preferred to use the DemPref system (over IRL) to train the robot.
Tasks
Published	2019-06-21
URL	https://arxiv.org/abs/1906.08928v1
PDF	https://arxiv.org/pdf/1906.08928v1.pdf
PWC	https://paperswithcode.com/paper/learning-reward-functions-by-integrating
Repo	https://github.com/malayandi/DemPrefCode
Framework	none

Learning Multi-Human Optical Flow


Title	Learning Multi-Human Optical Flow
Authors	Anurag Ranjan, David T. Hoffmann, Dimitrios Tzionas, Siyu Tang, Javier Romero, Michael J. Black
Abstract	The optical flow of humans is well known to be useful for the analysis of human action. Recent optical flow methods focus on training deep networks to approach the problem. However, the training data used by them does not cover the domain of human motion. Therefore, we develop a dataset of multi-human optical flow and train optical flow networks on this dataset. We use a 3D model of the human body and motion capture data to synthesize realistic flow fields in both single- and multi-person images. We then train optical flow networks to estimate human flow fields from pairs of images. We demonstrate that our trained networks are more accurate than a wide range of top methods on held-out test data and that they can generalize well to real image sequences. The code, trained models and the dataset are available for research.
Tasks	Motion Capture, Optical Flow Estimation
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11667v2
PDF	https://arxiv.org/pdf/1910.11667v2.pdf
PWC	https://paperswithcode.com/paper/learning-multi-human-optical-flow
Repo	https://github.com/anuragranj/humanflow2
Framework	pytorch

t-SS3: a text classifier with dynamic n-grams for early risk detection over text streams


Title	t-SS3: a text classifier with dynamic n-grams for early risk detection over text streams
Authors	Sergio G. Burdisso, Marcelo Errecalde, Manuel Montes-y-Gómez
Abstract	A recently introduced classifier, called SS3, has shown to be well suited to deal with early risk detection (ERD) problems on text streams. It obtained state-of-the-art performance on early depression and anorexia detection on Reddit in the CLEF’s eRisk open tasks. SS3 was created to naturally deal with ERD problems since: it supports incremental training and classification over text streams and it can visually explain its rationale. However, SS3 processes the input using a bag-of-word model lacking the ability to recognize important word sequences. This could negatively affect the classification performance and also reduces the descriptiveness of visual explanations. In the standard document classification field, it is very common to use word n-grams to try to overcome some of these limitations. Unfortunately, when working with text streams, using n-grams is not trivial since the system must learn and recognize which n-grams are important ``on the fly’'. This paper introduces t-SS3, a variation of SS3 which expands the model to dynamically recognize useful patterns over text streams. We evaluated our model on the eRisk 2017 and 2018 tasks on early depression and anorexia detection. Experimental results show that t-SS3 is able to improve both, existing results and the richness of visual explanations. \|
Tasks	Document Classification, Multi-Label Text Classification, Sentence Classification, Text Categorization, Text Classification
Published	2019-11-11
URL	https://arxiv.org/abs/1911.06147v1
PDF	https://arxiv.org/pdf/1911.06147v1.pdf
PWC	https://paperswithcode.com/paper/t-ss3-a-text-classifier-with-dynamic-n-grams
Repo	https://github.com/sergioburdisso/pyss3
Framework	none

Guided Image-to-Image Translation with Bi-Directional Feature Transformation


Title	Guided Image-to-Image Translation with Bi-Directional Feature Transformation
Authors	Badour AlBahar, Jia-Bin Huang
Abstract	We address the problem of guided image-to-image translation where we translate an input image into another while respecting the constraints provided by an external, user-provided guidance image. Various conditioning methods for leveraging the given guidance image have been explored, including input concatenation , feature concatenation, and conditional affine transformation of feature activations. All these conditioning mechanisms, however, are uni-directional, i.e., no information flow from the input image back to the guidance. To better utilize the constraints of the guidance image, we present a bi-directional feature transformation (bFT) scheme. We show that our bFT scheme outperforms other conditioning schemes and has comparable results to state-of-the-art methods on different tasks.
Tasks	Image-to-Image Translation, Pose Transfer
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11328v1
PDF	https://arxiv.org/pdf/1910.11328v1.pdf
PWC	https://paperswithcode.com/paper/guided-image-to-image-translation-with-bi-1
Repo	https://github.com/vt-vl-lab/Guided-pix2pix
Framework	pytorch

Hierarchical Attentional Hybrid Neural Networks for Document Classification


Title	Hierarchical Attentional Hybrid Neural Networks for Document Classification
Authors	Jader Abreu, Luis Fred, David Macêdo, Cleber Zanchettin
Abstract	Document classification is a challenging task with important applications. The deep learning approaches to the problem have gained much attention recently. Despite the progress, the proposed models do not incorporate the knowledge of the document structure in the architecture efficiently and not take into account the contexting importance of words and sentences. In this paper, we propose a new approach based on a combination of convolutional neural networks, gated recurrent units, and attention mechanisms for document classification tasks. The main contribution of this work is the use of convolution layers to extract more meaningful, generalizable and abstract features by the hierarchical representation. The proposed method in this paper improves the results of the current attention-based approaches for document classification.
Tasks	Document Classification
Published	2019-01-20
URL	https://arxiv.org/abs/1901.06610v2
PDF	https://arxiv.org/pdf/1901.06610v2.pdf
PWC	https://paperswithcode.com/paper/hierarchical-attentional-hybrid-neural
Repo	https://github.com/luisfredgs/cnn-hierarchical-network-for-document-classification
Framework	tf

Learning Distributions Generated by One-Layer ReLU Networks


Title	Learning Distributions Generated by One-Layer ReLU Networks
Authors	Shanshan Wu, Alexandros G. Dimakis, Sujay Sanghavi
Abstract	We consider the problem of estimating the parameters of a $d$-dimensional rectified Gaussian distribution from i.i.d. samples. A rectified Gaussian distribution is defined by passing a standard Gaussian distribution through a one-layer ReLU neural network. We give a simple algorithm to estimate the parameters (i.e., the weight matrix and bias vector of the ReLU neural network) up to an error $\epsilonW_F$ using $\tilde{O}(1/\epsilon^2)$ samples and $\tilde{O}(d^2/\epsilon^2)$ time (log factors are ignored for simplicity). This implies that we can estimate the distribution up to $\epsilon$ in total variation distance using $\tilde{O}(\kappa^2d^2/\epsilon^2)$ samples, where $\kappa$ is the condition number of the covariance matrix. Our only assumption is that the bias vector is non-negative. Without this non-negativity assumption, we show that estimating the bias vector within any error requires the number of samples at least exponential in the infinity norm of the bias vector. Our algorithm is based on the key observation that vector norms and pairwise angles can be estimated separately. We use a recent result on learning from truncated samples. We also prove two sample complexity lower bounds: $\Omega(1/\epsilon^2)$ samples are required to estimate the parameters up to error $\epsilon$, while $\Omega(d/\epsilon^2)$ samples are necessary to estimate the distribution up to $\epsilon$ in total variation distance. The first lower bound implies that our algorithm is optimal for parameter estimation. Finally, we show an interesting connection between learning a two-layer generative model and non-negative matrix factorization. Experimental results are provided to support our analysis.
Tasks
Published	2019-09-04
URL	https://arxiv.org/abs/1909.01812v2
PDF	https://arxiv.org/pdf/1909.01812v2.pdf
PWC	https://paperswithcode.com/paper/learning-distributions-generated-by-one-layer
Repo	https://github.com/wushanshan/densityEstimation
Framework	none

An Evalutation of Programming Language Models’ performance on Software Defect Detection


Title	An Evalutation of Programming Language Models’ performance on Software Defect Detection
Authors	Kailun Wang
Abstract	This dissertation presents an evaluation of several language models on software defect datasets. A language Model (LM) “can provide word representation and probability indication of word sequences as the core component of an NLP system.” Language models for source code are specified for tasks in the software engineering field. While some models are directly the NLP ones, others contain structural information that is uniquely owned by source code. Software defects are defects in the source code that lead to unexpected behaviours and malfunctions at all levels. This study provides an original attempt to detect these defects at three different levels (syntactical, algorithmic and general) We also provide a tool chain that researchers can use to reproduce the experiments. We have tested the different models against different datasets, and performed an analysis over the results. Our original attempt to deploy bert, the state-of-the-art model for multitasks, leveled or outscored all other models compared.
Tasks	Language Modelling
Published	2019-09-10
URL	https://arxiv.org/abs/1909.10309v1
PDF	https://arxiv.org/pdf/1909.10309v1.pdf
PWC	https://paperswithcode.com/paper/an-evalutation-of-programming-language-models
Repo	https://github.com/hiroto-takatoshi/XLM
Framework	pytorch

Enhancing temporal segmentation by nonlocal self-similarity


Title	Enhancing temporal segmentation by nonlocal self-similarity
Authors	Mariella Dimiccoli, Herwig Wendt
Abstract	Temporal segmentation of untrimmed videos and photo-streams is currently an active area of research in computer vision and image processing. This paper proposes a new approach to improve the temporal segmentation of photo-streams. The method consists in enhancing image representations by encoding long-range temporal dependencies. Our key contribution is to take advantage of the temporal stationarity assumption of photostreams for modeling each frame by its nonlocal self-similarity function. The proposed approach is put to test on the EDUB-Seg dataset, a standard benchmark for egocentric photostream temporal segmentation. Starting from seven different (CNN based) image features, the method yields consistent improvements in event segmentation quality, leading to an average increase of F-measure of 3.71% with respect to the state of the art.
Tasks
Published	2019-06-14
URL	https://arxiv.org/abs/1906.11335v1
PDF	https://arxiv.org/pdf/1906.11335v1.pdf
PWC	https://paperswithcode.com/paper/enhancing-temporal-segmentation-by-nonlocal
Repo	https://github.com/mdimiccoli/Nonlocal-self-similarity-1D
Framework	pytorch

Transformers without Tears: Improving the Normalization of Self-Attention


Title	Transformers without Tears: Improving the Normalization of Self-Attention
Authors	Toan Q. Nguyen, Julian Salazar
Abstract	We evaluate three simple, normalization-centric changes to improve Transformer training. First, we show that pre-norm residual connections (PreNorm) and smaller initializations enable warmup-free, validation-based training with large learning rates. Second, we propose $\ell_2$ normalization with a single scale parameter (ScaleNorm) for faster training and better performance. Finally, we reaffirm the effectiveness of normalizing word embeddings to a fixed length (FixNorm). On five low-resource translation pairs from TED Talks-based corpora, these changes always converge, giving an average +1.1 BLEU over state-of-the-art bilingual baselines and a new 32.8 BLEU on IWSLT’15 English-Vietnamese. We observe sharper performance curves, more consistent gradient norms, and a linear relationship between activation scaling and decoder depth. Surprisingly, in the high-resource setting (WMT’14 English-German), ScaleNorm and FixNorm remain competitive but PreNorm degrades performance.
Tasks	Machine Translation, Word Embeddings
Published	2019-10-14
URL	https://arxiv.org/abs/1910.05895v2
PDF	https://arxiv.org/pdf/1910.05895v2.pdf
PWC	https://paperswithcode.com/paper/transformers-without-tears-improving-the
Repo	https://github.com/tnq177/transformers_without_tears
Framework	pytorch

Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?


Title	Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?
Authors	Sorami Hisamoto, Matt Post, Kevin Duh
Abstract	Data privacy is an important issue for “machine learning as a service” providers. We focus on the problem of membership inference attacks: given a data sample and black-box access to a model’s API, determine whether the sample existed in the model’s training data. Our contribution is an investigation of this problem in the context of sequence-to-sequence models, which are important in applications such as machine translation and video captioning. We define the membership inference problem for sequence generation, provide an open dataset based on state-of-the-art machine translation models, and report initial results on whether these models leak private information against several kinds of membership inference attacks.
Tasks	Machine Translation, Video Captioning
Published	2019-04-11
URL	https://arxiv.org/abs/1904.05506v2
PDF	https://arxiv.org/pdf/1904.05506v2.pdf
PWC	https://paperswithcode.com/paper/membership-inference-attacks-on-sequence-to
Repo	https://github.com/sorami/Membership-Inference-Attacks-on-Sequence-to-Sequence-Models
Framework	none