Paper Group NANR 153
Understanding Data Augmentation in Neural Machine Translation: Two Perspectives towards Generalization. Composing Complex Skills by Learning Transition Policies with Proximity Reward Induction. Improving Abstractive Document Summarization with Salient Information Modeling. Courteously Yours: Inducing courteous behavior in Customer Care responses us …
Understanding Data Augmentation in Neural Machine Translation: Two Perspectives towards Generalization
Title | Understanding Data Augmentation in Neural Machine Translation: Two Perspectives towards Generalization |
Authors | Guanlin Li, Lemao Liu, Guoping Huang, Conghui Zhu, Tiejun Zhao |
Abstract | Many Data Augmentation (DA) methods have been proposed for neural machine translation. Existing works measure the superiority of DA methods in terms of their performance on a specific test set, but we find that some DA methods do not exhibit consistent improvements across translation tasks. Based on the observation, this paper makes an initial attempt to answer a fundamental question: what benefits, which are consistent across different methods and tasks, does DA in general obtain? Inspired by recent theoretic advances in deep learning, the paper understands DA from two perspectives towards the generalization ability of a model: input sensitivity and prediction margin, which are defined independent of specific test set thereby may lead to findings with relatively low variance. Extensive experiments show that relatively consistent benefits across five DA methods and four translation tasks are achieved regarding both perspectives. |
Tasks | Data Augmentation, Machine Translation |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1570/ |
https://www.aclweb.org/anthology/D19-1570 | |
PWC | https://paperswithcode.com/paper/understanding-data-augmentation-in-neural |
Repo | |
Framework | |
Composing Complex Skills by Learning Transition Policies with Proximity Reward Induction
Title | Composing Complex Skills by Learning Transition Policies with Proximity Reward Induction |
Authors | Youngwoon Lee*, Shao-Hua Sun*, Sriram Somasundaram, Edward Hu, Joseph J. Lim |
Abstract | Intelligent creatures acquire complex skills by exploiting previously learned skills and learning to transition between them. To empower machines with this ability, we propose transition policies which effectively connect primitive skills to perform sequential tasks without handcrafted rewards. To effectively train our transition policies, we introduce proximity predictors which induce rewards gauging proximity to suitable initial states for the next skill. The proposed method is evaluated on a diverse set of experiments for continuous control in both bi-pedal locomotion and robotic arm manipulation tasks in MuJoCo. We demonstrate that transition policies enable us to effectively learn complex tasks and the induced proximity reward computed using the initiation predictor improves training efficiency. Videos of policies learned by our algorithm and baselines can be found at https://sites.google.com/view/transitions-iclr2019 . |
Tasks | Continuous Control |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=rygrBhC5tQ |
https://openreview.net/pdf?id=rygrBhC5tQ | |
PWC | https://paperswithcode.com/paper/composing-complex-skills-by-learning |
Repo | |
Framework | |
Improving Abstractive Document Summarization with Salient Information Modeling
Title | Improving Abstractive Document Summarization with Salient Information Modeling |
Authors | Yongjian You, Weijia Jia, Tianyi Liu, Wenmian Yang |
Abstract | Comprehensive document encoding and salient information selection are two major difficulties for generating summaries with adequate salient information. To tackle the above difficulties, we propose a Transformer-based encoder-decoder framework with two novel extensions for abstractive document summarization. Specifically, (1) to encode the documents comprehensively, we design a focus-attention mechanism and incorporate it into the encoder. This mechanism models a Gaussian focal bias on attention scores to enhance the perception of local context, which contributes to producing salient and informative summaries. (2) To distinguish salient information precisely, we design an independent saliency-selection network which manages the information flow from encoder to decoder. This network effectively reduces the influences of secondary information on the generated summaries. Experimental results on the popular CNN/Daily Mail benchmark demonstrate that our model outperforms other state-of-the-art baselines on the ROUGE metrics. |
Tasks | Document Summarization |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1205/ |
https://www.aclweb.org/anthology/P19-1205 | |
PWC | https://paperswithcode.com/paper/improving-abstractive-document-summarization |
Repo | |
Framework | |
Courteously Yours: Inducing courteous behavior in Customer Care responses using Reinforced Pointer Generator Network
Title | Courteously Yours: Inducing courteous behavior in Customer Care responses using Reinforced Pointer Generator Network |
Authors | Hitesh Golchha, Mauajama Firdaus, Asif Ekbal, Pushpak Bhattacharyya |
Abstract | In this paper, we propose an effective deep learning framework for inducing courteous behavior in customer care responses. The interaction between a customer and the customer care representative contributes substantially to the overall customer experience. Thus it is imperative for customer care agents and chatbots engaging with humans to be personal, cordial and emphatic to ensure customer satisfaction and retention. Our system aims at automatically transforming neutral customer care responses into courteous replies. Along with stylistic transfer (of courtesy), our system ensures that responses are coherent with the conversation history, and generates courteous expressions consistent with the emotional state of the customer. Our technique is based on a reinforced pointer-generator model for the sequence to sequence task. The model is also conditioned on a hierarchically encoded and emotionally aware conversational context. We use real interactions on Twitter between customer care professionals and aggrieved customers to create a large conversational dataset having both forms of agent responses: {}generic{'} and { }courteous{'}. We perform quantitative and qualitative analyses on established and task-specific metrics, both automatic and human evaluation based. Our evaluation shows that the proposed models can generate emotionally-appropriate courteous expressions while preserving the content. Experimental results also prove that our proposed approach performs better than the baseline models. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1091/ |
https://www.aclweb.org/anthology/N19-1091 | |
PWC | https://paperswithcode.com/paper/courteously-yours-inducing-courteous-behavior |
Repo | |
Framework | |
Creative Flow+ Dataset
Title | Creative Flow+ Dataset |
Authors | Maria Shugrina, Ziheng Liang, Amlan Kar, Jiaman Li, Angad Singh, Karan Singh, Sanja Fidler |
Abstract | We present the Creative Flow+ Dataset, the first diverse multi-style artistic video dataset richly labeled with per-pixel optical flow, occlusions, correspondences, segmentation labels, normals, and depth. Our dataset includes 3000 animated sequences rendered using styles randomly selected from 40 textured line styles and 38 shading styles, spanning the range between flat cartoon fill and wildly sketchy shading. Our dataset includes 124K+ train set frames and 10K test set frames rendered at 1500x1500 resolution, far surpassing the largest available optical flow datasets in size. While modern techniques for tasks such as optical flow estimation achieve impressive performance on realistic images and video, today there is no way to gauge their performance on non-photorealistic images. Creative Flow+ poses a new challenge to generalize real-world Computer Vision to messy stylized content. We show that learning-based optical flow methods fail to generalize to this data and struggle to compete with classical approaches, and invite new research in this area. Our dataset and a new optical flow benchmark will be publicly available at: www.cs.toronto.edu/creativeflow/. We further release the complete dataset creation pipeline, allowing the community to generate and stylize their own data on demand. |
Tasks | Optical Flow Estimation |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Shugrina_Creative_Flow_Dataset_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Shugrina_Creative_Flow_Dataset_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/creative-flow-dataset |
Repo | |
Framework | |
A Structural Probe for Finding Syntax in Word Representations
Title | A Structural Probe for Finding Syntax in Word Representations |
Authors | John Hewitt, Christopher D. Manning |
Abstract | Recent work has improved our ability to detect linguistic knowledge in word representations. However, current methods for detecting syntactic knowledge do not test whether syntax trees are represented in their entirety. In this work, we propose a structural probe, which evaluates whether syntax trees are embedded in a linear transformation of a neural network{'}s word representation space. The probe identifies a linear transformation under which squared L2 distance encodes the distance between words in the parse tree, and one in which squared L2 norm encodes depth in the parse tree. Using our probe, we show that such transformations exist for both ELMo and BERT but not in baselines, providing evidence that entire syntax trees are embedded implicitly in deep models{'} vector geometry. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1419/ |
https://www.aclweb.org/anthology/N19-1419 | |
PWC | https://paperswithcode.com/paper/a-structural-probe-for-finding-syntax-in-word |
Repo | |
Framework | |
Unsupervised Dialogue Spectrum Generation for Log Dialogue Ranking
Title | Unsupervised Dialogue Spectrum Generation for Log Dialogue Ranking |
Authors | Xinnuo Xu, Yizhe Zhang, Lars Liden, Sungjin Lee |
Abstract | Although the data-driven approaches of some recent bot building platforms make it possible for a wide range of users to easily create dialogue systems, those platforms don{'}t offer tools for quickly identifying which log dialogues contain problems. This is important since corrections to log dialogues provide a means to improve performance after deployment. A log dialogue ranker, which ranks problematic dialogues higher, is an essential tool due to the sheer volume of log dialogues that could be generated. However, training a ranker typically requires labelling a substantial amount of data, which is not feasible for most users. In this paper, we present a novel unsupervised approach for dialogue ranking using GANs and release a corpus of labelled dialogues for evaluation and comparison with supervised methods. The evaluation result shows that our method compares favorably to supervised methods without any labelled data. |
Tasks | |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/W19-5919/ |
https://www.aclweb.org/anthology/W19-5919 | |
PWC | https://paperswithcode.com/paper/unsupervised-dialogue-spectrum-generation-for |
Repo | |
Framework | |
AWSD: Adaptive Weighted Spatiotemporal Distillation for Video Representation
Title | AWSD: Adaptive Weighted Spatiotemporal Distillation for Video Representation |
Authors | Mohammad Tavakolian, Hamed R. Tavakoli, Abdenour Hadid |
Abstract | We propose an Adaptive Weighted Spatiotemporal Distillation (AWSD) technique for video representation by encoding the appearance and dynamics of the videos into a single RGB image map. This is obtained by adaptively dividing the videos into small segments and comparing two consecutive segments. This allows using pre-trained models on still images for video classification while successfully capturing the spatiotemporal variations in the videos. The adaptive segment selection enables effective encoding of the essential discriminative information of untrimmed videos. Based on Gaussian Scale Mixture, we compute the weights by extracting the mutual information between two consecutive segments. Unlike pooling-based methods, our AWSD gives more importance to the frames that characterize actions or events thanks to its adaptive segment length selection. We conducted extensive experimental analysis to evaluate the effectiveness of our proposed method and compared our results against those of recent state-of-the-art methods on four benchmark datatsets, including UCF101, HMDB51, ActivityNet v1.3, and Maryland. The obtained results on these benchmark datatsets showed that our method significantly outperforms earlier works and sets the new state-of-the-art performance in video classification. Code is available at the project webpage: https://mohammadt68.github.io/AWSD/ |
Tasks | Video Classification |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Tavakolian_AWSD_Adaptive_Weighted_Spatiotemporal_Distillation_for_Video_Representation_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Tavakolian_AWSD_Adaptive_Weighted_Spatiotemporal_Distillation_for_Video_Representation_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/awsd-adaptive-weighted-spatiotemporal |
Repo | |
Framework | |
Towards Automatic Variant Analysis of Ancient Devotional Texts
Title | Towards Automatic Variant Analysis of Ancient Devotional Texts |
Authors | Amir Hazem, B{'e}atrice Daille, Dominique Stutzmann, Jacob Currie, Christine Jacquin |
Abstract | We address in this paper the issue of text reuse in liturgical manuscripts of the middle ages. More specifically, we study variant readings of the Obsecro Te prayer, part of the devotional Books of Hours often used by Christians as guidance for their daily prayers. We aim at automatically extracting and categorising pairs of words and expressions that exhibit variant relations. For this purpose, we adopt a linguistic classification that allows to better characterize the variants than edit operations. Then, we study the evolution of Obsecro Te texts from a temporal and geographical axis. Finally, we contrast several unsupervised state-of-the-art approaches for the automatic extraction of Obsecro Te variants. Based on the manual observation of 772 Obsecro Te copies which show more than 21,000 variants, we show that the proposed methodology is helpful for an automatic study of variants and may serve as basis to analyze and to depict useful information from devotional texts. |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4730/ |
https://www.aclweb.org/anthology/W19-4730 | |
PWC | https://paperswithcode.com/paper/towards-automatic-variant-analysis-of-ancient |
Repo | |
Framework | |
nlpUP at SemEval-2019 Task 6: A Deep Neural Language Model for Offensive Language Detection
Title | nlpUP at SemEval-2019 Task 6: A Deep Neural Language Model for Offensive Language Detection |
Authors | Jelena Mitrovi{'c}, Bastian Birkeneder, Michael Granitzer |
Abstract | This paper presents our submission for the SemEval shared task 6, sub-task A on the identification of offensive language. Our proposed model, C-BiGRU, combines a Convolutional Neural Network (CNN) with a bidirectional Recurrent Neural Network (RNN). We utilize word2vec to capture the semantic similarities between words. This composition allows us to extract long term dependencies in tweets and distinguish between offensive and non-offensive tweets. In addition, we evaluate our approach on a different dataset and show that our model is capable of detecting online aggressiveness in both English and German tweets. Our model achieved a macro F1-score of 79.40{%} on the SemEval dataset. |
Tasks | Language Modelling |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2127/ |
https://www.aclweb.org/anthology/S19-2127 | |
PWC | https://paperswithcode.com/paper/nlpup-at-semeval-2019-task-6-a-deep-neural |
Repo | |
Framework | |
Dependency Parsing as Sequence Labeling with Head-Based Encoding and Multi-Task Learning
Title | Dependency Parsing as Sequence Labeling with Head-Based Encoding and Multi-Task Learning |
Authors | Oph{'e}lie Lacroix |
Abstract | |
Tasks | Dependency Parsing, Multi-Task Learning |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-7716/ |
https://www.aclweb.org/anthology/W19-7716 | |
PWC | https://paperswithcode.com/paper/dependency-parsing-as-sequence-labeling-with |
Repo | |
Framework | |
Adaptive Convolution for Multi-Relational Learning
Title | Adaptive Convolution for Multi-Relational Learning |
Authors | Xiaotian Jiang, Quan Wang, Bin Wang |
Abstract | We consider the problem of learning distributed representations for entities and relations of multi-relational data so as to predict missing links therein. Convolutional neural networks have recently shown their superiority for this problem, bringing increased model expressiveness while remaining parameter efficient. Despite the success, previous convolution designs fail to model full interactions between input entities and relations, which potentially limits the performance of link prediction. In this work we introduce ConvR, an adaptive convolutional network designed to maximize entity-relation interactions in a convolutional fashion. ConvR adaptively constructs convolution filters from relation representations, and applies these filters across entity representations to generate convolutional features. As such, ConvR enables rich interactions between entity and relation representations at diverse regions, and all the convolutional features generated will be able to capture such interactions. We evaluate ConvR on multiple benchmark datasets. Experimental results show that: (1) ConvR performs substantially better than competitive baselines in almost all the metrics and on all the datasets; (2) Compared with state-of-the-art convolutional models, ConvR is not only more effective but also more efficient. It offers a 7{%} increase in MRR and a 6{%} increase in Hits@10, while saving 12{%} in parameter storage. |
Tasks | Link Prediction, Relational Reasoning |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1103/ |
https://www.aclweb.org/anthology/N19-1103 | |
PWC | https://paperswithcode.com/paper/adaptive-convolution-for-multi-relational |
Repo | |
Framework | |
Can You Unpack That? Learning to Rewrite Questions-in-Context
Title | Can You Unpack That? Learning to Rewrite Questions-in-Context |
Authors | Ahmed Elgohary, Denis Peskov, Jordan Boyd-Graber |
Abstract | Question answering is an AI-complete problem, but existing datasets lack key elements of language understanding such as coreference and ellipsis resolution. We consider sequential question answering: multiple questions are asked one-by-one in a conversation between a questioner and an answerer. Answering these questions is only possible through understanding the conversation history. We introduce the task of question-in-context rewriting: given the context of a conversation{'}s history, rewrite a context-dependent into a self-contained question with the same answer. We construct, CANARD, a dataset of 40,527 questions based on QuAC (Choi et al., 2018) and train Seq2Seq models for incorporating context into standalone questions. |
Tasks | Question Answering |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1605/ |
https://www.aclweb.org/anthology/D19-1605 | |
PWC | https://paperswithcode.com/paper/can-you-unpack-that-learning-to-rewrite |
Repo | |
Framework | |
Data-efficient Neural Text Compression with Interactive Learning
Title | Data-efficient Neural Text Compression with Interactive Learning |
Authors | Avinesh P.V.S, Christian M. Meyer |
Abstract | Neural sequence-to-sequence models have been successfully applied to text compression. However, these models were trained on huge automatically induced parallel corpora, which are only available for a few domains and tasks. In this paper, we propose a novel interactive setup to neural text compression that enables transferring a model to new domains and compression tasks with minimal human supervision. This is achieved by employing active learning, which intelligently samples from a large pool of unlabeled data. Using this setup, we can successfully adapt a model trained on small data of 40k samples for a headline generation task to a general text compression dataset at an acceptable compression quality with just 500 sampled instances annotated by a human. |
Tasks | Active Learning |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1262/ |
https://www.aclweb.org/anthology/N19-1262 | |
PWC | https://paperswithcode.com/paper/data-efficient-neural-text-compression-with |
Repo | |
Framework | |
Large-scale optimal transport map estimation using projection pursuit
Title | Large-scale optimal transport map estimation using projection pursuit |
Authors | Cheng Meng, Yuan Ke, Jingyi Zhang, Mengrui Zhang, Wenxuan Zhong, Ping Ma |
Abstract | This paper studies the estimation of large-scale optimal transport maps (OTM), which is a well known challenging problem owing to the curse of dimensionality. Existing literature approximates the large-scale OTM by a series of one-dimensional OTM problems through iterative random projection. Such methods, however, suffer from slow or none convergence in practice due to the nature of randomly selected projection directions. Instead, we propose an estimation method of large-scale OTM by combining the idea of projection pursuit regression and sufficient dimension reduction. The proposed method, named projection pursuit Monge map (PPMM), adaptively selects the most informative'' projection direction in each iteration. We theoretically show the proposed dimension reduction method can consistently estimate the most informative’’ projection direction in each iteration. Furthermore, the PPMM algorithm weakly convergences to the target large-scale OTM in a reasonable number of steps. Empirically, PPMM is computationally easy and converges fast. We assess its finite sample performance through the applications of Wasserstein distance estimation and generative models. |
Tasks | Dimensionality Reduction |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9023-large-scale-optimal-transport-map-estimation-using-projection-pursuit |
http://papers.nips.cc/paper/9023-large-scale-optimal-transport-map-estimation-using-projection-pursuit.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-optimal-transport-map-estimation |
Repo | |
Framework | |