January 25, 2020

2415 words 12 mins read

Paper Group NANR 55

Paper Group NANR 55

Benefits of Data Augmentation for NMT-based Text Normalization of User-Generated Content. Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications. VTNFP: An Image-Based Virtual Try-On Network With Body and Clothing Feature Preservation. Sign Clustering and Topic Extraction in Proto-Elamite. BioReddit: W …

Benefits of Data Augmentation for NMT-based Text Normalization of User-Generated Content

Title Benefits of Data Augmentation for NMT-based Text Normalization of User-Generated Content
Authors Claudia Matos Veliz, Orphee De Clercq, Veronique Hoste
Abstract One of the most persistent characteristics of written user-generated content (UGC) is the use of non-standard words. This characteristic contributes to an increased difficulty to automatically process and analyze UGC. Text normalization is the task of transforming lexical variants to their canonical forms and is often used as a pre-processing step for conventional NLP tasks in order to overcome the performance drop that NLP systems experience when applied to UGC. In this work, we follow a Neural Machine Translation approach to text normalization. To train such an encoder-decoder model, large parallel training corpora of sentence pairs are required. However, obtaining large data sets with UGC and their normalized version is not trivial, especially for languages other than English. In this paper, we explore how to overcome this data bottleneck for Dutch, a low-resource language. We start off with a small publicly available parallel Dutch data set comprising three UGC genres and compare two different approaches. The first is to manually normalize and add training data, a money and time-consuming task. The second approach is a set of data augmentation techniques which increase data size by converting existing resources into synthesized non-standard forms. Our results reveal that, while the different approaches yield similar results regarding the normalization issues in the test set, they also introduce a large amount of over-normalizations.
Tasks Data Augmentation, Machine Translation
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5536/
PDF https://www.aclweb.org/anthology/D19-5536
PWC https://paperswithcode.com/paper/benefits-of-data-augmentation-for-nmt-based
Repo
Framework

Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

Title Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications
Authors
Abstract
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4400/
PDF https://www.aclweb.org/anthology/W19-4400
PWC https://paperswithcode.com/paper/proceedings-of-the-fourteenth-workshop-on
Repo
Framework

VTNFP: An Image-Based Virtual Try-On Network With Body and Clothing Feature Preservation

Title VTNFP: An Image-Based Virtual Try-On Network With Body and Clothing Feature Preservation
Authors Ruiyun Yu, Xiaoqi Wang, Xiaohui Xie
Abstract Image-based virtual try-on systems with the goal of transferring a desired clothing item onto the corresponding region of a person have made great strides recently, but challenges remain in generating realistic looking images that preserve both body and clothing details. Here we present a new virtual try-on network, called VTNFP, to synthesize photo-realistic images given the images of a clothed person and a target clothing item. In order to better preserve clothing and body features, VTNFP follows a three-stage design strategy. First, it transforms the target clothing into a warped form compatible with the pose of the given person. Next, it predicts a body segmentation map of the person wearing the target clothing, delineating body parts as well as clothing regions. Finally, the warped clothing, body segmentation map and given person image are fused together for fine-scale image synthesis. A key innovation of VTNFP is the body segmentation map prediction module, which provides critical information to guide image synthesis in regions where body parts and clothing intersects, and is very beneficial for preventing blurry pictures and preserving clothing and body part details. Experiments on a fashion dataset demonstrate that VTNFP generates substantially better results than state-of-the-art methods.
Tasks Image Generation
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Yu_VTNFP_An_Image-Based_Virtual_Try-On_Network_With_Body_and_Clothing_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Yu_VTNFP_An_Image-Based_Virtual_Try-On_Network_With_Body_and_Clothing_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/vtnfp-an-image-based-virtual-try-on-network
Repo
Framework

Sign Clustering and Topic Extraction in Proto-Elamite

Title Sign Clustering and Topic Extraction in Proto-Elamite
Authors Logan Born, Kate Kelley, Nishant Kambhatla, Carolyn Chen, Anoop Sarkar
Abstract We describe a first attempt at using techniques from computational linguistics to analyze the undeciphered proto-Elamite script. Using hierarchical clustering, n-gram frequencies, and LDA topic models, we both replicate results obtained by manual decipherment and reveal previously-unobserved relationships between signs. This demonstrates the utility of these techniques as an aid to manual decipherment.
Tasks Topic Models
Published 2019-06-01
URL https://www.aclweb.org/anthology/W19-2516/
PDF https://www.aclweb.org/anthology/W19-2516
PWC https://paperswithcode.com/paper/sign-clustering-and-topic-extraction-in-proto
Repo
Framework

BioReddit: Word Embeddings for User-Generated Biomedical NLP

Title BioReddit: Word Embeddings for User-Generated Biomedical NLP
Authors Marco Basaldella, Nigel Collier
Abstract Word embeddings, in their different shapes and iterations, have changed the natural language processing research landscape in the last years. The biomedical text processing field is no stranger to this revolution; however, scholars in the field largely trained their embeddings on scientific documents only, even when working on user-generated data. In this paper we show how training embeddings from a corpus collected from user-generated text from medical forums heavily influences the performance on downstream tasks, outperforming embeddings trained both on general purpose data or on scientific papers when applied on user-generated content.
Tasks Word Embeddings
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-6205/
PDF https://www.aclweb.org/anthology/D19-6205
PWC https://paperswithcode.com/paper/bioreddit-word-embeddings-for-user-generated
Repo
Framework

Unsupervised Aspect-Based Multi-Document Abstractive Summarization

Title Unsupervised Aspect-Based Multi-Document Abstractive Summarization
Authors Maximin Coavoux, Hady Elsahar, Matthias Gall{'e}
Abstract User-generated reviews of products or services provide valuable information to customers. However, it is often impossible to read each of the potentially thousands of reviews: it would therefore save valuable time to provide short summaries of their contents. We address opinion summarization, a multi-document summarization task, with an unsupervised abstractive summarization neural system. Our system is based on (i) a language model that is meant to encode reviews to a vector space, and to generate fluent sentences from the same vector space (ii) a clustering step that groups together reviews about the same aspects and allows the system to generate summary sentences focused on these aspects. Our experiments on the Oposum dataset empirically show the importance of the clustering step.
Tasks Abstractive Text Summarization, Document Summarization, Language Modelling, Multi-Document Summarization
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5405/
PDF https://www.aclweb.org/anthology/D19-5405
PWC https://paperswithcode.com/paper/unsupervised-aspect-based-multi-document
Repo
Framework

Enhancing the Measurement of Social Effects by Capturing Morality

Title Enhancing the Measurement of Social Effects by Capturing Morality
Authors Rezvaneh Rezapour, Saumil H. Shah, Jana Diesner
Abstract We investigate the relationship between basic principles of human morality and the expression of opinions in user-generated text data. We assume that people{'}s backgrounds, culture, and values are associated with their perceptions and expressions of everyday topics, and that people{'}s language use reflects these perceptions. While personal values and social effects are abstract and complex concepts, they have practical implications and are relevant for a wide range of NLP applications. To extract human values (in this paper, morality) and measure social effects (morality and stance), we empirically evaluate the usage of a morality lexicon that we expanded via a quality controlled, human in the loop process. As a result, we enhanced the Moral Foundations Dictionary in size (from 324 to 4,636 syntactically disambiguated entries) and scope. We used both lexica for feature-based and deep learning classification (SVM, RF, and LSTM) to test their usefulness for measuring social effects. We find that the enhancement of the original lexicon led to measurable improvements in prediction accuracy for the selected NLP tasks.
Tasks
Published 2019-06-01
URL https://www.aclweb.org/anthology/W19-1305/
PDF https://www.aclweb.org/anthology/W19-1305
PWC https://paperswithcode.com/paper/enhancing-the-measurement-of-social-effects
Repo
Framework

Multilingual Grammar Induction with Continuous Language Identification

Title Multilingual Grammar Induction with Continuous Language Identification
Authors Wenjuan Han, Ge Wang, Yong Jiang, Kewei Tu
Abstract The key to multilingual grammar induction is to couple grammar parameters of different languages together by exploiting the similarity between languages. Previous work relies on linguistic phylogenetic knowledge to specify similarity between languages. In this work, we propose a novel universal grammar induction approach that represents language identities with continuous vectors and employs a neural network to predict grammar parameters based on the representation. Without any prior linguistic phylogenetic knowledge, we automatically capture similarity between languages with the vector representations and softly tie the grammar parameters of different languages. In our experiments, we apply our approach to 15 languages across 8 language families and subfamilies in the Universal Dependency Treebank dataset, and we observe substantial performance gain on average over monolingual and multilingual baselines.
Tasks Language Identification
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-1576/
PDF https://www.aclweb.org/anthology/D19-1576
PWC https://paperswithcode.com/paper/multilingual-grammar-induction-with
Repo
Framework

Using Human Attention to Extract Keyphrase from Microblog Post

Title Using Human Attention to Extract Keyphrase from Microblog Post
Authors Yingyi Zhang, Chengzhi Zhang
Abstract This paper studies automatic keyphrase extraction on social media. Previous works have achieved promising results on it, but they neglect human reading behavior during keyphrase annotating. The human attention is a crucial element of human reading behavior. It reveals the relevance of words to the main topics of the target text. Thus, this paper aims to integrate human attention into keyphrase extraction models. First, human attention is represented by the reading duration estimated from eye-tracking corpus. Then, we merge human attention with neural network models by an attention mechanism. In addition, we also integrate human attention into unsupervised models. To the best of our knowledge, we are the first to utilize human attention on keyphrase extraction tasks. The experimental results show that our models have significant improvements on two Twitter datasets.
Tasks Eye Tracking
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1588/
PDF https://www.aclweb.org/anthology/P19-1588
PWC https://paperswithcode.com/paper/using-human-attention-to-extract-keyphrase
Repo
Framework

The Lexical Gap: An Improved Measure of Automated Image Description Quality

Title The Lexical Gap: An Improved Measure of Automated Image Description Quality
Authors Austin Kershaw, Miroslaw Bober
Abstract The challenge of automatically describing images and videos has stimulated much research in Computer Vision and Natural Language Processing. In order to test the semantic abilities of new algorithms, we need reliable and objective ways of measuring progress. We show that standard evaluation measures do not take into account the semantic richness of a description, and give the impression that sparse machine descriptions outperform rich human descriptions. We introduce and test a new measure of semantic ability based on relative lexical diversity. We show how our measure can work alongside existing measures to achieve state of the art correlation with human judgement of quality. We also introduce a new dataset: Rich-Sparse Descriptions, which provides 2K human and machine descriptions to stimulate interest into the semantic evaluation of machine descriptions.
Tasks
Published 2019-05-01
URL https://www.aclweb.org/anthology/W19-0603/
PDF https://www.aclweb.org/anthology/W19-0603
PWC https://paperswithcode.com/paper/the-lexical-gap-an-improved-measure-of
Repo
Framework

Discourse Representation Structure Parsing with Recurrent Neural Networks and the Transformer Model

Title Discourse Representation Structure Parsing with Recurrent Neural Networks and the Transformer Model
Authors Jiangming Liu, Shay B. Cohen, Mirella Lapata
Abstract We describe the systems we developed for Discourse Representation Structure (DRS) parsing as part of the IWCS-2019 Shared Task of DRS Parsing.1 Our systems are based on sequence-to-sequence modeling. To implement our model, we use the open-source neural machine translation system implemented in PyTorch, OpenNMT-py. We experimented with a variety of encoder-decoder models based on recurrent neural networks and the Transformer model. We conduct experiments on the standard benchmark of the Parallel Meaning Bank (PMB 2.2). Our best system achieves a score of 84.8{%} F1 in the DRS parsing shared task.
Tasks Machine Translation
Published 2019-05-01
URL https://www.aclweb.org/anthology/W19-1203/
PDF https://www.aclweb.org/anthology/W19-1203
PWC https://paperswithcode.com/paper/discourse-representation-structure-parsing-1
Repo
Framework

Learning to See Moving Objects in the Dark

Title Learning to See Moving Objects in the Dark
Authors Haiyang Jiang, Yinqiang Zheng
Abstract Video surveillance systems have wide range of utilities, yet easily suffer from great quality degeneration under dim light circumstances. Industrial solutions mainly use extra near-infrared illuminations, even though it doesn’t preserve color and texture information. A variety of researches enhanced low-light videos shot by visible light cameras, while they either relied on task specific preconditions or trained with synthetic datasets. We propose a novel optical system to capture bright and dark videos of the exact same scenes, generating training and groud truth pairs for authentic low-light video dataset. A fully convolutional network with 3D and 2D miscellaneous operations is utilized to learn an enhancement mapping with proper spatial-temporal transformation from raw camera sensor data to bright RGB videos. Experiments show promising results by our method, and it outperforms state-of-the-art low-light image/video enhancement algorithms.
Tasks
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Jiang_Learning_to_See_Moving_Objects_in_the_Dark_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Jiang_Learning_to_See_Moving_Objects_in_the_Dark_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/learning-to-see-moving-objects-in-the-dark
Repo
Framework

Isolating the Effects of Modeling Recursive Structures: A Case Study in Pronunciation Prediction of Chinese Characters

Title Isolating the Effects of Modeling Recursive Structures: A Case Study in Pronunciation Prediction of Chinese Characters
Authors Minh Nguyen, Gia H Ngo, Nancy Chen
Abstract Finding that explicitly modeling structures leads to better generalization, we consider the task of predicting Cantonese pronunciations of logographs (Chinese characters) using logographs{'} recursive structures. This task is a suitable case study for two reasons. First, logographs{'} pronunciations depend on structures (i.e. the hierarchies of sub-units in logographs) Second, the quality of logographic structures is consistent since the structures are constructed automatically using a set of rules. Thus, this task is less affected by confounds such as varying quality between annotators. Empirical results show that modeling structures explicitly using treeLSTM outperforms LSTM baseline, reducing prediction error by 6.0{%} relative.
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/papers/W/W19/W19-3631/
PDF https://www.aclweb.org/anthology/W19-3631
PWC https://paperswithcode.com/paper/isolating-the-effects-of-modeling-recursive
Repo
Framework

Maximum a Posteriori on a Submanifold: a General Image Restoration Method with GAN

Title Maximum a Posteriori on a Submanifold: a General Image Restoration Method with GAN
Authors Fangzhou Luo, Xiaolin Wu
Abstract We propose a general method for various image restoration problems, such as denoising, deblurring, super-resolution and inpainting. The problem is formulated as a constrained optimization problem. Its objective is to maximize a posteriori probability of latent variables, and its constraint is that the image generated by these latent variables must be the same as the degraded image. We use a Generative Adversarial Network (GAN) as our density estimation model. Convincing results are obtained on MNIST dataset.
Tasks Deblurring, Denoising, Density Estimation, Image Restoration, Super-Resolution
Published 2019-05-01
URL https://openreview.net/forum?id=S1xLZ2R5KQ
PDF https://openreview.net/pdf?id=S1xLZ2R5KQ
PWC https://paperswithcode.com/paper/maximum-a-posteriori-on-a-submanifold-a
Repo
Framework

A Soft Label Strategy for Target-Level Sentiment Classification

Title A Soft Label Strategy for Target-Level Sentiment Classification
Authors Da Yin, Xiao Liu, Xiuyu Wu, Baobao Chang
Abstract In this paper, we propose a soft label approach to target-level sentiment classification task, in which a history-based soft labeling model is proposed to measure the possibility of a context word as an opinion word. We also apply a convolution layer to extract local active features, and introduce positional weights to take relative distance information into consideration. In addition, we obtain more informative target representation by training with context tokens together to make deeper interaction between target and context tokens. We conduct experiments on SemEval 2014 datasets and the experimental results show that our approach significantly outperforms previous models and gives state-of-the-art results on these datasets.
Tasks Sentiment Analysis
Published 2019-06-01
URL https://www.aclweb.org/anthology/W19-1302/
PDF https://www.aclweb.org/anthology/W19-1302
PWC https://paperswithcode.com/paper/a-soft-label-strategy-for-target-level
Repo
Framework
comments powered by Disqus