January 25, 2020

2455 words 12 mins read

Paper Group NANR 14

The relation between dependency distance and frequency. Learning-Based Sampling for Natural Image Matting. A Multi-modal one-class generative adversarial network for anomaly detection in manufacturing. Syntax is clearer on the other side - Using parallel corpus to extract monolingual data. Neural Text Simplification in Low-Resource Conditions Using …

The relation between dependency distance and frequency


Title	The relation between dependency distance and frequency
Authors	Xinying Chen, Kim Gerdes
Abstract
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-7909/
PDF	https://www.aclweb.org/anthology/W19-7909
PWC	https://paperswithcode.com/paper/the-relation-between-dependency-distance-and
Repo
Framework

Learning-Based Sampling for Natural Image Matting


Title	Learning-Based Sampling for Natural Image Matting
Authors	Jingwei Tang, Yagiz Aksoy, Cengiz Oztireli, Markus Gross, Tunc Ozan Aydin
Abstract	The goal of natural image matting is the estimation of opacities of a user-defined foreground object that is essential in creating realistic composite imagery. Natural matting is a challenging process due to the high number of unknowns in the mathematical modeling of the problem, namely the opacities as well as the foreground and background layer colors, while the original image serves as the single observation. In this paper, we propose the estimation of the layer colors through the use of deep neural networks prior to the opacity estimation. The layer color estimation is a better match for the capabilities of neural networks, and the availability of these colors substantially increase the performance of opacity estimation due to the reduced number of unknowns in the compositing equation. A prominent approach to matting in parallel to ours is called sampling-based matting, which involves gathering color samples from known-opacity regions to predict the layer colors. Our approach outperforms not only the previous hand-crafted sampling algorithms, but also current data-driven methods. We hence classify our method as a hybrid sampling- and learning-based approach to matting, and demonstrate the effectiveness of our approach through detailed ablation studies using alternative network architectures.
Tasks	Image Matting
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Tang_Learning-Based_Sampling_for_Natural_Image_Matting_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Tang_Learning-Based_Sampling_for_Natural_Image_Matting_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/learning-based-sampling-for-natural-image
Repo
Framework


Title	A Multi-modal one-class generative adversarial network for anomaly detection in manufacturing
Authors	Shuhui Qu, Janghwan Lee, Wei Xiong, Wonhyouk Jang, Jie Wang
Abstract	One class anomaly detection on high-dimensional data is one of the critical issue in both fundamental machine learning research area and manufacturing applica- tions. A good anomaly detection should accurately discriminate anomalies from normal data. Although most previous anomaly detection methods achieve good performances, they do not perform well on high-dimensional imbalanced data- set 1) with a limited amount of data; 2) multi-modal distribution; 3) few anomaly data. In this paper, we develop a multi-modal one-class generative adversarial net- work based detector (MMOC-GAN) to distinguish anomalies from normal data (products). Apart from a domain-specific feature extractor, our model leverage a generative adversarial network(GAN). The generator takes in a modified noise vector using a pseudo latent prior and generate samples at the low-density area of the given normal data to simulate the anomalies. The discriminator then is trained to distinguish the generate samples from the normal samples. Since the generated samples simulate the low density area for each modal, the discriminator could directly detect anomalies from normal data. Experiments demonstrate that our model outperforms the state-of-the-art one-class classification models and other anomaly detection methods on both normal data and anomalies accuracy, as well as the F1 score. Also, the generated samples can fully capture the low density area of different types of products.
Tasks	Anomaly Detection
Published	2019-05-01
URL	https://openreview.net/forum?id=HJl1ujCct7
PDF	https://openreview.net/pdf?id=HJl1ujCct7
PWC	https://paperswithcode.com/paper/a-multi-modal-one-class-generative
Repo
Framework

Syntax is clearer on the other side - Using parallel corpus to extract monolingual data


Title	Syntax is clearer on the other side - Using parallel corpus to extract monolingual data
Authors	Andrea D{"o}m{"o}t{"o}r
Abstract
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-7813/
PDF	https://www.aclweb.org/anthology/W19-7813
PWC	https://paperswithcode.com/paper/syntax-is-clearer-on-the-other-side-using
Repo
Framework

Neural Text Simplification in Low-Resource Conditions Using Weak Supervision


Title	Neural Text Simplification in Low-Resource Conditions Using Weak Supervision
Authors	Alessio Palmero Aprosio, Sara Tonelli, Marco Turchi, Matteo Negri, Mattia A. Di Gangi
Abstract	Neural text simplification has gained increasing attention in the NLP community thanks to recent advancements in deep sequence-to-sequence learning. Most recent efforts with such a data-demanding paradigm have dealt with the English language, for which sizeable training datasets are currently available to deploy competitive models. Similar improvements on less resource-rich languages are conditioned either to intensive manual work to create training data, or to the design of effective automatic generation techniques to bypass the data acquisition bottleneck. Inspired by the machine translation field, in which synthetic parallel pairs generated from monolingual data yield significant improvements to neural models, in this paper we exploit large amounts of heterogeneous data to automatically select simple sentences, which are then used to create synthetic simplification pairs. We also evaluate other solutions, such as oversampling and the use of external word embeddings to be fed to the neural simplification system. Our approach is evaluated on Italian and Spanish, for which few thousand gold sentence pairs are available. The results show that these techniques yield performance improvements over a baseline sequence-to-sequence configuration.
Tasks	Machine Translation, Text Simplification, Word Embeddings
Published	2019-06-01
URL	https://www.aclweb.org/anthology/W19-2305/
PDF	https://www.aclweb.org/anthology/W19-2305
PWC	https://paperswithcode.com/paper/neural-text-simplification-in-low-resource
Repo
Framework

Visual Detection with Context for Document Layout Analysis


Title	Visual Detection with Context for Document Layout Analysis
Authors	Carlos Soto, Shinjae Yoo
Abstract	We present 1) a work in progress method to visually segment key regions of scientific articles using an object detection technique augmented with contextual features, and 2) a novel dataset of region-labeled articles. A continuing challenge in scientific literature mining is the difficulty of consistently extracting high-quality text from formatted PDFs. To address this, we adapt the object-detection technique Faster R-CNN for document layout detection, incorporating contextual information that leverages the inherently localized nature of article contents to improve the region detection performance. Due to the limited availability of high-quality region-labels for scientific articles, we also contribute a novel dataset of region annotations, the first version of which covers 9 region classes and 822 article pages. Initial experimental results demonstrate a 23.9{%} absolute improvement in mean average precision over the baseline model by incorporating contextual features, and a processing speed 14x faster than a text-based technique. Ongoing work on further improvements is also discussed.
Tasks	Document Layout Analysis, Object Detection
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-1348/
PDF	https://www.aclweb.org/anthology/D19-1348
PWC	https://paperswithcode.com/paper/visual-detection-with-context-for-document
Repo
Framework

Neural Versus Non-Neural Text Simplification: A Case Study


Title	Neural Versus Non-Neural Text Simplification: A Case Study
Authors	Islam Nassar, An, Michelle a-Rajah, Gholamreza Haffari
Abstract
Tasks	Text Simplification
Published	2019-04-01
URL	https://www.aclweb.org/anthology/U19-1023/
PDF	https://www.aclweb.org/anthology/U19-1023
PWC	https://paperswithcode.com/paper/neural-versus-non-neural-text-simplification
Repo
Framework

EMOMINER at SemEval-2019 Task 3: A Stacked BiLSTM Architecture for Contextual Emotion Detection in Text


Title	EMOMINER at SemEval-2019 Task 3: A Stacked BiLSTM Architecture for Contextual Emotion Detection in Text
Authors	Nikhil Chakravartula, Vijayasaradhi Indurthi
Abstract	This paper describes our participation in the SemEval 2019 Task 3 - Contextual Emotion Detection in Text. This task aims to identify emotions, viz. happiness, anger, sadness in the context of a text conversation. Our system is a stacked Bidirectional LSTM, equipped with attention on top of word embeddings pre-trained on a large collection of Twitter data. In this paper, apart from describing our official submission, we elucidate how different deep learning models respond to this task.
Tasks	Word Embeddings
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2033/
PDF	https://www.aclweb.org/anthology/S19-2033
PWC	https://paperswithcode.com/paper/emominer-at-semeval-2019-task-3-a-stacked
Repo
Framework

Scalable graph-based method for individual named entity identification


Title	Scalable graph-based method for individual named entity identification
Authors	Sammy Khalife, Michalis Vazirgiannis
Abstract	In this paper, we consider the named entity linking (NEL) problem. We assume a set of queries, named entities, that have to be identified within a knowledge base. This knowledge base is represented by a text database paired with a semantic graph, endowed with a classification of entities (ontology). We present state-of-the-art methods in NEL, and propose a new method for individual identification requiring few annotated data samples. We demonstrate its scalability and performance over standard datasets, for several ontology configurations. Our approach is well-motivated for integration in real systems. Indeed, recent deep learning methods, despite their capacity to improve experimental precision, require lots of parameter tuning along with large volume of annotated data.
Tasks	Entity Linking
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-5303/
PDF	https://www.aclweb.org/anthology/D19-5303
PWC	https://paperswithcode.com/paper/scalable-graph-based-method-for-individual
Repo
Framework

SEAGLE: A Platform for Comparative Evaluation of Semantic Encoders for Information Retrieval


Title	SEAGLE: A Platform for Comparative Evaluation of Semantic Encoders for Information Retrieval
Authors	Fabian David Schmidt, Markus Dietsche, Simone Paolo Ponzetto, Goran Glava{\v{s}}
Abstract	We introduce Seagle, a platform for comparative evaluation of semantic text encoding models on information retrieval (IR) tasks. Seagle implements (1) word embedding aggregators, which represent texts as algebraic aggregations of pretrained word embeddings and (2) pretrained semantic encoders, and allows for their comparative evaluation on arbitrary (monolingual and cross-lingual) IR collections. We benchmark Seagle{'}s models on monolingual document retrieval and cross-lingual sentence retrieval. Seagle functionality can be exploited via an easy-to-use web interface and its modular backend (micro-service architecture) can easily be extended with additional semantic search models.
Tasks	Information Retrieval, Word Embeddings
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-3034/
PDF	https://www.aclweb.org/anthology/D19-3034
PWC	https://paperswithcode.com/paper/seagle-a-platform-for-comparative-evaluation
Repo
Framework

Cross-domain and Cross-lingual Abusive Language Detection: A Hybrid Approach with Deep Learning and a Multilingual Lexicon


Title	Cross-domain and Cross-lingual Abusive Language Detection: A Hybrid Approach with Deep Learning and a Multilingual Lexicon
Authors	Endang Wahyu Pamungkas, Viviana Patti
Abstract	The development of computational methods to detect abusive language in social media within variable and multilingual contexts has recently gained significant traction. The growing interest is confirmed by the large number of benchmark corpora for different languages developed in the latest years. However, abusive language behaviour is multifaceted and available datasets are featured by different topical focuses. This makes abusive language detection a domain-dependent task, and building a robust system to detect general abusive content a first challenge. Moreover, most resources are available for English, which makes detecting abusive language in low-resource languages a further challenge. We address both challenges by considering ten publicly available datasets across different domains and languages. A hybrid approach with deep learning and a multilingual lexicon to cross-domain and cross-lingual detection of abusive content is proposed and compared with other simpler models. We show that training a system on general abusive language datasets will produce a cross-domain robust system, which can be used to detect other more specific types of abusive content. We also found that using the domain-independent lexicon HurtLex is useful to transfer knowledge between domains and languages. In the cross-lingual experiment, we demonstrate the effectiveness of our jointlearning model also in out-domain scenarios.
Tasks
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-2051/
PDF	https://www.aclweb.org/anthology/P19-2051
PWC	https://paperswithcode.com/paper/cross-domain-and-cross-lingual-abusive
Repo
Framework

Question Similarity in Community Question Answering: A Systematic Exploration of Preprocessing Methods and Models


Title	Question Similarity in Community Question Answering: A Systematic Exploration of Preprocessing Methods and Models
Authors	Florian Kunneman, Thiago Castro Ferreira, Emiel Krahmer, Antal van den Bosch
Abstract	Community Question Answering forums are popular among Internet users, and a basic problem they encounter is trying to find out if their question has already been posed before. To address this issue, NLP researchers have developed methods to automatically detect question-similarity, which was one of the shared tasks in SemEval. The best performing systems for this task made use of Syntactic Tree Kernels or the SoftCosine metric. However, it remains unclear why these methods seem to work, whether their performance can be improved by better preprocessing methods and what kinds of errors they (and other methods) make. In this paper, we therefore systematically combine and compare these two approaches with the more traditional BM25 and translation-based models. Moreover, we analyze the impact of preprocessing steps (lowercasing, suppression of punctuation and stop words removal) and word meaning similarity based on different distributions (word translation probability, Word2Vec, fastText and ELMo) on the performance of the task. We conduct an error analysis to gain insight into the differences in performance between the system set-ups. The implementation is made publicly available from https://github.com/fkunneman/DiscoSumo/tree/master/ranlp.
Tasks	Community Question Answering, Question Answering, Question Similarity
Published	2019-09-01
URL	https://www.aclweb.org/anthology/R19-1070/
PDF	https://www.aclweb.org/anthology/R19-1070
PWC	https://paperswithcode.com/paper/question-similarity-in-community-question
Repo
Framework

Sieg at MEDIQA 2019: Multi-task Neural Ensemble for Biomedical Inference and Entailment


Title	Sieg at MEDIQA 2019: Multi-task Neural Ensemble for Biomedical Inference and Entailment
Authors	Sai Abishek Bhaskar, Rashi Rungta, James Route, Eric Nyberg, Teruko Mitamura
Abstract	This paper presents a multi-task learning approach to natural language inference (NLI) and question entailment (RQE) in the biomedical domain. Recognizing textual inference relations and question similarity can address the issue of answering new consumer health questions by mapping them to Frequently Asked Questions on reputed websites like the NIH. We show that leveraging information from parallel tasks across domains along with medical knowledge integration allows our model to learn better biomedical feature representations. Our final models for the NLI and RQE tasks achieve the 4th and 2nd rank on the shared-task leaderboard respectively.
Tasks	Multi-Task Learning, Natural Language Inference, Question Similarity
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5049/
PDF	https://www.aclweb.org/anthology/W19-5049
PWC	https://paperswithcode.com/paper/sieg-at-mediqa-2019-multi-task-neural
Repo
Framework

Controlling Grammatical Error Correction Using Word Edit Rate


Title	Controlling Grammatical Error Correction Using Word Edit Rate
Authors	Kengo Hotate, Masahiro Kaneko, Satoru Katsumata, Mamoru Komachi
Abstract	When professional English teachers correct grammatically erroneous sentences written by English learners, they use various methods. The correction method depends on how much corrections a learner requires. In this paper, we propose a method for neural grammar error correction (GEC) that can control the degree of correction. We show that it is possible to actually control the degree of GEC by using new training data annotated with word edit rate. Thereby, diverse corrected sentences is obtained from a single erroneous sentence. Moreover, compared to a GEC model that does not use information on the degree of correction, the proposed method improves correction accuracy.
Tasks	Grammatical Error Correction
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-2020/
PDF	https://www.aclweb.org/anthology/P19-2020
PWC	https://paperswithcode.com/paper/controlling-grammatical-error-correction
Repo
Framework

YNU-HPCC at SemEval-2019 Task 6: Identifying and Categorising Offensive Language on Twitter


Title	YNU-HPCC at SemEval-2019 Task 6: Identifying and Categorising Offensive Language on Twitter
Authors	Chengjin Zhou, Jin Wang, Xuejie Zhang
Abstract	This document describes the submission of team YNU-HPCC to SemEval-2019 for three Sub-tasks of Task 6: Sub-task A, Sub-task B, and Sub-task C. We have submitted four systems to identify and categorise offensive language. The first subsystem is an attention-based 2-layer bidirectional long short-term memory (BiLSTM). The second subsystem is a voting ensemble of four different deep learning architectures. The third subsystem is a stacking ensemble of four different deep learning architectures. Finally, the fourth subsystem is a bidirectional encoder representations from transformers (BERT) model. Among our models, in Sub-task A, our first subsystem performed the best, ranking 16th among 103 teams; in Sub-task B, the second subsystem performed the best, ranking 12th among 75 teams; in Sub-task C, the fourth subsystem performed best, ranking 4th among 65 teams.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2142/
PDF	https://www.aclweb.org/anthology/S19-2142
PWC	https://paperswithcode.com/paper/ynu-hpcc-at-semeval-2019-task-6-identifying
Repo
Framework