Paper Group NANR 14
The relation between dependency distance and frequency. Learning-Based Sampling for Natural Image Matting. A Multi-modal one-class generative adversarial network for anomaly detection in manufacturing. Syntax is clearer on the other side - Using parallel corpus to extract monolingual data. Neural Text Simplification in Low-Resource Conditions Using …
The relation between dependency distance and frequency
Title | The relation between dependency distance and frequency |
Authors | Xinying Chen, Kim Gerdes |
Abstract | |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-7909/ |
https://www.aclweb.org/anthology/W19-7909 | |
PWC | https://paperswithcode.com/paper/the-relation-between-dependency-distance-and |
Repo | |
Framework | |
Learning-Based Sampling for Natural Image Matting
Title | Learning-Based Sampling for Natural Image Matting |
Authors | Jingwei Tang, Yagiz Aksoy, Cengiz Oztireli, Markus Gross, Tunc Ozan Aydin |
Abstract | The goal of natural image matting is the estimation of opacities of a user-defined foreground object that is essential in creating realistic composite imagery. Natural matting is a challenging process due to the high number of unknowns in the mathematical modeling of the problem, namely the opacities as well as the foreground and background layer colors, while the original image serves as the single observation. In this paper, we propose the estimation of the layer colors through the use of deep neural networks prior to the opacity estimation. The layer color estimation is a better match for the capabilities of neural networks, and the availability of these colors substantially increase the performance of opacity estimation due to the reduced number of unknowns in the compositing equation. A prominent approach to matting in parallel to ours is called sampling-based matting, which involves gathering color samples from known-opacity regions to predict the layer colors. Our approach outperforms not only the previous hand-crafted sampling algorithms, but also current data-driven methods. We hence classify our method as a hybrid sampling- and learning-based approach to matting, and demonstrate the effectiveness of our approach through detailed ablation studies using alternative network architectures. |
Tasks | Image Matting |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Tang_Learning-Based_Sampling_for_Natural_Image_Matting_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Tang_Learning-Based_Sampling_for_Natural_Image_Matting_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/learning-based-sampling-for-natural-image |
Repo | |
Framework | |
A Multi-modal one-class generative adversarial network for anomaly detection in manufacturing
Title | A Multi-modal one-class generative adversarial network for anomaly detection in manufacturing |
Authors | Shuhui Qu, Janghwan Lee, Wei Xiong, Wonhyouk Jang, Jie Wang |
Abstract | One class anomaly detection on high-dimensional data is one of the critical issue in both fundamental machine learning research area and manufacturing applica- tions. A good anomaly detection should accurately discriminate anomalies from normal data. Although most previous anomaly detection methods achieve good performances, they do not perform well on high-dimensional imbalanced data- set 1) with a limited amount of data; 2) multi-modal distribution; 3) few anomaly data. In this paper, we develop a multi-modal one-class generative adversarial net- work based detector (MMOC-GAN) to distinguish anomalies from normal data (products). Apart from a domain-specific feature extractor, our model leverage a generative adversarial network(GAN). The generator takes in a modified noise vector using a pseudo latent prior and generate samples at the low-density area of the given normal data to simulate the anomalies. The discriminator then is trained to distinguish the generate samples from the normal samples. Since the generated samples simulate the low density area for each modal, the discriminator could directly detect anomalies from normal data. Experiments demonstrate that our model outperforms the state-of-the-art one-class classification models and other anomaly detection methods on both normal data and anomalies accuracy, as well as the F1 score. Also, the generated samples can fully capture the low density area of different types of products. |
Tasks | Anomaly Detection |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=HJl1ujCct7 |
https://openreview.net/pdf?id=HJl1ujCct7 | |
PWC | https://paperswithcode.com/paper/a-multi-modal-one-class-generative |
Repo | |
Framework | |
Syntax is clearer on the other side - Using parallel corpus to extract monolingual data
Title | Syntax is clearer on the other side - Using parallel corpus to extract monolingual data |
Authors | Andrea D{"o}m{"o}t{"o}r |
Abstract | |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-7813/ |
https://www.aclweb.org/anthology/W19-7813 | |
PWC | https://paperswithcode.com/paper/syntax-is-clearer-on-the-other-side-using |
Repo | |
Framework | |
Neural Text Simplification in Low-Resource Conditions Using Weak Supervision
Title | Neural Text Simplification in Low-Resource Conditions Using Weak Supervision |
Authors | Alessio Palmero Aprosio, Sara Tonelli, Marco Turchi, Matteo Negri, Mattia A. Di Gangi |
Abstract | Neural text simplification has gained increasing attention in the NLP community thanks to recent advancements in deep sequence-to-sequence learning. Most recent efforts with such a data-demanding paradigm have dealt with the English language, for which sizeable training datasets are currently available to deploy competitive models. Similar improvements on less resource-rich languages are conditioned either to intensive manual work to create training data, or to the design of effective automatic generation techniques to bypass the data acquisition bottleneck. Inspired by the machine translation field, in which synthetic parallel pairs generated from monolingual data yield significant improvements to neural models, in this paper we exploit large amounts of heterogeneous data to automatically select simple sentences, which are then used to create synthetic simplification pairs. We also evaluate other solutions, such as oversampling and the use of external word embeddings to be fed to the neural simplification system. Our approach is evaluated on Italian and Spanish, for which few thousand gold sentence pairs are available. The results show that these techniques yield performance improvements over a baseline sequence-to-sequence configuration. |
Tasks | Machine Translation, Text Simplification, Word Embeddings |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/W19-2305/ |
https://www.aclweb.org/anthology/W19-2305 | |
PWC | https://paperswithcode.com/paper/neural-text-simplification-in-low-resource |
Repo | |
Framework | |
Visual Detection with Context for Document Layout Analysis
Title | Visual Detection with Context for Document Layout Analysis |
Authors | Carlos Soto, Shinjae Yoo |
Abstract | We present 1) a work in progress method to visually segment key regions of scientific articles using an object detection technique augmented with contextual features, and 2) a novel dataset of region-labeled articles. A continuing challenge in scientific literature mining is the difficulty of consistently extracting high-quality text from formatted PDFs. To address this, we adapt the object-detection technique Faster R-CNN for document layout detection, incorporating contextual information that leverages the inherently localized nature of article contents to improve the region detection performance. Due to the limited availability of high-quality region-labels for scientific articles, we also contribute a novel dataset of region annotations, the first version of which covers 9 region classes and 822 article pages. Initial experimental results demonstrate a 23.9{%} absolute improvement in mean average precision over the baseline model by incorporating contextual features, and a processing speed 14x faster than a text-based technique. Ongoing work on further improvements is also discussed. |
Tasks | Document Layout Analysis, Object Detection |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1348/ |
https://www.aclweb.org/anthology/D19-1348 | |
PWC | https://paperswithcode.com/paper/visual-detection-with-context-for-document |
Repo | |
Framework | |
Neural Versus Non-Neural Text Simplification: A Case Study
Title | Neural Versus Non-Neural Text Simplification: A Case Study |
Authors | Islam Nassar, An, Michelle a-Rajah, Gholamreza Haffari |
Abstract | |
Tasks | Text Simplification |
Published | 2019-04-01 |
URL | https://www.aclweb.org/anthology/U19-1023/ |
https://www.aclweb.org/anthology/U19-1023 | |
PWC | https://paperswithcode.com/paper/neural-versus-non-neural-text-simplification |
Repo | |
Framework | |
EMOMINER at SemEval-2019 Task 3: A Stacked BiLSTM Architecture for Contextual Emotion Detection in Text
Title | EMOMINER at SemEval-2019 Task 3: A Stacked BiLSTM Architecture for Contextual Emotion Detection in Text |
Authors | Nikhil Chakravartula, Vijayasaradhi Indurthi |
Abstract | This paper describes our participation in the SemEval 2019 Task 3 - Contextual Emotion Detection in Text. This task aims to identify emotions, viz. happiness, anger, sadness in the context of a text conversation. Our system is a stacked Bidirectional LSTM, equipped with attention on top of word embeddings pre-trained on a large collection of Twitter data. In this paper, apart from describing our official submission, we elucidate how different deep learning models respond to this task. |
Tasks | Word Embeddings |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2033/ |
https://www.aclweb.org/anthology/S19-2033 | |
PWC | https://paperswithcode.com/paper/emominer-at-semeval-2019-task-3-a-stacked |
Repo | |
Framework | |
Scalable graph-based method for individual named entity identification
Title | Scalable graph-based method for individual named entity identification |
Authors | Sammy Khalife, Michalis Vazirgiannis |
Abstract | In this paper, we consider the named entity linking (NEL) problem. We assume a set of queries, named entities, that have to be identified within a knowledge base. This knowledge base is represented by a text database paired with a semantic graph, endowed with a classification of entities (ontology). We present state-of-the-art methods in NEL, and propose a new method for individual identification requiring few annotated data samples. We demonstrate its scalability and performance over standard datasets, for several ontology configurations. Our approach is well-motivated for integration in real systems. Indeed, recent deep learning methods, despite their capacity to improve experimental precision, require lots of parameter tuning along with large volume of annotated data. |
Tasks | Entity Linking |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-5303/ |
https://www.aclweb.org/anthology/D19-5303 | |
PWC | https://paperswithcode.com/paper/scalable-graph-based-method-for-individual |
Repo | |
Framework | |
SEAGLE: A Platform for Comparative Evaluation of Semantic Encoders for Information Retrieval
Title | SEAGLE: A Platform for Comparative Evaluation of Semantic Encoders for Information Retrieval |
Authors | Fabian David Schmidt, Markus Dietsche, Simone Paolo Ponzetto, Goran Glava{\v{s}} |
Abstract | We introduce Seagle, a platform for comparative evaluation of semantic text encoding models on information retrieval (IR) tasks. Seagle implements (1) word embedding aggregators, which represent texts as algebraic aggregations of pretrained word embeddings and (2) pretrained semantic encoders, and allows for their comparative evaluation on arbitrary (monolingual and cross-lingual) IR collections. We benchmark Seagle{'}s models on monolingual document retrieval and cross-lingual sentence retrieval. Seagle functionality can be exploited via an easy-to-use web interface and its modular backend (micro-service architecture) can easily be extended with additional semantic search models. |
Tasks | Information Retrieval, Word Embeddings |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-3034/ |
https://www.aclweb.org/anthology/D19-3034 | |
PWC | https://paperswithcode.com/paper/seagle-a-platform-for-comparative-evaluation |
Repo | |
Framework | |
Cross-domain and Cross-lingual Abusive Language Detection: A Hybrid Approach with Deep Learning and a Multilingual Lexicon
Title | Cross-domain and Cross-lingual Abusive Language Detection: A Hybrid Approach with Deep Learning and a Multilingual Lexicon |
Authors | Endang Wahyu Pamungkas, Viviana Patti |
Abstract | The development of computational methods to detect abusive language in social media within variable and multilingual contexts has recently gained significant traction. The growing interest is confirmed by the large number of benchmark corpora for different languages developed in the latest years. However, abusive language behaviour is multifaceted and available datasets are featured by different topical focuses. This makes abusive language detection a domain-dependent task, and building a robust system to detect general abusive content a first challenge. Moreover, most resources are available for English, which makes detecting abusive language in low-resource languages a further challenge. We address both challenges by considering ten publicly available datasets across different domains and languages. A hybrid approach with deep learning and a multilingual lexicon to cross-domain and cross-lingual detection of abusive content is proposed and compared with other simpler models. We show that training a system on general abusive language datasets will produce a cross-domain robust system, which can be used to detect other more specific types of abusive content. We also found that using the domain-independent lexicon HurtLex is useful to transfer knowledge between domains and languages. In the cross-lingual experiment, we demonstrate the effectiveness of our jointlearning model also in out-domain scenarios. |
Tasks | |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-2051/ |
https://www.aclweb.org/anthology/P19-2051 | |
PWC | https://paperswithcode.com/paper/cross-domain-and-cross-lingual-abusive |
Repo | |
Framework | |
Question Similarity in Community Question Answering: A Systematic Exploration of Preprocessing Methods and Models
Title | Question Similarity in Community Question Answering: A Systematic Exploration of Preprocessing Methods and Models |
Authors | Florian Kunneman, Thiago Castro Ferreira, Emiel Krahmer, Antal van den Bosch |
Abstract | Community Question Answering forums are popular among Internet users, and a basic problem they encounter is trying to find out if their question has already been posed before. To address this issue, NLP researchers have developed methods to automatically detect question-similarity, which was one of the shared tasks in SemEval. The best performing systems for this task made use of Syntactic Tree Kernels or the SoftCosine metric. However, it remains unclear why these methods seem to work, whether their performance can be improved by better preprocessing methods and what kinds of errors they (and other methods) make. In this paper, we therefore systematically combine and compare these two approaches with the more traditional BM25 and translation-based models. Moreover, we analyze the impact of preprocessing steps (lowercasing, suppression of punctuation and stop words removal) and word meaning similarity based on different distributions (word translation probability, Word2Vec, fastText and ELMo) on the performance of the task. We conduct an error analysis to gain insight into the differences in performance between the system set-ups. The implementation is made publicly available from https://github.com/fkunneman/DiscoSumo/tree/master/ranlp. |
Tasks | Community Question Answering, Question Answering, Question Similarity |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/R19-1070/ |
https://www.aclweb.org/anthology/R19-1070 | |
PWC | https://paperswithcode.com/paper/question-similarity-in-community-question |
Repo | |
Framework | |
Sieg at MEDIQA 2019: Multi-task Neural Ensemble for Biomedical Inference and Entailment
Title | Sieg at MEDIQA 2019: Multi-task Neural Ensemble for Biomedical Inference and Entailment |
Authors | Sai Abishek Bhaskar, Rashi Rungta, James Route, Eric Nyberg, Teruko Mitamura |
Abstract | This paper presents a multi-task learning approach to natural language inference (NLI) and question entailment (RQE) in the biomedical domain. Recognizing textual inference relations and question similarity can address the issue of answering new consumer health questions by mapping them to Frequently Asked Questions on reputed websites like the NIH. We show that leveraging information from parallel tasks across domains along with medical knowledge integration allows our model to learn better biomedical feature representations. Our final models for the NLI and RQE tasks achieve the 4th and 2nd rank on the shared-task leaderboard respectively. |
Tasks | Multi-Task Learning, Natural Language Inference, Question Similarity |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5049/ |
https://www.aclweb.org/anthology/W19-5049 | |
PWC | https://paperswithcode.com/paper/sieg-at-mediqa-2019-multi-task-neural |
Repo | |
Framework | |
Controlling Grammatical Error Correction Using Word Edit Rate
Title | Controlling Grammatical Error Correction Using Word Edit Rate |
Authors | Kengo Hotate, Masahiro Kaneko, Satoru Katsumata, Mamoru Komachi |
Abstract | When professional English teachers correct grammatically erroneous sentences written by English learners, they use various methods. The correction method depends on how much corrections a learner requires. In this paper, we propose a method for neural grammar error correction (GEC) that can control the degree of correction. We show that it is possible to actually control the degree of GEC by using new training data annotated with word edit rate. Thereby, diverse corrected sentences is obtained from a single erroneous sentence. Moreover, compared to a GEC model that does not use information on the degree of correction, the proposed method improves correction accuracy. |
Tasks | Grammatical Error Correction |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-2020/ |
https://www.aclweb.org/anthology/P19-2020 | |
PWC | https://paperswithcode.com/paper/controlling-grammatical-error-correction |
Repo | |
Framework | |
YNU-HPCC at SemEval-2019 Task 6: Identifying and Categorising Offensive Language on Twitter
Title | YNU-HPCC at SemEval-2019 Task 6: Identifying and Categorising Offensive Language on Twitter |
Authors | Chengjin Zhou, Jin Wang, Xuejie Zhang |
Abstract | This document describes the submission of team YNU-HPCC to SemEval-2019 for three Sub-tasks of Task 6: Sub-task A, Sub-task B, and Sub-task C. We have submitted four systems to identify and categorise offensive language. The first subsystem is an attention-based 2-layer bidirectional long short-term memory (BiLSTM). The second subsystem is a voting ensemble of four different deep learning architectures. The third subsystem is a stacking ensemble of four different deep learning architectures. Finally, the fourth subsystem is a bidirectional encoder representations from transformers (BERT) model. Among our models, in Sub-task A, our first subsystem performed the best, ranking 16th among 103 teams; in Sub-task B, the second subsystem performed the best, ranking 12th among 75 teams; in Sub-task C, the fourth subsystem performed best, ranking 4th among 65 teams. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2142/ |
https://www.aclweb.org/anthology/S19-2142 | |
PWC | https://paperswithcode.com/paper/ynu-hpcc-at-semeval-2019-task-6-identifying |
Repo | |
Framework | |