July 26, 2019

1904 words 9 mins read

Paper Group NANR 11

CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups. Using Rhetorical Structure Theory for Detection of Fake Online Reviews. Known Strangers: Cross Linguistic Patterns in Multilingual Multidirectional Dictionaries. TakeLab at SemEval-2017 Task 5: Linear aggregation of word embeddings for fine-grained sentiment an …

CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups


Title	CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups
Authors	Seid Muhie Yimam, Sanja {\v{S}}tajner, Martin Riedl, Chris Biemann
Abstract	Complex word identification (CWI) is an important task in text accessibility. However, due to the scarcity of CWI datasets, previous studies have only addressed this problem on Wikipedia sentences and have solely taken into account the needs of non-native English speakers. We collect a new CWI dataset (CWIG3G2) covering three text genres News, WikiNews, and Wikipedia) annotated by both native and non-native English speakers. Unlike previous datasets, we cover single words, as well as complex phrases, and present them for judgment in a paragraph context. We present the first study on cross-genre and cross-group CWI, showing measurable influences in native language and genre types.
Tasks	Complex Word Identification, Lexical Simplification, Reading Comprehension
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-2068/
PDF	https://www.aclweb.org/anthology/I17-2068
PWC	https://paperswithcode.com/paper/cwig3g2-complex-word-identification-task
Repo
Framework

Using Rhetorical Structure Theory for Detection of Fake Online Reviews


Title	Using Rhetorical Structure Theory for Detection of Fake Online Reviews
Authors	Olu Popoola
Abstract
Tasks	Deception Detection
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-3608/
PDF	https://www.aclweb.org/anthology/W17-3608
PWC	https://paperswithcode.com/paper/using-rhetorical-structure-theory-for
Repo
Framework

Known Strangers: Cross Linguistic Patterns in Multilingual Multidirectional Dictionaries


Title	Known Strangers: Cross Linguistic Patterns in Multilingual Multidirectional Dictionaries
Authors	Rejitha K. S, Rajesha N.
Abstract
Tasks
Published	2017-12-01
URL	https://www.aclweb.org/anthology/W17-7521/
PDF	https://www.aclweb.org/anthology/W17-7521
PWC	https://paperswithcode.com/paper/known-strangers-cross-linguistic-patterns-in
Repo
Framework

TakeLab at SemEval-2017 Task 5: Linear aggregation of word embeddings for fine-grained sentiment analysis of financial news


Title	TakeLab at SemEval-2017 Task 5: Linear aggregation of word embeddings for fine-grained sentiment analysis of financial news
Authors	Leon Rotim, Martin Tutek, Jan {\v{S}}najder
Abstract	This paper describes our system for fine-grained sentiment scoring of news headlines submitted to SemEval 2017 task 5{–}subtask 2. Our system uses a feature-light method that consists of a Support Vector Regression (SVR) with various kernels and word vectors as features. Our best-performing submission scored 3rd on the task out of 29 teams and 4th out of 45 submissions with a cosine score of 0.733.
Tasks	Feature Engineering, Sentiment Analysis, Word Embeddings
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2148/
PDF	https://www.aclweb.org/anthology/S17-2148
PWC	https://paperswithcode.com/paper/takelab-at-semeval-2017-task-5-linear
Repo
Framework

BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs


Title	BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs
Authors	Mathieu Cliche
Abstract	In this paper we describe our attempt at producing a state-of-the-art Twitter sentiment classifier using Convolutional Neural Networks (CNNs) and Long Short Term Memory (LSTMs) networks. Our system leverages a large amount of unlabeled data to pre-train word embeddings. We then use a subset of the unlabeled data to fine tune the embeddings using distant supervision. The final CNNs and LSTMs are trained on the SemEval-2017 Twitter dataset where the embeddings are fined tuned again. To boost performances we ensemble several CNNs and LSTMs together. Our approach achieved first rank on all of the five English subtasks amongst 40 teams.
Tasks	Sentiment Analysis, Twitter Sentiment Analysis, Word Embeddings
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2094/
PDF	https://www.aclweb.org/anthology/S17-2094
PWC	https://paperswithcode.com/paper/bb_twtr-at-semeval-2017-task-4-twitter-1
Repo
Framework

An Entity Resolution Approach to Isolate Instances of Human Trafficking Online


Title	An Entity Resolution Approach to Isolate Instances of Human Trafficking Online
Authors	Chirag Nagpal, Kyle Miller, Benedikt Boecking, Artur Dubrawski
Abstract	Human trafficking is a challenging law enforcement problem, and traces of victims of such activity manifest as {`}escort advertisements{'} on various online forums. Given the large, heterogeneous and noisy structure of this data, building models to predict instances of trafficking is a convoluted task. In this paper we propose an entity resolution pipeline using a notion of proxy labels, in order to extract clusters from this data with prior history of human trafficking activity. We apply this pipeline to 5M records from backpage.com and report on the performance of this approach, challenges in terms of scalability, and some significant domain specific characteristics of our resolved entities. \|
Tasks	Entity Resolution
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4411/
PDF	https://www.aclweb.org/anthology/W17-4411
PWC	https://paperswithcode.com/paper/an-entity-resolution-approach-to-isolate
Repo
Framework

Event Detection and Semantic Storytelling: Generating a Travelogue from a large Collection of Personal Letters


Title	Event Detection and Semantic Storytelling: Generating a Travelogue from a large Collection of Personal Letters
Authors	Georg Rehm, Julian Moreno Schneider, Peter Bourgonje, Ankit Srivastava, Jan Nehring, Armin Berger, Luca K{"o}nig, S{"o}ren R{"a}uchle, Jens Gerth
Abstract	We present an approach at identifying a specific class of events, movement action events (MAEs), in a data set that consists of ca. 2,800 personal letters exchanged by the German architect Erich Mendelsohn and his wife, Luise. A backend system uses these and other semantic analysis results as input for an authoring environment that digital curators can use to produce new pieces of digital content. In our example case, the human expert will receive recommendations from the system with the goal of putting together a travelogue, i.e., a description of the trips and journeys undertaken by the couple. We describe the components and architecture and also apply the system to news data.
Tasks	Entity Linking, Machine Translation, Named Entity Recognition
Published	2017-08-01
URL	https://www.aclweb.org/anthology/W17-2707/
PDF	https://www.aclweb.org/anthology/W17-2707
PWC	https://paperswithcode.com/paper/event-detection-and-semantic-storytelling
Repo
Framework

Computational analysis of Gondi dialects


Title	Computational analysis of Gondi dialects
Authors	Taraka Rama, {\c{C}}a{\u{g}}r{\i} {\c{C}}{"o}ltekin, Pavel Sofroniev
Abstract	This paper presents a computational analysis of Gondi dialects spoken in central India. We present a digitized data set of the dialect area, and analyze the data using different techniques from dialectometry, deep learning, and computational biology. We show that the methods largely agree with each other and with the earlier non-computational analyses of the language group.
Tasks	Word Alignment
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1203/
PDF	https://www.aclweb.org/anthology/W17-1203
PWC	https://paperswithcode.com/paper/computational-analysis-of-gondi-dialects
Repo
Framework

Experiments in Non-Coherent Post-editing


Title	Experiments in Non-Coherent Post-editing
Authors	Cristina Toledo B{'a}ez, Moritz Schaeffer, Michael Carl
Abstract	Market pressure on translation productivity joined with technological innovation is likely to fragment and decontextualise translation jobs even more than is cur-rently the case. Many different translators increasingly work on one document at different places, collaboratively working in the cloud. This paper investigates the effect of decontextualised source texts on behaviour by comparing post-editing of sequentially ordered sentences with shuffled sentences from two different texts. The findings suggest that there is little or no effect of the decontextualised source texts on behaviour.
Tasks	Active Learning, Machine Translation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-7902/
PDF	https://doi.org/10.26615/978-954-452-042-7_002
PWC	https://paperswithcode.com/paper/experiments-in-non-coherent-post-editing
Repo
Framework

多樣訊雜比之訓練語料於降噪自動編碼器其語音強化功能之初步研究 (A Preliminary Study of Various SNR-level Training Data in the Denoising Auto-encoder (DAE) Technique for Speech Enhancement) [In Chinese]


Title	多樣訊雜比之訓練語料於降噪自動編碼器其語音強化功能之初步研究 (A Preliminary Study of Various SNR-level Training Data in the Denoising Auto-encoder (DAE) Technique for Speech Enhancement) [In Chinese]
Authors	Shih-Kuang Lee, Syu-Siang Wang, Yu Tsao, Jeih-weih Hung
Abstract
Tasks	Denoising, Speech Enhancement
Published	2017-11-01
URL	https://www.aclweb.org/anthology/O17-1009/
PDF	https://www.aclweb.org/anthology/O17-1009
PWC	https://paperswithcode.com/paper/a-e-e-a1e-c-ea14eaaeaac-c14a
Repo
Framework

Annotating Orthographic Target Hypotheses in a German L1 Learner Corpus


Title	Annotating Orthographic Target Hypotheses in a German L1 Learner Corpus
Authors	Ronja Laarmann-Quante, Katrin Ortmann, Anna Ehlert, Maurice Vogel, Stefanie Dipper
Abstract	NLP applications for learners often rely on annotated learner corpora. Thereby, it is important that the annotations are both meaningful for the task, and consistent and reliable. We present a new longitudinal L1 learner corpus for German (handwritten texts collected in grade 2{–}4), which is transcribed and annotated with a target hypothesis that strictly only corrects orthographic errors, and is thereby tailored to research and tool development for orthographic issues in primary school. While for most corpora, transcription and target hypothesis are not evaluated, we conducted a detailed inter-annotator agreement study for both tasks. Although we achieved high agreement, our discussion of cases of disagreement shows that even with detailed guidelines, annotators differ here and there for different reasons, which should also be considered when working with transcriptions and target hypotheses of other corpora, especially if no explicit guidelines for their construction are known.
Tasks
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-5051/
PDF	https://www.aclweb.org/anthology/W17-5051
PWC	https://paperswithcode.com/paper/annotating-orthographic-target-hypotheses-in
Repo
Framework

Psycholinguistic Models of Sentence Processing Improve Sentence Readability Ranking


Title	Psycholinguistic Models of Sentence Processing Improve Sentence Readability Ranking
Authors	David M. Howcroft, Vera Demberg
Abstract	While previous research on readability has typically focused on document-level measures, recent work in areas such as natural language generation has pointed out the need of sentence-level readability measures. Much of psycholinguistics has focused for many years on processing measures that provide difficulty estimates on a word-by-word basis. However, these psycholinguistic measures have not yet been tested on sentence readability ranking tasks. In this paper, we use four psycholinguistic measures: idea density, surprisal, integration cost, and embedding depth to test whether these features are predictive of readability levels. We find that psycholinguistic features significantly improve performance by up to 3 percentage points over a standard document-level readability metric baseline.
Tasks	Information Retrieval, Text Generation, Text Simplification
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-1090/
PDF	https://www.aclweb.org/anthology/E17-1090
PWC	https://paperswithcode.com/paper/psycholinguistic-models-of-sentence
Repo
Framework

Indexicals as Weak Descriptors


Title	Indexicals as Weak Descriptors
Authors	Andy L{"u}cking
Abstract
Tasks
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-7102/
PDF	https://www.aclweb.org/anthology/W17-7102
PWC	https://paperswithcode.com/paper/indexicals-as-weak-descriptors
Repo
Framework

Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Topic Labels


Title	Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Topic Labels
Authors	Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Niklas Elmqvist, Leah Findlater
Abstract	Probabilistic topic models are important tools for indexing, summarizing, and analyzing large document collections by their themes. However, promoting end-user understanding of topics remains an open research problem. We compare labels generated by users given four topic visualization techniques{—}word lists, word lists with bars, word clouds, and network graphs{—}against each other and against automatically generated labels. Our basis of comparison is participant ratings of how well labels describe documents from the topic. Our study has two phases: a labeling phase where participants label visualized topics and a validation phase where different participants select which labels best describe the topics{'} documents. Although all visualizations produce similar quality labels, simple visualizations such as word lists allow participants to quickly understand topics, while complex visualizations take longer but expose multi-word expressions that simpler visualizations obscure. Automatic labels lag behind user-created labels, but our dataset of manually labeled topics highlights linguistic patterns (e.g., hypernyms, phrases) that can be used to improve automatic topic labeling algorithms.
Tasks	Topic Models
Published	2017-01-01
URL	https://www.aclweb.org/anthology/Q17-1001/
PDF	https://www.aclweb.org/anthology/Q17-1001
PWC	https://paperswithcode.com/paper/evaluating-visual-representations-for-topic
Repo
Framework

``PageRank’’ for Argument Relevance


Title	``PageRank’’ for Argument Relevance \|
Authors	Henning Wachsmuth, Benno Stein, Yamen Ajjour
Abstract	Future search engines are expected to deliver pro and con arguments in response to queries on controversial topics. While argument mining is now in the focus of research, the question of how to retrieve the relevant arguments remains open. This paper proposes a radical model to assess relevance objectively at web scale: the relevance of an argument{'}s conclusion is decided by what other arguments reuse it as a premise. We build an argument graph for this model that we analyze with a recursive weighting scheme, adapting key ideas of PageRank. In experiments on a large ground-truth argument graph, the resulting relevance scores correlate with human average judgments. We outline what natural language challenges must be faced at web scale in order to stepwise bring argument relevance to web search engines.
Tasks	Argument Mining
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-1105/
PDF	https://www.aclweb.org/anthology/E17-1105
PWC	https://paperswithcode.com/paper/pagerank-for-argument-relevance
Repo
Framework