July 26, 2019

1904 words 9 mins read

Paper Group NANR 11

Paper Group NANR 11

CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups. Using Rhetorical Structure Theory for Detection of Fake Online Reviews. Known Strangers: Cross Linguistic Patterns in Multilingual Multidirectional Dictionaries. TakeLab at SemEval-2017 Task 5: Linear aggregation of word embeddings for fine-grained sentiment an …

CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups

Title CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups
Authors Seid Muhie Yimam, Sanja {\v{S}}tajner, Martin Riedl, Chris Biemann
Abstract Complex word identification (CWI) is an important task in text accessibility. However, due to the scarcity of CWI datasets, previous studies have only addressed this problem on Wikipedia sentences and have solely taken into account the needs of non-native English speakers. We collect a new CWI dataset (CWIG3G2) covering three text genres News, WikiNews, and Wikipedia) annotated by both native and non-native English speakers. Unlike previous datasets, we cover single words, as well as complex phrases, and present them for judgment in a paragraph context. We present the first study on cross-genre and cross-group CWI, showing measurable influences in native language and genre types.
Tasks Complex Word Identification, Lexical Simplification, Reading Comprehension
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-2068/
PDF https://www.aclweb.org/anthology/I17-2068
PWC https://paperswithcode.com/paper/cwig3g2-complex-word-identification-task
Repo
Framework

Using Rhetorical Structure Theory for Detection of Fake Online Reviews

Title Using Rhetorical Structure Theory for Detection of Fake Online Reviews
Authors Olu Popoola
Abstract
Tasks Deception Detection
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-3608/
PDF https://www.aclweb.org/anthology/W17-3608
PWC https://paperswithcode.com/paper/using-rhetorical-structure-theory-for
Repo
Framework

Known Strangers: Cross Linguistic Patterns in Multilingual Multidirectional Dictionaries

Title Known Strangers: Cross Linguistic Patterns in Multilingual Multidirectional Dictionaries
Authors Rejitha K. S, Rajesha N.
Abstract
Tasks
Published 2017-12-01
URL https://www.aclweb.org/anthology/W17-7521/
PDF https://www.aclweb.org/anthology/W17-7521
PWC https://paperswithcode.com/paper/known-strangers-cross-linguistic-patterns-in
Repo
Framework

TakeLab at SemEval-2017 Task 5: Linear aggregation of word embeddings for fine-grained sentiment analysis of financial news

Title TakeLab at SemEval-2017 Task 5: Linear aggregation of word embeddings for fine-grained sentiment analysis of financial news
Authors Leon Rotim, Martin Tutek, Jan {\v{S}}najder
Abstract This paper describes our system for fine-grained sentiment scoring of news headlines submitted to SemEval 2017 task 5{–}subtask 2. Our system uses a feature-light method that consists of a Support Vector Regression (SVR) with various kernels and word vectors as features. Our best-performing submission scored 3rd on the task out of 29 teams and 4th out of 45 submissions with a cosine score of 0.733.
Tasks Feature Engineering, Sentiment Analysis, Word Embeddings
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2148/
PDF https://www.aclweb.org/anthology/S17-2148
PWC https://paperswithcode.com/paper/takelab-at-semeval-2017-task-5-linear
Repo
Framework

BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs

Title BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs
Authors Mathieu Cliche
Abstract In this paper we describe our attempt at producing a state-of-the-art Twitter sentiment classifier using Convolutional Neural Networks (CNNs) and Long Short Term Memory (LSTMs) networks. Our system leverages a large amount of unlabeled data to pre-train word embeddings. We then use a subset of the unlabeled data to fine tune the embeddings using distant supervision. The final CNNs and LSTMs are trained on the SemEval-2017 Twitter dataset where the embeddings are fined tuned again. To boost performances we ensemble several CNNs and LSTMs together. Our approach achieved first rank on all of the five English subtasks amongst 40 teams.
Tasks Sentiment Analysis, Twitter Sentiment Analysis, Word Embeddings
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2094/
PDF https://www.aclweb.org/anthology/S17-2094
PWC https://paperswithcode.com/paper/bb_twtr-at-semeval-2017-task-4-twitter-1
Repo
Framework

An Entity Resolution Approach to Isolate Instances of Human Trafficking Online

Title An Entity Resolution Approach to Isolate Instances of Human Trafficking Online
Authors Chirag Nagpal, Kyle Miller, Benedikt Boecking, Artur Dubrawski
Abstract Human trafficking is a challenging law enforcement problem, and traces of victims of such activity manifest as {`}escort advertisements{'} on various online forums. Given the large, heterogeneous and noisy structure of this data, building models to predict instances of trafficking is a convoluted task. In this paper we propose an entity resolution pipeline using a notion of proxy labels, in order to extract clusters from this data with prior history of human trafficking activity. We apply this pipeline to 5M records from backpage.com and report on the performance of this approach, challenges in terms of scalability, and some significant domain specific characteristics of our resolved entities. |
Tasks Entity Resolution
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4411/
PDF https://www.aclweb.org/anthology/W17-4411
PWC https://paperswithcode.com/paper/an-entity-resolution-approach-to-isolate
Repo
Framework

Event Detection and Semantic Storytelling: Generating a Travelogue from a large Collection of Personal Letters

Title Event Detection and Semantic Storytelling: Generating a Travelogue from a large Collection of Personal Letters
Authors Georg Rehm, Julian Moreno Schneider, Peter Bourgonje, Ankit Srivastava, Jan Nehring, Armin Berger, Luca K{"o}nig, S{"o}ren R{"a}uchle, Jens Gerth
Abstract We present an approach at identifying a specific class of events, movement action events (MAEs), in a data set that consists of ca. 2,800 personal letters exchanged by the German architect Erich Mendelsohn and his wife, Luise. A backend system uses these and other semantic analysis results as input for an authoring environment that digital curators can use to produce new pieces of digital content. In our example case, the human expert will receive recommendations from the system with the goal of putting together a travelogue, i.e., a description of the trips and journeys undertaken by the couple. We describe the components and architecture and also apply the system to news data.
Tasks Entity Linking, Machine Translation, Named Entity Recognition
Published 2017-08-01
URL https://www.aclweb.org/anthology/W17-2707/
PDF https://www.aclweb.org/anthology/W17-2707
PWC https://paperswithcode.com/paper/event-detection-and-semantic-storytelling
Repo
Framework

Computational analysis of Gondi dialects

Title Computational analysis of Gondi dialects
Authors Taraka Rama, {\c{C}}a{\u{g}}r{\i} {\c{C}}{"o}ltekin, Pavel Sofroniev
Abstract This paper presents a computational analysis of Gondi dialects spoken in central India. We present a digitized data set of the dialect area, and analyze the data using different techniques from dialectometry, deep learning, and computational biology. We show that the methods largely agree with each other and with the earlier non-computational analyses of the language group.
Tasks Word Alignment
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1203/
PDF https://www.aclweb.org/anthology/W17-1203
PWC https://paperswithcode.com/paper/computational-analysis-of-gondi-dialects
Repo
Framework

Experiments in Non-Coherent Post-editing

Title Experiments in Non-Coherent Post-editing
Authors Cristina Toledo B{'a}ez, Moritz Schaeffer, Michael Carl
Abstract Market pressure on translation productivity joined with technological innovation is likely to fragment and decontextualise translation jobs even more than is cur-rently the case. Many different translators increasingly work on one document at different places, collaboratively working in the cloud. This paper investigates the effect of decontextualised source texts on behaviour by comparing post-editing of sequentially ordered sentences with shuffled sentences from two different texts. The findings suggest that there is little or no effect of the decontextualised source texts on behaviour.
Tasks Active Learning, Machine Translation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-7902/
PDF https://doi.org/10.26615/978-954-452-042-7_002
PWC https://paperswithcode.com/paper/experiments-in-non-coherent-post-editing
Repo
Framework

多樣訊雜比之訓練語料於降噪自動編碼器其語音強化功能之初步研究 (A Preliminary Study of Various SNR-level Training Data in the Denoising Auto-encoder (DAE) Technique for Speech Enhancement) [In Chinese]

Title 多樣訊雜比之訓練語料於降噪自動編碼器其語音強化功能之初步研究 (A Preliminary Study of Various SNR-level Training Data in the Denoising Auto-encoder (DAE) Technique for Speech Enhancement) [In Chinese]
Authors Shih-Kuang Lee, Syu-Siang Wang, Yu Tsao, Jeih-weih Hung
Abstract
Tasks Denoising, Speech Enhancement
Published 2017-11-01
URL https://www.aclweb.org/anthology/O17-1009/
PDF https://www.aclweb.org/anthology/O17-1009
PWC https://paperswithcode.com/paper/a-e-e-a1e-c-ea14eaaeaac-c14a
Repo
Framework

Annotating Orthographic Target Hypotheses in a German L1 Learner Corpus

Title Annotating Orthographic Target Hypotheses in a German L1 Learner Corpus
Authors Ronja Laarmann-Quante, Katrin Ortmann, Anna Ehlert, Maurice Vogel, Stefanie Dipper
Abstract NLP applications for learners often rely on annotated learner corpora. Thereby, it is important that the annotations are both meaningful for the task, and consistent and reliable. We present a new longitudinal L1 learner corpus for German (handwritten texts collected in grade 2{–}4), which is transcribed and annotated with a target hypothesis that strictly only corrects orthographic errors, and is thereby tailored to research and tool development for orthographic issues in primary school. While for most corpora, transcription and target hypothesis are not evaluated, we conducted a detailed inter-annotator agreement study for both tasks. Although we achieved high agreement, our discussion of cases of disagreement shows that even with detailed guidelines, annotators differ here and there for different reasons, which should also be considered when working with transcriptions and target hypotheses of other corpora, especially if no explicit guidelines for their construction are known.
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-5051/
PDF https://www.aclweb.org/anthology/W17-5051
PWC https://paperswithcode.com/paper/annotating-orthographic-target-hypotheses-in
Repo
Framework

Psycholinguistic Models of Sentence Processing Improve Sentence Readability Ranking

Title Psycholinguistic Models of Sentence Processing Improve Sentence Readability Ranking
Authors David M. Howcroft, Vera Demberg
Abstract While previous research on readability has typically focused on document-level measures, recent work in areas such as natural language generation has pointed out the need of sentence-level readability measures. Much of psycholinguistics has focused for many years on processing measures that provide difficulty estimates on a word-by-word basis. However, these psycholinguistic measures have not yet been tested on sentence readability ranking tasks. In this paper, we use four psycholinguistic measures: idea density, surprisal, integration cost, and embedding depth to test whether these features are predictive of readability levels. We find that psycholinguistic features significantly improve performance by up to 3 percentage points over a standard document-level readability metric baseline.
Tasks Information Retrieval, Text Generation, Text Simplification
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-1090/
PDF https://www.aclweb.org/anthology/E17-1090
PWC https://paperswithcode.com/paper/psycholinguistic-models-of-sentence
Repo
Framework

Indexicals as Weak Descriptors

Title Indexicals as Weak Descriptors
Authors Andy L{"u}cking
Abstract
Tasks
Published 2017-01-01
URL https://www.aclweb.org/anthology/W17-7102/
PDF https://www.aclweb.org/anthology/W17-7102
PWC https://paperswithcode.com/paper/indexicals-as-weak-descriptors
Repo
Framework

Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Topic Labels

Title Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Topic Labels
Authors Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Niklas Elmqvist, Leah Findlater
Abstract Probabilistic topic models are important tools for indexing, summarizing, and analyzing large document collections by their themes. However, promoting end-user understanding of topics remains an open research problem. We compare labels generated by users given four topic visualization techniques{—}word lists, word lists with bars, word clouds, and network graphs{—}against each other and against automatically generated labels. Our basis of comparison is participant ratings of how well labels describe documents from the topic. Our study has two phases: a labeling phase where participants label visualized topics and a validation phase where different participants select which labels best describe the topics{'} documents. Although all visualizations produce similar quality labels, simple visualizations such as word lists allow participants to quickly understand topics, while complex visualizations take longer but expose multi-word expressions that simpler visualizations obscure. Automatic labels lag behind user-created labels, but our dataset of manually labeled topics highlights linguistic patterns (e.g., hypernyms, phrases) that can be used to improve automatic topic labeling algorithms.
Tasks Topic Models
Published 2017-01-01
URL https://www.aclweb.org/anthology/Q17-1001/
PDF https://www.aclweb.org/anthology/Q17-1001
PWC https://paperswithcode.com/paper/evaluating-visual-representations-for-topic
Repo
Framework

``PageRank’’ for Argument Relevance

Title ``PageRank’’ for Argument Relevance |
Authors Henning Wachsmuth, Benno Stein, Yamen Ajjour
Abstract Future search engines are expected to deliver pro and con arguments in response to queries on controversial topics. While argument mining is now in the focus of research, the question of how to retrieve the relevant arguments remains open. This paper proposes a radical model to assess relevance objectively at web scale: the relevance of an argument{'}s conclusion is decided by what other arguments reuse it as a premise. We build an argument graph for this model that we analyze with a recursive weighting scheme, adapting key ideas of PageRank. In experiments on a large ground-truth argument graph, the resulting relevance scores correlate with human average judgments. We outline what natural language challenges must be faced at web scale in order to stepwise bring argument relevance to web search engines.
Tasks Argument Mining
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-1105/
PDF https://www.aclweb.org/anthology/E17-1105
PWC https://paperswithcode.com/paper/pagerank-for-argument-relevance
Repo
Framework
comments powered by Disqus