Paper Group NANR 11
CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups. Using Rhetorical Structure Theory for Detection of Fake Online Reviews. Known Strangers: Cross Linguistic Patterns in Multilingual Multidirectional Dictionaries. TakeLab at SemEval-2017 Task 5: Linear aggregation of word embeddings for fine-grained sentiment an …
CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups
Title | CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups |
Authors | Seid Muhie Yimam, Sanja {\v{S}}tajner, Martin Riedl, Chris Biemann |
Abstract | Complex word identification (CWI) is an important task in text accessibility. However, due to the scarcity of CWI datasets, previous studies have only addressed this problem on Wikipedia sentences and have solely taken into account the needs of non-native English speakers. We collect a new CWI dataset (CWIG3G2) covering three text genres News, WikiNews, and Wikipedia) annotated by both native and non-native English speakers. Unlike previous datasets, we cover single words, as well as complex phrases, and present them for judgment in a paragraph context. We present the first study on cross-genre and cross-group CWI, showing measurable influences in native language and genre types. |
Tasks | Complex Word Identification, Lexical Simplification, Reading Comprehension |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-2068/ |
https://www.aclweb.org/anthology/I17-2068 | |
PWC | https://paperswithcode.com/paper/cwig3g2-complex-word-identification-task |
Repo | |
Framework | |
Using Rhetorical Structure Theory for Detection of Fake Online Reviews
Title | Using Rhetorical Structure Theory for Detection of Fake Online Reviews |
Authors | Olu Popoola |
Abstract | |
Tasks | Deception Detection |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-3608/ |
https://www.aclweb.org/anthology/W17-3608 | |
PWC | https://paperswithcode.com/paper/using-rhetorical-structure-theory-for |
Repo | |
Framework | |
Known Strangers: Cross Linguistic Patterns in Multilingual Multidirectional Dictionaries
Title | Known Strangers: Cross Linguistic Patterns in Multilingual Multidirectional Dictionaries |
Authors | Rejitha K. S, Rajesha N. |
Abstract | |
Tasks | |
Published | 2017-12-01 |
URL | https://www.aclweb.org/anthology/W17-7521/ |
https://www.aclweb.org/anthology/W17-7521 | |
PWC | https://paperswithcode.com/paper/known-strangers-cross-linguistic-patterns-in |
Repo | |
Framework | |
TakeLab at SemEval-2017 Task 5: Linear aggregation of word embeddings for fine-grained sentiment analysis of financial news
Title | TakeLab at SemEval-2017 Task 5: Linear aggregation of word embeddings for fine-grained sentiment analysis of financial news |
Authors | Leon Rotim, Martin Tutek, Jan {\v{S}}najder |
Abstract | This paper describes our system for fine-grained sentiment scoring of news headlines submitted to SemEval 2017 task 5{–}subtask 2. Our system uses a feature-light method that consists of a Support Vector Regression (SVR) with various kernels and word vectors as features. Our best-performing submission scored 3rd on the task out of 29 teams and 4th out of 45 submissions with a cosine score of 0.733. |
Tasks | Feature Engineering, Sentiment Analysis, Word Embeddings |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/S17-2148/ |
https://www.aclweb.org/anthology/S17-2148 | |
PWC | https://paperswithcode.com/paper/takelab-at-semeval-2017-task-5-linear |
Repo | |
Framework | |
BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs
Title | BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs |
Authors | Mathieu Cliche |
Abstract | In this paper we describe our attempt at producing a state-of-the-art Twitter sentiment classifier using Convolutional Neural Networks (CNNs) and Long Short Term Memory (LSTMs) networks. Our system leverages a large amount of unlabeled data to pre-train word embeddings. We then use a subset of the unlabeled data to fine tune the embeddings using distant supervision. The final CNNs and LSTMs are trained on the SemEval-2017 Twitter dataset where the embeddings are fined tuned again. To boost performances we ensemble several CNNs and LSTMs together. Our approach achieved first rank on all of the five English subtasks amongst 40 teams. |
Tasks | Sentiment Analysis, Twitter Sentiment Analysis, Word Embeddings |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/S17-2094/ |
https://www.aclweb.org/anthology/S17-2094 | |
PWC | https://paperswithcode.com/paper/bb_twtr-at-semeval-2017-task-4-twitter-1 |
Repo | |
Framework | |
An Entity Resolution Approach to Isolate Instances of Human Trafficking Online
Title | An Entity Resolution Approach to Isolate Instances of Human Trafficking Online |
Authors | Chirag Nagpal, Kyle Miller, Benedikt Boecking, Artur Dubrawski |
Abstract | Human trafficking is a challenging law enforcement problem, and traces of victims of such activity manifest as {`}escort advertisements{'} on various online forums. Given the large, heterogeneous and noisy structure of this data, building models to predict instances of trafficking is a convoluted task. In this paper we propose an entity resolution pipeline using a notion of proxy labels, in order to extract clusters from this data with prior history of human trafficking activity. We apply this pipeline to 5M records from backpage.com and report on the performance of this approach, challenges in terms of scalability, and some significant domain specific characteristics of our resolved entities. | |
Tasks | Entity Resolution |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4411/ |
https://www.aclweb.org/anthology/W17-4411 | |
PWC | https://paperswithcode.com/paper/an-entity-resolution-approach-to-isolate |
Repo | |
Framework | |
Event Detection and Semantic Storytelling: Generating a Travelogue from a large Collection of Personal Letters
Title | Event Detection and Semantic Storytelling: Generating a Travelogue from a large Collection of Personal Letters |
Authors | Georg Rehm, Julian Moreno Schneider, Peter Bourgonje, Ankit Srivastava, Jan Nehring, Armin Berger, Luca K{"o}nig, S{"o}ren R{"a}uchle, Jens Gerth |
Abstract | We present an approach at identifying a specific class of events, movement action events (MAEs), in a data set that consists of ca. 2,800 personal letters exchanged by the German architect Erich Mendelsohn and his wife, Luise. A backend system uses these and other semantic analysis results as input for an authoring environment that digital curators can use to produce new pieces of digital content. In our example case, the human expert will receive recommendations from the system with the goal of putting together a travelogue, i.e., a description of the trips and journeys undertaken by the couple. We describe the components and architecture and also apply the system to news data. |
Tasks | Entity Linking, Machine Translation, Named Entity Recognition |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-2707/ |
https://www.aclweb.org/anthology/W17-2707 | |
PWC | https://paperswithcode.com/paper/event-detection-and-semantic-storytelling |
Repo | |
Framework | |
Computational analysis of Gondi dialects
Title | Computational analysis of Gondi dialects |
Authors | Taraka Rama, {\c{C}}a{\u{g}}r{\i} {\c{C}}{"o}ltekin, Pavel Sofroniev |
Abstract | This paper presents a computational analysis of Gondi dialects spoken in central India. We present a digitized data set of the dialect area, and analyze the data using different techniques from dialectometry, deep learning, and computational biology. We show that the methods largely agree with each other and with the earlier non-computational analyses of the language group. |
Tasks | Word Alignment |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1203/ |
https://www.aclweb.org/anthology/W17-1203 | |
PWC | https://paperswithcode.com/paper/computational-analysis-of-gondi-dialects |
Repo | |
Framework | |
Experiments in Non-Coherent Post-editing
Title | Experiments in Non-Coherent Post-editing |
Authors | Cristina Toledo B{'a}ez, Moritz Schaeffer, Michael Carl |
Abstract | Market pressure on translation productivity joined with technological innovation is likely to fragment and decontextualise translation jobs even more than is cur-rently the case. Many different translators increasingly work on one document at different places, collaboratively working in the cloud. This paper investigates the effect of decontextualised source texts on behaviour by comparing post-editing of sequentially ordered sentences with shuffled sentences from two different texts. The findings suggest that there is little or no effect of the decontextualised source texts on behaviour. |
Tasks | Active Learning, Machine Translation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-7902/ |
https://doi.org/10.26615/978-954-452-042-7_002 | |
PWC | https://paperswithcode.com/paper/experiments-in-non-coherent-post-editing |
Repo | |
Framework | |
多樣訊雜比之訓練語料於降噪自動編碼器其語音強化功能之初步研究 (A Preliminary Study of Various SNR-level Training Data in the Denoising Auto-encoder (DAE) Technique for Speech Enhancement) [In Chinese]
Title | 多樣訊雜比之訓練語料於降噪自動編碼器其語音強化功能之初步研究 (A Preliminary Study of Various SNR-level Training Data in the Denoising Auto-encoder (DAE) Technique for Speech Enhancement) [In Chinese] |
Authors | Shih-Kuang Lee, Syu-Siang Wang, Yu Tsao, Jeih-weih Hung |
Abstract | |
Tasks | Denoising, Speech Enhancement |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/O17-1009/ |
https://www.aclweb.org/anthology/O17-1009 | |
PWC | https://paperswithcode.com/paper/a-e-e-a1e-c-ea14eaaeaac-c14a |
Repo | |
Framework | |
Annotating Orthographic Target Hypotheses in a German L1 Learner Corpus
Title | Annotating Orthographic Target Hypotheses in a German L1 Learner Corpus |
Authors | Ronja Laarmann-Quante, Katrin Ortmann, Anna Ehlert, Maurice Vogel, Stefanie Dipper |
Abstract | NLP applications for learners often rely on annotated learner corpora. Thereby, it is important that the annotations are both meaningful for the task, and consistent and reliable. We present a new longitudinal L1 learner corpus for German (handwritten texts collected in grade 2{–}4), which is transcribed and annotated with a target hypothesis that strictly only corrects orthographic errors, and is thereby tailored to research and tool development for orthographic issues in primary school. While for most corpora, transcription and target hypothesis are not evaluated, we conducted a detailed inter-annotator agreement study for both tasks. Although we achieved high agreement, our discussion of cases of disagreement shows that even with detailed guidelines, annotators differ here and there for different reasons, which should also be considered when working with transcriptions and target hypotheses of other corpora, especially if no explicit guidelines for their construction are known. |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-5051/ |
https://www.aclweb.org/anthology/W17-5051 | |
PWC | https://paperswithcode.com/paper/annotating-orthographic-target-hypotheses-in |
Repo | |
Framework | |
Psycholinguistic Models of Sentence Processing Improve Sentence Readability Ranking
Title | Psycholinguistic Models of Sentence Processing Improve Sentence Readability Ranking |
Authors | David M. Howcroft, Vera Demberg |
Abstract | While previous research on readability has typically focused on document-level measures, recent work in areas such as natural language generation has pointed out the need of sentence-level readability measures. Much of psycholinguistics has focused for many years on processing measures that provide difficulty estimates on a word-by-word basis. However, these psycholinguistic measures have not yet been tested on sentence readability ranking tasks. In this paper, we use four psycholinguistic measures: idea density, surprisal, integration cost, and embedding depth to test whether these features are predictive of readability levels. We find that psycholinguistic features significantly improve performance by up to 3 percentage points over a standard document-level readability metric baseline. |
Tasks | Information Retrieval, Text Generation, Text Simplification |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-1090/ |
https://www.aclweb.org/anthology/E17-1090 | |
PWC | https://paperswithcode.com/paper/psycholinguistic-models-of-sentence |
Repo | |
Framework | |
Indexicals as Weak Descriptors
Title | Indexicals as Weak Descriptors |
Authors | Andy L{"u}cking |
Abstract | |
Tasks | |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/W17-7102/ |
https://www.aclweb.org/anthology/W17-7102 | |
PWC | https://paperswithcode.com/paper/indexicals-as-weak-descriptors |
Repo | |
Framework | |
Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Topic Labels
Title | Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Topic Labels |
Authors | Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Niklas Elmqvist, Leah Findlater |
Abstract | Probabilistic topic models are important tools for indexing, summarizing, and analyzing large document collections by their themes. However, promoting end-user understanding of topics remains an open research problem. We compare labels generated by users given four topic visualization techniques{—}word lists, word lists with bars, word clouds, and network graphs{—}against each other and against automatically generated labels. Our basis of comparison is participant ratings of how well labels describe documents from the topic. Our study has two phases: a labeling phase where participants label visualized topics and a validation phase where different participants select which labels best describe the topics{'} documents. Although all visualizations produce similar quality labels, simple visualizations such as word lists allow participants to quickly understand topics, while complex visualizations take longer but expose multi-word expressions that simpler visualizations obscure. Automatic labels lag behind user-created labels, but our dataset of manually labeled topics highlights linguistic patterns (e.g., hypernyms, phrases) that can be used to improve automatic topic labeling algorithms. |
Tasks | Topic Models |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/Q17-1001/ |
https://www.aclweb.org/anthology/Q17-1001 | |
PWC | https://paperswithcode.com/paper/evaluating-visual-representations-for-topic |
Repo | |
Framework | |
``PageRank’’ for Argument Relevance
Title | ``PageRank’’ for Argument Relevance | |
Authors | Henning Wachsmuth, Benno Stein, Yamen Ajjour |
Abstract | Future search engines are expected to deliver pro and con arguments in response to queries on controversial topics. While argument mining is now in the focus of research, the question of how to retrieve the relevant arguments remains open. This paper proposes a radical model to assess relevance objectively at web scale: the relevance of an argument{'}s conclusion is decided by what other arguments reuse it as a premise. We build an argument graph for this model that we analyze with a recursive weighting scheme, adapting key ideas of PageRank. In experiments on a large ground-truth argument graph, the resulting relevance scores correlate with human average judgments. We outline what natural language challenges must be faced at web scale in order to stepwise bring argument relevance to web search engines. |
Tasks | Argument Mining |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-1105/ |
https://www.aclweb.org/anthology/E17-1105 | |
PWC | https://paperswithcode.com/paper/pagerank-for-argument-relevance |
Repo | |
Framework | |