July 26, 2019

2210 words 11 mins read

Paper Group NANR 4

Gender Profiling for Slovene Twitter communication: the Influence of Gender Marking, Content and Style. Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin. HHU at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Data using Machine Learning Methods. INF-UFRGS at SemEva …

Gender Profiling for Slovene Twitter communication: the Influence of Gender Marking, Content and Style


Title	Gender Profiling for Slovene Twitter communication: the Influence of Gender Marking, Content and Style
Authors	Ben Verhoeven, Iza {\v{S}}krjanec, Senja Pollak
Abstract	We present results of the first gender classification experiments on Slovene text to our knowledge. Inspired by the TwiSty corpus and experiments (Verhoeven et al., 2016), we employed the Janes corpus (Erjavec et al., 2016) and its gender annotations to perform gender classification experiments on Twitter text comparing a token-based and a lemma-based approach. We find that the token-based approach (92.6{%} accuracy), containing gender markings related to the author, outperforms the lemma-based approach by about 5{%}. Especially in the lemmatized version, we also observe stylistic and content-based differences in writing between men (e.g. more profane language, numerals and beer mentions) and women (e.g. more pronouns, emoticons and character flooding). Many of our findings corroborate previous research on other languages.
Tasks	Lemmatization
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1418/
PDF	https://www.aclweb.org/anthology/W17-1418
PWC	https://paperswithcode.com/paper/gender-profiling-for-slovene-twitter
Repo
Framework

Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin


Title	Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin
Authors	G{'e}raldine Walther, Beno{^\i}t Sagot
Abstract	In this paper, we present ongoing work for developing language resources and basic NLP tools for an undocumented variety of Romansh, in the context of a language documentation and language acquisition project. Our tools are meant to improve the speed and reliability of corpus annotations for noisy data involving large amounts of code-switching, occurrences of child-speech and orthographic noise. Being able to increase the efficiency of language resource development for language documentation and acquisition research also constitutes a step towards solving the data sparsity issues with which researchers have been struggling.
Tasks	Language Acquisition, Spelling Correction
Published	2017-08-01
URL	https://www.aclweb.org/anthology/W17-2212/
PDF	https://www.aclweb.org/anthology/W17-2212
PWC	https://paperswithcode.com/paper/speeding-up-corpus-development-for-linguistic
Repo
Framework

HHU at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Data using Machine Learning Methods


Title	HHU at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Data using Machine Learning Methods
Authors	Tobias Cabanski, Julia Romberg, Stefan Conrad
Abstract	In this Paper a system for solving SemEval-2017 Task 5 is presented. This task is divided into two tracks where the sentiment of microblog messages and news headlines has to be predicted. Since two submissions were allowed, two different machine learning methods were developed to solve this task, a support vector machine approach and a recurrent neural network approach. To feed in data for these approaches, different feature extraction methods are used, mainly word representations and lexica. The best submissions for both tracks are provided by the recurrent neural network which achieves a F1-score of 0.729 in track 1 and 0.702 in track 2.
Tasks	Feature Selection, Sentiment Analysis
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2141/
PDF	https://www.aclweb.org/anthology/S17-2141
PWC	https://paperswithcode.com/paper/hhu-at-semeval-2017-task-5-fine-grained
Repo
Framework

INF-UFRGS at SemEval-2017 Task 5: A Supervised Identification of Sentiment Score in Tweets and Headlines


Title	INF-UFRGS at SemEval-2017 Task 5: A Supervised Identification of Sentiment Score in Tweets and Headlines
Authors	Tiago Zini, Karin Becker, Marcelo Dias
Abstract	This paper describes a supervised solution for detecting the polarity scores of tweets or headline news in the financial domain, submitted to the SemEval 2017 Fine-Grained Sentiment Analysis on Financial Microblogs and News Task. The premise is that it is possible to understand market reaction over a company stock by measuring the positive/negative sentiment contained in the financial tweets and news headlines, where polarity is measured in a continuous scale ranging from -1.0 (very bearish) to 1.0 (very bullish). Our system receives as input the textual content of tweets or news headlines, together with their ids, stock cashtag or name of target company, and the polarity score gold standard for the training dataset. Our solution retrieves features from these text instances using n-gram, hashtags, sentiment score calculated by a external APIs and others features to train a regression model capable to detect continuous score of these sentiments with precision.
Tasks	Opinion Mining, Sentiment Analysis
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2142/
PDF	https://www.aclweb.org/anthology/S17-2142
PWC	https://paperswithcode.com/paper/inf-ufrgs-at-semeval-2017-task-5-a-supervised
Repo
Framework

Building Large Chinese Corpus for Spoken Dialogue Research in Specific Domains


Title	Building Large Chinese Corpus for Spoken Dialogue Research in Specific Domains
Authors	Changliang Li, Xiuying Wang
Abstract	Corpus is a valuable resource for information retrieval and data-driven natural language processing systems,especially for spoken dialogue research in specific domains. However,there is little non-English corpora, particular for ones in Chinese. Spoken by the nation with the largest population in the world, Chinese become increasingly prevalent and popular among millions of people worldwide. In this paper, we build a large-scale and high-quality Chinese corpus, called CSDC (Chinese Spoken Dialogue Corpus). It contains five domains and more than 140 thousand dialogues in all. Each sentence in this corpus is annotated with slot information additionally compared to other corpora. To our best knowledge, this is the largest Chinese spoken dialogue corpus, as well as the first one with slot information. With this corpus, we proposed a method and did a well-designed experiment. The indicative result is reported at last.
Tasks	Information Retrieval
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-2054/
PDF	https://www.aclweb.org/anthology/I17-2054
PWC	https://paperswithcode.com/paper/building-large-chinese-corpus-for-spoken
Repo
Framework

Measuring Topic Coherence through Optimal Word Buckets


Title	Measuring Topic Coherence through Optimal Word Buckets
Authors	Nitin Ramrakhiyani, Sachin Pawar, Swapnil Hingmire, Girish Palshikar
Abstract	Measuring topic quality is essential for scoring the learned topics and their subsequent use in Information Retrieval and Text classification. To measure quality of Latent Dirichlet Allocation (LDA) based topics learned from text, we propose a novel approach based on grouping of topic words into buckets (TBuckets). A single large bucket signifies a single coherent theme, in turn indicating high topic coherence. TBuckets uses word embeddings of topic words and employs singular value decomposition (SVD) and Integer Linear Programming based optimization to create coherent word buckets. TBuckets outperforms the state-of-the-art techniques when evaluated using 3 publicly available datasets and on another one proposed in this paper.
Tasks	Information Retrieval, Text Classification, Topic Models, Word Embeddings
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-2070/
PDF	https://www.aclweb.org/anthology/E17-2070
PWC	https://paperswithcode.com/paper/measuring-topic-coherence-through-optimal
Repo
Framework

PP Attachment: Where do We Stand?


Title	PP Attachment: Where do We Stand?
Authors	Dani{"e}l de Kok, Jianqiang Ma, Corina Dima, Erhard Hinrichs
Abstract	Prepostitional phrase (PP) attachment is a well known challenge to parsing. In this paper, we combine the insights of different works, namely: (1) treating PP attachment as a classification task with an arbitrary number of attachment candidates; (2) using auxiliary distributions to augment the data beyond the hand-annotated training set; (3) using topological fields to get information about the distribution of PP attachment throughout clauses and (4) using state-of-the-art techniques such as word embeddings and neural networks. We show that jointly using these techniques leads to substantial improvements. We also conduct a qualitative analysis to gauge where the ceiling of the task is in a realistic setup.
Tasks	Word Embeddings
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-2050/
PDF	https://www.aclweb.org/anthology/E17-2050
PWC	https://paperswithcode.com/paper/pp-attachment-where-do-we-stand
Repo
Framework

Exploiting Argument Information to Improve Event Detection via Supervised Attention Mechanisms


Title	Exploiting Argument Information to Improve Event Detection via Supervised Attention Mechanisms
Authors	Shulin Liu, Yubo Chen, Kang Liu, Jun Zhao
Abstract	This paper tackles the task of event detection (ED), which involves identifying and categorizing events. We argue that arguments provide significant clues to this task, but they are either completely ignored or exploited in an indirect manner in existing detection approaches. In this work, we propose to exploit argument information explicitly for ED via supervised attention mechanisms. In specific, we systematically investigate the proposed model under the supervision of different attention strategies. Experimental results show that our approach advances state-of-the-arts and achieves the best F1 score on ACE 2005 dataset.
Tasks
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-1164/
PDF	https://www.aclweb.org/anthology/P17-1164
PWC	https://paperswithcode.com/paper/exploiting-argument-information-to-improve
Repo
Framework

Deep Learning in Lexical Analysis and Parsing


Title	Deep Learning in Lexical Analysis and Parsing
Authors	Wanxiang Che, Yue Zhang
Abstract	Neural networks, also with a fancy name deep learning, just right can overcome the above {``}feature engineering{''} problem. In theory, they can use non-linear activation functions and multiple layers to automatically find useful features. The novel network structures, such as convolutional or recurrent, help to reduce the difficulty further. These deep learning models have been successfully used for lexical analysis and parsing. In this tutorial, we will give a review of each line of work, by contrasting them with traditional statistical methods, and organizing them in consistent orders. \|
Tasks	Dependency Parsing, Feature Engineering, Lexical Analysis, Part-Of-Speech Tagging, Structured Prediction
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-5001/
PDF	https://www.aclweb.org/anthology/I17-5001
PWC	https://paperswithcode.com/paper/deep-learning-in-lexical-analysis-and-parsing
Repo
Framework

UIT-DANGNT-CLNLP at SemEval-2017 Task 9: Building Scientific Concept Fixing Patterns for Improving CAMR


Title	UIT-DANGNT-CLNLP at SemEval-2017 Task 9: Building Scientific Concept Fixing Patterns for Improving CAMR
Authors	Khoa Nguyen, Dang Nguyen
Abstract	This paper describes the improvements that we have applied on CAMR baseline parser (Wang et al., 2016) at Task 8 of SemEval-2016. Our objective is to increase the performance of CAMR when parsing sentences from scientific articles, especially articles of biology domain more accurately. To achieve this goal, we built two wrapper layers for CAMR. The first layer, which covers the input data, will normalize, add necessary information to the input sentences to make the input dependency parser and the aligner better handle reference citations, scientific figures, formulas, etc. The second layer, which covers the output data, will modify and standardize output data based on a list of scientific concept fixing patterns. This will help CAMR better handle biological concepts which are not in the training dataset. Finally, after applying our approach, CAMR has scored 0.65 F-score on the test set of Biomedical training data and 0.61 F-score on the official blind test dataset.
Tasks
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2156/
PDF	https://www.aclweb.org/anthology/S17-2156
PWC	https://paperswithcode.com/paper/uit-dangnt-clnlp-at-semeval-2017-task-9
Repo
Framework

The Projector: An Interactive Annotation Projection Visualization Tool


Title	The Projector: An Interactive Annotation Projection Visualization Tool
Authors	Alan Akbik, Rol Vollgraf,
Abstract	Previous works proposed annotation projection in parallel corpora to inexpensively generate treebanks or propbanks for new languages. In this approach, linguistic annotation is automatically transferred from a resource-rich source language (SL) to translations in a target language (TL). However, annotation projection may be adversely affected by translational divergences between specific language pairs. For this reason, previous work often required careful qualitative analysis of projectability of specific annotation in order to define strategies to address quality and coverage issues. In this demonstration, we present THE PROJECTOR, an interactive GUI designed to assist researchers in such analysis: it allows users to execute and visually inspect annotation projection in a range of different settings. We give an overview of the GUI, discuss use cases and illustrate how the tool can facilitate discussions with the research community.
Tasks
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-2008/
PDF	https://www.aclweb.org/anthology/D17-2008
PWC	https://paperswithcode.com/paper/the-projector-an-interactive-annotation
Repo
Framework

FORGe at SemEval-2017 Task 9: Deep sentence generation based on a sequence of graph transducers


Title	FORGe at SemEval-2017 Task 9: Deep sentence generation based on a sequence of graph transducers
Authors	Simon Mille, Roberto Carlini, Alicia Burga, Leo Wanner
Abstract	We present the contribution of Universitat Pompeu Fabra{'}s NLP group to the SemEval Task 9.2 (AMR-to-English Generation). The proposed generation pipeline comprises: (i) a series of rule-based graph-transducers for the syntacticization of the input graphs and the resolution of morphological agreements, and (ii) an off-the-shelf statistical linearization component.
Tasks
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2158/
PDF	https://www.aclweb.org/anthology/S17-2158
PWC	https://paperswithcode.com/paper/forge-at-semeval-2017-task-9-deep-sentence
Repo
Framework

Improving Optical Character Recognition of Finnish Historical Newspapers with a Combination of Fraktur & Antiqua Models and Image Preprocessing


Title	Improving Optical Character Recognition of Finnish Historical Newspapers with a Combination of Fraktur & Antiqua Models and Image Preprocessing
Authors	Mika Koistinen, Kimmo Kettunen, Tuula P{"a}{"a}kk{"o}nen
Abstract
Tasks	Boundary Detection, Information Retrieval, Machine Translation, Named Entity Recognition, Optical Character Recognition, Tokenization
Published	2017-05-01
URL	https://www.aclweb.org/anthology/W17-0238/
PDF	https://www.aclweb.org/anthology/W17-0238
PWC	https://paperswithcode.com/paper/improving-optical-character-recognition-of
Repo
Framework

MEANT 2.0: Accurate semantic MT evaluation for any output language


Title	MEANT 2.0: Accurate semantic MT evaluation for any output language
Authors	Chi-kiu Lo
Abstract
Tasks	Machine Translation, Semantic Role Labeling, Word Embeddings
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4767/
PDF	https://www.aclweb.org/anthology/W17-4767
PWC	https://paperswithcode.com/paper/meant-20-accurate-semantic-mt-evaluation-for
Repo
Framework

RIGOTRIO at SemEval-2017 Task 9: Combining Machine Learning and Grammar Engineering for AMR Parsing and Generation


Title	RIGOTRIO at SemEval-2017 Task 9: Combining Machine Learning and Grammar Engineering for AMR Parsing and Generation
Authors	Normunds Gruzitis, Didzis Gosko, Guntis Barzdins
Abstract	By addressing both text-to-AMR parsing and AMR-to-text generation, SemEval-2017 Task 9 established AMR as a powerful semantic interlingua. We strengthen the interlingual aspect of AMR by applying the multilingual Grammatical Framework (GF) for AMR-to-text generation. Our current rule-based GF approach completely covered only 12.3{%} of the test AMRs, therefore we combined it with state-of-the-art JAMR Generator to see if the combination increases or decreases the overall performance. The combined system achieved the automatic BLEU score of 18.82 and the human Trueskill score of 107.2, to be compared to the plain JAMR Generator results. As for AMR parsing, we added NER extensions to our SemEval-2016 general-domain AMR parser to handle the biomedical genre, rich in organic compound names, achieving Smatch F1=54.0{%}.
Tasks	Amr Parsing, Text Generation
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2159/
PDF	https://www.aclweb.org/anthology/S17-2159
PWC	https://paperswithcode.com/paper/rigotrio-at-semeval-2017-task-9-combining
Repo
Framework