July 26, 2019

2210 words 11 mins read

Paper Group NANR 4

Paper Group NANR 4

Gender Profiling for Slovene Twitter communication: the Influence of Gender Marking, Content and Style. Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin. HHU at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Data using Machine Learning Methods. INF-UFRGS at SemEva …

Gender Profiling for Slovene Twitter communication: the Influence of Gender Marking, Content and Style

Title Gender Profiling for Slovene Twitter communication: the Influence of Gender Marking, Content and Style
Authors Ben Verhoeven, Iza {\v{S}}krjanec, Senja Pollak
Abstract We present results of the first gender classification experiments on Slovene text to our knowledge. Inspired by the TwiSty corpus and experiments (Verhoeven et al., 2016), we employed the Janes corpus (Erjavec et al., 2016) and its gender annotations to perform gender classification experiments on Twitter text comparing a token-based and a lemma-based approach. We find that the token-based approach (92.6{%} accuracy), containing gender markings related to the author, outperforms the lemma-based approach by about 5{%}. Especially in the lemmatized version, we also observe stylistic and content-based differences in writing between men (e.g. more profane language, numerals and beer mentions) and women (e.g. more pronouns, emoticons and character flooding). Many of our findings corroborate previous research on other languages.
Tasks Lemmatization
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1418/
PDF https://www.aclweb.org/anthology/W17-1418
PWC https://paperswithcode.com/paper/gender-profiling-for-slovene-twitter
Repo
Framework

Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin

Title Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin
Authors G{'e}raldine Walther, Beno{^\i}t Sagot
Abstract In this paper, we present ongoing work for developing language resources and basic NLP tools for an undocumented variety of Romansh, in the context of a language documentation and language acquisition project. Our tools are meant to improve the speed and reliability of corpus annotations for noisy data involving large amounts of code-switching, occurrences of child-speech and orthographic noise. Being able to increase the efficiency of language resource development for language documentation and acquisition research also constitutes a step towards solving the data sparsity issues with which researchers have been struggling.
Tasks Language Acquisition, Spelling Correction
Published 2017-08-01
URL https://www.aclweb.org/anthology/W17-2212/
PDF https://www.aclweb.org/anthology/W17-2212
PWC https://paperswithcode.com/paper/speeding-up-corpus-development-for-linguistic
Repo
Framework

HHU at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Data using Machine Learning Methods

Title HHU at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Data using Machine Learning Methods
Authors Tobias Cabanski, Julia Romberg, Stefan Conrad
Abstract In this Paper a system for solving SemEval-2017 Task 5 is presented. This task is divided into two tracks where the sentiment of microblog messages and news headlines has to be predicted. Since two submissions were allowed, two different machine learning methods were developed to solve this task, a support vector machine approach and a recurrent neural network approach. To feed in data for these approaches, different feature extraction methods are used, mainly word representations and lexica. The best submissions for both tracks are provided by the recurrent neural network which achieves a F1-score of 0.729 in track 1 and 0.702 in track 2.
Tasks Feature Selection, Sentiment Analysis
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2141/
PDF https://www.aclweb.org/anthology/S17-2141
PWC https://paperswithcode.com/paper/hhu-at-semeval-2017-task-5-fine-grained
Repo
Framework

INF-UFRGS at SemEval-2017 Task 5: A Supervised Identification of Sentiment Score in Tweets and Headlines

Title INF-UFRGS at SemEval-2017 Task 5: A Supervised Identification of Sentiment Score in Tweets and Headlines
Authors Tiago Zini, Karin Becker, Marcelo Dias
Abstract This paper describes a supervised solution for detecting the polarity scores of tweets or headline news in the financial domain, submitted to the SemEval 2017 Fine-Grained Sentiment Analysis on Financial Microblogs and News Task. The premise is that it is possible to understand market reaction over a company stock by measuring the positive/negative sentiment contained in the financial tweets and news headlines, where polarity is measured in a continuous scale ranging from -1.0 (very bearish) to 1.0 (very bullish). Our system receives as input the textual content of tweets or news headlines, together with their ids, stock cashtag or name of target company, and the polarity score gold standard for the training dataset. Our solution retrieves features from these text instances using n-gram, hashtags, sentiment score calculated by a external APIs and others features to train a regression model capable to detect continuous score of these sentiments with precision.
Tasks Opinion Mining, Sentiment Analysis
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2142/
PDF https://www.aclweb.org/anthology/S17-2142
PWC https://paperswithcode.com/paper/inf-ufrgs-at-semeval-2017-task-5-a-supervised
Repo
Framework

Building Large Chinese Corpus for Spoken Dialogue Research in Specific Domains

Title Building Large Chinese Corpus for Spoken Dialogue Research in Specific Domains
Authors Changliang Li, Xiuying Wang
Abstract Corpus is a valuable resource for information retrieval and data-driven natural language processing systems,especially for spoken dialogue research in specific domains. However,there is little non-English corpora, particular for ones in Chinese. Spoken by the nation with the largest population in the world, Chinese become increasingly prevalent and popular among millions of people worldwide. In this paper, we build a large-scale and high-quality Chinese corpus, called CSDC (Chinese Spoken Dialogue Corpus). It contains five domains and more than 140 thousand dialogues in all. Each sentence in this corpus is annotated with slot information additionally compared to other corpora. To our best knowledge, this is the largest Chinese spoken dialogue corpus, as well as the first one with slot information. With this corpus, we proposed a method and did a well-designed experiment. The indicative result is reported at last.
Tasks Information Retrieval
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-2054/
PDF https://www.aclweb.org/anthology/I17-2054
PWC https://paperswithcode.com/paper/building-large-chinese-corpus-for-spoken
Repo
Framework

Measuring Topic Coherence through Optimal Word Buckets

Title Measuring Topic Coherence through Optimal Word Buckets
Authors Nitin Ramrakhiyani, Sachin Pawar, Swapnil Hingmire, Girish Palshikar
Abstract Measuring topic quality is essential for scoring the learned topics and their subsequent use in Information Retrieval and Text classification. To measure quality of Latent Dirichlet Allocation (LDA) based topics learned from text, we propose a novel approach based on grouping of topic words into buckets (TBuckets). A single large bucket signifies a single coherent theme, in turn indicating high topic coherence. TBuckets uses word embeddings of topic words and employs singular value decomposition (SVD) and Integer Linear Programming based optimization to create coherent word buckets. TBuckets outperforms the state-of-the-art techniques when evaluated using 3 publicly available datasets and on another one proposed in this paper.
Tasks Information Retrieval, Text Classification, Topic Models, Word Embeddings
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-2070/
PDF https://www.aclweb.org/anthology/E17-2070
PWC https://paperswithcode.com/paper/measuring-topic-coherence-through-optimal
Repo
Framework

PP Attachment: Where do We Stand?

Title PP Attachment: Where do We Stand?
Authors Dani{"e}l de Kok, Jianqiang Ma, Corina Dima, Erhard Hinrichs
Abstract Prepostitional phrase (PP) attachment is a well known challenge to parsing. In this paper, we combine the insights of different works, namely: (1) treating PP attachment as a classification task with an arbitrary number of attachment candidates; (2) using auxiliary distributions to augment the data beyond the hand-annotated training set; (3) using topological fields to get information about the distribution of PP attachment throughout clauses and (4) using state-of-the-art techniques such as word embeddings and neural networks. We show that jointly using these techniques leads to substantial improvements. We also conduct a qualitative analysis to gauge where the ceiling of the task is in a realistic setup.
Tasks Word Embeddings
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-2050/
PDF https://www.aclweb.org/anthology/E17-2050
PWC https://paperswithcode.com/paper/pp-attachment-where-do-we-stand
Repo
Framework

Exploiting Argument Information to Improve Event Detection via Supervised Attention Mechanisms

Title Exploiting Argument Information to Improve Event Detection via Supervised Attention Mechanisms
Authors Shulin Liu, Yubo Chen, Kang Liu, Jun Zhao
Abstract This paper tackles the task of event detection (ED), which involves identifying and categorizing events. We argue that arguments provide significant clues to this task, but they are either completely ignored or exploited in an indirect manner in existing detection approaches. In this work, we propose to exploit argument information explicitly for ED via supervised attention mechanisms. In specific, we systematically investigate the proposed model under the supervision of different attention strategies. Experimental results show that our approach advances state-of-the-arts and achieves the best F1 score on ACE 2005 dataset.
Tasks
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-1164/
PDF https://www.aclweb.org/anthology/P17-1164
PWC https://paperswithcode.com/paper/exploiting-argument-information-to-improve
Repo
Framework

Deep Learning in Lexical Analysis and Parsing

Title Deep Learning in Lexical Analysis and Parsing
Authors Wanxiang Che, Yue Zhang
Abstract Neural networks, also with a fancy name deep learning, just right can overcome the above {``}feature engineering{''} problem. In theory, they can use non-linear activation functions and multiple layers to automatically find useful features. The novel network structures, such as convolutional or recurrent, help to reduce the difficulty further. These deep learning models have been successfully used for lexical analysis and parsing. In this tutorial, we will give a review of each line of work, by contrasting them with traditional statistical methods, and organizing them in consistent orders. |
Tasks Dependency Parsing, Feature Engineering, Lexical Analysis, Part-Of-Speech Tagging, Structured Prediction
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-5001/
PDF https://www.aclweb.org/anthology/I17-5001
PWC https://paperswithcode.com/paper/deep-learning-in-lexical-analysis-and-parsing
Repo
Framework

UIT-DANGNT-CLNLP at SemEval-2017 Task 9: Building Scientific Concept Fixing Patterns for Improving CAMR

Title UIT-DANGNT-CLNLP at SemEval-2017 Task 9: Building Scientific Concept Fixing Patterns for Improving CAMR
Authors Khoa Nguyen, Dang Nguyen
Abstract This paper describes the improvements that we have applied on CAMR baseline parser (Wang et al., 2016) at Task 8 of SemEval-2016. Our objective is to increase the performance of CAMR when parsing sentences from scientific articles, especially articles of biology domain more accurately. To achieve this goal, we built two wrapper layers for CAMR. The first layer, which covers the input data, will normalize, add necessary information to the input sentences to make the input dependency parser and the aligner better handle reference citations, scientific figures, formulas, etc. The second layer, which covers the output data, will modify and standardize output data based on a list of scientific concept fixing patterns. This will help CAMR better handle biological concepts which are not in the training dataset. Finally, after applying our approach, CAMR has scored 0.65 F-score on the test set of Biomedical training data and 0.61 F-score on the official blind test dataset.
Tasks
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2156/
PDF https://www.aclweb.org/anthology/S17-2156
PWC https://paperswithcode.com/paper/uit-dangnt-clnlp-at-semeval-2017-task-9
Repo
Framework

The Projector: An Interactive Annotation Projection Visualization Tool

Title The Projector: An Interactive Annotation Projection Visualization Tool
Authors Alan Akbik, Rol Vollgraf,
Abstract Previous works proposed annotation projection in parallel corpora to inexpensively generate treebanks or propbanks for new languages. In this approach, linguistic annotation is automatically transferred from a resource-rich source language (SL) to translations in a target language (TL). However, annotation projection may be adversely affected by translational divergences between specific language pairs. For this reason, previous work often required careful qualitative analysis of projectability of specific annotation in order to define strategies to address quality and coverage issues. In this demonstration, we present THE PROJECTOR, an interactive GUI designed to assist researchers in such analysis: it allows users to execute and visually inspect annotation projection in a range of different settings. We give an overview of the GUI, discuss use cases and illustrate how the tool can facilitate discussions with the research community.
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-2008/
PDF https://www.aclweb.org/anthology/D17-2008
PWC https://paperswithcode.com/paper/the-projector-an-interactive-annotation
Repo
Framework

FORGe at SemEval-2017 Task 9: Deep sentence generation based on a sequence of graph transducers

Title FORGe at SemEval-2017 Task 9: Deep sentence generation based on a sequence of graph transducers
Authors Simon Mille, Roberto Carlini, Alicia Burga, Leo Wanner
Abstract We present the contribution of Universitat Pompeu Fabra{'}s NLP group to the SemEval Task 9.2 (AMR-to-English Generation). The proposed generation pipeline comprises: (i) a series of rule-based graph-transducers for the syntacticization of the input graphs and the resolution of morphological agreements, and (ii) an off-the-shelf statistical linearization component.
Tasks
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2158/
PDF https://www.aclweb.org/anthology/S17-2158
PWC https://paperswithcode.com/paper/forge-at-semeval-2017-task-9-deep-sentence
Repo
Framework

Improving Optical Character Recognition of Finnish Historical Newspapers with a Combination of Fraktur & Antiqua Models and Image Preprocessing

Title Improving Optical Character Recognition of Finnish Historical Newspapers with a Combination of Fraktur & Antiqua Models and Image Preprocessing
Authors Mika Koistinen, Kimmo Kettunen, Tuula P{"a}{"a}kk{"o}nen
Abstract
Tasks Boundary Detection, Information Retrieval, Machine Translation, Named Entity Recognition, Optical Character Recognition, Tokenization
Published 2017-05-01
URL https://www.aclweb.org/anthology/W17-0238/
PDF https://www.aclweb.org/anthology/W17-0238
PWC https://paperswithcode.com/paper/improving-optical-character-recognition-of
Repo
Framework

MEANT 2.0: Accurate semantic MT evaluation for any output language

Title MEANT 2.0: Accurate semantic MT evaluation for any output language
Authors Chi-kiu Lo
Abstract
Tasks Machine Translation, Semantic Role Labeling, Word Embeddings
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4767/
PDF https://www.aclweb.org/anthology/W17-4767
PWC https://paperswithcode.com/paper/meant-20-accurate-semantic-mt-evaluation-for
Repo
Framework

RIGOTRIO at SemEval-2017 Task 9: Combining Machine Learning and Grammar Engineering for AMR Parsing and Generation

Title RIGOTRIO at SemEval-2017 Task 9: Combining Machine Learning and Grammar Engineering for AMR Parsing and Generation
Authors Normunds Gruzitis, Didzis Gosko, Guntis Barzdins
Abstract By addressing both text-to-AMR parsing and AMR-to-text generation, SemEval-2017 Task 9 established AMR as a powerful semantic interlingua. We strengthen the interlingual aspect of AMR by applying the multilingual Grammatical Framework (GF) for AMR-to-text generation. Our current rule-based GF approach completely covered only 12.3{%} of the test AMRs, therefore we combined it with state-of-the-art JAMR Generator to see if the combination increases or decreases the overall performance. The combined system achieved the automatic BLEU score of 18.82 and the human Trueskill score of 107.2, to be compared to the plain JAMR Generator results. As for AMR parsing, we added NER extensions to our SemEval-2016 general-domain AMR parser to handle the biomedical genre, rich in organic compound names, achieving Smatch F1=54.0{%}.
Tasks Amr Parsing, Text Generation
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2159/
PDF https://www.aclweb.org/anthology/S17-2159
PWC https://paperswithcode.com/paper/rigotrio-at-semeval-2017-task-9-combining
Repo
Framework
comments powered by Disqus