July 26, 2019

2500 words 12 mins read

Paper Group NANR 3

SiTAKA at SemEval-2017 Task 4: Sentiment Analysis in Twitter Based on a Rich Set of Features. DUTH at SemEval-2017 Task 4: A Voting Classification Approach for Twitter Sentiment Analysis. YNUDLG at SemEval-2017 Task 4: A GRU-SVM Model for Sentiment Classification and Quantification in Twitter. The power of absolute discounting: all-dimensional dist …

SiTAKA at SemEval-2017 Task 4: Sentiment Analysis in Twitter Based on a Rich Set of Features


Title	SiTAKA at SemEval-2017 Task 4: Sentiment Analysis in Twitter Based on a Rich Set of Features
Authors	Mohammed Jabreel, Antonio Moreno
Abstract	This paper describes SiTAKA, our system that has been used in task 4A, English and Arabic languages, Sentiment Analysis in Twitter of SemEval2017. The system proposes the representation of tweets using a novel set of features, which include a bag of negated words and the information provided by some lexicons. The polarity of tweets is determined by a classifier based on a Support Vector Machine. Our system ranks 2nd among 8 systems in the Arabic language tweets and ranks 8th among 38 systems in the English-language tweets.
Tasks	Sentiment Analysis, Twitter Sentiment Analysis
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2115/
PDF	https://www.aclweb.org/anthology/S17-2115
PWC	https://paperswithcode.com/paper/sitaka-at-semeval-2017-task-4-sentiment
Repo
Framework

DUTH at SemEval-2017 Task 4: A Voting Classification Approach for Twitter Sentiment Analysis


Title	DUTH at SemEval-2017 Task 4: A Voting Classification Approach for Twitter Sentiment Analysis
Authors	Symeon Symeonidis, Dimitrios Effrosynidis, John Kordonis, Avi Arampatzis
Abstract	This report describes our participation to SemEval-2017 Task 4: Sentiment Analysis in Twitter, specifically in subtasks A, B, and C. The approach for text sentiment classification is based on a Majority Vote scheme and combined supervised machine learning methods with classical linguistic resources, including bag-of-words and sentiment lexicon features.
Tasks	Information Retrieval, Sentiment Analysis, Twitter Sentiment Analysis
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2117/
PDF	https://www.aclweb.org/anthology/S17-2117
PWC	https://paperswithcode.com/paper/duth-at-semeval-2017-task-4-a-voting
Repo
Framework

YNUDLG at SemEval-2017 Task 4: A GRU-SVM Model for Sentiment Classification and Quantification in Twitter


Title	YNUDLG at SemEval-2017 Task 4: A GRU-SVM Model for Sentiment Classification and Quantification in Twitter
Authors	Ming Wang, Biao Chu, Qingxun Liu, Xiaobing Zhou
Abstract	Sentiment analysis is one of the central issues in Natural Language Processing and has become more and more important in many fields. Typical sentiment analysis classifies the sentiment of sentences into several discrete classes (e.g.,positive or negative). In this paper we describe our deep learning system(combining GRU and SVM) to solve both two-, three- and five-tweet polarity classifications. We first trained a gated recurrent neural network using pre-trained word embeddings, then we extracted features from GRU layer and input these features into support vector machine to fulfill both the classification and quantification subtasks. The proposed approach achieved 37th, 19th, and 14rd places in subtasks A, B and C, respectively.
Tasks	Sentiment Analysis, Word Embeddings
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2119/
PDF	https://www.aclweb.org/anthology/S17-2119
PWC	https://paperswithcode.com/paper/ynudlg-at-semeval-2017-task-4-a-gru-svm-model
Repo
Framework

The power of absolute discounting: all-dimensional distribution estimation


Title	The power of absolute discounting: all-dimensional distribution estimation
Authors	Moein Falahatgar, Mesrob I. Ohannessian, Alon Orlitsky, Venkatadheeraj Pichapati
Abstract	Categorical models are a natural fit for many problems. When learning the distribution of categories from samples, high-dimensionality may dilute the data. Minimax optimality is too pessimistic to remedy this issue. A serendipitously discovered estimator, absolute discounting, corrects empirical frequencies by subtracting a constant from observed categories, which it then redistributes among the unobserved. It outperforms classical estimators empirically, and has been used extensively in natural language modeling. In this paper, we rigorously explain the prowess of this estimator using less pessimistic notions. We show that (1) absolute discounting recovers classical minimax KL-risk rates, (2) it is \emph{adaptive} to an effective dimension rather than the true dimension, (3) it is strongly related to the Good-Turing estimator and inherits its \emph{competitive} properties. We use power-law distributions as the cornerstone of these results. We validate the theory via synthetic data and an application to the Global Terrorism Database.
Tasks	Language Modelling
Published	2017-12-01
URL	http://papers.nips.cc/paper/7243-the-power-of-absolute-discounting-all-dimensional-distribution-estimation
PDF	http://papers.nips.cc/paper/7243-the-power-of-absolute-discounting-all-dimensional-distribution-estimation.pdf
PWC	https://paperswithcode.com/paper/the-power-of-absolute-discounting-all
Repo
Framework

O Poeta Artificial 2.0: Increasing Meaningfulness in a Poetry Generation Twitter bot


Title	O Poeta Artificial 2.0: Increasing Meaningfulness in a Poetry Generation Twitter bot
Authors	Hugo Gon{\c{c}}alo Oliveira
Abstract
Tasks	Text Generation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-3902/
PDF	https://www.aclweb.org/anthology/W17-3902
PWC	https://paperswithcode.com/paper/o-poeta-artificial-20-increasing
Repo
Framework

Identifying and Avoiding Confusion in Dialogue with People with Alzheimer’s Disease


Title	Identifying and Avoiding Confusion in Dialogue with People with Alzheimer’s Disease
Authors	Hamidreza Chinaei, Leila Chan Currie, Andrew Danks, Hubert Lin, Tejas Mehta, Frank Rudzicz
Abstract	Alzheimer{'}s disease (AD) is an increasingly prevalent cognitive disorder in which memory, language, and executive function deteriorate, usually in that order. There is a growing need to support individuals with AD and other forms of dementia in their daily lives, and our goal is to do so through speech-based interaction. Given that 33{%} of conversations with people with middle-stage AD involve a breakdown in communication, it is vital that automated dialogue systems be able to identify those breakdowns and, if possible, avoid them. In this article, we discuss several linguistic features that are verbal indicators of confusion in AD (including vocabulary richness, parse tree structures, and acoustic cues) and apply several machine learning algorithms to identify dialogue-relevant confusion from speech with up to 82{%} accuracy. We also learn dialogue strategies to avoid confusion in the first place, which is accomplished using a partially observable Markov decision process and which obtains accuracies (up to 96.1{%}) that are significantly higher than several baselines. This work represents a major step towards automated dialogue systems for individuals with dementia.
Tasks
Published	2017-06-01
URL	https://www.aclweb.org/anthology/J17-2004/
PDF	https://www.aclweb.org/anthology/J17-2004
PWC	https://paperswithcode.com/paper/identifying-and-avoiding-confusion-in
Repo
Framework

Acquisition, Representation and Usage of Conceptual Hierarchies


Title	Acquisition, Representation and Usage of Conceptual Hierarchies
Authors	Marius Pasca
Abstract	Through subsumption and instantiation, individual instances ({`}artificial intelligence{''}, {`}the spotted pig{''}) otherwise spanning a wide range of domains can be brought together and organized under conceptual hierarchies. The hierarchies connect more specific concepts ({`}computer science subfields{''}, {`}gastropubs{''}) to more general concepts ({`}academic disciplines{''}, {`}restaurants{''}) through IsA relations. Explicit or implicit properties applicable to, and defining, more general concepts are inherited by their more specific concepts, down to the instances connected to the lower parts of the hierarchies. Subsumption represents a crisp, universally-applicable principle towards consistently representing IsA relations in any knowledge resource. Yet knowledge resources often exhibit significant differences in their scope, representation choices and intended usage, to cause significant differences in their expected usage and impact on various tasks. This tutorial examines the theoretical foundations of subsumption, and its practical embodiment through IsA relations compiled manually or extracted automatically. It addresses IsA relations from their formal definition; through practical choices made in their representation within the larger and more widely-used of the available knowledge resources; to their automatic acquisition from document repositories, as opposed to their manual compilation by human contributors; to their impact in text analysis and information retrieval. As search engines move away from returning a set of links and closer to returning results that more directly answer queries, IsA relations play an increasingly important role towards a better understanding of documents and queries. The tutorial teaches the audience about definitions, assumptions and practical choices related to modeling and representing IsA relations in existing, human-compiled resources of instances, concepts and resulting conceptual hierarchies; methods for automatically extracting sets of instances within unlabeled or labeled concepts, where the concepts may be considered as a flat set or organized hierarchically; and applications of IsA relations in information retrieval.
Tasks	Information Retrieval
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-3001/
PDF	https://www.aclweb.org/anthology/D17-3001
PWC	https://paperswithcode.com/paper/acquisition-representation-and-usage-of
Repo
Framework

LSIS at SemEval-2017 Task 4: Using Adapted Sentiment Similarity Seed Words For English and Arabic Tweet Polarity Classification


Title	LSIS at SemEval-2017 Task 4: Using Adapted Sentiment Similarity Seed Words For English and Arabic Tweet Polarity Classification
Authors	Amal Htait, S{'e}bastien Fournier, Patrice Bellot
Abstract	We present, in this paper, our contribution in SemEval2017 task 4 : {`}Sentiment Analysis in Twitter{''}, subtask A: {`}Message Polarity Classification{''}, for English and Arabic languages. Our system is based on a list of sentiment seed words adapted for tweets. The sentiment relations between seed words and other terms are captured by cosine similarity between the word embedding representations (word2vec). These seed words are extracted from datasets of annotated tweets available online. Our tests, using these seed words, show significant improvement in results compared to the use of Turney and Littman{'}s (2003) seed words, on polarity classification of tweet messages.
Tasks	Semantic Textual Similarity, Sentiment Analysis
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2120/
PDF	https://www.aclweb.org/anthology/S17-2120
PWC	https://paperswithcode.com/paper/lsis-at-semeval-2017-task-4-using-adapted
Repo
Framework

Ways of Asking and Replying in Duplicate Question Detection


Title	Ways of Asking and Replying in Duplicate Question Detection
Authors	Jo{~a}o Ant{'o}nio Rodrigues, Chakaveh Saedi, Vladislav Maraev, Jo{~a}o Silva, Ant{'o}nio Branco
Abstract	This paper presents the results of systematic experimentation on the impact in duplicate question detection of different types of questions across both a number of established approaches and a novel, superior one used to address this language processing task. This study permits to gain a novel insight on the different levels of robustness of the diverse detection methods with respect to different conditions of their application, including the ones that approximate real usage scenarios.
Tasks	Machine Translation, Question Answering, Semantic Textual Similarity
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-1030/
PDF	https://www.aclweb.org/anthology/S17-1030
PWC	https://paperswithcode.com/paper/ways-of-asking-and-replying-in-duplicate
Repo
Framework

Stylometric Analysis of Parliamentary Speeches: Gender Dimension


Title	Stylometric Analysis of Parliamentary Speeches: Gender Dimension
Authors	M, Justina ravickait{.e}, Tomas Krilavi{\v{c}}ius
Abstract	Relation between gender and language has been studied by many authors, however, there is still some uncertainty left regarding gender influence on language usage in the professional environment. Often, the studied data sets are too small or texts of individual authors are too short in order to capture differences of language usage wrt gender successfully. This study draws from a larger corpus of speeches transcripts of the Lithuanian Parliament (1990-2013) to explore language differences of political debates by gender via stylometric analysis. Experimental set up consists of stylistic features that indicate lexical style and do not require external linguistic tools, namely the most frequent words, in combination with unsupervised machine learning algorithms. Results show that gender differences in the language use remain in professional environment not only in usage of function words, preferred linguistic constructions, but in the presented topics as well.
Tasks
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1416/
PDF	https://www.aclweb.org/anthology/W17-1416
PWC	https://paperswithcode.com/paper/stylometric-analysis-of-parliamentary
Repo
Framework

A Graph-based Text Similarity Measure That Employs Named Entity Information


Title	A Graph-based Text Similarity Measure That Employs Named Entity Information
Authors	Leonidas Tsekouras, Iraklis Varlamis, George Giannakopoulos
Abstract	Text comparison is an interesting though hard task, with many applications in Natural Language Processing. This work introduces a new text-similarity measure, which employs named-entities{'} information extracted from the texts and the n-gram graphs{'} model for representing documents. Using OpenCalais as a named-entity recognition service and the JINSECT toolkit for constructing and managing n-gram graphs, the text similarity measure is embedded in a text clustering algorithm (k-Means). The evaluation of the produced clusters with various clustering validity metrics shows that the extraction of named entities at a first step can be profitable for the time-performance of similarity measures that are based on the n-gram graph representation without affecting the overall performance of the NLP task.
Tasks	Named Entity Recognition, Part-Of-Speech Tagging, Text Clustering, Tokenization
Published	2017-09-01
URL	https://www.aclweb.org/anthology/R17-1098/
PDF	https://doi.org/10.26615/978-954-452-049-6_098
PWC	https://paperswithcode.com/paper/a-graph-based-text-similarity-measure-that
Repo
Framework

Tools for Building a Corpus to Study the Historical and Geographical Variation of the Romanian Language


Title	Tools for Building a Corpus to Study the Historical and Geographical Variation of the Romanian Language
Authors	Victoria Bobicev, C{\u{a}}t{\u{a}}lina M{\u{a}}r{\u{a}}nduc, Cenel Augusto Perez
Abstract	Contemporary standard language corpora are ideal for NLP. There are few morphologically and syntactically annotated corpora for Romanian, and those existing or in progress only deal with the Contemporary Romanian standard. However, the necessity to study the dynamics of natural languages gave rise to balanced corpora, containing non-standard texts. In this paper, we describe the creation of tools for processing non-standard Romanian to build a big balanced corpus. We want to preserve in annotated form as many early stages of language as possible. We have already built a corpus in Old Romanian. We also intend to include the South-Danube dialects, remote to the standard language, along with regional forms closer to the standard. We try to preserve data about endangered idioms such as Aromanian, Meglenoromanian and Istroromanian dialects, and calculate the distance between different regional variants, including the language spoken in the Republic of Moldova. This distance, as well as the mutual understanding between the speakers, is the correct criterion for the classification of idioms as different languages, or as dialects, or as regional variants close to the standard.
Tasks
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-8102/
PDF	http://doi.org/10.26615/978-954-452-046-5_002
PWC	https://paperswithcode.com/paper/tools-for-building-a-corpus-to-study-the
Repo
Framework

TopicThunder at SemEval-2017 Task 4: Sentiment Classification Using a Convolutional Neural Network with Distant Supervision


Title	TopicThunder at SemEval-2017 Task 4: Sentiment Classification Using a Convolutional Neural Network with Distant Supervision
Authors	Simon M{"u}ller, Tobias Huonder, Jan Deriu, Mark Cieliebak
Abstract	In this paper, we propose a classifier for predicting topic-specific sentiments of English Twitter messages. Our method is based on a 2-layer CNN.With a distant supervised phase we leverage a large amount of weakly-labelled training data. Our system was evaluated on the data provided by the SemEval-2017 competition in the Topic-Based Message Polarity Classification subtask, where it ranked 4th place.
Tasks	Sentiment Analysis, Word Embeddings
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2129/
PDF	https://www.aclweb.org/anthology/S17-2129
PWC	https://paperswithcode.com/paper/topicthunder-at-semeval-2017-task-4-sentiment
Repo
Framework

Fine-grained essay scoring of a complex writing task for native speakers


Title	Fine-grained essay scoring of a complex writing task for native speakers
Authors	Andrea Horbach, Dirk Scholten-Akoun, Yuning Ding, Torsten Zesch
Abstract	Automatic essay scoring is nowadays successfully used even in high-stakes tests, but this is mainly limited to holistic scoring of learner essays. We present a new dataset of essays written by highly proficient German native speakers that is scored using a fine-grained rubric with the goal to provide detailed feedback. Our experiments with two state-of-the-art scoring systems (a neural and a SVM-based one) show a large drop in performance compared to existing datasets. This demonstrates the need for such datasets that allow to guide research on more elaborate essay scoring methods.
Tasks
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-5040/
PDF	https://www.aclweb.org/anthology/W17-5040
PWC	https://paperswithcode.com/paper/fine-grained-essay-scoring-of-a-complex
Repo
Framework

YNU-HPCC at SemEval 2017 Task 4: Using A Multi-Channel CNN-LSTM Model for Sentiment Classification


Title	YNU-HPCC at SemEval 2017 Task 4: Using A Multi-Channel CNN-LSTM Model for Sentiment Classification
Authors	Haowei Zhang, Jin Wang, Jixian Zhang, Xuejie Zhang
Abstract	In this paper, we propose a multi-channel convolutional neural network-long short-term memory (CNN-LSTM) model that consists of two parts: multi-channel CNN and LSTM to analyze the sentiments of short English messages from Twitter. Un-like a conventional CNN, the proposed model applies a multi-channel strategy that uses several filters of different length to extract active local n-gram features in different scales. This information is then sequentially composed using LSTM. By combining both CNN and LSTM, we can consider both local information within tweets and long-distance dependency across tweets in the classification process. Officially released results show that our system outperforms the baseline algo-rithm.
Tasks	Sentiment Analysis, Text Classification
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2134/
PDF	https://www.aclweb.org/anthology/S17-2134
PWC	https://paperswithcode.com/paper/ynu-hpcc-at-semeval-2017-task-4-using-a-multi
Repo
Framework