July 26, 2019

2500 words 12 mins read

Paper Group NANR 3

Paper Group NANR 3

SiTAKA at SemEval-2017 Task 4: Sentiment Analysis in Twitter Based on a Rich Set of Features. DUTH at SemEval-2017 Task 4: A Voting Classification Approach for Twitter Sentiment Analysis. YNUDLG at SemEval-2017 Task 4: A GRU-SVM Model for Sentiment Classification and Quantification in Twitter. The power of absolute discounting: all-dimensional dist …

SiTAKA at SemEval-2017 Task 4: Sentiment Analysis in Twitter Based on a Rich Set of Features

Title SiTAKA at SemEval-2017 Task 4: Sentiment Analysis in Twitter Based on a Rich Set of Features
Authors Mohammed Jabreel, Antonio Moreno
Abstract This paper describes SiTAKA, our system that has been used in task 4A, English and Arabic languages, Sentiment Analysis in Twitter of SemEval2017. The system proposes the representation of tweets using a novel set of features, which include a bag of negated words and the information provided by some lexicons. The polarity of tweets is determined by a classifier based on a Support Vector Machine. Our system ranks 2nd among 8 systems in the Arabic language tweets and ranks 8th among 38 systems in the English-language tweets.
Tasks Sentiment Analysis, Twitter Sentiment Analysis
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2115/
PDF https://www.aclweb.org/anthology/S17-2115
PWC https://paperswithcode.com/paper/sitaka-at-semeval-2017-task-4-sentiment
Repo
Framework

DUTH at SemEval-2017 Task 4: A Voting Classification Approach for Twitter Sentiment Analysis

Title DUTH at SemEval-2017 Task 4: A Voting Classification Approach for Twitter Sentiment Analysis
Authors Symeon Symeonidis, Dimitrios Effrosynidis, John Kordonis, Avi Arampatzis
Abstract This report describes our participation to SemEval-2017 Task 4: Sentiment Analysis in Twitter, specifically in subtasks A, B, and C. The approach for text sentiment classification is based on a Majority Vote scheme and combined supervised machine learning methods with classical linguistic resources, including bag-of-words and sentiment lexicon features.
Tasks Information Retrieval, Sentiment Analysis, Twitter Sentiment Analysis
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2117/
PDF https://www.aclweb.org/anthology/S17-2117
PWC https://paperswithcode.com/paper/duth-at-semeval-2017-task-4-a-voting
Repo
Framework

YNUDLG at SemEval-2017 Task 4: A GRU-SVM Model for Sentiment Classification and Quantification in Twitter

Title YNUDLG at SemEval-2017 Task 4: A GRU-SVM Model for Sentiment Classification and Quantification in Twitter
Authors Ming Wang, Biao Chu, Qingxun Liu, Xiaobing Zhou
Abstract Sentiment analysis is one of the central issues in Natural Language Processing and has become more and more important in many fields. Typical sentiment analysis classifies the sentiment of sentences into several discrete classes (e.g.,positive or negative). In this paper we describe our deep learning system(combining GRU and SVM) to solve both two-, three- and five-tweet polarity classifications. We first trained a gated recurrent neural network using pre-trained word embeddings, then we extracted features from GRU layer and input these features into support vector machine to fulfill both the classification and quantification subtasks. The proposed approach achieved 37th, 19th, and 14rd places in subtasks A, B and C, respectively.
Tasks Sentiment Analysis, Word Embeddings
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2119/
PDF https://www.aclweb.org/anthology/S17-2119
PWC https://paperswithcode.com/paper/ynudlg-at-semeval-2017-task-4-a-gru-svm-model
Repo
Framework

The power of absolute discounting: all-dimensional distribution estimation

Title The power of absolute discounting: all-dimensional distribution estimation
Authors Moein Falahatgar, Mesrob I. Ohannessian, Alon Orlitsky, Venkatadheeraj Pichapati
Abstract Categorical models are a natural fit for many problems. When learning the distribution of categories from samples, high-dimensionality may dilute the data. Minimax optimality is too pessimistic to remedy this issue. A serendipitously discovered estimator, absolute discounting, corrects empirical frequencies by subtracting a constant from observed categories, which it then redistributes among the unobserved. It outperforms classical estimators empirically, and has been used extensively in natural language modeling. In this paper, we rigorously explain the prowess of this estimator using less pessimistic notions. We show that (1) absolute discounting recovers classical minimax KL-risk rates, (2) it is \emph{adaptive} to an effective dimension rather than the true dimension, (3) it is strongly related to the Good-Turing estimator and inherits its \emph{competitive} properties. We use power-law distributions as the cornerstone of these results. We validate the theory via synthetic data and an application to the Global Terrorism Database.
Tasks Language Modelling
Published 2017-12-01
URL http://papers.nips.cc/paper/7243-the-power-of-absolute-discounting-all-dimensional-distribution-estimation
PDF http://papers.nips.cc/paper/7243-the-power-of-absolute-discounting-all-dimensional-distribution-estimation.pdf
PWC https://paperswithcode.com/paper/the-power-of-absolute-discounting-all
Repo
Framework

O Poeta Artificial 2.0: Increasing Meaningfulness in a Poetry Generation Twitter bot

Title O Poeta Artificial 2.0: Increasing Meaningfulness in a Poetry Generation Twitter bot
Authors Hugo Gon{\c{c}}alo Oliveira
Abstract
Tasks Text Generation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-3902/
PDF https://www.aclweb.org/anthology/W17-3902
PWC https://paperswithcode.com/paper/o-poeta-artificial-20-increasing
Repo
Framework

Identifying and Avoiding Confusion in Dialogue with People with Alzheimer’s Disease

Title Identifying and Avoiding Confusion in Dialogue with People with Alzheimer’s Disease
Authors Hamidreza Chinaei, Leila Chan Currie, Andrew Danks, Hubert Lin, Tejas Mehta, Frank Rudzicz
Abstract Alzheimer{'}s disease (AD) is an increasingly prevalent cognitive disorder in which memory, language, and executive function deteriorate, usually in that order. There is a growing need to support individuals with AD and other forms of dementia in their daily lives, and our goal is to do so through speech-based interaction. Given that 33{%} of conversations with people with middle-stage AD involve a breakdown in communication, it is vital that automated dialogue systems be able to identify those breakdowns and, if possible, avoid them. In this article, we discuss several linguistic features that are verbal indicators of confusion in AD (including vocabulary richness, parse tree structures, and acoustic cues) and apply several machine learning algorithms to identify dialogue-relevant confusion from speech with up to 82{%} accuracy. We also learn dialogue strategies to avoid confusion in the first place, which is accomplished using a partially observable Markov decision process and which obtains accuracies (up to 96.1{%}) that are significantly higher than several baselines. This work represents a major step towards automated dialogue systems for individuals with dementia.
Tasks
Published 2017-06-01
URL https://www.aclweb.org/anthology/J17-2004/
PDF https://www.aclweb.org/anthology/J17-2004
PWC https://paperswithcode.com/paper/identifying-and-avoiding-confusion-in
Repo
Framework

Acquisition, Representation and Usage of Conceptual Hierarchies

Title Acquisition, Representation and Usage of Conceptual Hierarchies
Authors Marius Pasca
Abstract Through subsumption and instantiation, individual instances ({}artificial intelligence{''}, {}the spotted pig{''}) otherwise spanning a wide range of domains can be brought together and organized under conceptual hierarchies. The hierarchies connect more specific concepts ({}computer science subfields{''}, {}gastropubs{''}) to more general concepts ({}academic disciplines{''}, {}restaurants{''}) through IsA relations. Explicit or implicit properties applicable to, and defining, more general concepts are inherited by their more specific concepts, down to the instances connected to the lower parts of the hierarchies. Subsumption represents a crisp, universally-applicable principle towards consistently representing IsA relations in any knowledge resource. Yet knowledge resources often exhibit significant differences in their scope, representation choices and intended usage, to cause significant differences in their expected usage and impact on various tasks. This tutorial examines the theoretical foundations of subsumption, and its practical embodiment through IsA relations compiled manually or extracted automatically. It addresses IsA relations from their formal definition; through practical choices made in their representation within the larger and more widely-used of the available knowledge resources; to their automatic acquisition from document repositories, as opposed to their manual compilation by human contributors; to their impact in text analysis and information retrieval. As search engines move away from returning a set of links and closer to returning results that more directly answer queries, IsA relations play an increasingly important role towards a better understanding of documents and queries. The tutorial teaches the audience about definitions, assumptions and practical choices related to modeling and representing IsA relations in existing, human-compiled resources of instances, concepts and resulting conceptual hierarchies; methods for automatically extracting sets of instances within unlabeled or labeled concepts, where the concepts may be considered as a flat set or organized hierarchically; and applications of IsA relations in information retrieval.
Tasks Information Retrieval
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-3001/
PDF https://www.aclweb.org/anthology/D17-3001
PWC https://paperswithcode.com/paper/acquisition-representation-and-usage-of
Repo
Framework

LSIS at SemEval-2017 Task 4: Using Adapted Sentiment Similarity Seed Words For English and Arabic Tweet Polarity Classification

Title LSIS at SemEval-2017 Task 4: Using Adapted Sentiment Similarity Seed Words For English and Arabic Tweet Polarity Classification
Authors Amal Htait, S{'e}bastien Fournier, Patrice Bellot
Abstract We present, in this paper, our contribution in SemEval2017 task 4 : {}Sentiment Analysis in Twitter{''}, subtask A: {}Message Polarity Classification{''}, for English and Arabic languages. Our system is based on a list of sentiment seed words adapted for tweets. The sentiment relations between seed words and other terms are captured by cosine similarity between the word embedding representations (word2vec). These seed words are extracted from datasets of annotated tweets available online. Our tests, using these seed words, show significant improvement in results compared to the use of Turney and Littman{'}s (2003) seed words, on polarity classification of tweet messages.
Tasks Semantic Textual Similarity, Sentiment Analysis
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2120/
PDF https://www.aclweb.org/anthology/S17-2120
PWC https://paperswithcode.com/paper/lsis-at-semeval-2017-task-4-using-adapted
Repo
Framework

Ways of Asking and Replying in Duplicate Question Detection

Title Ways of Asking and Replying in Duplicate Question Detection
Authors Jo{~a}o Ant{'o}nio Rodrigues, Chakaveh Saedi, Vladislav Maraev, Jo{~a}o Silva, Ant{'o}nio Branco
Abstract This paper presents the results of systematic experimentation on the impact in duplicate question detection of different types of questions across both a number of established approaches and a novel, superior one used to address this language processing task. This study permits to gain a novel insight on the different levels of robustness of the diverse detection methods with respect to different conditions of their application, including the ones that approximate real usage scenarios.
Tasks Machine Translation, Question Answering, Semantic Textual Similarity
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-1030/
PDF https://www.aclweb.org/anthology/S17-1030
PWC https://paperswithcode.com/paper/ways-of-asking-and-replying-in-duplicate
Repo
Framework

Stylometric Analysis of Parliamentary Speeches: Gender Dimension

Title Stylometric Analysis of Parliamentary Speeches: Gender Dimension
Authors M, Justina ravickait{.e}, Tomas Krilavi{\v{c}}ius
Abstract Relation between gender and language has been studied by many authors, however, there is still some uncertainty left regarding gender influence on language usage in the professional environment. Often, the studied data sets are too small or texts of individual authors are too short in order to capture differences of language usage wrt gender successfully. This study draws from a larger corpus of speeches transcripts of the Lithuanian Parliament (1990-2013) to explore language differences of political debates by gender via stylometric analysis. Experimental set up consists of stylistic features that indicate lexical style and do not require external linguistic tools, namely the most frequent words, in combination with unsupervised machine learning algorithms. Results show that gender differences in the language use remain in professional environment not only in usage of function words, preferred linguistic constructions, but in the presented topics as well.
Tasks
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1416/
PDF https://www.aclweb.org/anthology/W17-1416
PWC https://paperswithcode.com/paper/stylometric-analysis-of-parliamentary
Repo
Framework

A Graph-based Text Similarity Measure That Employs Named Entity Information

Title A Graph-based Text Similarity Measure That Employs Named Entity Information
Authors Leonidas Tsekouras, Iraklis Varlamis, George Giannakopoulos
Abstract Text comparison is an interesting though hard task, with many applications in Natural Language Processing. This work introduces a new text-similarity measure, which employs named-entities{'} information extracted from the texts and the n-gram graphs{'} model for representing documents. Using OpenCalais as a named-entity recognition service and the JINSECT toolkit for constructing and managing n-gram graphs, the text similarity measure is embedded in a text clustering algorithm (k-Means). The evaluation of the produced clusters with various clustering validity metrics shows that the extraction of named entities at a first step can be profitable for the time-performance of similarity measures that are based on the n-gram graph representation without affecting the overall performance of the NLP task.
Tasks Named Entity Recognition, Part-Of-Speech Tagging, Text Clustering, Tokenization
Published 2017-09-01
URL https://www.aclweb.org/anthology/R17-1098/
PDF https://doi.org/10.26615/978-954-452-049-6_098
PWC https://paperswithcode.com/paper/a-graph-based-text-similarity-measure-that
Repo
Framework

Tools for Building a Corpus to Study the Historical and Geographical Variation of the Romanian Language

Title Tools for Building a Corpus to Study the Historical and Geographical Variation of the Romanian Language
Authors Victoria Bobicev, C{\u{a}}t{\u{a}}lina M{\u{a}}r{\u{a}}nduc, Cenel Augusto Perez
Abstract Contemporary standard language corpora are ideal for NLP. There are few morphologically and syntactically annotated corpora for Romanian, and those existing or in progress only deal with the Contemporary Romanian standard. However, the necessity to study the dynamics of natural languages gave rise to balanced corpora, containing non-standard texts. In this paper, we describe the creation of tools for processing non-standard Romanian to build a big balanced corpus. We want to preserve in annotated form as many early stages of language as possible. We have already built a corpus in Old Romanian. We also intend to include the South-Danube dialects, remote to the standard language, along with regional forms closer to the standard. We try to preserve data about endangered idioms such as Aromanian, Meglenoromanian and Istroromanian dialects, and calculate the distance between different regional variants, including the language spoken in the Republic of Moldova. This distance, as well as the mutual understanding between the speakers, is the correct criterion for the classification of idioms as different languages, or as dialects, or as regional variants close to the standard.
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-8102/
PDF http://doi.org/10.26615/978-954-452-046-5_002
PWC https://paperswithcode.com/paper/tools-for-building-a-corpus-to-study-the
Repo
Framework

TopicThunder at SemEval-2017 Task 4: Sentiment Classification Using a Convolutional Neural Network with Distant Supervision

Title TopicThunder at SemEval-2017 Task 4: Sentiment Classification Using a Convolutional Neural Network with Distant Supervision
Authors Simon M{"u}ller, Tobias Huonder, Jan Deriu, Mark Cieliebak
Abstract In this paper, we propose a classifier for predicting topic-specific sentiments of English Twitter messages. Our method is based on a 2-layer CNN.With a distant supervised phase we leverage a large amount of weakly-labelled training data. Our system was evaluated on the data provided by the SemEval-2017 competition in the Topic-Based Message Polarity Classification subtask, where it ranked 4th place.
Tasks Sentiment Analysis, Word Embeddings
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2129/
PDF https://www.aclweb.org/anthology/S17-2129
PWC https://paperswithcode.com/paper/topicthunder-at-semeval-2017-task-4-sentiment
Repo
Framework

Fine-grained essay scoring of a complex writing task for native speakers

Title Fine-grained essay scoring of a complex writing task for native speakers
Authors Andrea Horbach, Dirk Scholten-Akoun, Yuning Ding, Torsten Zesch
Abstract Automatic essay scoring is nowadays successfully used even in high-stakes tests, but this is mainly limited to holistic scoring of learner essays. We present a new dataset of essays written by highly proficient German native speakers that is scored using a fine-grained rubric with the goal to provide detailed feedback. Our experiments with two state-of-the-art scoring systems (a neural and a SVM-based one) show a large drop in performance compared to existing datasets. This demonstrates the need for such datasets that allow to guide research on more elaborate essay scoring methods.
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-5040/
PDF https://www.aclweb.org/anthology/W17-5040
PWC https://paperswithcode.com/paper/fine-grained-essay-scoring-of-a-complex
Repo
Framework

YNU-HPCC at SemEval 2017 Task 4: Using A Multi-Channel CNN-LSTM Model for Sentiment Classification

Title YNU-HPCC at SemEval 2017 Task 4: Using A Multi-Channel CNN-LSTM Model for Sentiment Classification
Authors Haowei Zhang, Jin Wang, Jixian Zhang, Xuejie Zhang
Abstract In this paper, we propose a multi-channel convolutional neural network-long short-term memory (CNN-LSTM) model that consists of two parts: multi-channel CNN and LSTM to analyze the sentiments of short English messages from Twitter. Un-like a conventional CNN, the proposed model applies a multi-channel strategy that uses several filters of different length to extract active local n-gram features in different scales. This information is then sequentially composed using LSTM. By combining both CNN and LSTM, we can consider both local information within tweets and long-distance dependency across tweets in the classification process. Officially released results show that our system outperforms the baseline algo-rithm.
Tasks Sentiment Analysis, Text Classification
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2134/
PDF https://www.aclweb.org/anthology/S17-2134
PWC https://paperswithcode.com/paper/ynu-hpcc-at-semeval-2017-task-4-using-a-multi
Repo
Framework
comments powered by Disqus