July 26, 2019

2412 words 12 mins read

Paper Group NANR 8

A Shallow Neural Network for Native Language Identification with Character N-grams. Fewer features perform well at Native Language Identification task. Domain Adaptation from User-level Facebook Models to County-level Twitter Predictions. The Power of Character N-grams in Native Language Identification. Addressing Problems across Linguistic Levels …

A Shallow Neural Network for Native Language Identification with Character N-grams


Title	A Shallow Neural Network for Native Language Identification with Character N-grams
Authors	Yunita Sari, Muhammad Rifqi Fatchurrahman, Meisyarah Dwiastuti
Abstract	This paper describes the systems submitted by GadjahMada team to the Native Language Identification (NLI) Shared Task 2017. Our models used a continuous representation of character n-grams which are learned jointly with feed-forward neural network classifier. Character n-grams have been proved to be effective for style-based identification tasks including NLI. Results on the test set demonstrate that the proposed model performs very well on essay and fusion tracks by obtaining more than 0.8 on both F-macro score and accuracy.
Tasks	Language Identification, Native Language Identification
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-5027/
PDF	https://www.aclweb.org/anthology/W17-5027
PWC	https://paperswithcode.com/paper/a-shallow-neural-network-for-native-language
Repo
Framework

Fewer features perform well at Native Language Identification task


Title	Fewer features perform well at Native Language Identification task
Authors	Taraka Rama, {\c{C}}a{\u{g}}r{\i} {\c{C}}{"o}ltekin
Abstract	This paper describes our results at the NLI shared task 2017. We participated in essays, speech, and fusion task that uses text, speech, and i-vectors for the task of identifying the native language of the given input. In the essay track, a linear SVM system using word bigrams and character 7-grams performed the best. In the speech track, an LDA classifier based only on i-vectors performed better than a combination system using text features from speech transcriptions and i-vectors. In the fusion task, we experimented with systems that used combination of i-vectors with higher order n-grams features, combination of i-vectors with word unigrams, a mean probability ensemble, and a stacked ensemble system. Our finding is that word unigrams in combination with i-vectors achieve higher score than systems trained with larger number of $n$-gram features. Our best-performing systems achieved F1-scores of 87.16{%}, 83.33{%} and 91.75{%} on the essay track, the speech track and the fusion track respectively.
Tasks	Language Identification, Native Language Identification
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-5028/
PDF	https://www.aclweb.org/anthology/W17-5028
PWC	https://paperswithcode.com/paper/fewer-features-perform-well-at-native
Repo
Framework

Domain Adaptation from User-level Facebook Models to County-level Twitter Predictions


Title	Domain Adaptation from User-level Facebook Models to County-level Twitter Predictions
Authors	Daniel Rieman, Kokil Jaidka, H. Andrew Schwartz, Lyle Ungar
Abstract	Several studies have demonstrated how language models of user attributes, such as personality, can be built by using the Facebook language of social media users in conjunction with their responses to psychology questionnaires. It is challenging to apply these models to make general predictions about attributes of communities, such as personality distributions across US counties, because it requires 1. the potentially inavailability of the original training data because of privacy and ethical regulations, 2. adapting Facebook language models to Twitter language without retraining the model, and 3. adapting from users to county-level collections of tweets. We propose a two-step algorithm, Target Side Domain Adaptation (TSDA) for such domain adaptation when no labeled Twitter/county data is available. TSDA corrects for the different word distributions between Facebook and Twitter and for the varying word distributions across counties by adjusting target side word frequencies; no changes to the trained model are made. In the case of predicting the Big Five county-level personality traits, TSDA outperforms a state-of-the-art domain adaptation method, gives county-level predictions that have fewer extreme outliers, higher year-to-year stability, and higher correlation with county-level outcomes.
Tasks	Domain Adaptation
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-1077/
PDF	https://www.aclweb.org/anthology/I17-1077
PWC	https://paperswithcode.com/paper/domain-adaptation-from-user-level-facebook
Repo
Framework

The Power of Character N-grams in Native Language Identification


Title	The Power of Character N-grams in Native Language Identification
Authors	Artur Kulmizev, Bo Blankers, Johannes Bjerva, Malvina Nissim, Gertjan van Noord, Barbara Plank, Martijn Wieling
Abstract	In this paper, we explore the performance of a linear SVM trained on language independent character features for the NLI Shared Task 2017. Our basic system (GRONINGEN) achieves the best performance (87.56 F1-score) on the evaluation set using only 1-9 character n-grams as features. We compare this against several ensemble and meta-classifiers in order to examine how the linear system fares when combined with other, especially non-linear classifiers. Special emphasis is placed on the topic bias that exists by virtue of the assessment essay prompt distribution.
Tasks	Language Identification, Native Language Identification, Text Classification
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-5043/
PDF	https://www.aclweb.org/anthology/W17-5043
PWC	https://paperswithcode.com/paper/the-power-of-character-n-grams-in-native
Repo
Framework

Addressing Problems across Linguistic Levels in SMT: Combining Approaches to Model Morphology, Syntax and Lexical Choice


Title	Addressing Problems across Linguistic Levels in SMT: Combining Approaches to Model Morphology, Syntax and Lexical Choice
Authors	Marion Weller-Di Marco, Alex Fraser, er, Sabine Schulte im Walde
Abstract	Many errors in phrase-based SMT can be attributed to problems on three linguistic levels: morphological complexity in the target language, structural differences and lexical choice. We explore combinations of linguistically motivated approaches to address these problems in English-to-German SMT and show that they are complementary to one another, but also that the popular verbal pre-ordering can cause problems on the morphological and lexical level. A discriminative classifier can overcome these problems, in particular when enriching standard lexical features with features geared towards verbal inflection.
Tasks	Word Alignment, Word Sense Disambiguation
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-2099/
PDF	https://www.aclweb.org/anthology/E17-2099
PWC	https://paperswithcode.com/paper/addressing-problems-across-linguistic-levels
Repo
Framework

Unsupervised Detection of Argumentative Units though Topic Modeling Techniques


Title	Unsupervised Detection of Argumentative Units though Topic Modeling Techniques
Authors	Alfio Ferrara, Stefano Montanelli, Georgios Petasis
Abstract	In this paper we present a new unsupervised approach, {``}Attraction to Topics{''} {–} A2T , for the detection of argumentative units, a sub-task of argument mining. Motivated by the importance of topic identification in manual annotation, we examine whether topic modeling can be used for performing unsupervised detection of argumentative sentences, and to what extend topic modeling can be used to classify sentences as claims and premises. Preliminary evaluation results suggest that topic information can be successfully used for the detection of argumentative sentences, at least for corpora used for evaluation. Our approach has been evaluated on two English corpora, the first of which contains 90 persuasive essays, while the second is a collection of 340 documents from user generated content. \|
Tasks	Argument Mining, Opinion Mining, Stance Detection
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-5113/
PDF	https://www.aclweb.org/anthology/W17-5113
PWC	https://paperswithcode.com/paper/unsupervised-detection-of-argumentative-units
Repo
Framework

Show Me Your Variance and I Tell You Who You Are - Deriving Compound Compositionality from Word Alignments


Title	Show Me Your Variance and I Tell You Who You Are - Deriving Compound Compositionality from Word Alignments
Authors	Fabienne Cap
Abstract	We use word alignment variance as an indicator for the non-compositionality of German and English noun compounds. Our work-in-progress results are on their own not competitive with state-of-the art approaches, but they show that alignment variance is correlated with compositionality and thus worth a closer look in the future.
Tasks	Word Alignment
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1713/
PDF	https://www.aclweb.org/anthology/W17-1713
PWC	https://paperswithcode.com/paper/show-me-your-variance-and-i-tell-you-who-you
Repo
Framework

Not All Segments are Created Equal: Syntactically Motivated Sentiment Analysis in Lexical Space


Title	Not All Segments are Created Equal: Syntactically Motivated Sentiment Analysis in Lexical Space
Authors	Muhammad Abdul-Mageed
Abstract	Although there is by now a considerable amount of research on subjectivity and sentiment analysis on morphologically-rich languages, it is still unclear how lexical information can best be modeled in these languages. To bridge this gap, we build effective models exploiting exclusively gold- and machine-segmented lexical input and successfully employ syntactically motivated feature selection to improve classification. Our best models achieve significantly above the baselines, with 67.93{%} and 69.37{%} accuracies for subjectivity and sentiment classification respectively.
Tasks	Feature Selection, Sentiment Analysis
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1318/
PDF	https://www.aclweb.org/anthology/W17-1318
PWC	https://paperswithcode.com/paper/not-all-segments-are-created-equal
Repo
Framework

Reviewers for Volume 43


Title	Reviewers for Volume 43
Authors
Abstract
Tasks
Published	2017-12-01
URL	https://www.aclweb.org/anthology/J17-4008/
PDF	https://www.aclweb.org/anthology/J17-4008
PWC	https://paperswithcode.com/paper/reviewers-for-volume-43
Repo
Framework

IBA-Sys at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News


Title	IBA-Sys at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News
Authors	Zarmeen Nasim
Abstract	This paper presents the details of our system IBA-Sys that participated in SemEval Task: Fine-grained sentiment analysis on Financial Microblogs and News. Our system participated in both tracks. For microblogs track, a supervised learning approach was adopted and the regressor was trained using XgBoost regression algorithm on lexicon features. For news headlines track, an ensemble of regressors was used to predict sentiment score. One regressor was trained using TF-IDF features and another was trained using the n-gram features. The source code is available at Github.
Tasks	Sentiment Analysis
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2140/
PDF	https://www.aclweb.org/anthology/S17-2140
PWC	https://paperswithcode.com/paper/iba-sys-at-semeval-2017-task-5-fine-grained
Repo
Framework

Data Augmentation for Visual Question Answering


Title	Data Augmentation for Visual Question Answering
Authors	Kushal Kafle, Mohammed Yousefhussien, Christopher Kanan
Abstract	Data augmentation is widely used to train deep neural networks for image classification tasks. Simply flipping images can help learning tremendously by increasing the number of training images by a factor of two. However, little work has been done studying data augmentation in natural language processing. Here, we describe two methods for data augmentation for Visual Question Answering (VQA). The first uses existing semantic annotations to generate new questions. The second method is a generative approach using recurrent neural networks. Experiments show that the proposed data augmentation improves performance of both baseline and state-of-the-art VQA algorithms.
Tasks	Data Augmentation, Image Classification, Question Answering, Text Generation, Visual Question Answering
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-3529/
PDF	https://www.aclweb.org/anthology/W17-3529
PWC	https://paperswithcode.com/paper/data-augmentation-for-visual-question
Repo
Framework

Computational Sarcasm


Title	Computational Sarcasm
Authors	Pushpak Bhattacharyya, Aditya Joshi
Abstract	Sarcasm is a form of verbal irony that is intended to express contempt or ridicule. Motivated by challenges posed by sarcastic text to sentiment analysis, computational approaches to sarcasm have witnessed a growing interest at NLP forums in the past decade. Computational sarcasm refers to automatic approaches pertaining to sarcasm. The tutorial will provide a bird{'}s-eye view of the research in computational sarcasm for text, while focusing on significant milestones.The tutorial begins with linguistic theories of sarcasm, with a focus on incongruity: a useful notion that underlies sarcasm and other forms of figurative language. Since the most significant work in computational sarcasm is sarcasm detection: predicting whether a given piece of text is sarcastic or not, sarcasm detection forms the focus hereafter. We begin our discussion on sarcasm detection with datasets, touching on strategies, challenges and nature of datasets. Then, we describe algorithms for sarcasm detection: rule-based (where a specific evidence of sarcasm is utilised as a rule), statistical classifier-based (where features are designed for a statistical classifier), a topic model-based technique, and deep learning-based algorithms for sarcasm detection. In case of each of these algorithms, we refer to our work on sarcasm detection and share our learnings. Since information beyond the text to be classified, contextual information is useful for sarcasm detection, we then describe approaches that use such information through conversational context or author-specific context.We then follow it by novel areas in computational sarcasm such as sarcasm generation, sarcasm v/s irony classification, etc. We then summarise the tutorial and describe future directions based on errors reported in past work. The tutorial will end with a demonstration of our work on sarcasm detection.This tutorial will be of interest to researchers investigating computational sarcasm and related areas such as computational humour, figurative language understanding, emotion and sentiment sentiment analysis, etc. The tutorial is motivated by our continually evolving survey paper of sarcasm detection, that is available on arXiv at: Joshi, Aditya, Pushpak Bhattacharyya, and Mark James Carman. {``}Automatic Sarcasm Detection: A Survey.{''} arXiv preprint arXiv:1602.03426 (2016). \|
Tasks	Sarcasm Detection, Sentiment Analysis
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-3002/
PDF	https://www.aclweb.org/anthology/D17-3002
PWC	https://paperswithcode.com/paper/computational-sarcasm
Repo
Framework

CrystalNest at SemEval-2017 Task 4: Using Sarcasm Detection for Enhancing Sentiment Classification and Quantification


Title	CrystalNest at SemEval-2017 Task 4: Using Sarcasm Detection for Enhancing Sentiment Classification and Quantification
Authors	Raj Kumar Gupta, Yinping Yang
Abstract	This paper describes a system developed for a shared sentiment analysis task and its subtasks organized by SemEval-2017. A key feature of our system is the embedded ability to detect sarcasm in order to enhance the performance of sentiment classification. We first constructed an affect-cognition-sociolinguistics sarcasm features model and trained a SVM-based classifier for detecting sarcastic expressions from general tweets. For sentiment prediction, we developed CrystalNest{–} a two-level cascade classification system using features combining sarcasm score derived from our sarcasm classifier, sentiment scores from Alchemy, NRC lexicon, n-grams, word embedding vectors, and part-of-speech features. We found that the sarcasm detection derived features consistently benefited key sentiment analysis evaluation metrics, in different degrees, across four subtasks A-D.
Tasks	Opinion Mining, Sarcasm Detection, Sentiment Analysis
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2103/
PDF	https://www.aclweb.org/anthology/S17-2103
PWC	https://paperswithcode.com/paper/crystalnest-at-semeval-2017-task-4-using
Repo
Framework

Learning Cognitive Features from Gaze Data for Sentiment and Sarcasm Classification using Convolutional Neural Network


Title	Learning Cognitive Features from Gaze Data for Sentiment and Sarcasm Classification using Convolutional Neural Network
Authors	Abhijit Mishra, Kuntal Dey, Pushpak Bhattacharyya
Abstract	Cognitive NLP systems- i.e., NLP systems that make use of behavioral data - augment traditional text-based features with cognitive features extracted from eye-movement patterns, EEG signals, brain-imaging etc. Such extraction of features is typically manual. We contend that manual extraction of features may not be the best way to tackle text subtleties that characteristically prevail in complex classification tasks like Sentiment Analysis and Sarcasm Detection, and that even the extraction and choice of features should be delegated to the learning system. We introduce a framework to automatically extract cognitive features from the eye-movement/gaze data of human readers reading the text and use them as features along with textual features for the tasks of sentiment polarity and sarcasm detection. Our proposed framework is based on Convolutional Neural Network (CNN). The CNN learns features from both gaze and text and uses them to classify the input text. We test our technique on published sentiment and sarcasm labeled datasets, enriched with gaze information, to show that using a combination of automatically learned text and gaze features often yields better classification performance over (i) CNN based systems that rely on text input alone and (ii) existing systems that rely on handcrafted gaze and textual features.
Tasks	EEG, Sarcasm Detection, Sentiment Analysis, Text Classification
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-1035/
PDF	https://www.aclweb.org/anthology/P17-1035
PWC	https://paperswithcode.com/paper/learning-cognitive-features-from-gaze-data
Repo
Framework

Os Prov'erbios em manuais de ensino de Portugu^es L'\ingua N~ao Materna (The Proverbs of teaching manuals in Non-Native Portuguese)[In Portuguese]


Title	Os Prov'erbios em manuais de ensino de Portugu^es L'\ingua N~ao Materna (The Proverbs of teaching manuals in Non-Native Portuguese)[In Portuguese]
Authors	S{'o}nia Reis, Jorge Baptista
Abstract
Tasks
Published	2017-10-01
URL	https://www.aclweb.org/anthology/W17-6629/
PDF	https://www.aclweb.org/anthology/W17-6629
PWC	https://paperswithcode.com/paper/os-provarbios-em-manuais-de-ensino-de
Repo
Framework