July 26, 2019

1972 words 10 mins read

Paper Group NANR 156

Debunking Sentiment Lexicons: A Case of Domain-Specific Sentiment Classification for Croatian. Comparison of Short-Text Sentiment Analysis Methods for Croatian. Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories. Modelling metaphor with attribute-based semantics. Single and Cross-domain Polarity Classification using …

Debunking Sentiment Lexicons: A Case of Domain-Specific Sentiment Classification for Croatian


Title	Debunking Sentiment Lexicons: A Case of Domain-Specific Sentiment Classification for Croatian
Authors	Paula Gombar, Zoran Medi{'c}, Domagoj Alagi{'c}, Jan {\v{S}}najder
Abstract	Sentiment lexicons are widely used as an intuitive and inexpensive way of tackling sentiment classification, often within a simple lexicon word-counting approach or as part of a supervised model. However, it is an open question whether these approaches can compete with supervised models that use only word-representation features. We address this question in the context of domain-specific sentiment classification for Croatian. We experiment with the graph-based acquisition of sentiment lexicons, analyze their quality, and investigate how effectively they can be used in sentiment classification. Our results indicate that, even with as few as 500 labeled instances, a supervised model substantially outperforms a word-counting model. We also observe that adding lexicon-based features does not significantly improve supervised sentiment classification.
Tasks	Sentiment Analysis, Stock Price Prediction
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1409/
PDF	https://www.aclweb.org/anthology/W17-1409
PWC	https://paperswithcode.com/paper/debunking-sentiment-lexicons-a-case-of-domain
Repo
Framework

Comparison of Short-Text Sentiment Analysis Methods for Croatian


Title	Comparison of Short-Text Sentiment Analysis Methods for Croatian
Authors	Leon Rotim, Jan {\v{S}}najder
Abstract	We focus on the task of supervised sentiment classification of short and informal texts in Croatian, using two simple yet effective methods: word embeddings and string kernels. We investigate whether word embeddings offer any advantage over corpus- and preprocessing-free string kernels, and how these compare to bag-of-words baselines. We conduct a comparison on three different datasets, using different preprocessing methods and kernel functions. Results show that, on two out of three datasets, word embeddings outperform string kernels, which in turn outperform word and n-gram bag-of-words baselines.
Tasks	Sentiment Analysis, Stock Price Prediction, Text Classification, Word Embeddings
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1411/
PDF	https://www.aclweb.org/anthology/W17-1411
PWC	https://paperswithcode.com/paper/comparison-of-short-text-sentiment-analysis
Repo
Framework

Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories


Title	Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories
Authors
Abstract
Tasks
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-7600/
PDF	https://www.aclweb.org/anthology/W17-7600
PWC	https://paperswithcode.com/paper/proceedings-of-the-16th-international
Repo
Framework

Modelling metaphor with attribute-based semantics


Title	Modelling metaphor with attribute-based semantics
Authors	Luana Bulat, Stephen Clark, Ekaterina Shutova
Abstract	One of the key problems in computational metaphor modelling is finding the optimal level of abstraction of semantic representations, such that these are able to capture and generalise metaphorical mechanisms. In this paper we present the first metaphor identification method that uses representations constructed from property norms. Such norms have been previously shown to provide a cognitively plausible representation of concepts in terms of semantic properties. Our results demonstrate that such property-based semantic representations provide a suitable model of cross-domain knowledge projection in metaphors, outperforming standard distributional models on a metaphor identification task.
Tasks	Machine Translation, Natural Language Inference, Word Embeddings
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-2084/
PDF	https://www.aclweb.org/anthology/E17-2084
PWC	https://paperswithcode.com/paper/modelling-metaphor-with-attribute-based
Repo
Framework

Single and Cross-domain Polarity Classification using String Kernels


Title	Single and Cross-domain Polarity Classification using String Kernels
Authors	Rosa M. Gim{'e}nez-P{'e}rez, Marc Franco-Salvador, Paolo Rosso
Abstract	The polarity classification task aims at automatically identifying whether a subjective text is positive or negative. When the target domain is different from those where a model was trained, we refer to a cross-domain setting. That setting usually implies the use of a domain adaptation method. In this work, we study the single and cross-domain polarity classification tasks from the string kernels perspective. Contrary to classical domain adaptation methods, which employ texts from both domains to detect pivot features, we do not use the target domain for training. Our approach detects the lexical peculiarities that characterise the text polarity and maps them into a domain independent space by means of kernel discriminant analysis. Experimental results show state-of-the-art performance in single and cross-domain polarity classification.
Tasks	Domain Adaptation, Text Classification
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-2089/
PDF	https://www.aclweb.org/anthology/E17-2089
PWC	https://paperswithcode.com/paper/single-and-cross-domain-polarity
Repo
Framework

UWat-Emote at EmoInt-2017: Emotion Intensity Detection using Affect Clues, Sentiment Polarity and Word Embeddings


Title	UWat-Emote at EmoInt-2017: Emotion Intensity Detection using Affect Clues, Sentiment Polarity and Word Embeddings
Authors	Vineet John, Olga Vechtomova
Abstract	This paper describes the UWaterloo affect prediction system developed for EmoInt-2017. We delve into our feature selection approach for affect intensity, affect presence, sentiment intensity and sentiment presence lexica alongside pre-trained word embeddings, which are utilized to extract emotion intensity signals from tweets in an ensemble learning approach. The system employs emotion specific model training, and utilizes distinct models for each of the emotion corpora in isolation. Our system utilizes gradient boosted regression as the primary learning technique to predict the final emotion intensities.
Tasks	Emotion Classification, Feature Selection, Word Embeddings
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-5235/
PDF	https://www.aclweb.org/anthology/W17-5235
PWC	https://paperswithcode.com/paper/uwat-emote-at-emoint-2017-emotion-intensity
Repo
Framework

Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)


Title	Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Authors
Abstract
Tasks
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-1000/
PDF	https://www.aclweb.org/anthology/I17-1000
PWC	https://paperswithcode.com/paper/proceedings-of-the-eighth-international-joint
Repo
Framework

Predicting Emotional Word Ratings using Distributional Representations and Signed Clustering


Title	Predicting Emotional Word Ratings using Distributional Representations and Signed Clustering
Authors	Jo{~a}o Sedoc, Daniel Preo{\c{t}}iuc-Pietro, Lyle Ungar
Abstract	Inferring the emotional content of words is important for text-based sentiment analysis, dialogue systems and psycholinguistics, but word ratings are expensive to collect at scale and across languages or domains. We develop a method that automatically extends word-level ratings to unrated words using signed clustering of vector space word representations along with affect ratings. We use our method to determine a word{'}s valence and arousal, which determine its position on the circumplex model of affect, the most popular dimensional model of emotion. Our method achieves superior out-of-sample word rating prediction on both affective dimensions across three different languages when compared to state-of-the-art word similarity based methods. Our method can assist building word ratings for new languages and improve downstream tasks such as sentiment analysis and emotion detection.
Tasks	Sentiment Analysis
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-2090/
PDF	https://www.aclweb.org/anthology/E17-2090
PWC	https://paperswithcode.com/paper/predicting-emotional-word-ratings-using
Repo
Framework

A Simple Multi-Class Boosting Framework with Theoretical Guarantees and Empirical Proficiency


Title	A Simple Multi-Class Boosting Framework with Theoretical Guarantees and Empirical Proficiency
Authors	Ron Appel, Pietro Perona
Abstract	There is a need for simple yet accurate white-box learning systems that train quickly and with little data. To this end, we showcase REBEL, a multi-class boosting method, and present a novel family of weak learners called localized similarities. Our framework provably minimizes the training error of any dataset at an exponential rate. We carry out experiments on a variety of synthetic and real datasets, demonstrating a consistent tendency to avoid overfitting. We evaluate our method on MNIST and standard UCI datasets against other state-of-the-art methods, showing the empirical proficiency of our method.
Tasks
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=675
PDF	http://proceedings.mlr.press/v70/appel17a/appel17a.pdf
PWC	https://paperswithcode.com/paper/a-simple-multi-class-boosting-framework-with
Repo
Framework

Context-Aware Graph Segmentation for Graph-Based Translation


Title	Context-Aware Graph Segmentation for Graph-Based Translation
Authors	Liangyou Li, Andy Way, Qun Liu
Abstract	In this paper, we present an improved graph-based translation model which segments an input graph into node-induced subgraphs by taking source context into consideration. Translations are generated by combining subgraph translations left-to-right using beam search. Experiments on Chinese{–}English and German{–}English demonstrate that the context-aware segmentation significantly improves the baseline graph-based model.
Tasks
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-2095/
PDF	https://www.aclweb.org/anthology/E17-2095
PWC	https://paperswithcode.com/paper/context-aware-graph-segmentation-for-graph
Repo
Framework

Ranking Convolutional Recurrent Neural Networks for Purchase Stage Identification on Imbalanced Twitter Data


Title	Ranking Convolutional Recurrent Neural Networks for Purchase Stage Identification on Imbalanced Twitter Data
Authors	Heike Adel, Francine Chen, Yan-Ying Chen
Abstract	Users often use social media to share their interest in products. We propose to identify purchase stages from Twitter data following the AIDA model (Awareness, Interest, Desire, Action). In particular, we define the task of classifying the purchase stage of each tweet in a user{'}s tweet sequence. We introduce RCRNN, a Ranking Convolutional Recurrent Neural Network which computes tweet representations using convolution over word embeddings and models a tweet sequence with gated recurrent units. Also, we consider various methods to cope with the imbalanced label distribution in our data and show that a ranking layer outperforms class weights.
Tasks	Word Embeddings
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-2094/
PDF	https://www.aclweb.org/anthology/E17-2094
PWC	https://paperswithcode.com/paper/ranking-convolutional-recurrent-neural
Repo
Framework

Reranking Translation Candidates Produced by Several Bilingual Word Similarity Sources


Title	Reranking Translation Candidates Produced by Several Bilingual Word Similarity Sources
Authors	Laurent Jakubina, Phillippe Langlais
Abstract	We investigate the reranking of the output of several distributional approaches on the Bilingual Lexicon Induction task. We show that reranking an n-best list produced by any of those approaches leads to very substantial improvements. We further demonstrate that combining several n-best lists by reranking is an effective way of further boosting performance.
Tasks	Word Embeddings
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-2096/
PDF	https://www.aclweb.org/anthology/E17-2096
PWC	https://paperswithcode.com/paper/reranking-translation-candidates-produced-by
Repo
Framework

Uniform Deviation Bounds for k-Means Clustering


Title	Uniform Deviation Bounds for k-Means Clustering
Authors	Olivier Bachem, Mario Lucic, S. Hamed Hassani, Andreas Krause
Abstract	Uniform deviation bounds limit the difference between a model’s expected loss and its loss on an empirical sample uniformly for all models in a learning problem. In this paper, we provide a novel framework to obtain uniform deviation bounds for loss functions which are unbounded. As a result, we obtain competitive uniform deviation bounds for k-Means clustering under weak assumptions on the underlying distribution. If the fourth moment is bounded, we prove a rate of $O(m^{-1/2})$ compared to the previously known $O(m^{-1/4})$ rate. Furthermore, we show that the rate also depends on the kurtosis – the normalized fourth moment which measures the “tailedness” of a distribution. We also provide improved rates under progressively stronger assumptions, namely, bounded higher moments, subgaussianity and bounded support of the underlying distribution.
Tasks
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=523
PDF	http://proceedings.mlr.press/v70/bachem17a/bachem17a.pdf
PWC	https://paperswithcode.com/paper/uniform-deviation-bounds-for-k-means
Repo
Framework

Lexicalized Reordering for Left-to-Right Hierarchical Phrase-based Translation


Title	Lexicalized Reordering for Left-to-Right Hierarchical Phrase-based Translation
Authors	Maryam Siahbani, Anoop Sarkar
Abstract	Phrase-based and hierarchical phrase-based (Hiero) translation models differ radically in the way reordering is modeled. Lexicalized reordering models play an important role in phrase-based MT and such models have been added to CKY-based decoders for Hiero. Watanabe et al. (2006) proposed a promising decoding algorithm for Hiero (LR-Hiero) that visits input spans in arbitrary order and produces the translation in left to right (LR) order which leads to far fewer language model calls and leads to a considerable speedup in decoding. We introduce a novel shift-reduce algorithm to LR-Hiero to decode with our lexicalized reordering model (LRM) and show that it improves translation quality for Czech-English, Chinese-English and German-English.
Tasks	Language Modelling, Machine Translation
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-2097/
PDF	https://www.aclweb.org/anthology/E17-2097
PWC	https://paperswithcode.com/paper/lexicalized-reordering-for-left-to-right
Repo
Framework

Gradient Boosted Decision Trees for High Dimensional Sparse Output


Title	Gradient Boosted Decision Trees for High Dimensional Sparse Output
Authors	Si Si, Huan Zhang, S. Sathiya Keerthi, Dhruv Mahajan, Inderjit S. Dhillon, Cho-Jui Hsieh
Abstract	In this paper, we study the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse. For example, in multilabel classification, the output space is a $L$-dimensional 0/1 vector, where $L$ is number of labels that can grow to millions and beyond in many modern applications. We show that vanilla GBDT can easily run out of memory or encounter near-forever running time in this regime, and propose a new GBDT variant, GBDT-SPARSE, to resolve this problem by employing $L_0$ regularization. We then discuss in detail how to utilize this sparsity to conduct GBDT training, including splitting the nodes, computing the sparse residual, and predicting in sublinear time. Finally, we apply our algorithm to extreme multilabel classification problems, and show that the proposed GBDT-SPARSE achieves an order of magnitude improvements in model size and prediction time over existing methods, while yielding similar performance.
Tasks
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=870
PDF	http://proceedings.mlr.press/v70/si17a/si17a.pdf
PWC	https://paperswithcode.com/paper/gradient-boosted-decision-trees-for-high
Repo
Framework