Paper Group NANR 17
Class-based Prediction Errors to Detect Hate Speech with Out-of-vocabulary Words. Good News vs. Bad News: What are they talking about?. Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition. Dependency Structure of Binary Conjunctions(of the IF\ldots, THEN\ldots Type). Creation and evaluat …
Class-based Prediction Errors to Detect Hate Speech with Out-of-vocabulary Words
Title | Class-based Prediction Errors to Detect Hate Speech with Out-of-vocabulary Words |
Authors | Joan Serr{`a}, Ilias Leontiadis, Dimitris Spathis, Gianluca Stringhini, Jeremy Blackburn, Athena Vakali |
Abstract | Common approaches to text categorization essentially rely either on n-gram counts or on word embeddings. This presents important difficulties in highly dynamic or quickly-interacting environments, where the appearance of new words and/or varied misspellings is the norm. A paradigmatic example of this situation is abusive online behavior, with social networks and media platforms struggling to effectively combat uncommon or non-blacklisted hate words. To better deal with these issues in those fast-paced environments, we propose using the error signal of class-based language models as input to text classification algorithms. In particular, we train a next-character prediction model for any given class and then exploit the error of such class-based models to inform a neural network classifier. This way, we shift from the {}ability to describe{'} seen documents to the { }ability to predict{'} unseen content. Preliminary studies using out-of-vocabulary splits from abusive tweet data show promising results, outperforming competitive text categorization strategies by 4-11{%}. |
Tasks | Adversarial Attack, Hate Speech Detection, Text Categorization, Text Classification, Word Embeddings |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-3005/ |
https://www.aclweb.org/anthology/W17-3005 | |
PWC | https://paperswithcode.com/paper/class-based-prediction-errors-to-detect-hate |
Repo | |
Framework | |
Good News vs. Bad News: What are they talking about?
Title | Good News vs. Bad News: What are they talking about? |
Authors | Olga Kanishcheva, Victoria Bobicev |
Abstract | Today{'}s massive news streams demand the automate analysis which is provided by various online news explorers. However, most of them do not provide sentiment analysis. The main problem of sentiment analysis of news is the differences between the writers and readers attitudes to the news text. News can be good or bad but have to be delivered in neutral words as pure facts. Although there are applications for sentiment analysis of news, the task of news analysis is still a very actual problem because the latest news impacts people{'}s lives daily. In this paper, we explored the problem of sentiment analysis for Ukrainian and Russian news, developed a corpus of Ukrainian and Russian news and annotated each text using one of three categories: positive, negative and neutral. Each text was marked by at least three independent annotators via the web interface, the inter-annotator agreement was analyzed and the final label for each text was computed. These texts were used in the machine learning experiments. Further, we investigated what kinds of named entities such as Locations, Organizations, Persons are perceived as good or bad by the readers and which of them were the cause for text annotation ambiguity. |
Tasks | Sentiment Analysis |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/R17-1044/ |
https://doi.org/10.26615/978-954-452-049-6_044 | |
PWC | https://paperswithcode.com/paper/good-news-vs-bad-news-what-are-they-talking |
Repo | |
Framework | |
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition
Title | Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition |
Authors | |
Abstract | |
Tasks | Language Acquisition |
Published | 2017-05-01 |
URL | https://www.aclweb.org/anthology/W17-0300/ |
https://www.aclweb.org/anthology/W17-0300 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-joint-workshop-on-nlp-for |
Repo | |
Framework | |
Dependency Structure of Binary Conjunctions(of the IF\ldots, THEN\ldots Type)
Title | Dependency Structure of Binary Conjunctions(of the IF\ldots, THEN\ldots Type) |
Authors | Igor Mel{'}{\v{c}}uk |
Abstract | |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-6516/ |
https://www.aclweb.org/anthology/W17-6516 | |
PWC | https://paperswithcode.com/paper/dependency-structure-of-binary-conjunctionsof |
Repo | |
Framework | |
Creation and evaluation of a dictionary-based tagger for virus species and proteins
Title | Creation and evaluation of a dictionary-based tagger for virus species and proteins |
Authors | Helen Cook, R{=u}dolfs B{=e}rzi{\c{n}}{\v{s}}, Cristina Leal Rodr{\i}guez, Juan Miguel Cejuela, Lars Juhl Jensen |
Abstract | ext mining automatically extracts information from the literature with the goal of making it available for further analysis, for example by incorporating it into biomedical databases. A key first step towards this goal is to identify and normalize the named entities, such as proteins and species, which are mentioned in text. Despite the large detrimental impact that viruses have on human and agricultural health, very little previous text-mining work has focused on identifying virus species and proteins in the literature. Here, we present an improved dictionary-based system for viral species and the first dictionary for viral proteins, which we benchmark on a new corpus of 300 manually annotated abstracts. We achieve 81.0{%} precision and 72.7{%} recall at the task of recognizing and normalizing viral species and 76.2{%} precision and 34.9{%} recall on viral proteins. These results are achieved despite the many challenges involved with the names of viral species and, especially, proteins. This work provides a foundation that can be used to extract more complicated relations about viruses from the literature. |
Tasks | |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-2311/ |
https://www.aclweb.org/anthology/W17-2311 | |
PWC | https://paperswithcode.com/paper/creation-and-evaluation-of-a-dictionary-based |
Repo | |
Framework | |
From Textbooks to Knowledge: A Case Study in Harvesting Axiomatic Knowledge from Textbooks to Solve Geometry Problems
Title | From Textbooks to Knowledge: A Case Study in Harvesting Axiomatic Knowledge from Textbooks to Solve Geometry Problems |
Authors | Mrinmaya Sachan, Kumar Dubey, Eric Xing |
Abstract | Textbooks are rich sources of information. Harvesting structured knowledge from textbooks is a key challenge in many educational applications. As a case study, we present an approach for harvesting structured axiomatic knowledge from math textbooks. Our approach uses rich contextual and typographical features extracted from raw textbooks. It leverages the redundancy and shared ordering across multiple textbooks to further refine the harvested axioms. These axioms are then parsed into rules that are used to improve the state-of-the-art in solving geometry problems. |
Tasks | Question Answering |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1081/ |
https://www.aclweb.org/anthology/D17-1081 | |
PWC | https://paperswithcode.com/paper/from-textbooks-to-knowledge-a-case-study-in |
Repo | |
Framework | |
UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics
Title | UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics |
Authors | Olga Vechtomova |
Abstract | The paper presents a system for locating a pun word. The developed method calculates a score for each word in a pun, using a number of components, including its Inverse Document Frequency (IDF), Normalized Pointwise Mutual Information (NPMI) with other words in the pun text, its position in the text, part-of-speech and some syntactic features. The method achieved the best performance in the Heterographic category and the second best in the Homographic. Further analysis showed that IDF is the most useful characteristic, whereas the count of words with which the given word has high NPMI has a negative effect on performance. |
Tasks | |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/S17-2071/ |
https://www.aclweb.org/anthology/S17-2071 | |
PWC | https://paperswithcode.com/paper/uwaterloo-at-semeval-2017-task-7-locating-the |
Repo | |
Framework | |
Exploiting Word Internal Structures for Generic Chinese Sentence Representation
Title | Exploiting Word Internal Structures for Generic Chinese Sentence Representation |
Authors | Shaonan Wang, Jiajun Zhang, Chengqing Zong |
Abstract | We introduce a novel mixed characterword architecture to improve Chinese sentence representations, by utilizing rich semantic information of word internal structures. Our architecture uses two key strategies. The first is a mask gate on characters, learning the relation among characters in a word. The second is a maxpooling operation on words, adaptively finding the optimal mixture of the atomic and compositional word representations. Finally, the proposed architecture is applied to various sentence composition models, which achieves substantial performance gains over baseline models on sentence similarity task. |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1029/ |
https://www.aclweb.org/anthology/D17-1029 | |
PWC | https://paperswithcode.com/paper/exploiting-word-internal-structures-for |
Repo | |
Framework | |
Estimating Reactions and Recommending Products with Generative Models of Reviews
Title | Estimating Reactions and Recommending Products with Generative Models of Reviews |
Authors | Jianmo Ni, Zachary C. Lipton, Sharad Vikram, Julian McAuley |
Abstract | Traditional approaches to recommendation focus on learning from large volumes of historical feedback to estimate simple numerical quantities (Will a user click on a product? Make a purchase? etc.). Natural language approaches that model information like product reviews have proved to be incredibly useful in improving the performance of such methods, as reviews provide valuable auxiliary information that can be used to better estimate latent user preferences and item properties. In this paper, rather than using reviews as an inputs to a recommender system, we focus on generating reviews as the model{'}s output. This requires us to efficiently model text (at the character level) to capture the preferences of the user, the properties of the item being consumed, and the interaction between them (i.e., the user{'}s preference). We show that this can model can be used to (a) generate plausible reviews and estimate nuanced reactions; (b) provide personalized rankings of existing reviews; and (c) recommend existing products more effectively. |
Tasks | Language Modelling, Recommendation Systems, Sentiment Analysis |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-1079/ |
https://www.aclweb.org/anthology/I17-1079 | |
PWC | https://paperswithcode.com/paper/estimating-reactions-and-recommending |
Repo | |
Framework | |
Towards Implicit Content-Introducing for Generative Short-Text Conversation Systems
Title | Towards Implicit Content-Introducing for Generative Short-Text Conversation Systems |
Authors | Lili Yao, Yaoyuan Zhang, Yansong Feng, Dongyan Zhao, Rui Yan |
Abstract | The study on human-computer conversation systems is a hot research topic nowadays. One of the prevailing methods to build the system is using the generative Sequence-to-Sequence (Seq2Seq) model through neural networks. However, the standard Seq2Seq model is prone to generate trivial responses. In this paper, we aim to generate a more meaningful and informative reply when answering a given question. We propose an implicit content-introducing method which incorporates additional information into the Seq2Seq model in a flexible way. Specifically, we fuse the general decoding and the auxiliary cue word information through our proposed hierarchical gated fusion unit. Experiments on real-life data demonstrate that our model consistently outperforms a set of competitive baselines in terms of BLEU scores and human evaluation. |
Tasks | Short-Text Conversation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1233/ |
https://www.aclweb.org/anthology/D17-1233 | |
PWC | https://paperswithcode.com/paper/towards-implicit-content-introducing-for |
Repo | |
Framework | |
Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction
Title | Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction |
Authors | Flavio Massimiliano Cecchini, Chris Biemann, Martin Riedl |
Abstract | |
Tasks | Word Sense Disambiguation, Word Sense Induction |
Published | 2017-05-01 |
URL | https://www.aclweb.org/anthology/W17-0213/ |
https://www.aclweb.org/anthology/W17-0213 | |
PWC | https://paperswithcode.com/paper/using-pseudowords-for-algorithm-comparison-an |
Repo | |
Framework | |
Corpus Linguistic Analysis for Language Planning
Title | Corpus Linguistic Analysis for Language Planning |
Authors | Joel Ilao |
Abstract | |
Tasks | |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/Y17-1004/ |
https://www.aclweb.org/anthology/Y17-1004 | |
PWC | https://paperswithcode.com/paper/corpus-linguistic-analysis-for-language |
Repo | |
Framework | |
A Type-Logical Approach to Potential Consructions in Japanese
Title | A Type-Logical Approach to Potential Consructions in Japanese |
Authors | Hiroaki Nakamura |
Abstract | |
Tasks | |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/Y17-1009/ |
https://www.aclweb.org/anthology/Y17-1009 | |
PWC | https://paperswithcode.com/paper/a-type-logical-approach-to-potential |
Repo | |
Framework | |
Personalized Questions, Answers and Grammars: Aiding the Search for Relevant Web Information
Title | Personalized Questions, Answers and Grammars: Aiding the Search for Relevant Web Information |
Authors | Marta Gatius |
Abstract | This work proposes an organization of knowledge to facilitate the generation of personalized questions, answers and grammars from web documents. To reduce the human effort needed in the generation of the linguistic resources for a new domain, the general aspects that can be reuse across domains are separated from those more specific. The proposed approach is based on the representation of the main domain concepts as a set of attributes. These attributes are related to a syntactico-semantic taxonomy representing the general relationships between conceptual and linguistic knowledge. User models are incorporated by distinguishing different user groups and relating each group to the appropriate conceptual attributes. Then, the data is extracted from the web documents and represented as instances of the domain concepts. Questions, answers and grammars are generated from these instances. |
Tasks | Text Generation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-3530/ |
https://www.aclweb.org/anthology/W17-3530 | |
PWC | https://paperswithcode.com/paper/personalized-questions-answers-and-grammars |
Repo | |
Framework | |
Outta Control: Laws of Semantic Change and Inherent Biases in Word Representation Models
Title | Outta Control: Laws of Semantic Change and Inherent Biases in Word Representation Models |
Authors | Haim Dubossarsky, Daphna Weinshall, Eitan Grossman |
Abstract | This article evaluates three proposed laws of semantic change. Our claim is that in order to validate a putative law of semantic change, the effect should be observed in the genuine condition but absent or reduced in a suitably matched control condition, in which no change can possibly have taken place. Our analysis shows that the effects reported in recent literature must be substantially revised: (i) the proposed negative correlation between meaning change and word frequency is shown to be largely an artefact of the models of word representation used; (ii) the proposed negative correlation between meaning change and prototypicality is shown to be much weaker than what has been claimed in prior art; and (iii) the proposed positive correlation between meaning change and polysemy is largely an artefact of word frequency. These empirical observations are corroborated by analytical proofs that show that count representations introduce an inherent dependence on word frequency, and thus word frequency cannot be evaluated as an independent factor with these representations. |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1118/ |
https://www.aclweb.org/anthology/D17-1118 | |
PWC | https://paperswithcode.com/paper/outta-control-laws-of-semantic-change-and |
Repo | |
Framework | |