July 26, 2019

2076 words 10 mins read

Paper Group NANR 17

Paper Group NANR 17

Class-based Prediction Errors to Detect Hate Speech with Out-of-vocabulary Words. Good News vs. Bad News: What are they talking about?. Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition. Dependency Structure of Binary Conjunctions(of the IF\ldots, THEN\ldots Type). Creation and evaluat …

Class-based Prediction Errors to Detect Hate Speech with Out-of-vocabulary Words

Title Class-based Prediction Errors to Detect Hate Speech with Out-of-vocabulary Words
Authors Joan Serr{`a}, Ilias Leontiadis, Dimitris Spathis, Gianluca Stringhini, Jeremy Blackburn, Athena Vakali
Abstract Common approaches to text categorization essentially rely either on n-gram counts or on word embeddings. This presents important difficulties in highly dynamic or quickly-interacting environments, where the appearance of new words and/or varied misspellings is the norm. A paradigmatic example of this situation is abusive online behavior, with social networks and media platforms struggling to effectively combat uncommon or non-blacklisted hate words. To better deal with these issues in those fast-paced environments, we propose using the error signal of class-based language models as input to text classification algorithms. In particular, we train a next-character prediction model for any given class and then exploit the error of such class-based models to inform a neural network classifier. This way, we shift from the {}ability to describe{'} seen documents to the {}ability to predict{'} unseen content. Preliminary studies using out-of-vocabulary splits from abusive tweet data show promising results, outperforming competitive text categorization strategies by 4-11{%}.
Tasks Adversarial Attack, Hate Speech Detection, Text Categorization, Text Classification, Word Embeddings
Published 2017-08-01
URL https://www.aclweb.org/anthology/W17-3005/
PDF https://www.aclweb.org/anthology/W17-3005
PWC https://paperswithcode.com/paper/class-based-prediction-errors-to-detect-hate
Repo
Framework

Good News vs. Bad News: What are they talking about?

Title Good News vs. Bad News: What are they talking about?
Authors Olga Kanishcheva, Victoria Bobicev
Abstract Today{'}s massive news streams demand the automate analysis which is provided by various online news explorers. However, most of them do not provide sentiment analysis. The main problem of sentiment analysis of news is the differences between the writers and readers attitudes to the news text. News can be good or bad but have to be delivered in neutral words as pure facts. Although there are applications for sentiment analysis of news, the task of news analysis is still a very actual problem because the latest news impacts people{'}s lives daily. In this paper, we explored the problem of sentiment analysis for Ukrainian and Russian news, developed a corpus of Ukrainian and Russian news and annotated each text using one of three categories: positive, negative and neutral. Each text was marked by at least three independent annotators via the web interface, the inter-annotator agreement was analyzed and the final label for each text was computed. These texts were used in the machine learning experiments. Further, we investigated what kinds of named entities such as Locations, Organizations, Persons are perceived as good or bad by the readers and which of them were the cause for text annotation ambiguity.
Tasks Sentiment Analysis
Published 2017-09-01
URL https://www.aclweb.org/anthology/R17-1044/
PDF https://doi.org/10.26615/978-954-452-049-6_044
PWC https://paperswithcode.com/paper/good-news-vs-bad-news-what-are-they-talking
Repo
Framework

Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition

Title Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition
Authors
Abstract
Tasks Language Acquisition
Published 2017-05-01
URL https://www.aclweb.org/anthology/W17-0300/
PDF https://www.aclweb.org/anthology/W17-0300
PWC https://paperswithcode.com/paper/proceedings-of-the-joint-workshop-on-nlp-for
Repo
Framework

Dependency Structure of Binary Conjunctions(of the IF\ldots, THEN\ldots Type)

Title Dependency Structure of Binary Conjunctions(of the IF\ldots, THEN\ldots Type)
Authors Igor Mel{'}{\v{c}}uk
Abstract
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-6516/
PDF https://www.aclweb.org/anthology/W17-6516
PWC https://paperswithcode.com/paper/dependency-structure-of-binary-conjunctionsof
Repo
Framework

Creation and evaluation of a dictionary-based tagger for virus species and proteins

Title Creation and evaluation of a dictionary-based tagger for virus species and proteins
Authors Helen Cook, R{=u}dolfs B{=e}rzi{\c{n}}{\v{s}}, Cristina Leal Rodr{\i}guez, Juan Miguel Cejuela, Lars Juhl Jensen
Abstract ext mining automatically extracts information from the literature with the goal of making it available for further analysis, for example by incorporating it into biomedical databases. A key first step towards this goal is to identify and normalize the named entities, such as proteins and species, which are mentioned in text. Despite the large detrimental impact that viruses have on human and agricultural health, very little previous text-mining work has focused on identifying virus species and proteins in the literature. Here, we present an improved dictionary-based system for viral species and the first dictionary for viral proteins, which we benchmark on a new corpus of 300 manually annotated abstracts. We achieve 81.0{%} precision and 72.7{%} recall at the task of recognizing and normalizing viral species and 76.2{%} precision and 34.9{%} recall on viral proteins. These results are achieved despite the many challenges involved with the names of viral species and, especially, proteins. This work provides a foundation that can be used to extract more complicated relations about viruses from the literature.
Tasks
Published 2017-08-01
URL https://www.aclweb.org/anthology/W17-2311/
PDF https://www.aclweb.org/anthology/W17-2311
PWC https://paperswithcode.com/paper/creation-and-evaluation-of-a-dictionary-based
Repo
Framework

From Textbooks to Knowledge: A Case Study in Harvesting Axiomatic Knowledge from Textbooks to Solve Geometry Problems

Title From Textbooks to Knowledge: A Case Study in Harvesting Axiomatic Knowledge from Textbooks to Solve Geometry Problems
Authors Mrinmaya Sachan, Kumar Dubey, Eric Xing
Abstract Textbooks are rich sources of information. Harvesting structured knowledge from textbooks is a key challenge in many educational applications. As a case study, we present an approach for harvesting structured axiomatic knowledge from math textbooks. Our approach uses rich contextual and typographical features extracted from raw textbooks. It leverages the redundancy and shared ordering across multiple textbooks to further refine the harvested axioms. These axioms are then parsed into rules that are used to improve the state-of-the-art in solving geometry problems.
Tasks Question Answering
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1081/
PDF https://www.aclweb.org/anthology/D17-1081
PWC https://paperswithcode.com/paper/from-textbooks-to-knowledge-a-case-study-in
Repo
Framework

UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics

Title UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics
Authors Olga Vechtomova
Abstract The paper presents a system for locating a pun word. The developed method calculates a score for each word in a pun, using a number of components, including its Inverse Document Frequency (IDF), Normalized Pointwise Mutual Information (NPMI) with other words in the pun text, its position in the text, part-of-speech and some syntactic features. The method achieved the best performance in the Heterographic category and the second best in the Homographic. Further analysis showed that IDF is the most useful characteristic, whereas the count of words with which the given word has high NPMI has a negative effect on performance.
Tasks
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2071/
PDF https://www.aclweb.org/anthology/S17-2071
PWC https://paperswithcode.com/paper/uwaterloo-at-semeval-2017-task-7-locating-the
Repo
Framework

Exploiting Word Internal Structures for Generic Chinese Sentence Representation

Title Exploiting Word Internal Structures for Generic Chinese Sentence Representation
Authors Shaonan Wang, Jiajun Zhang, Chengqing Zong
Abstract We introduce a novel mixed characterword architecture to improve Chinese sentence representations, by utilizing rich semantic information of word internal structures. Our architecture uses two key strategies. The first is a mask gate on characters, learning the relation among characters in a word. The second is a maxpooling operation on words, adaptively finding the optimal mixture of the atomic and compositional word representations. Finally, the proposed architecture is applied to various sentence composition models, which achieves substantial performance gains over baseline models on sentence similarity task.
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1029/
PDF https://www.aclweb.org/anthology/D17-1029
PWC https://paperswithcode.com/paper/exploiting-word-internal-structures-for
Repo
Framework

Estimating Reactions and Recommending Products with Generative Models of Reviews

Title Estimating Reactions and Recommending Products with Generative Models of Reviews
Authors Jianmo Ni, Zachary C. Lipton, Sharad Vikram, Julian McAuley
Abstract Traditional approaches to recommendation focus on learning from large volumes of historical feedback to estimate simple numerical quantities (Will a user click on a product? Make a purchase? etc.). Natural language approaches that model information like product reviews have proved to be incredibly useful in improving the performance of such methods, as reviews provide valuable auxiliary information that can be used to better estimate latent user preferences and item properties. In this paper, rather than using reviews as an inputs to a recommender system, we focus on generating reviews as the model{'}s output. This requires us to efficiently model text (at the character level) to capture the preferences of the user, the properties of the item being consumed, and the interaction between them (i.e., the user{'}s preference). We show that this can model can be used to (a) generate plausible reviews and estimate nuanced reactions; (b) provide personalized rankings of existing reviews; and (c) recommend existing products more effectively.
Tasks Language Modelling, Recommendation Systems, Sentiment Analysis
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-1079/
PDF https://www.aclweb.org/anthology/I17-1079
PWC https://paperswithcode.com/paper/estimating-reactions-and-recommending
Repo
Framework

Towards Implicit Content-Introducing for Generative Short-Text Conversation Systems

Title Towards Implicit Content-Introducing for Generative Short-Text Conversation Systems
Authors Lili Yao, Yaoyuan Zhang, Yansong Feng, Dongyan Zhao, Rui Yan
Abstract The study on human-computer conversation systems is a hot research topic nowadays. One of the prevailing methods to build the system is using the generative Sequence-to-Sequence (Seq2Seq) model through neural networks. However, the standard Seq2Seq model is prone to generate trivial responses. In this paper, we aim to generate a more meaningful and informative reply when answering a given question. We propose an implicit content-introducing method which incorporates additional information into the Seq2Seq model in a flexible way. Specifically, we fuse the general decoding and the auxiliary cue word information through our proposed hierarchical gated fusion unit. Experiments on real-life data demonstrate that our model consistently outperforms a set of competitive baselines in terms of BLEU scores and human evaluation.
Tasks Short-Text Conversation
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1233/
PDF https://www.aclweb.org/anthology/D17-1233
PWC https://paperswithcode.com/paper/towards-implicit-content-introducing-for
Repo
Framework

Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction

Title Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction
Authors Flavio Massimiliano Cecchini, Chris Biemann, Martin Riedl
Abstract
Tasks Word Sense Disambiguation, Word Sense Induction
Published 2017-05-01
URL https://www.aclweb.org/anthology/W17-0213/
PDF https://www.aclweb.org/anthology/W17-0213
PWC https://paperswithcode.com/paper/using-pseudowords-for-algorithm-comparison-an
Repo
Framework

Corpus Linguistic Analysis for Language Planning

Title Corpus Linguistic Analysis for Language Planning
Authors Joel Ilao
Abstract
Tasks
Published 2017-11-01
URL https://www.aclweb.org/anthology/Y17-1004/
PDF https://www.aclweb.org/anthology/Y17-1004
PWC https://paperswithcode.com/paper/corpus-linguistic-analysis-for-language
Repo
Framework

A Type-Logical Approach to Potential Consructions in Japanese

Title A Type-Logical Approach to Potential Consructions in Japanese
Authors Hiroaki Nakamura
Abstract
Tasks
Published 2017-11-01
URL https://www.aclweb.org/anthology/Y17-1009/
PDF https://www.aclweb.org/anthology/Y17-1009
PWC https://paperswithcode.com/paper/a-type-logical-approach-to-potential
Repo
Framework

Personalized Questions, Answers and Grammars: Aiding the Search for Relevant Web Information

Title Personalized Questions, Answers and Grammars: Aiding the Search for Relevant Web Information
Authors Marta Gatius
Abstract This work proposes an organization of knowledge to facilitate the generation of personalized questions, answers and grammars from web documents. To reduce the human effort needed in the generation of the linguistic resources for a new domain, the general aspects that can be reuse across domains are separated from those more specific. The proposed approach is based on the representation of the main domain concepts as a set of attributes. These attributes are related to a syntactico-semantic taxonomy representing the general relationships between conceptual and linguistic knowledge. User models are incorporated by distinguishing different user groups and relating each group to the appropriate conceptual attributes. Then, the data is extracted from the web documents and represented as instances of the domain concepts. Questions, answers and grammars are generated from these instances.
Tasks Text Generation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-3530/
PDF https://www.aclweb.org/anthology/W17-3530
PWC https://paperswithcode.com/paper/personalized-questions-answers-and-grammars
Repo
Framework

Outta Control: Laws of Semantic Change and Inherent Biases in Word Representation Models

Title Outta Control: Laws of Semantic Change and Inherent Biases in Word Representation Models
Authors Haim Dubossarsky, Daphna Weinshall, Eitan Grossman
Abstract This article evaluates three proposed laws of semantic change. Our claim is that in order to validate a putative law of semantic change, the effect should be observed in the genuine condition but absent or reduced in a suitably matched control condition, in which no change can possibly have taken place. Our analysis shows that the effects reported in recent literature must be substantially revised: (i) the proposed negative correlation between meaning change and word frequency is shown to be largely an artefact of the models of word representation used; (ii) the proposed negative correlation between meaning change and prototypicality is shown to be much weaker than what has been claimed in prior art; and (iii) the proposed positive correlation between meaning change and polysemy is largely an artefact of word frequency. These empirical observations are corroborated by analytical proofs that show that count representations introduce an inherent dependence on word frequency, and thus word frequency cannot be evaluated as an independent factor with these representations.
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1118/
PDF https://www.aclweb.org/anthology/D17-1118
PWC https://paperswithcode.com/paper/outta-control-laws-of-semantic-change-and
Repo
Framework
comments powered by Disqus