July 26, 2019

2306 words 11 mins read

Paper Group NANR 139

Creating lexical resources for polysynthetic languages—the case of Arapaho. Abusive Language Detection on Arabic Social Media. Representation of complex terms in a vector space structured by an ontology for a normalization task. Principles of Riemannian Geometry in Neural Networks. Natural Language Input for In-Car Spoken Dialog Systems: How Natu …

Creating lexical resources for polysynthetic languages—the case of Arapaho


Title	Creating lexical resources for polysynthetic languages—the case of Arapaho
Authors	Ghazaleh Kazeminejad, Andrew Cowell, Mans Hulden
Abstract
Tasks
Published	2017-03-01
URL	https://www.aclweb.org/anthology/W17-0102/
PDF	https://www.aclweb.org/anthology/W17-0102
PWC	https://paperswithcode.com/paper/creating-lexical-resources-for-polysynthetic
Repo
Framework


Title	Abusive Language Detection on Arabic Social Media
Authors	Hamdy Mubarak, Kareem Darwish, Walid Magdy
Abstract	In this paper, we present our work on detecting abusive language on Arabic social media. We extract a list of obscene words and hashtags using common patterns used in offensive and rude communications. We also classify Twitter users according to whether they use any of these words or not in their tweets. We expand the list of obscene words using this classification, and we report results on a newly created dataset of classified Arabic tweets (obscene, offensive, and clean). We make this dataset freely available for research, in addition to the list of obscene words and hashtags. We are also publicly releasing a large corpus of classified user comments that were deleted from a popular Arabic news site due to violations the site{'}s rules and guidelines.
Tasks
Published	2017-08-01
URL	https://www.aclweb.org/anthology/W17-3008/
PDF	https://www.aclweb.org/anthology/W17-3008
PWC	https://paperswithcode.com/paper/abusive-language-detection-on-arabic-social
Repo
Framework

Representation of complex terms in a vector space structured by an ontology for a normalization task


Title	Representation of complex terms in a vector space structured by an ontology for a normalization task
Authors	Arnaud Ferr{'e}, Pierre Zweigenbaum, Claire N{'e}dellec
Abstract	We propose in this paper a semi-supervised method for labeling terms of texts with concepts of a domain ontology. The method generates continuous vector representations of complex terms in a semantic space structured by the ontology. The proposed method relies on a distributional semantics approach, which generates initial vectors for each of the extracted terms. Then these vectors are embedded in the vector space constructed from the structure of the ontology. This embedding is carried out by training a linear model. Finally, we apply a distance calculation to determine the proximity between vectors of terms and vectors of concepts and thus to assign ontology labels to terms. We have evaluated the quality of these representations for a normalization task by using the concepts of an ontology as semantic labels. Normalization of terms is an important step to extract a part of the information containing in texts, but the vector space generated might find other applications. The performance of this method is comparable to that of the state of the art for this task of standardization, opening up encouraging prospects.
Tasks
Published	2017-08-01
URL	https://www.aclweb.org/anthology/W17-2312/
PDF	https://www.aclweb.org/anthology/W17-2312
PWC	https://paperswithcode.com/paper/representation-of-complex-terms-in-a-vector
Repo
Framework

Principles of Riemannian Geometry in Neural Networks


Title	Principles of Riemannian Geometry in Neural Networks
Authors	Michael Hauser, Asok Ray
Abstract	This study deals with neural networks in the sense of geometric transformations acting on the coordinate representation of the underlying data manifold which the data is sampled from. It forms part of an attempt to construct a formalized general theory of neural networks in the setting of Riemannian geometry. From this perspective, the following theoretical results are developed and proven for feedforward networks. First it is shown that residual neural networks are finite difference approximations to dynamical systems of first order differential equations, as opposed to ordinary networks that are static. This implies that the network is learning systems of differential equations governing the coordinate transformations that represent the data. Second it is shown that a closed form solution of the metric tensor on the underlying data manifold can be found by backpropagating the coordinate representations learned by the neural network itself. This is formulated in a formal abstract sense as a sequence of Lie group actions on the metric fibre space in the principal and associated bundles on the data manifold. Toy experiments were run to confirm parts of the proposed theory, as well as to provide intuitions as to how neural networks operate on data.
Tasks
Published	2017-12-01
URL	http://papers.nips.cc/paper/6873-principles-of-riemannian-geometry-in-neural-networks
PDF	http://papers.nips.cc/paper/6873-principles-of-riemannian-geometry-in-neural-networks.pdf
PWC	https://paperswithcode.com/paper/principles-of-riemannian-geometry-in-neural
Repo
Framework

Natural Language Input for In-Car Spoken Dialog Systems: How Natural is Natural?


Title	Natural Language Input for In-Car Spoken Dialog Systems: How Natural is Natural?
Authors	Patricia Braunger, Wolfgang Maier
Abstract	Recent spoken dialog systems are moving away from command and control towards a more intuitive and natural style of interaction. In order to choose an appropriate system design which allows the system to deal with naturally spoken user input, a definition of what exactly constitutes naturalness in user input is important. In this paper, we examine how different user groups naturally speak to an automotive spoken dialog system (SDS). We conduct a user study in which we collect freely spoken user utterances for a wide range of use cases in German. By means of a comparative study of the utterances from the study with interpersonal utterances, we provide criteria what constitutes naturalness in the user input of an state-of-the-art automotive SDS.
Tasks
Published	2017-08-01
URL	https://www.aclweb.org/anthology/W17-5517/
PDF	https://www.aclweb.org/anthology/W17-5517
PWC	https://paperswithcode.com/paper/natural-language-input-for-in-car-spoken
Repo
Framework

MultiLing 2017 Overview


Title	MultiLing 2017 Overview
Authors	George Giannakopoulos, John Conroy, Jeff Kubina, Peter A. Rankel, Elena Lloret, Josef Steinberger, Marina Litvak, Benoit Favre
Abstract	In this brief report we present an overview of the MultiLing 2017 effort and workshop, as implemented within EACL 2017. MultiLing is a community-driven initiative that pushes the state-of-the-art in Automatic Summarization by providing data sets and fostering further research and development of summarization systems. This year the scope of the workshop was widened, bringing together researchers that work on summarization across sources, languages and genres. We summarize the main tasks planned and implemented this year, the contributions received, and we also provide insights on next steps.
Tasks	Document Summarization
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1001/
PDF	https://www.aclweb.org/anthology/W17-1001
PWC	https://paperswithcode.com/paper/multiling-2017-overview
Repo
Framework

Instant Annotations – Applying NLP Methods to the Annotation of Spoken Language Documentation Corpora


Title	Instant Annotations – Applying NLP Methods to the Annotation of Spoken Language Documentation Corpora
Authors	Ciprian Gerstenberger, Niko Partanen, Michael Rie{\ss}ler, Joshua Wilbur
Abstract
Tasks
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-0604/
PDF	https://www.aclweb.org/anthology/W17-0604
PWC	https://paperswithcode.com/paper/instant-annotations-axtendash-applying-nlp
Repo
Framework


Title	Detecting Nastiness in Social Media
Authors	Niloofar Safi Samghabadi, Suraj Maharjan, Alan Sprague, Raquel Diaz-Sprague, Thamar Solorio
Abstract	Although social media has made it easy for people to connect on a virtually unlimited basis, it has also opened doors to people who misuse it to undermine, harass, humiliate, threaten and bully others. There is a lack of adequate resources to detect and hinder its occurrence. In this paper, we present our initial NLP approach to detect invective posts as a first step to eventually detect and deter cyberbullying. We crawl data containing profanities and then determine whether or not it contains invective. Annotations on this data are improved iteratively by in-lab annotations and crowdsourcing. We pursue different NLP approaches containing various typical and some newer techniques to distinguish the use of swear words in a neutral way from those instances in which they are used in an insulting way. We also show that this model not only works for our data set, but also can be successfully applied to different data sets.
Tasks
Published	2017-08-01
URL	https://www.aclweb.org/anthology/W17-3010/
PDF	https://www.aclweb.org/anthology/W17-3010
PWC	https://paperswithcode.com/paper/detecting-nastiness-in-social-media
Repo
Framework

A Text Normalisation System for Non-Standard English Words


Title	A Text Normalisation System for Non-Standard English Words
Authors	Emma Flint, Elliot Ford, Olivia Thomas, Andrew Caines, Paula Buttery
Abstract	This paper investigates the problem of text normalisation; specifically, the normalisation of non-standard words (NSWs) in English. Non-standard words can be defined as those word tokens which do not have a dictionary entry, and cannot be pronounced using the usual letter-to-phoneme conversion rules; e.g. lbs, 99.3{%}, {#}EMNLP2017. NSWs pose a challenge to the proper functioning of text-to-speech technology, and the solution is to spell them out in such a way that they can be pronounced appropriately. We describe our four-stage normalisation system made up of components for detection, classification, division and expansion of NSWs. Performance is favourabe compared to previous work in the field (Sproat et al. 2001, Normalization of non-standard words), as well as state-of-the-art text-to-speech software. Further, we update Sproat et al.{'}s NSW taxonomy, and create a more customisable system where users are able to input their own abbreviations and specify into which variety of English (currently available: British or American) they wish to normalise.
Tasks	Speech Recognition
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4414/
PDF	https://www.aclweb.org/anthology/W17-4414
PWC	https://paperswithcode.com/paper/a-text-normalisation-system-for-non-standard
Repo
Framework

The Covert Helps Parse the Overt


Title	The Covert Helps Parse the Overt
Authors	Xun Zhang, Weiwei Sun, Xiaojun Wan
Abstract	This paper is concerned with whether deep syntactic information can help surface parsing, with a particular focus on empty categories. We design new algorithms to produce dependency trees in which empty elements are allowed, and evaluate the impact of information about empty category on parsing overt elements. Such information is helpful to reduce the approximation error in a structured parsing model, but increases the search space for inference and accordingly the estimation error. To deal with structure-based overfitting, we propose to integrate disambiguation models with and without empty elements, and perform structure regularization via joint decoding. Experiments on English and Chinese TreeBanks with different parsing models indicate that incorporating empty elements consistently improves surface parsing.
Tasks	Dependency Parsing
Published	2017-08-01
URL	https://www.aclweb.org/anthology/K17-1035/
PDF	https://www.aclweb.org/anthology/K17-1035
PWC	https://paperswithcode.com/paper/the-covert-helps-parse-the-overt
Repo
Framework

Paraphrasing Revisited with Neural Machine Translation


Title	Paraphrasing Revisited with Neural Machine Translation
Authors	Jonathan Mallinson, Rico Sennrich, Mirella Lapata
Abstract	Recognizing and generating paraphrases is an important component in many natural language processing applications. A well-established technique for automatically extracting paraphrases leverages bilingual corpora to find meaning-equivalent phrases in a single language by {``}pivoting{''} over a shared translation in another language. In this paper we revisit bilingual pivoting in the context of neural machine translation and present a paraphrasing model based purely on neural networks. Our model represents paraphrases in a continuous space, estimates the degree of semantic relatedness between text segments of arbitrary length, and generates candidate paraphrases for any source input. Experimental results across tasks and datasets show that neural paraphrases outperform those obtained with conventional phrase-based pivoting approaches. \|
Tasks	Machine Translation, Question Answering, Semantic Parsing, Semantic Role Labeling
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-1083/
PDF	https://www.aclweb.org/anthology/E17-1083
PWC	https://paperswithcode.com/paper/paraphrasing-revisited-with-neural-machine
Repo
Framework

Learning Stock Market Sentiment Lexicon and Sentiment-Oriented Word Vector from StockTwits


Title	Learning Stock Market Sentiment Lexicon and Sentiment-Oriented Word Vector from StockTwits
Authors	Quanzhi Li, Sameena Shah
Abstract	Previous studies have shown that investor sentiment indicators can predict stock market change. A domain-specific sentiment lexicon and sentiment-oriented word embedding model would help the sentiment analysis in financial domain and stock market. In this paper, we present a new approach to learning stock market lexicon from StockTwits, a popular financial social network for investors to share ideas. It learns word polarity by predicting message sentiment, using a neural net-work. The sentiment-oriented word embeddings are learned from tens of millions of StockTwits posts, and this is the first study presenting sentiment-oriented word embeddings for stock market. The experiments of predicting investor sentiment show that our lexicon outperformed other lexicons built by the state-of-the-art methods, and the sentiment-oriented word vector was much better than the general word embeddings.
Tasks	Decision Making, Sentiment Analysis, Word Embeddings
Published	2017-08-01
URL	https://www.aclweb.org/anthology/K17-1031/
PDF	https://www.aclweb.org/anthology/K17-1031
PWC	https://paperswithcode.com/paper/learning-stock-market-sentiment-lexicon-and
Repo
Framework

Enabling robust and fluid spoken dialogue with cognitively impaired users


Title	Enabling robust and fluid spoken dialogue with cognitively impaired users
Authors	Ramin Yaghoubzadeh, Stefan Kopp
Abstract	We present the flexdiam dialogue management architecture, which was developed in a series of projects dedicated to tailoring spoken interaction to the needs of users with cognitive impairments in an everyday assistive domain, using a multimodal front-end. This hybrid DM architecture affords incremental processing of uncertain input, a flexible, mixed-initiative information grounding process that can be adapted to users{'} cognitive capacities and interactive idiosyncrasies, and generic mechanisms that foster transitions in the joint discourse state that are understandable and controllable by those users, in order to effect a robust interaction for users with varying capacities.
Tasks	Dialogue Management, Speech Recognition
Published	2017-08-01
URL	https://www.aclweb.org/anthology/W17-5533/
PDF	https://www.aclweb.org/anthology/W17-5533
PWC	https://paperswithcode.com/paper/enabling-robust-and-fluid-spoken-dialogue
Repo
Framework

oIQa: An Opinion Influence Oriented Question Answering Framework with Applications to Marketing Domain


Title	oIQa: An Opinion Influence Oriented Question Answering Framework with Applications to Marketing Domain
Authors	Dumitru-Clementin Cercel, Cristian Onose, Stefan Trausan-Matu, Florin Pop
Abstract	Understanding questions and answers in QA system is a major challenge in the domain of natural language processing. In this paper, we present a question answering system that influences the human opinions in a conversation. The opinion words are quantified by using a lexicon-based method. We apply Latent Semantic Analysis and the cosine similarity measure between candidate answers and each question to infer the answer of the chatbot.
Tasks	Chatbot, Information Retrieval, Opinion Mining, Question Answering, Sentiment Analysis
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-7702/
PDF	https://doi.org/10.26615/978-954-452-038-0_002
PWC	https://paperswithcode.com/paper/oiqa-an-opinion-influence-oriented-question
Repo
Framework

Affinity-Preserving Random Walk for Multi-Document Summarization


Title	Affinity-Preserving Random Walk for Multi-Document Summarization
Authors	Kexiang Wang, Tianyu Liu, Zhifang Sui, Baobao Chang
Abstract	Multi-document summarization provides users with a short text that summarizes the information in a set of related documents. This paper introduces affinity-preserving random walk to the summarization task, which preserves the affinity relations of sentences by an absorbing random walk model. Meanwhile, we put forward adjustable affinity-preserving random walk to enforce the diversity constraint of summarization in the random walk process. The ROUGE evaluations on DUC 2003 topic-focused summarization task and DUC 2004 generic summarization task show the good performance of our method, which has the best ROUGE-2 recall among the graph-based ranking methods.
Tasks	Document Summarization, Multi-Document Summarization, Text Summarization
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-1020/
PDF	https://www.aclweb.org/anthology/D17-1020
PWC	https://paperswithcode.com/paper/affinity-preserving-random-walk-for-multi
Repo
Framework