Paper Group NANR 139
Creating lexical resources for polysynthetic languages—the case of Arapaho. Abusive Language Detection on Arabic Social Media. Representation of complex terms in a vector space structured by an ontology for a normalization task. Principles of Riemannian Geometry in Neural Networks. Natural Language Input for In-Car Spoken Dialog Systems: How Natu …
Creating lexical resources for polysynthetic languages—the case of Arapaho
Title | Creating lexical resources for polysynthetic languages—the case of Arapaho |
Authors | Ghazaleh Kazeminejad, Andrew Cowell, Mans Hulden |
Abstract | |
Tasks | |
Published | 2017-03-01 |
URL | https://www.aclweb.org/anthology/W17-0102/ |
https://www.aclweb.org/anthology/W17-0102 | |
PWC | https://paperswithcode.com/paper/creating-lexical-resources-for-polysynthetic |
Repo | |
Framework | |
Abusive Language Detection on Arabic Social Media
Title | Abusive Language Detection on Arabic Social Media |
Authors | Hamdy Mubarak, Kareem Darwish, Walid Magdy |
Abstract | In this paper, we present our work on detecting abusive language on Arabic social media. We extract a list of obscene words and hashtags using common patterns used in offensive and rude communications. We also classify Twitter users according to whether they use any of these words or not in their tweets. We expand the list of obscene words using this classification, and we report results on a newly created dataset of classified Arabic tweets (obscene, offensive, and clean). We make this dataset freely available for research, in addition to the list of obscene words and hashtags. We are also publicly releasing a large corpus of classified user comments that were deleted from a popular Arabic news site due to violations the site{'}s rules and guidelines. |
Tasks | |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-3008/ |
https://www.aclweb.org/anthology/W17-3008 | |
PWC | https://paperswithcode.com/paper/abusive-language-detection-on-arabic-social |
Repo | |
Framework | |
Representation of complex terms in a vector space structured by an ontology for a normalization task
Title | Representation of complex terms in a vector space structured by an ontology for a normalization task |
Authors | Arnaud Ferr{'e}, Pierre Zweigenbaum, Claire N{'e}dellec |
Abstract | We propose in this paper a semi-supervised method for labeling terms of texts with concepts of a domain ontology. The method generates continuous vector representations of complex terms in a semantic space structured by the ontology. The proposed method relies on a distributional semantics approach, which generates initial vectors for each of the extracted terms. Then these vectors are embedded in the vector space constructed from the structure of the ontology. This embedding is carried out by training a linear model. Finally, we apply a distance calculation to determine the proximity between vectors of terms and vectors of concepts and thus to assign ontology labels to terms. We have evaluated the quality of these representations for a normalization task by using the concepts of an ontology as semantic labels. Normalization of terms is an important step to extract a part of the information containing in texts, but the vector space generated might find other applications. The performance of this method is comparable to that of the state of the art for this task of standardization, opening up encouraging prospects. |
Tasks | |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-2312/ |
https://www.aclweb.org/anthology/W17-2312 | |
PWC | https://paperswithcode.com/paper/representation-of-complex-terms-in-a-vector |
Repo | |
Framework | |
Principles of Riemannian Geometry in Neural Networks
Title | Principles of Riemannian Geometry in Neural Networks |
Authors | Michael Hauser, Asok Ray |
Abstract | This study deals with neural networks in the sense of geometric transformations acting on the coordinate representation of the underlying data manifold which the data is sampled from. It forms part of an attempt to construct a formalized general theory of neural networks in the setting of Riemannian geometry. From this perspective, the following theoretical results are developed and proven for feedforward networks. First it is shown that residual neural networks are finite difference approximations to dynamical systems of first order differential equations, as opposed to ordinary networks that are static. This implies that the network is learning systems of differential equations governing the coordinate transformations that represent the data. Second it is shown that a closed form solution of the metric tensor on the underlying data manifold can be found by backpropagating the coordinate representations learned by the neural network itself. This is formulated in a formal abstract sense as a sequence of Lie group actions on the metric fibre space in the principal and associated bundles on the data manifold. Toy experiments were run to confirm parts of the proposed theory, as well as to provide intuitions as to how neural networks operate on data. |
Tasks | |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/6873-principles-of-riemannian-geometry-in-neural-networks |
http://papers.nips.cc/paper/6873-principles-of-riemannian-geometry-in-neural-networks.pdf | |
PWC | https://paperswithcode.com/paper/principles-of-riemannian-geometry-in-neural |
Repo | |
Framework | |
Natural Language Input for In-Car Spoken Dialog Systems: How Natural is Natural?
Title | Natural Language Input for In-Car Spoken Dialog Systems: How Natural is Natural? |
Authors | Patricia Braunger, Wolfgang Maier |
Abstract | Recent spoken dialog systems are moving away from command and control towards a more intuitive and natural style of interaction. In order to choose an appropriate system design which allows the system to deal with naturally spoken user input, a definition of what exactly constitutes naturalness in user input is important. In this paper, we examine how different user groups naturally speak to an automotive spoken dialog system (SDS). We conduct a user study in which we collect freely spoken user utterances for a wide range of use cases in German. By means of a comparative study of the utterances from the study with interpersonal utterances, we provide criteria what constitutes naturalness in the user input of an state-of-the-art automotive SDS. |
Tasks | |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-5517/ |
https://www.aclweb.org/anthology/W17-5517 | |
PWC | https://paperswithcode.com/paper/natural-language-input-for-in-car-spoken |
Repo | |
Framework | |
MultiLing 2017 Overview
Title | MultiLing 2017 Overview |
Authors | George Giannakopoulos, John Conroy, Jeff Kubina, Peter A. Rankel, Elena Lloret, Josef Steinberger, Marina Litvak, Benoit Favre |
Abstract | In this brief report we present an overview of the MultiLing 2017 effort and workshop, as implemented within EACL 2017. MultiLing is a community-driven initiative that pushes the state-of-the-art in Automatic Summarization by providing data sets and fostering further research and development of summarization systems. This year the scope of the workshop was widened, bringing together researchers that work on summarization across sources, languages and genres. We summarize the main tasks planned and implemented this year, the contributions received, and we also provide insights on next steps. |
Tasks | Document Summarization |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1001/ |
https://www.aclweb.org/anthology/W17-1001 | |
PWC | https://paperswithcode.com/paper/multiling-2017-overview |
Repo | |
Framework | |
Instant Annotations – Applying NLP Methods to the Annotation of Spoken Language Documentation Corpora
Title | Instant Annotations – Applying NLP Methods to the Annotation of Spoken Language Documentation Corpora |
Authors | Ciprian Gerstenberger, Niko Partanen, Michael Rie{\ss}ler, Joshua Wilbur |
Abstract | |
Tasks | |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/W17-0604/ |
https://www.aclweb.org/anthology/W17-0604 | |
PWC | https://paperswithcode.com/paper/instant-annotations-axtendash-applying-nlp |
Repo | |
Framework | |
Detecting Nastiness in Social Media
Title | Detecting Nastiness in Social Media |
Authors | Niloofar Safi Samghabadi, Suraj Maharjan, Alan Sprague, Raquel Diaz-Sprague, Thamar Solorio |
Abstract | Although social media has made it easy for people to connect on a virtually unlimited basis, it has also opened doors to people who misuse it to undermine, harass, humiliate, threaten and bully others. There is a lack of adequate resources to detect and hinder its occurrence. In this paper, we present our initial NLP approach to detect invective posts as a first step to eventually detect and deter cyberbullying. We crawl data containing profanities and then determine whether or not it contains invective. Annotations on this data are improved iteratively by in-lab annotations and crowdsourcing. We pursue different NLP approaches containing various typical and some newer techniques to distinguish the use of swear words in a neutral way from those instances in which they are used in an insulting way. We also show that this model not only works for our data set, but also can be successfully applied to different data sets. |
Tasks | |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-3010/ |
https://www.aclweb.org/anthology/W17-3010 | |
PWC | https://paperswithcode.com/paper/detecting-nastiness-in-social-media |
Repo | |
Framework | |
A Text Normalisation System for Non-Standard English Words
Title | A Text Normalisation System for Non-Standard English Words |
Authors | Emma Flint, Elliot Ford, Olivia Thomas, Andrew Caines, Paula Buttery |
Abstract | This paper investigates the problem of text normalisation; specifically, the normalisation of non-standard words (NSWs) in English. Non-standard words can be defined as those word tokens which do not have a dictionary entry, and cannot be pronounced using the usual letter-to-phoneme conversion rules; e.g. lbs, 99.3{%}, {#}EMNLP2017. NSWs pose a challenge to the proper functioning of text-to-speech technology, and the solution is to spell them out in such a way that they can be pronounced appropriately. We describe our four-stage normalisation system made up of components for detection, classification, division and expansion of NSWs. Performance is favourabe compared to previous work in the field (Sproat et al. 2001, Normalization of non-standard words), as well as state-of-the-art text-to-speech software. Further, we update Sproat et al.{'}s NSW taxonomy, and create a more customisable system where users are able to input their own abbreviations and specify into which variety of English (currently available: British or American) they wish to normalise. |
Tasks | Speech Recognition |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4414/ |
https://www.aclweb.org/anthology/W17-4414 | |
PWC | https://paperswithcode.com/paper/a-text-normalisation-system-for-non-standard |
Repo | |
Framework | |
The Covert Helps Parse the Overt
Title | The Covert Helps Parse the Overt |
Authors | Xun Zhang, Weiwei Sun, Xiaojun Wan |
Abstract | This paper is concerned with whether deep syntactic information can help surface parsing, with a particular focus on empty categories. We design new algorithms to produce dependency trees in which empty elements are allowed, and evaluate the impact of information about empty category on parsing overt elements. Such information is helpful to reduce the approximation error in a structured parsing model, but increases the search space for inference and accordingly the estimation error. To deal with structure-based overfitting, we propose to integrate disambiguation models with and without empty elements, and perform structure regularization via joint decoding. Experiments on English and Chinese TreeBanks with different parsing models indicate that incorporating empty elements consistently improves surface parsing. |
Tasks | Dependency Parsing |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/K17-1035/ |
https://www.aclweb.org/anthology/K17-1035 | |
PWC | https://paperswithcode.com/paper/the-covert-helps-parse-the-overt |
Repo | |
Framework | |
Paraphrasing Revisited with Neural Machine Translation
Title | Paraphrasing Revisited with Neural Machine Translation |
Authors | Jonathan Mallinson, Rico Sennrich, Mirella Lapata |
Abstract | Recognizing and generating paraphrases is an important component in many natural language processing applications. A well-established technique for automatically extracting paraphrases leverages bilingual corpora to find meaning-equivalent phrases in a single language by {``}pivoting{''} over a shared translation in another language. In this paper we revisit bilingual pivoting in the context of neural machine translation and present a paraphrasing model based purely on neural networks. Our model represents paraphrases in a continuous space, estimates the degree of semantic relatedness between text segments of arbitrary length, and generates candidate paraphrases for any source input. Experimental results across tasks and datasets show that neural paraphrases outperform those obtained with conventional phrase-based pivoting approaches. | |
Tasks | Machine Translation, Question Answering, Semantic Parsing, Semantic Role Labeling |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-1083/ |
https://www.aclweb.org/anthology/E17-1083 | |
PWC | https://paperswithcode.com/paper/paraphrasing-revisited-with-neural-machine |
Repo | |
Framework | |
Learning Stock Market Sentiment Lexicon and Sentiment-Oriented Word Vector from StockTwits
Title | Learning Stock Market Sentiment Lexicon and Sentiment-Oriented Word Vector from StockTwits |
Authors | Quanzhi Li, Sameena Shah |
Abstract | Previous studies have shown that investor sentiment indicators can predict stock market change. A domain-specific sentiment lexicon and sentiment-oriented word embedding model would help the sentiment analysis in financial domain and stock market. In this paper, we present a new approach to learning stock market lexicon from StockTwits, a popular financial social network for investors to share ideas. It learns word polarity by predicting message sentiment, using a neural net-work. The sentiment-oriented word embeddings are learned from tens of millions of StockTwits posts, and this is the first study presenting sentiment-oriented word embeddings for stock market. The experiments of predicting investor sentiment show that our lexicon outperformed other lexicons built by the state-of-the-art methods, and the sentiment-oriented word vector was much better than the general word embeddings. |
Tasks | Decision Making, Sentiment Analysis, Word Embeddings |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/K17-1031/ |
https://www.aclweb.org/anthology/K17-1031 | |
PWC | https://paperswithcode.com/paper/learning-stock-market-sentiment-lexicon-and |
Repo | |
Framework | |
Enabling robust and fluid spoken dialogue with cognitively impaired users
Title | Enabling robust and fluid spoken dialogue with cognitively impaired users |
Authors | Ramin Yaghoubzadeh, Stefan Kopp |
Abstract | We present the flexdiam dialogue management architecture, which was developed in a series of projects dedicated to tailoring spoken interaction to the needs of users with cognitive impairments in an everyday assistive domain, using a multimodal front-end. This hybrid DM architecture affords incremental processing of uncertain input, a flexible, mixed-initiative information grounding process that can be adapted to users{'} cognitive capacities and interactive idiosyncrasies, and generic mechanisms that foster transitions in the joint discourse state that are understandable and controllable by those users, in order to effect a robust interaction for users with varying capacities. |
Tasks | Dialogue Management, Speech Recognition |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-5533/ |
https://www.aclweb.org/anthology/W17-5533 | |
PWC | https://paperswithcode.com/paper/enabling-robust-and-fluid-spoken-dialogue |
Repo | |
Framework | |
oIQa: An Opinion Influence Oriented Question Answering Framework with Applications to Marketing Domain
Title | oIQa: An Opinion Influence Oriented Question Answering Framework with Applications to Marketing Domain |
Authors | Dumitru-Clementin Cercel, Cristian Onose, Stefan Trausan-Matu, Florin Pop |
Abstract | Understanding questions and answers in QA system is a major challenge in the domain of natural language processing. In this paper, we present a question answering system that influences the human opinions in a conversation. The opinion words are quantified by using a lexicon-based method. We apply Latent Semantic Analysis and the cosine similarity measure between candidate answers and each question to infer the answer of the chatbot. |
Tasks | Chatbot, Information Retrieval, Opinion Mining, Question Answering, Sentiment Analysis |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-7702/ |
https://doi.org/10.26615/978-954-452-038-0_002 | |
PWC | https://paperswithcode.com/paper/oiqa-an-opinion-influence-oriented-question |
Repo | |
Framework | |
Affinity-Preserving Random Walk for Multi-Document Summarization
Title | Affinity-Preserving Random Walk for Multi-Document Summarization |
Authors | Kexiang Wang, Tianyu Liu, Zhifang Sui, Baobao Chang |
Abstract | Multi-document summarization provides users with a short text that summarizes the information in a set of related documents. This paper introduces affinity-preserving random walk to the summarization task, which preserves the affinity relations of sentences by an absorbing random walk model. Meanwhile, we put forward adjustable affinity-preserving random walk to enforce the diversity constraint of summarization in the random walk process. The ROUGE evaluations on DUC 2003 topic-focused summarization task and DUC 2004 generic summarization task show the good performance of our method, which has the best ROUGE-2 recall among the graph-based ranking methods. |
Tasks | Document Summarization, Multi-Document Summarization, Text Summarization |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1020/ |
https://www.aclweb.org/anthology/D17-1020 | |
PWC | https://paperswithcode.com/paper/affinity-preserving-random-walk-for-multi |
Repo | |
Framework | |