Paper Group NANR 173
Domain-Adaptable Hybrid Generation of RDF Entity Descriptions. Leveraging Newswire Treebanks for Parsing Conversational Data with Argument Scrambling. Comparing Rule-based and SMT-based Spelling Normalisation for English Historical Texts. Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). The Expxorci …
Domain-Adaptable Hybrid Generation of RDF Entity Descriptions
Title | Domain-Adaptable Hybrid Generation of RDF Entity Descriptions |
Authors | Or Biran, Kathleen McKeown |
Abstract | RDF ontologies provide structured data on entities in many domains and continue to grow in size and diversity. While they can be useful as a starting point for generating descriptions of entities, they often miss important information about an entity that cannot be captured as simple relations. In addition, generic approaches to generation from RDF cannot capture the unique style and content of specific domains. We describe a framework for hybrid generation of entity descriptions, which combines generation from RDF data with text extracted from a corpus, and extracts unique aspects of the domain from the corpus to create domain-specific generation systems. We show that each component of our approach significantly increases the satisfaction of readers with the text across multiple applications and domains. |
Tasks | Domain Adaptation |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-1031/ |
https://www.aclweb.org/anthology/I17-1031 | |
PWC | https://paperswithcode.com/paper/domain-adaptable-hybrid-generation-of-rdf |
Repo | |
Framework | |
Leveraging Newswire Treebanks for Parsing Conversational Data with Argument Scrambling
Title | Leveraging Newswire Treebanks for Parsing Conversational Data with Argument Scrambling |
Authors | Riyaz A. Bhat, Irshad Bhat, Dipti Sharma |
Abstract | We investigate the problem of parsing conversational data of morphologically-rich languages such as Hindi where argument scrambling occurs frequently. We evaluate a state-of-the-art non-linear transition-based parsing system on a new dataset containing 506 dependency trees for sentences from Bollywood (Hindi) movie scripts and Twitter posts of Hindi monolingual speakers. We show that a dependency parser trained on a newswire treebank is strongly biased towards the canonical structures and degrades when applied to conversational data. Inspired by Transformational Generative Grammar (Chomsky, 1965), we mitigate the sampling bias by generating all theoretically possible alternative word orders of a clause from the existing (kernel) structures in the treebank. Training our parser on canonical and transformed structures improves performance on conversational data by around 9{%} LAS over the baseline newswire parser. |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-6309/ |
https://www.aclweb.org/anthology/W17-6309 | |
PWC | https://paperswithcode.com/paper/leveraging-newswire-treebanks-for-parsing-1 |
Repo | |
Framework | |
Comparing Rule-based and SMT-based Spelling Normalisation for English Historical Texts
Title | Comparing Rule-based and SMT-based Spelling Normalisation for English Historical Texts |
Authors | Gerold Schneider, Eva Pettersson, Michael Percillier |
Abstract | |
Tasks | Machine Translation |
Published | 2017-05-01 |
URL | https://www.aclweb.org/anthology/W17-0508/ |
https://www.aclweb.org/anthology/W17-0508 | |
PWC | https://paperswithcode.com/paper/comparing-rule-based-and-smt-based-spelling |
Repo | |
Framework | |
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)
Title | Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial) |
Authors | |
Abstract | |
Tasks | |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1200/ |
https://www.aclweb.org/anthology/W17-1200 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-fourth-workshop-on-nlp-for-1 |
Repo | |
Framework | |
The Expxorcist: Nonparametric Graphical Models Via Conditional Exponential Densities
Title | The Expxorcist: Nonparametric Graphical Models Via Conditional Exponential Densities |
Authors | Arun Suggala, Mladen Kolar, Pradeep K. Ravikumar |
Abstract | Non-parametric multivariate density estimation faces strong statistical and computational bottlenecks, and the more practical approaches impose near-parametric assumptions on the form of the density functions. In this paper, we leverage recent developments to propose a class of non-parametric models which have very attractive computational and statistical properties. Our approach relies on the simple function space assumption that the conditional distribution of each variable conditioned on the other variables has a non-parametric exponential family form. |
Tasks | Density Estimation |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/7031-the-expxorcist-nonparametric-graphical-models-via-conditional-exponential-densities |
http://papers.nips.cc/paper/7031-the-expxorcist-nonparametric-graphical-models-via-conditional-exponential-densities.pdf | |
PWC | https://paperswithcode.com/paper/the-expxorcist-nonparametric-graphical-models |
Repo | |
Framework | |
Using Explicit Discourse Connectives in Translation for Implicit Discourse Relation Classification
Title | Using Explicit Discourse Connectives in Translation for Implicit Discourse Relation Classification |
Authors | Wei Shi, Frances Yung, Raphael Rubino, Vera Demberg |
Abstract | Implicit discourse relation recognition is an extremely challenging task due to the lack of indicative connectives. Various neural network architectures have been proposed for this task recently, but most of them suffer from the shortage of labeled data. In this paper, we address this problem by procuring additional training data from parallel corpora: When humans translate a text, they sometimes add connectives (a process known as \textit{explicitation}). Weautomatically back-translate it into an English connective and use it to infera label with high confidence. We show that a training set several times largerthan the original training set can be generated this way. With the extralabeled instances, we show that even a simple bidirectional Long Short-TermMemory Network can outperform the current state-of-the-art. |
Tasks | Implicit Discourse Relation Classification, Machine Translation, Question Answering, Relation Classification |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-1049/ |
https://www.aclweb.org/anthology/I17-1049 | |
PWC | https://paperswithcode.com/paper/using-explicit-discourse-connectives-in |
Repo | |
Framework | |
LIMSI Submission for WMT’17 Shared Task on Bandit Learning
Title | LIMSI Submission for WMT’17 Shared Task on Bandit Learning |
Authors | Guillaume Wisniewski |
Abstract | |
Tasks | Machine Translation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4779/ |
https://www.aclweb.org/anthology/W17-4779 | |
PWC | https://paperswithcode.com/paper/limsi-submission-for-wmt17-shared-task-on |
Repo | |
Framework | |
Sentence Modeling with Deep Neural Architecture using Lexicon and Character Attention Mechanism for Sentiment Classification
Title | Sentence Modeling with Deep Neural Architecture using Lexicon and Character Attention Mechanism for Sentiment Classification |
Authors | Huy Thanh Nguyen, Minh Le Nguyen |
Abstract | Tweet-level sentiment classification in Twitter social networking has many challenges: exploiting syntax, semantic, sentiment, and context in tweets. To address these problems, we propose a novel approach to sentiment analysis that uses lexicon features for building lexicon embeddings (LexW2Vs) and generates character attention vectors (CharAVs) by using a Deep Convolutional Neural Network (DeepCNN). Our approach integrates LexW2Vs and CharAVs with continuous word embeddings (ContinuousW2Vs) and dependency-based word embeddings (DependencyW2Vs) simultaneously in order to increase information for each word into a Bidirectional Contextual Gated Recurrent Neural Network (Bi-CGRNN). We evaluate our model on two Twitter sentiment classification datasets. Experimental results show that our model can improve the classification accuracy of sentence-level sentiment analysis in Twitter social networking. |
Tasks | Sentiment Analysis, Word Embeddings |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-1054/ |
https://www.aclweb.org/anthology/I17-1054 | |
PWC | https://paperswithcode.com/paper/sentence-modeling-with-deep-neural |
Repo | |
Framework | |
Boundary-based MWE segmentation with text partitioning
Title | Boundary-based MWE segmentation with text partitioning |
Authors | Jake Williams |
Abstract | This submission describes the development of a fine-grained, text-chunking algorithm for the task of comprehensive MWE segmentation. This task notably focuses on the identification of colloquial and idiomatic language. The submission also includes a thorough model evaluation in the context of two recent shared tasks, spanning 19 different languages and many text domains, including noisy, user-generated text. Evaluations exhibit the presented model as the best overall for purposes of MWE segmentation, and open-source software is released with the submission (although links are withheld for purposes of anonymity). Additionally, the authors acknowledge the existence of a pre-print document on arxiv.org, which should be avoided to maintain anonymity in review. |
Tasks | Chunking, Information Retrieval, Machine Translation, Tokenization |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4401/ |
https://www.aclweb.org/anthology/W17-4401 | |
PWC | https://paperswithcode.com/paper/boundary-based-mwe-segmentation-with-text |
Repo | |
Framework | |
Chat Disentanglement: Identifying Semantic Reply Relationships with Random Forests and Recurrent Neural Networks
Title | Chat Disentanglement: Identifying Semantic Reply Relationships with Random Forests and Recurrent Neural Networks |
Authors | Shikib Mehri, Giuseppe Carenini |
Abstract | Thread disentanglement is a precursor to any high-level analysis of multiparticipant chats. Existing research approaches the problem by calculating the likelihood of two messages belonging in the same thread. Our approach leverages a newly annotated dataset to identify reply relationships. Furthermore, we explore the usage of an RNN, along with large quantities of unlabeled data, to learn semantic relationships between messages. Our proposed pipeline, which utilizes a reply classifier and an RNN to generate a set of disentangled threads, is novel and performs well against previous work. |
Tasks | |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-1062/ |
https://www.aclweb.org/anthology/I17-1062 | |
PWC | https://paperswithcode.com/paper/chat-disentanglement-identifying-semantic |
Repo | |
Framework | |
The SUMMA Platform Prototype
Title | The SUMMA Platform Prototype |
Authors | Renars Liepins, Ulrich Germann, Guntis Barzdins, Alex Birch, ra, Steve Renals, Susanne Weber, Peggy van der Kreeft, Herv{'e} Bourlard, Jo{~a}o Prieto, Ond{\v{r}}ej Klejch, Peter Bell, Alex Lazaridis, ros, Alfonso Mendes, Sebastian Riedel, Mariana S. C. Almeida, Pedro Balage, Shay B. Cohen, Tomasz Dwojak, Philip N. Garner, Andreas Giefer, Marcin Junczys-Dowmunt, Hina Imran, David Nogueira, Ahmed Ali, Mir, Sebasti{~a}o a, Andrei Popescu-Belis, Lesly Miculicich Werlen, Nikos Papasarantopoulos, Abiola Obamuyide, Clive Jones, Fahim Dalvi, Andreas Vlachos, Yang Wang, Sibo Tong, Rico Sennrich, Nikolaos Pappas, Shashi Narayan, Marco Damonte, Nadir Durrani, Sameer Khurana, Ahmed Abdelali, Hassan Sajjad, Stephan Vogel, David Sheppey, Chris Hernon, Jeff Mitchell |
Abstract | We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring. The platform contains a rich suite of low-level and high-level natural language processing technologies: automatic speech recognition of broadcast media, machine translation, automated tagging and classification of named entities, semantic parsing to detect relationships between entities, and automatic construction / augmentation of factual knowledge bases. Implemented on the Docker platform, it can easily be deployed, customised, and scaled to large volumes of incoming media streams. |
Tasks | Machine Translation, Semantic Parsing, Speech Recognition |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-3029/ |
https://www.aclweb.org/anthology/E17-3029 | |
PWC | https://paperswithcode.com/paper/the-summa-platform-prototype |
Repo | |
Framework | |
A System for Identifying and Exploring Text Repetition in Large Historical Document Corpora
Title | A System for Identifying and Exploring Text Repetition in Large Historical Document Corpora |
Authors | Aleksi Vesanto, Filip Ginter, Hannu Salmi, Asko Nivala, Tapio Salakoski |
Abstract | |
Tasks | Optical Character Recognition |
Published | 2017-05-01 |
URL | https://www.aclweb.org/anthology/W17-0249/ |
https://www.aclweb.org/anthology/W17-0249 | |
PWC | https://paperswithcode.com/paper/a-system-for-identifying-and-exploring-text |
Repo | |
Framework | |
Cascading Multiway Attentions for Document-level Sentiment Classification
Title | Cascading Multiway Attentions for Document-level Sentiment Classification |
Authors | Dehong Ma, Sujian Li, Xiaodong Zhang, Houfeng Wang, Xu Sun |
Abstract | Document-level sentiment classification aims to assign the user reviews a sentiment polarity. Previous methods either just utilized the document content without consideration of user and product information, or did not comprehensively consider what roles the three kinds of information play in text modeling. In this paper, to reasonably use all the information, we present the idea that user, product and their combination can all influence the generation of attentions to words and sentences, when judging the sentiment of a document. With this idea, we propose a cascading multiway attention (CMA) model, where multiple ways of using user and product information are cascaded to influence the generation of attentions on the word and sentence layers. Then, sentences and documents are well modeled by multiple representation vectors, which provide rich information for sentiment classification. Experiments on IMDB and Yelp datasets demonstrate the effectiveness of our model. |
Tasks | Product Recommendation, Sentiment Analysis |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-1064/ |
https://www.aclweb.org/anthology/I17-1064 | |
PWC | https://paperswithcode.com/paper/cascading-multiway-attentions-for-document |
Repo | |
Framework | |
Measuring Semantic Relations between Human Activities
Title | Measuring Semantic Relations between Human Activities |
Authors | Steven Wilson, Rada Mihalcea |
Abstract | The things people do in their daily lives can provide valuable insights into their personality, values, and interests. Unstructured text data on social media platforms are rich in behavioral content, and automated systems can be deployed to learn about human activity on a broad scale if these systems are able to reason about the content of interest. In order to aid in the evaluation of such systems, we introduce a new phrase-level semantic textual similarity dataset comprised of human activity phrases, providing a testbed for automated systems that analyze relationships between phrasal descriptions of people{'}s actions. Our set of 1,000 pairs of activities is annotated by human judges across four relational dimensions including similarity, relatedness, motivational alignment, and perceived actor congruence. We evaluate a set of strong baselines for the task of generating scores that correlate highly with human ratings, and we introduce several new approaches to the phrase-level similarity task in the domain of human activities. |
Tasks | Semantic Textual Similarity |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-1067/ |
https://www.aclweb.org/anthology/I17-1067 | |
PWC | https://paperswithcode.com/paper/measuring-semantic-relations-between-human |
Repo | |
Framework | |
Open-Domain Neural Dialogue Systems
Title | Open-Domain Neural Dialogue Systems |
Authors | Yun-Nung Chen, Jianfeng Gao |
Abstract | In the past decade, spoken dialogue systems have been the most prominent component in today{'}s personal assistants. A lot of devices have incorporated dialogue system modules, which allow users to speak naturally in order to finish tasks more efficiently. The traditional conversational systems have rather complex and/or modular pipelines. The advance of deep learning technologies has recently risen the applications of neural models to dialogue modeling. Nevertheless, applying deep learning technologies for building robust and scalable dialogue systems is still a challenging task and an open research area as it requires deeper understanding of the classic pipelines as well as detailed knowledge on the benchmark of the models of the prior work and the recent state-of-the-art work. Therefore, this tutorial is designed to focus on an overview of the dialogue system development while describing most recent research for building task-oriented and chit-chat dialogue systems, and summarizing the challenges. We target the audience of students and practitioners who have some deep learning background, who want to get more familiar with conversational dialogue systems. |
Tasks | Dialogue Management, Dialogue State Tracking, Intent Classification, Spoken Dialogue Systems, Task-Oriented Dialogue Systems, Text Generation |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-5003/ |
https://www.aclweb.org/anthology/I17-5003 | |
PWC | https://paperswithcode.com/paper/open-domain-neural-dialogue-systems |
Repo | |
Framework | |