July 26, 2019

2075 words 10 mins read

Paper Group NANR 173

Paper Group NANR 173

Domain-Adaptable Hybrid Generation of RDF Entity Descriptions. Leveraging Newswire Treebanks for Parsing Conversational Data with Argument Scrambling. Comparing Rule-based and SMT-based Spelling Normalisation for English Historical Texts. Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). The Expxorci …

Domain-Adaptable Hybrid Generation of RDF Entity Descriptions

Title Domain-Adaptable Hybrid Generation of RDF Entity Descriptions
Authors Or Biran, Kathleen McKeown
Abstract RDF ontologies provide structured data on entities in many domains and continue to grow in size and diversity. While they can be useful as a starting point for generating descriptions of entities, they often miss important information about an entity that cannot be captured as simple relations. In addition, generic approaches to generation from RDF cannot capture the unique style and content of specific domains. We describe a framework for hybrid generation of entity descriptions, which combines generation from RDF data with text extracted from a corpus, and extracts unique aspects of the domain from the corpus to create domain-specific generation systems. We show that each component of our approach significantly increases the satisfaction of readers with the text across multiple applications and domains.
Tasks Domain Adaptation
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-1031/
PDF https://www.aclweb.org/anthology/I17-1031
PWC https://paperswithcode.com/paper/domain-adaptable-hybrid-generation-of-rdf
Repo
Framework

Leveraging Newswire Treebanks for Parsing Conversational Data with Argument Scrambling

Title Leveraging Newswire Treebanks for Parsing Conversational Data with Argument Scrambling
Authors Riyaz A. Bhat, Irshad Bhat, Dipti Sharma
Abstract We investigate the problem of parsing conversational data of morphologically-rich languages such as Hindi where argument scrambling occurs frequently. We evaluate a state-of-the-art non-linear transition-based parsing system on a new dataset containing 506 dependency trees for sentences from Bollywood (Hindi) movie scripts and Twitter posts of Hindi monolingual speakers. We show that a dependency parser trained on a newswire treebank is strongly biased towards the canonical structures and degrades when applied to conversational data. Inspired by Transformational Generative Grammar (Chomsky, 1965), we mitigate the sampling bias by generating all theoretically possible alternative word orders of a clause from the existing (kernel) structures in the treebank. Training our parser on canonical and transformed structures improves performance on conversational data by around 9{%} LAS over the baseline newswire parser.
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-6309/
PDF https://www.aclweb.org/anthology/W17-6309
PWC https://paperswithcode.com/paper/leveraging-newswire-treebanks-for-parsing-1
Repo
Framework

Comparing Rule-based and SMT-based Spelling Normalisation for English Historical Texts

Title Comparing Rule-based and SMT-based Spelling Normalisation for English Historical Texts
Authors Gerold Schneider, Eva Pettersson, Michael Percillier
Abstract
Tasks Machine Translation
Published 2017-05-01
URL https://www.aclweb.org/anthology/W17-0508/
PDF https://www.aclweb.org/anthology/W17-0508
PWC https://paperswithcode.com/paper/comparing-rule-based-and-smt-based-spelling
Repo
Framework

Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)

Title Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)
Authors
Abstract
Tasks
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1200/
PDF https://www.aclweb.org/anthology/W17-1200
PWC https://paperswithcode.com/paper/proceedings-of-the-fourth-workshop-on-nlp-for-1
Repo
Framework

The Expxorcist: Nonparametric Graphical Models Via Conditional Exponential Densities

Title The Expxorcist: Nonparametric Graphical Models Via Conditional Exponential Densities
Authors Arun Suggala, Mladen Kolar, Pradeep K. Ravikumar
Abstract Non-parametric multivariate density estimation faces strong statistical and computational bottlenecks, and the more practical approaches impose near-parametric assumptions on the form of the density functions. In this paper, we leverage recent developments to propose a class of non-parametric models which have very attractive computational and statistical properties. Our approach relies on the simple function space assumption that the conditional distribution of each variable conditioned on the other variables has a non-parametric exponential family form.
Tasks Density Estimation
Published 2017-12-01
URL http://papers.nips.cc/paper/7031-the-expxorcist-nonparametric-graphical-models-via-conditional-exponential-densities
PDF http://papers.nips.cc/paper/7031-the-expxorcist-nonparametric-graphical-models-via-conditional-exponential-densities.pdf
PWC https://paperswithcode.com/paper/the-expxorcist-nonparametric-graphical-models
Repo
Framework

Using Explicit Discourse Connectives in Translation for Implicit Discourse Relation Classification

Title Using Explicit Discourse Connectives in Translation for Implicit Discourse Relation Classification
Authors Wei Shi, Frances Yung, Raphael Rubino, Vera Demberg
Abstract Implicit discourse relation recognition is an extremely challenging task due to the lack of indicative connectives. Various neural network architectures have been proposed for this task recently, but most of them suffer from the shortage of labeled data. In this paper, we address this problem by procuring additional training data from parallel corpora: When humans translate a text, they sometimes add connectives (a process known as \textit{explicitation}). Weautomatically back-translate it into an English connective and use it to infera label with high confidence. We show that a training set several times largerthan the original training set can be generated this way. With the extralabeled instances, we show that even a simple bidirectional Long Short-TermMemory Network can outperform the current state-of-the-art.
Tasks Implicit Discourse Relation Classification, Machine Translation, Question Answering, Relation Classification
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-1049/
PDF https://www.aclweb.org/anthology/I17-1049
PWC https://paperswithcode.com/paper/using-explicit-discourse-connectives-in
Repo
Framework

LIMSI Submission for WMT’17 Shared Task on Bandit Learning

Title LIMSI Submission for WMT’17 Shared Task on Bandit Learning
Authors Guillaume Wisniewski
Abstract
Tasks Machine Translation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4779/
PDF https://www.aclweb.org/anthology/W17-4779
PWC https://paperswithcode.com/paper/limsi-submission-for-wmt17-shared-task-on
Repo
Framework

Sentence Modeling with Deep Neural Architecture using Lexicon and Character Attention Mechanism for Sentiment Classification

Title Sentence Modeling with Deep Neural Architecture using Lexicon and Character Attention Mechanism for Sentiment Classification
Authors Huy Thanh Nguyen, Minh Le Nguyen
Abstract Tweet-level sentiment classification in Twitter social networking has many challenges: exploiting syntax, semantic, sentiment, and context in tweets. To address these problems, we propose a novel approach to sentiment analysis that uses lexicon features for building lexicon embeddings (LexW2Vs) and generates character attention vectors (CharAVs) by using a Deep Convolutional Neural Network (DeepCNN). Our approach integrates LexW2Vs and CharAVs with continuous word embeddings (ContinuousW2Vs) and dependency-based word embeddings (DependencyW2Vs) simultaneously in order to increase information for each word into a Bidirectional Contextual Gated Recurrent Neural Network (Bi-CGRNN). We evaluate our model on two Twitter sentiment classification datasets. Experimental results show that our model can improve the classification accuracy of sentence-level sentiment analysis in Twitter social networking.
Tasks Sentiment Analysis, Word Embeddings
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-1054/
PDF https://www.aclweb.org/anthology/I17-1054
PWC https://paperswithcode.com/paper/sentence-modeling-with-deep-neural
Repo
Framework

Boundary-based MWE segmentation with text partitioning

Title Boundary-based MWE segmentation with text partitioning
Authors Jake Williams
Abstract This submission describes the development of a fine-grained, text-chunking algorithm for the task of comprehensive MWE segmentation. This task notably focuses on the identification of colloquial and idiomatic language. The submission also includes a thorough model evaluation in the context of two recent shared tasks, spanning 19 different languages and many text domains, including noisy, user-generated text. Evaluations exhibit the presented model as the best overall for purposes of MWE segmentation, and open-source software is released with the submission (although links are withheld for purposes of anonymity). Additionally, the authors acknowledge the existence of a pre-print document on arxiv.org, which should be avoided to maintain anonymity in review.
Tasks Chunking, Information Retrieval, Machine Translation, Tokenization
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4401/
PDF https://www.aclweb.org/anthology/W17-4401
PWC https://paperswithcode.com/paper/boundary-based-mwe-segmentation-with-text
Repo
Framework

Chat Disentanglement: Identifying Semantic Reply Relationships with Random Forests and Recurrent Neural Networks

Title Chat Disentanglement: Identifying Semantic Reply Relationships with Random Forests and Recurrent Neural Networks
Authors Shikib Mehri, Giuseppe Carenini
Abstract Thread disentanglement is a precursor to any high-level analysis of multiparticipant chats. Existing research approaches the problem by calculating the likelihood of two messages belonging in the same thread. Our approach leverages a newly annotated dataset to identify reply relationships. Furthermore, we explore the usage of an RNN, along with large quantities of unlabeled data, to learn semantic relationships between messages. Our proposed pipeline, which utilizes a reply classifier and an RNN to generate a set of disentangled threads, is novel and performs well against previous work.
Tasks
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-1062/
PDF https://www.aclweb.org/anthology/I17-1062
PWC https://paperswithcode.com/paper/chat-disentanglement-identifying-semantic
Repo
Framework

The SUMMA Platform Prototype

Title The SUMMA Platform Prototype
Authors Renars Liepins, Ulrich Germann, Guntis Barzdins, Alex Birch, ra, Steve Renals, Susanne Weber, Peggy van der Kreeft, Herv{'e} Bourlard, Jo{~a}o Prieto, Ond{\v{r}}ej Klejch, Peter Bell, Alex Lazaridis, ros, Alfonso Mendes, Sebastian Riedel, Mariana S. C. Almeida, Pedro Balage, Shay B. Cohen, Tomasz Dwojak, Philip N. Garner, Andreas Giefer, Marcin Junczys-Dowmunt, Hina Imran, David Nogueira, Ahmed Ali, Mir, Sebasti{~a}o a, Andrei Popescu-Belis, Lesly Miculicich Werlen, Nikos Papasarantopoulos, Abiola Obamuyide, Clive Jones, Fahim Dalvi, Andreas Vlachos, Yang Wang, Sibo Tong, Rico Sennrich, Nikolaos Pappas, Shashi Narayan, Marco Damonte, Nadir Durrani, Sameer Khurana, Ahmed Abdelali, Hassan Sajjad, Stephan Vogel, David Sheppey, Chris Hernon, Jeff Mitchell
Abstract We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring. The platform contains a rich suite of low-level and high-level natural language processing technologies: automatic speech recognition of broadcast media, machine translation, automated tagging and classification of named entities, semantic parsing to detect relationships between entities, and automatic construction / augmentation of factual knowledge bases. Implemented on the Docker platform, it can easily be deployed, customised, and scaled to large volumes of incoming media streams.
Tasks Machine Translation, Semantic Parsing, Speech Recognition
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-3029/
PDF https://www.aclweb.org/anthology/E17-3029
PWC https://paperswithcode.com/paper/the-summa-platform-prototype
Repo
Framework

A System for Identifying and Exploring Text Repetition in Large Historical Document Corpora

Title A System for Identifying and Exploring Text Repetition in Large Historical Document Corpora
Authors Aleksi Vesanto, Filip Ginter, Hannu Salmi, Asko Nivala, Tapio Salakoski
Abstract
Tasks Optical Character Recognition
Published 2017-05-01
URL https://www.aclweb.org/anthology/W17-0249/
PDF https://www.aclweb.org/anthology/W17-0249
PWC https://paperswithcode.com/paper/a-system-for-identifying-and-exploring-text
Repo
Framework

Cascading Multiway Attentions for Document-level Sentiment Classification

Title Cascading Multiway Attentions for Document-level Sentiment Classification
Authors Dehong Ma, Sujian Li, Xiaodong Zhang, Houfeng Wang, Xu Sun
Abstract Document-level sentiment classification aims to assign the user reviews a sentiment polarity. Previous methods either just utilized the document content without consideration of user and product information, or did not comprehensively consider what roles the three kinds of information play in text modeling. In this paper, to reasonably use all the information, we present the idea that user, product and their combination can all influence the generation of attentions to words and sentences, when judging the sentiment of a document. With this idea, we propose a cascading multiway attention (CMA) model, where multiple ways of using user and product information are cascaded to influence the generation of attentions on the word and sentence layers. Then, sentences and documents are well modeled by multiple representation vectors, which provide rich information for sentiment classification. Experiments on IMDB and Yelp datasets demonstrate the effectiveness of our model.
Tasks Product Recommendation, Sentiment Analysis
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-1064/
PDF https://www.aclweb.org/anthology/I17-1064
PWC https://paperswithcode.com/paper/cascading-multiway-attentions-for-document
Repo
Framework

Measuring Semantic Relations between Human Activities

Title Measuring Semantic Relations between Human Activities
Authors Steven Wilson, Rada Mihalcea
Abstract The things people do in their daily lives can provide valuable insights into their personality, values, and interests. Unstructured text data on social media platforms are rich in behavioral content, and automated systems can be deployed to learn about human activity on a broad scale if these systems are able to reason about the content of interest. In order to aid in the evaluation of such systems, we introduce a new phrase-level semantic textual similarity dataset comprised of human activity phrases, providing a testbed for automated systems that analyze relationships between phrasal descriptions of people{'}s actions. Our set of 1,000 pairs of activities is annotated by human judges across four relational dimensions including similarity, relatedness, motivational alignment, and perceived actor congruence. We evaluate a set of strong baselines for the task of generating scores that correlate highly with human ratings, and we introduce several new approaches to the phrase-level similarity task in the domain of human activities.
Tasks Semantic Textual Similarity
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-1067/
PDF https://www.aclweb.org/anthology/I17-1067
PWC https://paperswithcode.com/paper/measuring-semantic-relations-between-human
Repo
Framework

Open-Domain Neural Dialogue Systems

Title Open-Domain Neural Dialogue Systems
Authors Yun-Nung Chen, Jianfeng Gao
Abstract In the past decade, spoken dialogue systems have been the most prominent component in today{'}s personal assistants. A lot of devices have incorporated dialogue system modules, which allow users to speak naturally in order to finish tasks more efficiently. The traditional conversational systems have rather complex and/or modular pipelines. The advance of deep learning technologies has recently risen the applications of neural models to dialogue modeling. Nevertheless, applying deep learning technologies for building robust and scalable dialogue systems is still a challenging task and an open research area as it requires deeper understanding of the classic pipelines as well as detailed knowledge on the benchmark of the models of the prior work and the recent state-of-the-art work. Therefore, this tutorial is designed to focus on an overview of the dialogue system development while describing most recent research for building task-oriented and chit-chat dialogue systems, and summarizing the challenges. We target the audience of students and practitioners who have some deep learning background, who want to get more familiar with conversational dialogue systems.
Tasks Dialogue Management, Dialogue State Tracking, Intent Classification, Spoken Dialogue Systems, Task-Oriented Dialogue Systems, Text Generation
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-5003/
PDF https://www.aclweb.org/anthology/I17-5003
PWC https://paperswithcode.com/paper/open-domain-neural-dialogue-systems
Repo
Framework
comments powered by Disqus