May 5, 2019

2227 words 11 mins read

Paper Group NANR 146

Paper Group NANR 146

PentoRef: A Corpus of Spoken References in Task-oriented Dialogues. Targeted Sentiment to Understand Student Comments. NASTEA: Investigating Narrative Schemas through Annotated Entities. Disentangling factors of variation in deep representation using adversarial training. Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs. Investig …

PentoRef: A Corpus of Spoken References in Task-oriented Dialogues

Title PentoRef: A Corpus of Spoken References in Task-oriented Dialogues
Authors Sina Zarrie{\ss}, Julian Hough, Casey Kennington, Ramesh Manuvinakurike, David DeVault, Raquel Fern{'a}ndez, David Schlangen
Abstract PentoRef is a corpus of task-oriented dialogues collected in systematically manipulated settings. The corpus is multilingual, with English and German sections, and overall comprises more than 20000 utterances. The dialogues are fully transcribed and annotated with referring expressions mapped to objects in corresponding visual scenes, which makes the corpus a rich resource for research on spoken referring expressions in generation and resolution. The corpus includes several sub-corpora that correspond to different dialogue situations where parameters related to interactivity, visual access, and verbal channel have been manipulated in systematic ways. The corpus thus lends itself to very targeted studies of reference in spontaneous dialogue.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1019/
PDF https://www.aclweb.org/anthology/L16-1019
PWC https://paperswithcode.com/paper/pentoref-a-corpus-of-spoken-references-in
Repo
Framework

Targeted Sentiment to Understand Student Comments

Title Targeted Sentiment to Understand Student Comments
Authors Charles Welch, Rada Mihalcea
Abstract We address the task of targeted sentiment as a means of understanding the sentiment that students hold toward courses and instructors, as expressed by students in their comments. We introduce a new dataset consisting of student comments annotated for targeted sentiment and describe a system that can both identify the courses and instructors mentioned in student comments, as well as label the students{'} sentiment toward those entities. Through several comparative evaluations, we show that our system outperforms previous work on a similar task.
Tasks Decision Making, Entity Extraction, Sentiment Analysis
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1233/
PDF https://www.aclweb.org/anthology/C16-1233
PWC https://paperswithcode.com/paper/targeted-sentiment-to-understand-student
Repo
Framework

NASTEA: Investigating Narrative Schemas through Annotated Entities

Title NASTEA: Investigating Narrative Schemas through Annotated Entities
Authors Dan Simonson, Anthony Davis
Abstract
Tasks Language Modelling
Published 2016-11-01
URL https://www.aclweb.org/anthology/W16-5707/
PDF https://www.aclweb.org/anthology/W16-5707
PWC https://paperswithcode.com/paper/nastea-investigating-narrative-schemas
Repo
Framework

Disentangling factors of variation in deep representation using adversarial training

Title Disentangling factors of variation in deep representation using adversarial training
Authors Michael F. Mathieu, Junbo Jake Zhao, Junbo Zhao, Aditya Ramesh, Pablo Sprechmann, Yann Lecun
Abstract We propose a deep generative model for learning to distill the hidden factors of variation within a set of labeled observations into two complementary codes. One code describes the factors of variation relevant to solving a specified task. The other code describes the remaining factors of variation that are irrelevant to solving this task. The only available source of supervision during the training process comes from our ability to distinguish among different observations belonging to the same category. Concrete examples include multiple images of the same object from different viewpoints, or multiple speech samples from the same speaker. In both of these instances, the factors of variation irrelevant to classification are implicitly expressed by intra-class variabilities, such as the relative position of an object in an image, or the linguistic content of an utterance. Most existing approaches for solving this problem rely heavily on having access to pairs of observations only sharing a single factor of variation, e.g. different objects observed in the exact same conditions. This assumption is often not encountered in realistic settings where data acquisition is not controlled and labels for the uninformative components are not available. In this work, we propose to overcome this limitation by augmenting deep convolutional autoencoders with a form of adversarial training. Both factors of variation are implicitly captured in the organization of the learned embedding space, and can be used for solving single-image analogies. Experimental results on synthetic and real datasets show that the proposed method is capable of disentangling the influences of style and content factors using a flexible representation, as well as generalizing to unseen styles or content classes.
Tasks
Published 2016-12-01
URL http://papers.nips.cc/paper/6051-disentangling-factors-of-variation-in-deep-representation-using-adversarial-training
PDF http://papers.nips.cc/paper/6051-disentangling-factors-of-variation-in-deep-representation-using-adversarial-training.pdf
PWC https://paperswithcode.com/paper/disentangling-factors-of-variation-in-deep-1
Repo
Framework

Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs

Title Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs
Authors Ximing Li, Jinjin Chi, Changchun Li, Jihong Ouyang, Bo Fu
Abstract Gaussian LDA integrates topic modeling with word embeddings by replacing discrete topic distribution over word types with multivariate Gaussian distribution on the embedding space. This can take semantic information of words into account. However, the Euclidean similarity used in Gaussian topics is not an optimal semantic measure for word embeddings. Acknowledgedly, the cosine similarity better describes the semantic relatedness between word embeddings. To employ the cosine measure and capture complex topic structure, we use von Mises-Fisher (vMF) mixture models to represent topics, and then develop a novel mix-vMF topic model (MvTM). Using public pre-trained word embeddings, we evaluate MvTM on three real-world data sets. Experimental results show that our model can discover more coherent topics than the state-of-the-art baseline models, and achieve competitive classification performance.
Tasks Topic Models, Word Embeddings
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1015/
PDF https://www.aclweb.org/anthology/C16-1015
PWC https://paperswithcode.com/paper/integrating-topic-modeling-with-word
Repo
Framework

Investigating Fluidity for Human-Robot Interaction with Real-time, Real-world Grounding Strategies

Title Investigating Fluidity for Human-Robot Interaction with Real-time, Real-world Grounding Strategies
Authors Julian Hough, David Schlangen
Abstract
Tasks Object Recognition, Text Generation
Published 2016-09-01
URL https://www.aclweb.org/anthology/W16-3637/
PDF https://www.aclweb.org/anthology/W16-3637
PWC https://paperswithcode.com/paper/investigating-fluidity-for-human-robot
Repo
Framework

Annotating Sentiment and Irony in the Online Italian Political Debate on #labuonascuola

Title Annotating Sentiment and Irony in the Online Italian Political Debate on #labuonascuola
Authors Marco Stranisci, Cristina Bosco, Delia Iraz{'u} Hern{'a}ndez Far{'\i}as, Viviana Patti
Abstract In this paper we present the TWitterBuonaScuola corpus (TW-BS), a novel Italian linguistic resource for Sentiment Analysis, developed with the main aim of analyzing the online debate on the controversial Italian political reform {``}Buona Scuola{''} (Good school), aimed at reorganizing the national educational and training systems. We describe the methodologies applied in the collection and annotation of data. The collection has been driven by the detection of the hashtags mainly used by the participants to the debate, while the annotation has been focused on sentiment polarity and irony, but also extended to mark the aspects of the reform that were mainly discussed in the debate. An in-depth study of the disagreement among annotators is included. We describe the collection and annotation stages, and the in-depth analysis of disagreement made with Crowdflower, a crowdsourcing annotation platform. |
Tasks Sentiment Analysis
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1462/
PDF https://www.aclweb.org/anthology/L16-1462
PWC https://paperswithcode.com/paper/annotating-sentiment-and-irony-in-the-online
Repo
Framework

EstNLTK - NLP Toolkit for Estonian

Title EstNLTK - NLP Toolkit for Estonian
Authors Siim Orasmaa, Timo Petmanson, Alex Tkachenko, er, Sven Laur, Heiki-Jaan Kaalep
Abstract Although there are many tools for natural language processing tasks in Estonian, these tools are very loosely interoperable, and it is not easy to build practical applications on top of them. In this paper, we introduce a new Python library for natural language processing in Estonian, which provides unified programming interface for various NLP components. The EstNLTK toolkit provides utilities for basic NLP tasks including tokenization, morphological analysis, lemmatisation and named entity recognition as well as offers more advanced features such as a clause segmentation, temporal expression extraction and normalization, verb chain detection, Estonian Wordnet integration and rule-based information extraction. Accompanied by a detailed API documentation and comprehensive tutorials, EstNLTK is suitable for a wide range of audience. We believe EstNLTK is mature enough to be used for developing NLP-backed systems both in industry and research. EstNLTK is freely available under the GNU GPL version 2+ license, which is standard for academic software.
Tasks Morphological Analysis, Named Entity Recognition, Tokenization
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1390/
PDF https://www.aclweb.org/anthology/L16-1390
PWC https://paperswithcode.com/paper/estnltk-nlp-toolkit-for-estonian
Repo
Framework

Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

Title Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
Authors
Abstract
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4600/
PDF https://www.aclweb.org/anthology/W16-4600
PWC https://paperswithcode.com/paper/proceedings-of-the-3rd-workshop-on-asian
Repo
Framework

Happy Accident: A Sentiment Composition Lexicon for Opposing Polarity Phrases

Title Happy Accident: A Sentiment Composition Lexicon for Opposing Polarity Phrases
Authors Svetlana Kiritchenko, Saif Mohammad
Abstract Sentiment composition is the determining of sentiment of a multi-word linguistic unit, such as a phrase or a sentence, based on its constituents. We focus on sentiment composition in phrases formed by at least one positive and at least one negative word ― phrases like {}happy accident{'} and {}best winter break{'}. We refer to such phrases as opposing polarity phrases. We manually annotate a collection of opposing polarity phrases and their constituent single words with real-valued sentiment intensity scores using a method known as Best―Worst Scaling. We show that the obtained annotations are consistent. We explore the entries in the lexicon for linguistic regularities that govern sentiment composition in opposing polarity phrases. Finally, we list the current and possible future applications of the lexicon.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1184/
PDF https://www.aclweb.org/anthology/L16-1184
PWC https://paperswithcode.com/paper/happy-accident-a-sentiment-composition
Repo
Framework

A Neural Model for Part-of-Speech Tagging in Historical Texts

Title A Neural Model for Part-of-Speech Tagging in Historical Texts
Authors Christian Hardmeier
Abstract Historical texts are challenging for natural language processing because they differ linguistically from modern texts and because of their lack of orthographical and grammatical standardisation. We use a character-level neural network to build a part-of-speech (POS) tagger that can process historical data directly without requiring a separate spelling normalisation stage. Its performance in a Swedish verb identification and a German POS tagging task is similar to that of a two-stage model. We analyse the performance of this tagger and a more traditional baseline system, discuss some of the remaining problems for tagging historical data and suggest how the flexibility of our neural tagger could be exploited to address diachronic divergences in morphology and syntax in early modern Swedish with the help of data from closely related languages.
Tasks Part-Of-Speech Tagging
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1088/
PDF https://www.aclweb.org/anthology/C16-1088
PWC https://paperswithcode.com/paper/a-neural-model-for-part-of-speech-tagging-in
Repo
Framework

Evaluating a dictionary of human phenotype terms focusing on rare diseases

Title Evaluating a dictionary of human phenotype terms focusing on rare diseases
Authors Simon Kocbek, Toyofumi Fujiwara, Jin-Dong Kim, Toshihisa Takagi, Tudor Groza
Abstract Annotating medical text such as clinical notes with human phenotype descriptors is an important task that can, for example, assist in building patient profiles. To automatically annotate text one usually needs a dictionary of predefined terms. However, do to the variety of human expressiveness, current state-of-the art phenotype concept recognizers and automatic annotators struggle with specific domain issues and challenges. In this paper we present results of an-notating gold standard corpus with a dictionary containing lexical variants for the Human Phenotype Ontology terms. The main purpose of the dictionary is to improve the recall of phenotype concept recognition systems. We compare the method with four other approaches and present results.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4712/
PDF https://www.aclweb.org/anthology/W16-4712
PWC https://paperswithcode.com/paper/evaluating-a-dictionary-of-human-phenotype
Repo
Framework

Extracting Discriminative Keyphrases with Learned Semantic Hierarchies

Title Extracting Discriminative Keyphrases with Learned Semantic Hierarchies
Authors Yunli Wang, Yong Jin, Xiaodan Zhu, Cyril Goutte
Abstract The goal of keyphrase extraction is to automatically identify the most salient phrases from documents. The technique has a wide range of applications such as rendering a quick glimpse of a document, or extracting key content for further use. While previous work often assumes keyphrases are a static property of a given documents, in many applications, the appropriate set of keyphrases that should be extracted depends on the set of documents that are being considered together. In particular, good keyphrases should not only accurately describe the content of a document, but also reveal what discriminates it from the other documents. In this paper, we study this problem of extracting discriminative keyphrases. In particularly, we propose to use the hierarchical semantic structure between candidate keyphrases to promote keyphrases that have the right level of specificity to clearly distinguish the target document from others. We show that such knowledge can be used to construct better discriminative keyphrase extraction systems that do not assume a static, fixed set of keyphrases for a document. We show how this helps identify key expertise of authors from their papers, as well as competencies covered by online courses within different domains.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1089/
PDF https://www.aclweb.org/anthology/C16-1089
PWC https://paperswithcode.com/paper/extracting-discriminative-keyphrases-with
Repo
Framework

Learning Sentence Embeddings with Auxiliary Tasks for Cross-Domain Sentiment Classification

Title Learning Sentence Embeddings with Auxiliary Tasks for Cross-Domain Sentiment Classification
Authors Jianfei Yu, Jing Jiang
Abstract
Tasks Domain Adaptation, Sentence Embedding, Sentence Embeddings, Sentiment Analysis, Word Embeddings
Published 2016-11-01
URL https://www.aclweb.org/anthology/D16-1023/
PDF https://www.aclweb.org/anthology/D16-1023
PWC https://paperswithcode.com/paper/learning-sentence-embeddings-with-auxiliary
Repo
Framework

Multilingual Supervision of Semantic Annotation

Title Multilingual Supervision of Semantic Annotation
Authors Peter Exner, Marcus Klang, Pierre Nugues
Abstract In this paper, we investigate the annotation projection of semantic units in a practical setting. Previous approaches have focused on using parallel corpora for semantic transfer. We evaluate an alternative approach using loosely parallel corpora that does not require the corpora to be exact translations of each other. We developed a method that transfers semantic annotations from one language to another using sentences aligned by entities, and we extended it to include alignments by entity-like linguistic units. We conducted our experiments on a large scale using the English, Swedish, and French language editions of Wikipedia. Our results show that the annotation projection using entities in combination with loosely parallel corpora provides a viable approach to extending previous attempts. In addition, it allows the generation of proposition banks upon which semantic parsers can be trained.
Tasks Question Answering, Relation Extraction, Semantic Parsing, Semantic Role Labeling, Text Summarization
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1096/
PDF https://www.aclweb.org/anthology/C16-1096
PWC https://paperswithcode.com/paper/multilingual-supervision-of-semantic
Repo
Framework
comments powered by Disqus