May 5, 2019

2227 words 11 mins read

Paper Group NANR 146

PentoRef: A Corpus of Spoken References in Task-oriented Dialogues. Targeted Sentiment to Understand Student Comments. NASTEA: Investigating Narrative Schemas through Annotated Entities. Disentangling factors of variation in deep representation using adversarial training. Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs. Investig …

PentoRef: A Corpus of Spoken References in Task-oriented Dialogues


Title	PentoRef: A Corpus of Spoken References in Task-oriented Dialogues
Authors	Sina Zarrie{\ss}, Julian Hough, Casey Kennington, Ramesh Manuvinakurike, David DeVault, Raquel Fern{'a}ndez, David Schlangen
Abstract	PentoRef is a corpus of task-oriented dialogues collected in systematically manipulated settings. The corpus is multilingual, with English and German sections, and overall comprises more than 20000 utterances. The dialogues are fully transcribed and annotated with referring expressions mapped to objects in corresponding visual scenes, which makes the corpus a rich resource for research on spoken referring expressions in generation and resolution. The corpus includes several sub-corpora that correspond to different dialogue situations where parameters related to interactivity, visual access, and verbal channel have been manipulated in systematic ways. The corpus thus lends itself to very targeted studies of reference in spontaneous dialogue.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1019/
PDF	https://www.aclweb.org/anthology/L16-1019
PWC	https://paperswithcode.com/paper/pentoref-a-corpus-of-spoken-references-in
Repo
Framework

Targeted Sentiment to Understand Student Comments


Title	Targeted Sentiment to Understand Student Comments
Authors	Charles Welch, Rada Mihalcea
Abstract	We address the task of targeted sentiment as a means of understanding the sentiment that students hold toward courses and instructors, as expressed by students in their comments. We introduce a new dataset consisting of student comments annotated for targeted sentiment and describe a system that can both identify the courses and instructors mentioned in student comments, as well as label the students{'} sentiment toward those entities. Through several comparative evaluations, we show that our system outperforms previous work on a similar task.
Tasks	Decision Making, Entity Extraction, Sentiment Analysis
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1233/
PDF	https://www.aclweb.org/anthology/C16-1233
PWC	https://paperswithcode.com/paper/targeted-sentiment-to-understand-student
Repo
Framework

NASTEA: Investigating Narrative Schemas through Annotated Entities


Title	NASTEA: Investigating Narrative Schemas through Annotated Entities
Authors	Dan Simonson, Anthony Davis
Abstract
Tasks	Language Modelling
Published	2016-11-01
URL	https://www.aclweb.org/anthology/W16-5707/
PDF	https://www.aclweb.org/anthology/W16-5707
PWC	https://paperswithcode.com/paper/nastea-investigating-narrative-schemas
Repo
Framework

Disentangling factors of variation in deep representation using adversarial training


Title	Disentangling factors of variation in deep representation using adversarial training
Authors	Michael F. Mathieu, Junbo Jake Zhao, Junbo Zhao, Aditya Ramesh, Pablo Sprechmann, Yann Lecun
Abstract	We propose a deep generative model for learning to distill the hidden factors of variation within a set of labeled observations into two complementary codes. One code describes the factors of variation relevant to solving a specified task. The other code describes the remaining factors of variation that are irrelevant to solving this task. The only available source of supervision during the training process comes from our ability to distinguish among different observations belonging to the same category. Concrete examples include multiple images of the same object from different viewpoints, or multiple speech samples from the same speaker. In both of these instances, the factors of variation irrelevant to classification are implicitly expressed by intra-class variabilities, such as the relative position of an object in an image, or the linguistic content of an utterance. Most existing approaches for solving this problem rely heavily on having access to pairs of observations only sharing a single factor of variation, e.g. different objects observed in the exact same conditions. This assumption is often not encountered in realistic settings where data acquisition is not controlled and labels for the uninformative components are not available. In this work, we propose to overcome this limitation by augmenting deep convolutional autoencoders with a form of adversarial training. Both factors of variation are implicitly captured in the organization of the learned embedding space, and can be used for solving single-image analogies. Experimental results on synthetic and real datasets show that the proposed method is capable of disentangling the influences of style and content factors using a flexible representation, as well as generalizing to unseen styles or content classes.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6051-disentangling-factors-of-variation-in-deep-representation-using-adversarial-training
PDF	http://papers.nips.cc/paper/6051-disentangling-factors-of-variation-in-deep-representation-using-adversarial-training.pdf
PWC	https://paperswithcode.com/paper/disentangling-factors-of-variation-in-deep-1
Repo
Framework

Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs


Title	Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs
Authors	Ximing Li, Jinjin Chi, Changchun Li, Jihong Ouyang, Bo Fu
Abstract	Gaussian LDA integrates topic modeling with word embeddings by replacing discrete topic distribution over word types with multivariate Gaussian distribution on the embedding space. This can take semantic information of words into account. However, the Euclidean similarity used in Gaussian topics is not an optimal semantic measure for word embeddings. Acknowledgedly, the cosine similarity better describes the semantic relatedness between word embeddings. To employ the cosine measure and capture complex topic structure, we use von Mises-Fisher (vMF) mixture models to represent topics, and then develop a novel mix-vMF topic model (MvTM). Using public pre-trained word embeddings, we evaluate MvTM on three real-world data sets. Experimental results show that our model can discover more coherent topics than the state-of-the-art baseline models, and achieve competitive classification performance.
Tasks	Topic Models, Word Embeddings
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1015/
PDF	https://www.aclweb.org/anthology/C16-1015
PWC	https://paperswithcode.com/paper/integrating-topic-modeling-with-word
Repo
Framework

Investigating Fluidity for Human-Robot Interaction with Real-time, Real-world Grounding Strategies


Title	Investigating Fluidity for Human-Robot Interaction with Real-time, Real-world Grounding Strategies
Authors	Julian Hough, David Schlangen
Abstract
Tasks	Object Recognition, Text Generation
Published	2016-09-01
URL	https://www.aclweb.org/anthology/W16-3637/
PDF	https://www.aclweb.org/anthology/W16-3637
PWC	https://paperswithcode.com/paper/investigating-fluidity-for-human-robot
Repo
Framework

Annotating Sentiment and Irony in the Online Italian Political Debate on #labuonascuola


Title	Annotating Sentiment and Irony in the Online Italian Political Debate on #labuonascuola
Authors	Marco Stranisci, Cristina Bosco, Delia Iraz{'u} Hern{'a}ndez Far{'\i}as, Viviana Patti
Abstract	In this paper we present the TWitterBuonaScuola corpus (TW-BS), a novel Italian linguistic resource for Sentiment Analysis, developed with the main aim of analyzing the online debate on the controversial Italian political reform {``}Buona Scuola{''} (Good school), aimed at reorganizing the national educational and training systems. We describe the methodologies applied in the collection and annotation of data. The collection has been driven by the detection of the hashtags mainly used by the participants to the debate, while the annotation has been focused on sentiment polarity and irony, but also extended to mark the aspects of the reform that were mainly discussed in the debate. An in-depth study of the disagreement among annotators is included. We describe the collection and annotation stages, and the in-depth analysis of disagreement made with Crowdflower, a crowdsourcing annotation platform. \|
Tasks	Sentiment Analysis
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1462/
PDF	https://www.aclweb.org/anthology/L16-1462
PWC	https://paperswithcode.com/paper/annotating-sentiment-and-irony-in-the-online
Repo
Framework

EstNLTK - NLP Toolkit for Estonian


Title	EstNLTK - NLP Toolkit for Estonian
Authors	Siim Orasmaa, Timo Petmanson, Alex Tkachenko, er, Sven Laur, Heiki-Jaan Kaalep
Abstract	Although there are many tools for natural language processing tasks in Estonian, these tools are very loosely interoperable, and it is not easy to build practical applications on top of them. In this paper, we introduce a new Python library for natural language processing in Estonian, which provides unified programming interface for various NLP components. The EstNLTK toolkit provides utilities for basic NLP tasks including tokenization, morphological analysis, lemmatisation and named entity recognition as well as offers more advanced features such as a clause segmentation, temporal expression extraction and normalization, verb chain detection, Estonian Wordnet integration and rule-based information extraction. Accompanied by a detailed API documentation and comprehensive tutorials, EstNLTK is suitable for a wide range of audience. We believe EstNLTK is mature enough to be used for developing NLP-backed systems both in industry and research. EstNLTK is freely available under the GNU GPL version 2+ license, which is standard for academic software.
Tasks	Morphological Analysis, Named Entity Recognition, Tokenization
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1390/
PDF	https://www.aclweb.org/anthology/L16-1390
PWC	https://paperswithcode.com/paper/estnltk-nlp-toolkit-for-estonian
Repo
Framework

Proceedings of the 3rd Workshop on Asian Translation (WAT2016)


Title	Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
Authors
Abstract
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4600/
PDF	https://www.aclweb.org/anthology/W16-4600
PWC	https://paperswithcode.com/paper/proceedings-of-the-3rd-workshop-on-asian
Repo
Framework

Happy Accident: A Sentiment Composition Lexicon for Opposing Polarity Phrases


Title	Happy Accident: A Sentiment Composition Lexicon for Opposing Polarity Phrases
Authors	Svetlana Kiritchenko, Saif Mohammad
Abstract	Sentiment composition is the determining of sentiment of a multi-word linguistic unit, such as a phrase or a sentence, based on its constituents. We focus on sentiment composition in phrases formed by at least one positive and at least one negative word ― phrases like {`}happy accident{'} and {`}best winter break{'}. We refer to such phrases as opposing polarity phrases. We manually annotate a collection of opposing polarity phrases and their constituent single words with real-valued sentiment intensity scores using a method known as Best―Worst Scaling. We show that the obtained annotations are consistent. We explore the entries in the lexicon for linguistic regularities that govern sentiment composition in opposing polarity phrases. Finally, we list the current and possible future applications of the lexicon.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1184/
PDF	https://www.aclweb.org/anthology/L16-1184
PWC	https://paperswithcode.com/paper/happy-accident-a-sentiment-composition
Repo
Framework

A Neural Model for Part-of-Speech Tagging in Historical Texts


Title	A Neural Model for Part-of-Speech Tagging in Historical Texts
Authors	Christian Hardmeier
Abstract	Historical texts are challenging for natural language processing because they differ linguistically from modern texts and because of their lack of orthographical and grammatical standardisation. We use a character-level neural network to build a part-of-speech (POS) tagger that can process historical data directly without requiring a separate spelling normalisation stage. Its performance in a Swedish verb identification and a German POS tagging task is similar to that of a two-stage model. We analyse the performance of this tagger and a more traditional baseline system, discuss some of the remaining problems for tagging historical data and suggest how the flexibility of our neural tagger could be exploited to address diachronic divergences in morphology and syntax in early modern Swedish with the help of data from closely related languages.
Tasks	Part-Of-Speech Tagging
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1088/
PDF	https://www.aclweb.org/anthology/C16-1088
PWC	https://paperswithcode.com/paper/a-neural-model-for-part-of-speech-tagging-in
Repo
Framework

Evaluating a dictionary of human phenotype terms focusing on rare diseases


Title	Evaluating a dictionary of human phenotype terms focusing on rare diseases
Authors	Simon Kocbek, Toyofumi Fujiwara, Jin-Dong Kim, Toshihisa Takagi, Tudor Groza
Abstract	Annotating medical text such as clinical notes with human phenotype descriptors is an important task that can, for example, assist in building patient profiles. To automatically annotate text one usually needs a dictionary of predefined terms. However, do to the variety of human expressiveness, current state-of-the art phenotype concept recognizers and automatic annotators struggle with specific domain issues and challenges. In this paper we present results of an-notating gold standard corpus with a dictionary containing lexical variants for the Human Phenotype Ontology terms. The main purpose of the dictionary is to improve the recall of phenotype concept recognition systems. We compare the method with four other approaches and present results.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4712/
PDF	https://www.aclweb.org/anthology/W16-4712
PWC	https://paperswithcode.com/paper/evaluating-a-dictionary-of-human-phenotype
Repo
Framework

Extracting Discriminative Keyphrases with Learned Semantic Hierarchies


Title	Extracting Discriminative Keyphrases with Learned Semantic Hierarchies
Authors	Yunli Wang, Yong Jin, Xiaodan Zhu, Cyril Goutte
Abstract	The goal of keyphrase extraction is to automatically identify the most salient phrases from documents. The technique has a wide range of applications such as rendering a quick glimpse of a document, or extracting key content for further use. While previous work often assumes keyphrases are a static property of a given documents, in many applications, the appropriate set of keyphrases that should be extracted depends on the set of documents that are being considered together. In particular, good keyphrases should not only accurately describe the content of a document, but also reveal what discriminates it from the other documents. In this paper, we study this problem of extracting discriminative keyphrases. In particularly, we propose to use the hierarchical semantic structure between candidate keyphrases to promote keyphrases that have the right level of specificity to clearly distinguish the target document from others. We show that such knowledge can be used to construct better discriminative keyphrase extraction systems that do not assume a static, fixed set of keyphrases for a document. We show how this helps identify key expertise of authors from their papers, as well as competencies covered by online courses within different domains.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1089/
PDF	https://www.aclweb.org/anthology/C16-1089
PWC	https://paperswithcode.com/paper/extracting-discriminative-keyphrases-with
Repo
Framework

Learning Sentence Embeddings with Auxiliary Tasks for Cross-Domain Sentiment Classification


Title	Learning Sentence Embeddings with Auxiliary Tasks for Cross-Domain Sentiment Classification
Authors	Jianfei Yu, Jing Jiang
Abstract
Tasks	Domain Adaptation, Sentence Embedding, Sentence Embeddings, Sentiment Analysis, Word Embeddings
Published	2016-11-01
URL	https://www.aclweb.org/anthology/D16-1023/
PDF	https://www.aclweb.org/anthology/D16-1023
PWC	https://paperswithcode.com/paper/learning-sentence-embeddings-with-auxiliary
Repo
Framework

Multilingual Supervision of Semantic Annotation


Title	Multilingual Supervision of Semantic Annotation
Authors	Peter Exner, Marcus Klang, Pierre Nugues
Abstract	In this paper, we investigate the annotation projection of semantic units in a practical setting. Previous approaches have focused on using parallel corpora for semantic transfer. We evaluate an alternative approach using loosely parallel corpora that does not require the corpora to be exact translations of each other. We developed a method that transfers semantic annotations from one language to another using sentences aligned by entities, and we extended it to include alignments by entity-like linguistic units. We conducted our experiments on a large scale using the English, Swedish, and French language editions of Wikipedia. Our results show that the annotation projection using entities in combination with loosely parallel corpora provides a viable approach to extending previous attempts. In addition, it allows the generation of proposition banks upon which semantic parsers can be trained.
Tasks	Question Answering, Relation Extraction, Semantic Parsing, Semantic Role Labeling, Text Summarization
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1096/
PDF	https://www.aclweb.org/anthology/C16-1096
PWC	https://paperswithcode.com/paper/multilingual-supervision-of-semantic
Repo
Framework