Paper Group NANR 146
PentoRef: A Corpus of Spoken References in Task-oriented Dialogues. Targeted Sentiment to Understand Student Comments. NASTEA: Investigating Narrative Schemas through Annotated Entities. Disentangling factors of variation in deep representation using adversarial training. Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs. Investig …
PentoRef: A Corpus of Spoken References in Task-oriented Dialogues
Title | PentoRef: A Corpus of Spoken References in Task-oriented Dialogues |
Authors | Sina Zarrie{\ss}, Julian Hough, Casey Kennington, Ramesh Manuvinakurike, David DeVault, Raquel Fern{'a}ndez, David Schlangen |
Abstract | PentoRef is a corpus of task-oriented dialogues collected in systematically manipulated settings. The corpus is multilingual, with English and German sections, and overall comprises more than 20000 utterances. The dialogues are fully transcribed and annotated with referring expressions mapped to objects in corresponding visual scenes, which makes the corpus a rich resource for research on spoken referring expressions in generation and resolution. The corpus includes several sub-corpora that correspond to different dialogue situations where parameters related to interactivity, visual access, and verbal channel have been manipulated in systematic ways. The corpus thus lends itself to very targeted studies of reference in spontaneous dialogue. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1019/ |
https://www.aclweb.org/anthology/L16-1019 | |
PWC | https://paperswithcode.com/paper/pentoref-a-corpus-of-spoken-references-in |
Repo | |
Framework | |
Targeted Sentiment to Understand Student Comments
Title | Targeted Sentiment to Understand Student Comments |
Authors | Charles Welch, Rada Mihalcea |
Abstract | We address the task of targeted sentiment as a means of understanding the sentiment that students hold toward courses and instructors, as expressed by students in their comments. We introduce a new dataset consisting of student comments annotated for targeted sentiment and describe a system that can both identify the courses and instructors mentioned in student comments, as well as label the students{'} sentiment toward those entities. Through several comparative evaluations, we show that our system outperforms previous work on a similar task. |
Tasks | Decision Making, Entity Extraction, Sentiment Analysis |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1233/ |
https://www.aclweb.org/anthology/C16-1233 | |
PWC | https://paperswithcode.com/paper/targeted-sentiment-to-understand-student |
Repo | |
Framework | |
NASTEA: Investigating Narrative Schemas through Annotated Entities
Title | NASTEA: Investigating Narrative Schemas through Annotated Entities |
Authors | Dan Simonson, Anthony Davis |
Abstract | |
Tasks | Language Modelling |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/W16-5707/ |
https://www.aclweb.org/anthology/W16-5707 | |
PWC | https://paperswithcode.com/paper/nastea-investigating-narrative-schemas |
Repo | |
Framework | |
Disentangling factors of variation in deep representation using adversarial training
Title | Disentangling factors of variation in deep representation using adversarial training |
Authors | Michael F. Mathieu, Junbo Jake Zhao, Junbo Zhao, Aditya Ramesh, Pablo Sprechmann, Yann Lecun |
Abstract | We propose a deep generative model for learning to distill the hidden factors of variation within a set of labeled observations into two complementary codes. One code describes the factors of variation relevant to solving a specified task. The other code describes the remaining factors of variation that are irrelevant to solving this task. The only available source of supervision during the training process comes from our ability to distinguish among different observations belonging to the same category. Concrete examples include multiple images of the same object from different viewpoints, or multiple speech samples from the same speaker. In both of these instances, the factors of variation irrelevant to classification are implicitly expressed by intra-class variabilities, such as the relative position of an object in an image, or the linguistic content of an utterance. Most existing approaches for solving this problem rely heavily on having access to pairs of observations only sharing a single factor of variation, e.g. different objects observed in the exact same conditions. This assumption is often not encountered in realistic settings where data acquisition is not controlled and labels for the uninformative components are not available. In this work, we propose to overcome this limitation by augmenting deep convolutional autoencoders with a form of adversarial training. Both factors of variation are implicitly captured in the organization of the learned embedding space, and can be used for solving single-image analogies. Experimental results on synthetic and real datasets show that the proposed method is capable of disentangling the influences of style and content factors using a flexible representation, as well as generalizing to unseen styles or content classes. |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6051-disentangling-factors-of-variation-in-deep-representation-using-adversarial-training |
http://papers.nips.cc/paper/6051-disentangling-factors-of-variation-in-deep-representation-using-adversarial-training.pdf | |
PWC | https://paperswithcode.com/paper/disentangling-factors-of-variation-in-deep-1 |
Repo | |
Framework | |
Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs
Title | Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs |
Authors | Ximing Li, Jinjin Chi, Changchun Li, Jihong Ouyang, Bo Fu |
Abstract | Gaussian LDA integrates topic modeling with word embeddings by replacing discrete topic distribution over word types with multivariate Gaussian distribution on the embedding space. This can take semantic information of words into account. However, the Euclidean similarity used in Gaussian topics is not an optimal semantic measure for word embeddings. Acknowledgedly, the cosine similarity better describes the semantic relatedness between word embeddings. To employ the cosine measure and capture complex topic structure, we use von Mises-Fisher (vMF) mixture models to represent topics, and then develop a novel mix-vMF topic model (MvTM). Using public pre-trained word embeddings, we evaluate MvTM on three real-world data sets. Experimental results show that our model can discover more coherent topics than the state-of-the-art baseline models, and achieve competitive classification performance. |
Tasks | Topic Models, Word Embeddings |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1015/ |
https://www.aclweb.org/anthology/C16-1015 | |
PWC | https://paperswithcode.com/paper/integrating-topic-modeling-with-word |
Repo | |
Framework | |
Investigating Fluidity for Human-Robot Interaction with Real-time, Real-world Grounding Strategies
Title | Investigating Fluidity for Human-Robot Interaction with Real-time, Real-world Grounding Strategies |
Authors | Julian Hough, David Schlangen |
Abstract | |
Tasks | Object Recognition, Text Generation |
Published | 2016-09-01 |
URL | https://www.aclweb.org/anthology/W16-3637/ |
https://www.aclweb.org/anthology/W16-3637 | |
PWC | https://paperswithcode.com/paper/investigating-fluidity-for-human-robot |
Repo | |
Framework | |
Annotating Sentiment and Irony in the Online Italian Political Debate on #labuonascuola
Title | Annotating Sentiment and Irony in the Online Italian Political Debate on #labuonascuola |
Authors | Marco Stranisci, Cristina Bosco, Delia Iraz{'u} Hern{'a}ndez Far{'\i}as, Viviana Patti |
Abstract | In this paper we present the TWitterBuonaScuola corpus (TW-BS), a novel Italian linguistic resource for Sentiment Analysis, developed with the main aim of analyzing the online debate on the controversial Italian political reform {``}Buona Scuola{''} (Good school), aimed at reorganizing the national educational and training systems. We describe the methodologies applied in the collection and annotation of data. The collection has been driven by the detection of the hashtags mainly used by the participants to the debate, while the annotation has been focused on sentiment polarity and irony, but also extended to mark the aspects of the reform that were mainly discussed in the debate. An in-depth study of the disagreement among annotators is included. We describe the collection and annotation stages, and the in-depth analysis of disagreement made with Crowdflower, a crowdsourcing annotation platform. | |
Tasks | Sentiment Analysis |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1462/ |
https://www.aclweb.org/anthology/L16-1462 | |
PWC | https://paperswithcode.com/paper/annotating-sentiment-and-irony-in-the-online |
Repo | |
Framework | |
EstNLTK - NLP Toolkit for Estonian
Title | EstNLTK - NLP Toolkit for Estonian |
Authors | Siim Orasmaa, Timo Petmanson, Alex Tkachenko, er, Sven Laur, Heiki-Jaan Kaalep |
Abstract | Although there are many tools for natural language processing tasks in Estonian, these tools are very loosely interoperable, and it is not easy to build practical applications on top of them. In this paper, we introduce a new Python library for natural language processing in Estonian, which provides unified programming interface for various NLP components. The EstNLTK toolkit provides utilities for basic NLP tasks including tokenization, morphological analysis, lemmatisation and named entity recognition as well as offers more advanced features such as a clause segmentation, temporal expression extraction and normalization, verb chain detection, Estonian Wordnet integration and rule-based information extraction. Accompanied by a detailed API documentation and comprehensive tutorials, EstNLTK is suitable for a wide range of audience. We believe EstNLTK is mature enough to be used for developing NLP-backed systems both in industry and research. EstNLTK is freely available under the GNU GPL version 2+ license, which is standard for academic software. |
Tasks | Morphological Analysis, Named Entity Recognition, Tokenization |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1390/ |
https://www.aclweb.org/anthology/L16-1390 | |
PWC | https://paperswithcode.com/paper/estnltk-nlp-toolkit-for-estonian |
Repo | |
Framework | |
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
Title | Proceedings of the 3rd Workshop on Asian Translation (WAT2016) |
Authors | |
Abstract | |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4600/ |
https://www.aclweb.org/anthology/W16-4600 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-3rd-workshop-on-asian |
Repo | |
Framework | |
Happy Accident: A Sentiment Composition Lexicon for Opposing Polarity Phrases
Title | Happy Accident: A Sentiment Composition Lexicon for Opposing Polarity Phrases |
Authors | Svetlana Kiritchenko, Saif Mohammad |
Abstract | Sentiment composition is the determining of sentiment of a multi-word linguistic unit, such as a phrase or a sentence, based on its constituents. We focus on sentiment composition in phrases formed by at least one positive and at least one negative word ― phrases like {}happy accident{'} and { }best winter break{'}. We refer to such phrases as opposing polarity phrases. We manually annotate a collection of opposing polarity phrases and their constituent single words with real-valued sentiment intensity scores using a method known as Best―Worst Scaling. We show that the obtained annotations are consistent. We explore the entries in the lexicon for linguistic regularities that govern sentiment composition in opposing polarity phrases. Finally, we list the current and possible future applications of the lexicon. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1184/ |
https://www.aclweb.org/anthology/L16-1184 | |
PWC | https://paperswithcode.com/paper/happy-accident-a-sentiment-composition |
Repo | |
Framework | |
A Neural Model for Part-of-Speech Tagging in Historical Texts
Title | A Neural Model for Part-of-Speech Tagging in Historical Texts |
Authors | Christian Hardmeier |
Abstract | Historical texts are challenging for natural language processing because they differ linguistically from modern texts and because of their lack of orthographical and grammatical standardisation. We use a character-level neural network to build a part-of-speech (POS) tagger that can process historical data directly without requiring a separate spelling normalisation stage. Its performance in a Swedish verb identification and a German POS tagging task is similar to that of a two-stage model. We analyse the performance of this tagger and a more traditional baseline system, discuss some of the remaining problems for tagging historical data and suggest how the flexibility of our neural tagger could be exploited to address diachronic divergences in morphology and syntax in early modern Swedish with the help of data from closely related languages. |
Tasks | Part-Of-Speech Tagging |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1088/ |
https://www.aclweb.org/anthology/C16-1088 | |
PWC | https://paperswithcode.com/paper/a-neural-model-for-part-of-speech-tagging-in |
Repo | |
Framework | |
Evaluating a dictionary of human phenotype terms focusing on rare diseases
Title | Evaluating a dictionary of human phenotype terms focusing on rare diseases |
Authors | Simon Kocbek, Toyofumi Fujiwara, Jin-Dong Kim, Toshihisa Takagi, Tudor Groza |
Abstract | Annotating medical text such as clinical notes with human phenotype descriptors is an important task that can, for example, assist in building patient profiles. To automatically annotate text one usually needs a dictionary of predefined terms. However, do to the variety of human expressiveness, current state-of-the art phenotype concept recognizers and automatic annotators struggle with specific domain issues and challenges. In this paper we present results of an-notating gold standard corpus with a dictionary containing lexical variants for the Human Phenotype Ontology terms. The main purpose of the dictionary is to improve the recall of phenotype concept recognition systems. We compare the method with four other approaches and present results. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4712/ |
https://www.aclweb.org/anthology/W16-4712 | |
PWC | https://paperswithcode.com/paper/evaluating-a-dictionary-of-human-phenotype |
Repo | |
Framework | |
Extracting Discriminative Keyphrases with Learned Semantic Hierarchies
Title | Extracting Discriminative Keyphrases with Learned Semantic Hierarchies |
Authors | Yunli Wang, Yong Jin, Xiaodan Zhu, Cyril Goutte |
Abstract | The goal of keyphrase extraction is to automatically identify the most salient phrases from documents. The technique has a wide range of applications such as rendering a quick glimpse of a document, or extracting key content for further use. While previous work often assumes keyphrases are a static property of a given documents, in many applications, the appropriate set of keyphrases that should be extracted depends on the set of documents that are being considered together. In particular, good keyphrases should not only accurately describe the content of a document, but also reveal what discriminates it from the other documents. In this paper, we study this problem of extracting discriminative keyphrases. In particularly, we propose to use the hierarchical semantic structure between candidate keyphrases to promote keyphrases that have the right level of specificity to clearly distinguish the target document from others. We show that such knowledge can be used to construct better discriminative keyphrase extraction systems that do not assume a static, fixed set of keyphrases for a document. We show how this helps identify key expertise of authors from their papers, as well as competencies covered by online courses within different domains. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1089/ |
https://www.aclweb.org/anthology/C16-1089 | |
PWC | https://paperswithcode.com/paper/extracting-discriminative-keyphrases-with |
Repo | |
Framework | |
Learning Sentence Embeddings with Auxiliary Tasks for Cross-Domain Sentiment Classification
Title | Learning Sentence Embeddings with Auxiliary Tasks for Cross-Domain Sentiment Classification |
Authors | Jianfei Yu, Jing Jiang |
Abstract | |
Tasks | Domain Adaptation, Sentence Embedding, Sentence Embeddings, Sentiment Analysis, Word Embeddings |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/D16-1023/ |
https://www.aclweb.org/anthology/D16-1023 | |
PWC | https://paperswithcode.com/paper/learning-sentence-embeddings-with-auxiliary |
Repo | |
Framework | |
Multilingual Supervision of Semantic Annotation
Title | Multilingual Supervision of Semantic Annotation |
Authors | Peter Exner, Marcus Klang, Pierre Nugues |
Abstract | In this paper, we investigate the annotation projection of semantic units in a practical setting. Previous approaches have focused on using parallel corpora for semantic transfer. We evaluate an alternative approach using loosely parallel corpora that does not require the corpora to be exact translations of each other. We developed a method that transfers semantic annotations from one language to another using sentences aligned by entities, and we extended it to include alignments by entity-like linguistic units. We conducted our experiments on a large scale using the English, Swedish, and French language editions of Wikipedia. Our results show that the annotation projection using entities in combination with loosely parallel corpora provides a viable approach to extending previous attempts. In addition, it allows the generation of proposition banks upon which semantic parsers can be trained. |
Tasks | Question Answering, Relation Extraction, Semantic Parsing, Semantic Role Labeling, Text Summarization |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1096/ |
https://www.aclweb.org/anthology/C16-1096 | |
PWC | https://paperswithcode.com/paper/multilingual-supervision-of-semantic |
Repo | |
Framework | |