Paper Group NANR 140
An Investigation on The Effectiveness of Employing Topic Modeling Techniques to Provide Topic Awareness For Conversational Agents. Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities. Quality Estimation for Language Output Applications. Combination of Convolutional and Recurrent …
An Investigation on The Effectiveness of Employing Topic Modeling Techniques to Provide Topic Awareness For Conversational Agents
Title | An Investigation on The Effectiveness of Employing Topic Modeling Techniques to Provide Topic Awareness For Conversational Agents |
Authors | Omid Moradiannasab |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-3012/ |
https://www.aclweb.org/anthology/P16-3012 | |
PWC | https://paperswithcode.com/paper/an-investigation-on-the-effectiveness-of |
Repo | |
Framework | |
Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities
Title | Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities |
Authors | Meritxell Fern{'a}ndez Barrera, Vladimir Popescu, Antonio Toral, Federico Gaspari, Khalid Choukri |
Abstract | This paper discusses the role that statistical machine translation (SMT) can play in the development of cross-border EU e-commerce,by highlighting extant obstacles and identifying relevant technologies to overcome them. In this sense, it firstly proposes a typology of e-commerce static and dynamic textual genres and it identifies those that may be more successfully targeted by SMT. The specific challenges concerning the automatic translation of user-generated content are discussed in detail. Secondly, the paper highlights the risk of data sparsity inherent to e-commerce and it explores the state-of-the-art strategies to achieve domain adequacy via adaptation. Thirdly, it proposes a robust workflow for the development of SMT systems adapted to the e-commerce domain by relying on inexpensive methods. Given the scarcity of user-generated language corpora for most language pairs, the paper proposes to obtain monolingual target-language data to train language models and aligned parallel corpora to tune and evaluate MT systems by means of crowdsourcing. |
Tasks | Machine Translation |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1721/ |
https://www.aclweb.org/anthology/L16-1721 | |
PWC | https://paperswithcode.com/paper/enhancing-cross-border-eu-e-commerce-through |
Repo | |
Framework | |
Quality Estimation for Language Output Applications
Title | Quality Estimation for Language Output Applications |
Authors | Carolina Scarton, Gustavo Paetzold, Lucia Specia |
Abstract | Quality Estimation (QE) of language output applications is a research area that has been attracting significant attention. The goal of QE is to estimate the quality of language output applications without the need of human references. Instead, machine learning algorithms are used to build supervised models based on a few labelled training instances. Such models are able to generalise over unseen data and thus QE is a robust method applicable to scenarios where human input is not available or possible. One such a scenario where QE is particularly appealing is that of Machine Translation, where a score for predicted quality can help decide whether or not a translation is useful (e.g. for post-editing) or reliable (e.g. for gisting). Other potential applications within Natural Language Processing (NLP) include Text Summarisation and Text Simplification. In this tutorial we present the task of QE and its application in NLP, focusing on Machine Translation. We also introduce QuEst++, a toolkit for QE that encompasses feature extraction and machine learning, and propose a practical activity to extend this toolkit in various ways. |
Tasks | Machine Translation, Multi-Task Learning, Text Simplification |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-3004/ |
https://www.aclweb.org/anthology/C16-3004 | |
PWC | https://paperswithcode.com/paper/quality-estimation-for-language-output |
Repo | |
Framework | |
Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts
Title | Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts |
Authors | Xingyou Wang, Weijie Jiang, Zhiyong Luo |
Abstract | Sentiment analysis of short texts is challenging because of the limited contextual information they usually contain. In recent years, deep learning models such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been applied to text sentiment analysis with comparatively remarkable results. In this paper, we describe a jointed CNN and RNN architecture, taking advantage of the coarse-grained local features generated by CNN and long-distance dependencies learned via RNN for sentiment analysis of short texts. Experimental results show an obvious improvement upon the state-of-the-art on three benchmark corpora, MR, SST1 and SST2, with 82.28{%}, 51.50{%} and 89.95{%} accuracy, respectively. |
Tasks | Information Retrieval, Sentiment Analysis, Speech Recognition, Word Embeddings |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1229/ |
https://www.aclweb.org/anthology/C16-1229 | |
PWC | https://paperswithcode.com/paper/combination-of-convolutional-and-recurrent |
Repo | |
Framework | |
Adjusting Word Embeddings with Semantic Intensity Orders
Title | Adjusting Word Embeddings with Semantic Intensity Orders |
Authors | Joo-Kyung Kim, Marie-Catherine de Marneffe, Eric Fosler-Lussier |
Abstract | |
Tasks | Representation Learning, Word Embeddings |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-1607/ |
https://www.aclweb.org/anthology/W16-1607 | |
PWC | https://paperswithcode.com/paper/adjusting-word-embeddings-with-semantic |
Repo | |
Framework | |
Assisting Discussion Forum Users using Deep Recurrent Neural Networks
Title | Assisting Discussion Forum Users using Deep Recurrent Neural Networks |
Authors | Jacob Hagstedt P Suorra, Olof Mogren |
Abstract | |
Tasks | Representation Learning |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-1606/ |
https://www.aclweb.org/anthology/W16-1606 | |
PWC | https://paperswithcode.com/paper/assisting-discussion-forum-users-using-deep |
Repo | |
Framework | |
Entity Disambiguation by Knowledge and Text Jointly Embedding
Title | Entity Disambiguation by Knowledge and Text Jointly Embedding |
Authors | Wei Fang, Jianwen Zhang, Dilin Wang, Zheng Chen, Ming Li |
Abstract | |
Tasks | Entity Disambiguation |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/K16-1026/ |
https://www.aclweb.org/anthology/K16-1026 | |
PWC | https://paperswithcode.com/paper/entity-disambiguation-by-knowledge-and-text |
Repo | |
Framework | |
Chinese Tense Labelling and Causal Analysis
Title | Chinese Tense Labelling and Causal Analysis |
Authors | Hen-Hsen Huang, Chang-Rui Yang, Hsin-Hsi Chen |
Abstract | This paper explores the role of tense information in Chinese causal analysis. Both tasks of causal type classification and causal directionality identification are experimented to show the significant improvement gained from tense features. To automatically extract the tense features, a Chinese tense predictor is proposed. Based on large amount of parallel data, our semi-supervised approach improves the dependency-based convolutional neural network (DCNN) models for Chinese tense labelling and thus the causal analysis. |
Tasks | Question Answering |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1210/ |
https://www.aclweb.org/anthology/C16-1210 | |
PWC | https://paperswithcode.com/paper/chinese-tense-labelling-and-causal-analysis |
Repo | |
Framework | |
Domain Adaptation for Authorship Attribution: Improved Structural Correspondence Learning
Title | Domain Adaptation for Authorship Attribution: Improved Structural Correspondence Learning |
Authors | Upendra Sapkota, Thamar Solorio, Manuel Montes, Steven Bethard |
Abstract | |
Tasks | Dimensionality Reduction, Domain Adaptation |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-1210/ |
https://www.aclweb.org/anthology/P16-1210 | |
PWC | https://paperswithcode.com/paper/domain-adaptation-for-authorship-attribution |
Repo | |
Framework | |
Decoding Anagrammed Texts Written in an Unknown Language and Script
Title | Decoding Anagrammed Texts Written in an Unknown Language and Script |
Authors | Bradley Hauer, Grzegorz Kondrak |
Abstract | Algorithmic decipherment is a prime example of a truly unsupervised problem. The first step in the decipherment process is the identification of the encrypted language. We propose three methods for determining the source language of a document enciphered with a monoalphabetic substitution cipher. The best method achieves 97{%} accuracy on 380 languages. We then present an approach to decoding anagrammed substitution ciphers, in which the letters within words have been arbitrarily transposed. It obtains the average decryption word accuracy of 93{%} on a set of 50 ciphertexts in 5 languages. Finally, we report the results on the Voynich manuscript, an unsolved fifteenth century cipher, which suggest Hebrew as the language of the document. |
Tasks | Language Identification, Optical Character Recognition, Transliteration |
Published | 2016-01-01 |
URL | https://www.aclweb.org/anthology/Q16-1006/ |
https://www.aclweb.org/anthology/Q16-1006 | |
PWC | https://paperswithcode.com/paper/decoding-anagrammed-texts-written-in-an |
Repo | |
Framework | |
Graph-Based Induction of Word Senses in Croatian
Title | Graph-Based Induction of Word Senses in Croatian |
Authors | Marko Bekavac, Jan {\v{S}}najder |
Abstract | Word sense induction (WSI) seeks to induce senses of words from unannotated corpora. In this paper, we address the WSI task for the Croatian language. We adopt the word clustering approach based on co-occurrence graphs, in which senses are taken to correspond to strongly inter-connected components of co-occurring words. We experiment with a number of graph construction techniques and clustering algorithms, and evaluate the sense inventories both as a clustering problem and extrinsically on a word sense disambiguation (WSD) task. In the cluster-based evaluation, Chinese Whispers algorithm outperformed Markov Clustering, yielding a normalized mutual information score of 64.3. In contrast, in WSD evaluation Markov Clustering performed better, yielding an accuracy of about 75{%}. We are making available two induced sense inventories of 10,000 most frequent Croatian words: one coarse-grained and one fine-grained inventory, both obtained using the Markov Clustering algorithm. |
Tasks | graph construction, Word Sense Disambiguation, Word Sense Induction |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1481/ |
https://www.aclweb.org/anthology/L16-1481 | |
PWC | https://paperswithcode.com/paper/graph-based-induction-of-word-senses-in |
Repo | |
Framework | |
Learning Cross-lingual Representations with Matrix Factorization
Title | Learning Cross-lingual Representations with Matrix Factorization |
Authors | Hanan Aldarmaki, Mona Diab |
Abstract | |
Tasks | Cross-Lingual Document Classification, Cross-Lingual Semantic Textual Similarity, Document Classification, Machine Translation, Question Answering, Semantic Textual Similarity, Sentence Embeddings, Word Embeddings |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-1201/ |
https://www.aclweb.org/anthology/W16-1201 | |
PWC | https://paperswithcode.com/paper/learning-cross-lingual-representations-with |
Repo | |
Framework | |
Pair Distance Distribution: A Model of Semantic Representation
Title | Pair Distance Distribution: A Model of Semantic Representation |
Authors | Yonatan Ramni, Oded Maimon, Evgeni Khmelnitsky |
Abstract | |
Tasks | Dimensionality Reduction, Representation Learning, Semantic Textual Similarity |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-1621/ |
https://www.aclweb.org/anthology/W16-1621 | |
PWC | https://paperswithcode.com/paper/pair-distance-distribution-a-model-of |
Repo | |
Framework | |
A Corpus of Preposition Supersenses
Title | A Corpus of Preposition Supersenses |
Authors | Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Meredith Green, Abhijit Suresh, Kathryn Conger, Tim O{'}Gorman, Martha Palmer |
Abstract | |
Tasks | Machine Translation, Semantic Parsing |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-1712/ |
https://www.aclweb.org/anthology/W16-1712 | |
PWC | https://paperswithcode.com/paper/a-corpus-of-preposition-supersenses |
Repo | |
Framework | |
ELMD: An Automatically Generated Entity Linking Gold Standard Dataset in the Music Domain
Title | ELMD: An Automatically Generated Entity Linking Gold Standard Dataset in the Music Domain |
Authors | Sergio Oramas, Luis Espinosa Anke, Mohamed Sordo, Horacio Saggion, Xavier Serra |
Abstract | In this paper we present a gold standard dataset for Entity Linking (EL) in the Music Domain. It contains thousands of musical named entities such as Artist, Song or Record Label, which have been automatically annotated on a set of artist biographies coming from the Music website and social network Last.fm. The annotation process relies on the analysis of the hyperlinks present in the source texts and in a voting-based algorithm for EL, which considers, for each entity mention in text, the degree of agreement across three state-of-the-art EL systems. Manual evaluation shows that EL Precision is at least 94{%}, and due to its tunable nature, it is possible to derive annotations favouring higher Precision or Recall, at will. We make available the annotated dataset along with evaluation data and the code. |
Tasks | Entity Linking |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1528/ |
https://www.aclweb.org/anthology/L16-1528 | |
PWC | https://paperswithcode.com/paper/elmd-an-automatically-generated-entity |
Repo | |
Framework | |