May 5, 2019

1873 words 9 mins read

Paper Group NANR 134

Paper Group NANR 134

CommonCOW: Massively Huge Web Corpora from CommonCrawl Data and a Method to Distribute them Freely under Restrictive EU Copyright Laws. Evaluating Lexical Similarity to build Sentiment Similarity. SemEval 2016 Task 11: Complex Word Identification. The development of a web corpus of Hindi language and corpus-based comparative studies to Japanese. Th …

Title CommonCOW: Massively Huge Web Corpora from CommonCrawl Data and a Method to Distribute them Freely under Restrictive EU Copyright Laws
Authors Rol Sch{"a}fer,
Abstract In this paper, I describe a method of creating massively huge web corpora from the CommonCrawl data sets and redistributing the resulting annotations in a stand-off format. Current EU (and especially German) copyright legislation categorically forbids the redistribution of downloaded material without express prior permission by the authors. Therefore, such stand-off annotations (or other derivates) are the only format in which European researchers (like myself) are allowed to re-distribute the respective corpora. In order to make the full corpora available to the public despite such restrictions, the stand-off format presented here allows anybody to locally reconstruct the full corpora with the least possible computational effort.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1712/
PDF https://www.aclweb.org/anthology/L16-1712
PWC https://paperswithcode.com/paper/commoncow-massively-huge-web-corpora-from
Repo
Framework

Evaluating Lexical Similarity to build Sentiment Similarity

Title Evaluating Lexical Similarity to build Sentiment Similarity
Authors Gr{'e}goire Jadi, Vincent Claveau, B{'e}atrice Daille, Laura Monceaux
Abstract In this article, we propose to evaluate the lexical similarity information provided by word representations against several opinion resources using traditional Information Retrieval tools. Word representation have been used to build and to extend opinion resources such as lexicon, and ontology and their performance have been evaluated on sentiment analysis tasks. We question this method by measuring the correlation between the sentiment proximity provided by opinion resources and the semantic similarity provided by word representations using different correlation coefficients. We also compare the neighbors found in word representations and list of similar opinion words. Our results show that the proximity of words in state-of-the-art word representations is not very effective to build sentiment similarity.
Tasks Information Retrieval, Semantic Similarity, Semantic Textual Similarity, Sentiment Analysis
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1190/
PDF https://www.aclweb.org/anthology/L16-1190
PWC https://paperswithcode.com/paper/evaluating-lexical-similarity-to-build
Repo
Framework

SemEval 2016 Task 11: Complex Word Identification

Title SemEval 2016 Task 11: Complex Word Identification
Authors Gustavo Paetzold, Lucia Specia
Abstract
Tasks Complex Word Identification, Lexical Simplification, Text Simplification
Published 2016-06-01
URL https://www.aclweb.org/anthology/S16-1085/
PDF https://www.aclweb.org/anthology/S16-1085
PWC https://paperswithcode.com/paper/semeval-2016-task-11-complex-word
Repo
Framework

The development of a web corpus of Hindi language and corpus-based comparative studies to Japanese

Title The development of a web corpus of Hindi language and corpus-based comparative studies to Japanese
Authors Miki Nishioka, Shiro Akasegawa
Abstract In this paper, we discuss our creation of a web corpus of spoken Hindi (COSH), one of the Indo-Aryan languages spoken mainly in the Indian subcontinent. We also point out notable problems we{'}ve encountered in the web corpus and the special concordancer. After observing the kind of technical problems we encountered, especially regarding annotation tagged by Shiva Reddy{'}s tagger, we argue how they can be solved when using COSH for linguistic studies. Finally, we mention the kinds of linguistic research that we non-native speakers of Hindi can do using the corpus, especially in pragmatics and semantics, and from a comparative viewpoint to Japanese.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-3712/
PDF https://www.aclweb.org/anthology/W16-3712
PWC https://paperswithcode.com/paper/the-development-of-a-web-corpus-of-hindi
Repo
Framework

The Universal Dependencies Treebank of Spoken Slovenian

Title The Universal Dependencies Treebank of Spoken Slovenian
Authors Kaja Dobrovoljc, Joakim Nivre
Abstract This paper presents the construction of an open-source dependency treebank of spoken Slovenian, the first syntactically annotated collection of spontaneous speech in Slovenian. The treebank has been manually annotated using the Universal Dependencies annotation scheme, a one-layer syntactic annotation scheme with a high degree of cross-modality, cross-framework and cross-language interoperability. In this original application of the scheme to spoken language transcripts, we address a wide spectrum of syntactic particularities in speech, either by extending the scope of application of existing universal labels or by proposing new speech-specific extensions. The initial analysis of the resulting treebank and its comparison with the written Slovenian UD treebank confirms significant syntactic differences between the two language modalities, with spoken data consisting of shorter and more elliptic sentences, less and simpler nominal phrases, and more relations marking disfluencies, interaction, deixis and modality.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1248/
PDF https://www.aclweb.org/anthology/L16-1248
PWC https://paperswithcode.com/paper/the-universal-dependencies-treebank-of-spoken
Repo
Framework

Improving Statistical Machine Translation with Selectional Preferences

Title Improving Statistical Machine Translation with Selectional Preferences
Authors Haiqing Tang, Deyi Xiong, Min Zhang, Zhengxian Gong
Abstract Long-distance semantic dependencies are crucial for lexical choice in statistical machine translation. In this paper, we study semantic dependencies between verbs and their arguments by modeling selectional preferences in the context of machine translation. We incorporate preferences that verbs impose on subjects and objects into translation. In addition, bilingual selectional preferences between source-side verbs and target-side arguments are also investigated. Our experiments on Chinese-to-English translation tasks with large-scale training data demonstrate that statistical machine translation using verbal selectional preferences can achieve statistically significant improvements over a state-of-the-art baseline.
Tasks Machine Translation, Semantic Role Labeling, Word Sense Disambiguation
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1203/
PDF https://www.aclweb.org/anthology/C16-1203
PWC https://paperswithcode.com/paper/improving-statistical-machine-translation-6
Repo
Framework

An Entity-Based approach to Answering Recurrent and Non-Recurrent Questions with Past Answers

Title An Entity-Based approach to Answering Recurrent and Non-Recurrent Questions with Past Answers
Authors Anietie Andy, Mugizi Rwebangira, Satoshi Sekine
Abstract An Entity-based approach to Answering recurrent and non-recurrent questions with Past Answers Abstract Community question answering (CQA) systems such as Yahoo! Answers allow registered-users to ask and answer questions in various question categories. However, a significant percentage of asked questions in Yahoo! Answers are unanswered. In this paper, we propose to reduce this percentage by reusing answers to past resolved questions from the site. Specifically, we propose to satisfy unanswered questions in entity rich categories by searching for and reusing the best answers to past resolved questions with shared needs. For unanswered questions that do not have a past resolved question with a shared need, we propose to use the best answer to a past resolved question with similar needs. Our experiments on a Yahoo! Answers dataset shows that our approach retrieves most of the past resolved questions that have shared and similar needs to unanswered questions.
Tasks Community Question Answering, Entity Linking, Question Answering
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4405/
PDF https://www.aclweb.org/anthology/W16-4405
PWC https://paperswithcode.com/paper/an-entity-based-approach-to-answering
Repo
Framework

The Manner/Result Complementarity in Chinese Motion Verbs Revisited

Title The Manner/Result Complementarity in Chinese Motion Verbs Revisited
Authors Lei Qiu
Abstract
Tasks
Published 2016-10-01
URL https://www.aclweb.org/anthology/Y16-2011/
PDF https://www.aclweb.org/anthology/Y16-2011
PWC https://paperswithcode.com/paper/the-mannerresult-complementarity-in-chinese
Repo
Framework

Learning Indonesian-Chinese Lexicon with Bilingual Word Embedding Models and Monolingual Signals

Title Learning Indonesian-Chinese Lexicon with Bilingual Word Embedding Models and Monolingual Signals
Authors Xinying Qiu, Gangqin Zhu
Abstract We present a research on learning Indonesian-Chinese bilingual lexicon using monolingual word embedding and bilingual seed lexicons to build shared bilingual word embedding space. We take the first attempt to examine the impact of different monolingual signals for the choice of seed lexicons on the model performance. We found that although monolingual signals alone do not seem to outperform signals coverings all words, the significant improvement for learning word translation of the same signal types may suggest that linguistic features possess value for further study in distinguishing the semantic margins of the shared word embedding space.
Tasks Document Classification
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-3720/
PDF https://www.aclweb.org/anthology/W16-3720
PWC https://paperswithcode.com/paper/learning-indonesian-chinese-lexicon-with
Repo
Framework

UTCNN: a Deep Learning Model of Stance Classification on Social Media Text

Title UTCNN: a Deep Learning Model of Stance Classification on Social Media Text
Authors Wei-Fan Chen, Lun-Wei Ku
Abstract Most neural network models for document classification on social media focus on text information to the neglect of other information on these platforms. In this paper, we classify post stance on social media channels and develop UTCNN, a neural network model that incorporates user tastes, topic tastes, and user comments on posts. UTCNN not only works on social media texts, but also analyzes texts in forums and message boards. Experiments performed on Chinese Facebook data and English online debate forum data show that UTCNN achieves a 0.755 macro average f-score for supportive, neutral, and unsupportive stance classes on Facebook data, which is significantly better than models in which either user, topic, or comment information is withheld. This model design greatly mitigates the lack of data for the minor class. In addition, UTCNN yields a 0.842 accuracy on English online debate forum data, which also significantly outperforms results from previous work, showing that UTCNN performs well regardless of language or platform.
Tasks Document Classification, Text Classification
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1154/
PDF https://www.aclweb.org/anthology/C16-1154
PWC https://paperswithcode.com/paper/utcnn-a-deep-learning-model-of-stance-1
Repo
Framework

ST-MVL: Filling Missing Values in Geo-Sensory Time Series Data

Title ST-MVL: Filling Missing Values in Geo-Sensory Time Series Data
Authors Xiuwen Yi, Yu Zheng, Junbo Zhang, Tianrui Li
Abstract Many sensors have been deployed in the physical world, generating massive geo-tagged time series data. In reality, readings of sensors are usually lost at various unexpected moments because of sensor or communication errors. Those missing readings do not only affect real-time monitoring but also compromise the performance of further data analysis. In this paper, we propose a spatio-temporal multi-view-based learning (ST-MVL) method to collectively fill missing readings in a collection of geosensory time series data, considering 1) the temporal correlation between readings at different timestamps in the same series and 2) the spatial correlation between different time series. Our method combines empirical statistic models, consisting of Inverse Distance Weighting and Simple Exponential Smoothing, with data-driven algorithms, comprised of User-based and Item-based Collaborative Filtering. The former models handle general missing cases based on empirical assumptions derived from history data over a long period, standing for two global views from spatial and temporal perspectives respectively. The latter algorithms deal with special cases where empirical assumptions may not hold, based on recent contexts of data, denoting two local views from spatial and temporal perspectives respectively. The predictions of the four views are aggregated to a final value in a multi-view learning algorithm. We evaluate our method based on Beijing air quality and meteorological data, finding advantages to our model compared with ten baseline approaches.
Tasks Imputation, Multivariate Time Series Imputation, MULTI-VIEW LEARNING, Time Series
Published 2016-07-09
URL https://www.microsoft.com/en-us/research/publication/st-mvl-filling-missing-values-in-geo-sensory-time-series-data/
PDF https://www.ijcai.org/Proceedings/16/Papers/384.pdf
PWC https://paperswithcode.com/paper/st-mvl-filling-missing-values-in-geo-sensory
Repo
Framework

AMR Parsing with an Incremental Joint Model

Title AMR Parsing with an Incremental Joint Model
Authors Junsheng Zhou, Feiyu Xu, Hans Uszkoreit, Weiguang Qu, Ran Li, Yanhui Gu
Abstract
Tasks Abstractive Text Summarization, Amr Parsing, Entity Linking, Machine Translation, Natural Language Inference, Question Answering
Published 2016-11-01
URL https://www.aclweb.org/anthology/D16-1065/
PDF https://www.aclweb.org/anthology/D16-1065
PWC https://paperswithcode.com/paper/amr-parsing-with-an-incremental-joint-model
Repo
Framework

Using mention accessibility to improve coreference resolution

Title Using mention accessibility to improve coreference resolution
Authors Kellie Webster, Joel Nothman
Abstract
Tasks Coreference Resolution
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-2070/
PDF https://www.aclweb.org/anthology/P16-2070
PWC https://paperswithcode.com/paper/using-mention-accessibility-to-improve
Repo
Framework

Incorporating Selectional Preferences in Multi-hop Relation Extraction

Title Incorporating Selectional Preferences in Multi-hop Relation Extraction
Authors Rajarshi Das, Arvind Neelakantan, David Belanger, Andrew McCallum
Abstract
Tasks Knowledge Base Completion, Question Answering, Relation Extraction
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-1304/
PDF https://www.aclweb.org/anthology/W16-1304
PWC https://paperswithcode.com/paper/incorporating-selectional-preferences-in
Repo
Framework

Sentence Clustering using PageRank Topic Model

Title Sentence Clustering using PageRank Topic Model
Authors Kenshin Ikegami, Yukio Ohsawa
Abstract
Tasks Decision Making, Language Modelling, Topic Models
Published 2016-10-01
URL https://www.aclweb.org/anthology/Y16-3003/
PDF https://www.aclweb.org/anthology/Y16-3003
PWC https://paperswithcode.com/paper/sentence-clustering-using-pagerank-topic
Repo
Framework
comments powered by Disqus