May 5, 2019

1873 words 9 mins read

Paper Group NANR 134

CommonCOW: Massively Huge Web Corpora from CommonCrawl Data and a Method to Distribute them Freely under Restrictive EU Copyright Laws. Evaluating Lexical Similarity to build Sentiment Similarity. SemEval 2016 Task 11: Complex Word Identification. The development of a web corpus of Hindi language and corpus-based comparative studies to Japanese. Th …

CommonCOW: Massively Huge Web Corpora from CommonCrawl Data and a Method to Distribute them Freely under Restrictive EU Copyright Laws


Title	CommonCOW: Massively Huge Web Corpora from CommonCrawl Data and a Method to Distribute them Freely under Restrictive EU Copyright Laws
Authors	Rol Sch{"a}fer,
Abstract	In this paper, I describe a method of creating massively huge web corpora from the CommonCrawl data sets and redistributing the resulting annotations in a stand-off format. Current EU (and especially German) copyright legislation categorically forbids the redistribution of downloaded material without express prior permission by the authors. Therefore, such stand-off annotations (or other derivates) are the only format in which European researchers (like myself) are allowed to re-distribute the respective corpora. In order to make the full corpora available to the public despite such restrictions, the stand-off format presented here allows anybody to locally reconstruct the full corpora with the least possible computational effort.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1712/
PDF	https://www.aclweb.org/anthology/L16-1712
PWC	https://paperswithcode.com/paper/commoncow-massively-huge-web-corpora-from
Repo
Framework

Evaluating Lexical Similarity to build Sentiment Similarity


Title	Evaluating Lexical Similarity to build Sentiment Similarity
Authors	Gr{'e}goire Jadi, Vincent Claveau, B{'e}atrice Daille, Laura Monceaux
Abstract	In this article, we propose to evaluate the lexical similarity information provided by word representations against several opinion resources using traditional Information Retrieval tools. Word representation have been used to build and to extend opinion resources such as lexicon, and ontology and their performance have been evaluated on sentiment analysis tasks. We question this method by measuring the correlation between the sentiment proximity provided by opinion resources and the semantic similarity provided by word representations using different correlation coefficients. We also compare the neighbors found in word representations and list of similar opinion words. Our results show that the proximity of words in state-of-the-art word representations is not very effective to build sentiment similarity.
Tasks	Information Retrieval, Semantic Similarity, Semantic Textual Similarity, Sentiment Analysis
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1190/
PDF	https://www.aclweb.org/anthology/L16-1190
PWC	https://paperswithcode.com/paper/evaluating-lexical-similarity-to-build
Repo
Framework

SemEval 2016 Task 11: Complex Word Identification


Title	SemEval 2016 Task 11: Complex Word Identification
Authors	Gustavo Paetzold, Lucia Specia
Abstract
Tasks	Complex Word Identification, Lexical Simplification, Text Simplification
Published	2016-06-01
URL	https://www.aclweb.org/anthology/S16-1085/
PDF	https://www.aclweb.org/anthology/S16-1085
PWC	https://paperswithcode.com/paper/semeval-2016-task-11-complex-word
Repo
Framework

The development of a web corpus of Hindi language and corpus-based comparative studies to Japanese


Title	The development of a web corpus of Hindi language and corpus-based comparative studies to Japanese
Authors	Miki Nishioka, Shiro Akasegawa
Abstract	In this paper, we discuss our creation of a web corpus of spoken Hindi (COSH), one of the Indo-Aryan languages spoken mainly in the Indian subcontinent. We also point out notable problems we{'}ve encountered in the web corpus and the special concordancer. After observing the kind of technical problems we encountered, especially regarding annotation tagged by Shiva Reddy{'}s tagger, we argue how they can be solved when using COSH for linguistic studies. Finally, we mention the kinds of linguistic research that we non-native speakers of Hindi can do using the corpus, especially in pragmatics and semantics, and from a comparative viewpoint to Japanese.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-3712/
PDF	https://www.aclweb.org/anthology/W16-3712
PWC	https://paperswithcode.com/paper/the-development-of-a-web-corpus-of-hindi
Repo
Framework

The Universal Dependencies Treebank of Spoken Slovenian


Title	The Universal Dependencies Treebank of Spoken Slovenian
Authors	Kaja Dobrovoljc, Joakim Nivre
Abstract	This paper presents the construction of an open-source dependency treebank of spoken Slovenian, the first syntactically annotated collection of spontaneous speech in Slovenian. The treebank has been manually annotated using the Universal Dependencies annotation scheme, a one-layer syntactic annotation scheme with a high degree of cross-modality, cross-framework and cross-language interoperability. In this original application of the scheme to spoken language transcripts, we address a wide spectrum of syntactic particularities in speech, either by extending the scope of application of existing universal labels or by proposing new speech-specific extensions. The initial analysis of the resulting treebank and its comparison with the written Slovenian UD treebank confirms significant syntactic differences between the two language modalities, with spoken data consisting of shorter and more elliptic sentences, less and simpler nominal phrases, and more relations marking disfluencies, interaction, deixis and modality.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1248/
PDF	https://www.aclweb.org/anthology/L16-1248
PWC	https://paperswithcode.com/paper/the-universal-dependencies-treebank-of-spoken
Repo
Framework

Improving Statistical Machine Translation with Selectional Preferences


Title	Improving Statistical Machine Translation with Selectional Preferences
Authors	Haiqing Tang, Deyi Xiong, Min Zhang, Zhengxian Gong
Abstract	Long-distance semantic dependencies are crucial for lexical choice in statistical machine translation. In this paper, we study semantic dependencies between verbs and their arguments by modeling selectional preferences in the context of machine translation. We incorporate preferences that verbs impose on subjects and objects into translation. In addition, bilingual selectional preferences between source-side verbs and target-side arguments are also investigated. Our experiments on Chinese-to-English translation tasks with large-scale training data demonstrate that statistical machine translation using verbal selectional preferences can achieve statistically significant improvements over a state-of-the-art baseline.
Tasks	Machine Translation, Semantic Role Labeling, Word Sense Disambiguation
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1203/
PDF	https://www.aclweb.org/anthology/C16-1203
PWC	https://paperswithcode.com/paper/improving-statistical-machine-translation-6
Repo
Framework

An Entity-Based approach to Answering Recurrent and Non-Recurrent Questions with Past Answers


Title	An Entity-Based approach to Answering Recurrent and Non-Recurrent Questions with Past Answers
Authors	Anietie Andy, Mugizi Rwebangira, Satoshi Sekine
Abstract	An Entity-based approach to Answering recurrent and non-recurrent questions with Past Answers Abstract Community question answering (CQA) systems such as Yahoo! Answers allow registered-users to ask and answer questions in various question categories. However, a significant percentage of asked questions in Yahoo! Answers are unanswered. In this paper, we propose to reduce this percentage by reusing answers to past resolved questions from the site. Specifically, we propose to satisfy unanswered questions in entity rich categories by searching for and reusing the best answers to past resolved questions with shared needs. For unanswered questions that do not have a past resolved question with a shared need, we propose to use the best answer to a past resolved question with similar needs. Our experiments on a Yahoo! Answers dataset shows that our approach retrieves most of the past resolved questions that have shared and similar needs to unanswered questions.
Tasks	Community Question Answering, Entity Linking, Question Answering
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4405/
PDF	https://www.aclweb.org/anthology/W16-4405
PWC	https://paperswithcode.com/paper/an-entity-based-approach-to-answering
Repo
Framework

The Manner/Result Complementarity in Chinese Motion Verbs Revisited


Title	The Manner/Result Complementarity in Chinese Motion Verbs Revisited
Authors	Lei Qiu
Abstract
Tasks
Published	2016-10-01
URL	https://www.aclweb.org/anthology/Y16-2011/
PDF	https://www.aclweb.org/anthology/Y16-2011
PWC	https://paperswithcode.com/paper/the-mannerresult-complementarity-in-chinese
Repo
Framework

Learning Indonesian-Chinese Lexicon with Bilingual Word Embedding Models and Monolingual Signals


Title	Learning Indonesian-Chinese Lexicon with Bilingual Word Embedding Models and Monolingual Signals
Authors	Xinying Qiu, Gangqin Zhu
Abstract	We present a research on learning Indonesian-Chinese bilingual lexicon using monolingual word embedding and bilingual seed lexicons to build shared bilingual word embedding space. We take the first attempt to examine the impact of different monolingual signals for the choice of seed lexicons on the model performance. We found that although monolingual signals alone do not seem to outperform signals coverings all words, the significant improvement for learning word translation of the same signal types may suggest that linguistic features possess value for further study in distinguishing the semantic margins of the shared word embedding space.
Tasks	Document Classification
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-3720/
PDF	https://www.aclweb.org/anthology/W16-3720
PWC	https://paperswithcode.com/paper/learning-indonesian-chinese-lexicon-with
Repo
Framework


Title	UTCNN: a Deep Learning Model of Stance Classification on Social Media Text
Authors	Wei-Fan Chen, Lun-Wei Ku
Abstract	Most neural network models for document classification on social media focus on text information to the neglect of other information on these platforms. In this paper, we classify post stance on social media channels and develop UTCNN, a neural network model that incorporates user tastes, topic tastes, and user comments on posts. UTCNN not only works on social media texts, but also analyzes texts in forums and message boards. Experiments performed on Chinese Facebook data and English online debate forum data show that UTCNN achieves a 0.755 macro average f-score for supportive, neutral, and unsupportive stance classes on Facebook data, which is significantly better than models in which either user, topic, or comment information is withheld. This model design greatly mitigates the lack of data for the minor class. In addition, UTCNN yields a 0.842 accuracy on English online debate forum data, which also significantly outperforms results from previous work, showing that UTCNN performs well regardless of language or platform.
Tasks	Document Classification, Text Classification
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1154/
PDF	https://www.aclweb.org/anthology/C16-1154
PWC	https://paperswithcode.com/paper/utcnn-a-deep-learning-model-of-stance-1
Repo
Framework

ST-MVL: Filling Missing Values in Geo-Sensory Time Series Data


Title	ST-MVL: Filling Missing Values in Geo-Sensory Time Series Data
Authors	Xiuwen Yi, Yu Zheng, Junbo Zhang, Tianrui Li
Abstract	Many sensors have been deployed in the physical world, generating massive geo-tagged time series data. In reality, readings of sensors are usually lost at various unexpected moments because of sensor or communication errors. Those missing readings do not only affect real-time monitoring but also compromise the performance of further data analysis. In this paper, we propose a spatio-temporal multi-view-based learning (ST-MVL) method to collectively fill missing readings in a collection of geosensory time series data, considering 1) the temporal correlation between readings at different timestamps in the same series and 2) the spatial correlation between different time series. Our method combines empirical statistic models, consisting of Inverse Distance Weighting and Simple Exponential Smoothing, with data-driven algorithms, comprised of User-based and Item-based Collaborative Filtering. The former models handle general missing cases based on empirical assumptions derived from history data over a long period, standing for two global views from spatial and temporal perspectives respectively. The latter algorithms deal with special cases where empirical assumptions may not hold, based on recent contexts of data, denoting two local views from spatial and temporal perspectives respectively. The predictions of the four views are aggregated to a final value in a multi-view learning algorithm. We evaluate our method based on Beijing air quality and meteorological data, finding advantages to our model compared with ten baseline approaches.
Tasks	Imputation, Multivariate Time Series Imputation, MULTI-VIEW LEARNING, Time Series
Published	2016-07-09
URL	https://www.microsoft.com/en-us/research/publication/st-mvl-filling-missing-values-in-geo-sensory-time-series-data/
PDF	https://www.ijcai.org/Proceedings/16/Papers/384.pdf
PWC	https://paperswithcode.com/paper/st-mvl-filling-missing-values-in-geo-sensory
Repo
Framework

AMR Parsing with an Incremental Joint Model


Title	AMR Parsing with an Incremental Joint Model
Authors	Junsheng Zhou, Feiyu Xu, Hans Uszkoreit, Weiguang Qu, Ran Li, Yanhui Gu
Abstract
Tasks	Abstractive Text Summarization, Amr Parsing, Entity Linking, Machine Translation, Natural Language Inference, Question Answering
Published	2016-11-01
URL	https://www.aclweb.org/anthology/D16-1065/
PDF	https://www.aclweb.org/anthology/D16-1065
PWC	https://paperswithcode.com/paper/amr-parsing-with-an-incremental-joint-model
Repo
Framework

Using mention accessibility to improve coreference resolution


Title	Using mention accessibility to improve coreference resolution
Authors	Kellie Webster, Joel Nothman
Abstract
Tasks	Coreference Resolution
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-2070/
PDF	https://www.aclweb.org/anthology/P16-2070
PWC	https://paperswithcode.com/paper/using-mention-accessibility-to-improve
Repo
Framework

Incorporating Selectional Preferences in Multi-hop Relation Extraction


Title	Incorporating Selectional Preferences in Multi-hop Relation Extraction
Authors	Rajarshi Das, Arvind Neelakantan, David Belanger, Andrew McCallum
Abstract
Tasks	Knowledge Base Completion, Question Answering, Relation Extraction
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-1304/
PDF	https://www.aclweb.org/anthology/W16-1304
PWC	https://paperswithcode.com/paper/incorporating-selectional-preferences-in
Repo
Framework

Sentence Clustering using PageRank Topic Model


Title	Sentence Clustering using PageRank Topic Model
Authors	Kenshin Ikegami, Yukio Ohsawa
Abstract
Tasks	Decision Making, Language Modelling, Topic Models
Published	2016-10-01
URL	https://www.aclweb.org/anthology/Y16-3003/
PDF	https://www.aclweb.org/anthology/Y16-3003
PWC	https://paperswithcode.com/paper/sentence-clustering-using-pagerank-topic
Repo
Framework