May 5, 2019

1914 words 9 mins read

Paper Group NANR 16

Annotation and Analysis of Discourse Relations, Temporal Relations and Multi-Layered Situational Relations in Japanese Texts. Twitter Geolocation Prediction Shared Task of the 2016 Workshop on Noisy User-generated Text. Automatic Generation of Context-Based Fill-in-the-Blank Exercises Using Co-occurrence Likelihoods and Google n-grams. Retrofitting …

Annotation and Analysis of Discourse Relations, Temporal Relations and Multi-Layered Situational Relations in Japanese Texts


Title	Annotation and Analysis of Discourse Relations, Temporal Relations and Multi-Layered Situational Relations in Japanese Texts
Authors	Kimi Kaneko, Saku Sugawara, Koji Mineshima, Daisuke Bekki
Abstract	This paper proposes a methodology for building a specialized Japanese data set for recognizing temporal relations and discourse relations. In addition to temporal and discourse relations, multi-layered situational relations that distinguish generic and specific states belonging to different layers in a discourse are annotated. Our methodology has been applied to 170 text fragments taken from Wikinews articles in Japanese. The validity of our methodology is evaluated and analyzed in terms of degree of annotator agreement and frequency of errors.
Tasks	Natural Language Inference, Question Answering, Text Summarization
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-5402/
PDF	https://www.aclweb.org/anthology/W16-5402
PWC	https://paperswithcode.com/paper/annotation-and-analysis-of-discourse
Repo
Framework

Twitter Geolocation Prediction Shared Task of the 2016 Workshop on Noisy User-generated Text


Title	Twitter Geolocation Prediction Shared Task of the 2016 Workshop on Noisy User-generated Text
Authors	Bo Han, Afshin Rahimi, Leon Derczynski, Timothy Baldwin
Abstract	This paper presents the shared task for English Twitter geolocation prediction in WNUT 2016. We discuss details of task settings, data preparations and participant systems. The derived dataset and performance figures from each system provide baselines for future research in this realm.
Tasks	Sentiment Analysis
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-3928/
PDF	https://www.aclweb.org/anthology/W16-3928
PWC	https://paperswithcode.com/paper/twitter-geolocation-prediction-shared-task-of
Repo
Framework

Automatic Generation of Context-Based Fill-in-the-Blank Exercises Using Co-occurrence Likelihoods and Google n-grams


Title	Automatic Generation of Context-Based Fill-in-the-Blank Exercises Using Co-occurrence Likelihoods and Google n-grams
Authors	Jennifer Hill, Rahul Simha
Abstract
Tasks	Reading Comprehension
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-0503/
PDF	https://www.aclweb.org/anthology/W16-0503
PWC	https://paperswithcode.com/paper/automatic-generation-of-context-based-fill-in
Repo
Framework

Retrofitting Word Vectors of MeSH Terms to Improve Semantic Similarity Measures


Title	Retrofitting Word Vectors of MeSH Terms to Improve Semantic Similarity Measures
Authors	Zhiguo Yu, Trevor Cohen, Byron Wallace, Elmer Bernstam, Todd Johnson
Abstract
Tasks	Semantic Similarity, Semantic Textual Similarity
Published	2016-11-01
URL	https://www.aclweb.org/anthology/W16-6106/
PDF	https://www.aclweb.org/anthology/W16-6106
PWC	https://paperswithcode.com/paper/retrofitting-word-vectors-of-mesh-terms-to
Repo
Framework

Rule-based Automatic Multi-word Term Extraction and Lemmatization


Title	Rule-based Automatic Multi-word Term Extraction and Lemmatization
Authors	Ranka Stankovi{'c}, Cvetana Krstev, Ivan Obradovi{'c}, Biljana Lazi{'c}, Aleks Trtovac, ra
Abstract	In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from the mining domain containing more than 600,000 simple word forms. Extracted and lemmatized multi-word terms are filtered in order to reject falsely offered lemmas and then ranked by introducing measures that combine linguistic and statistical information (C-Value, T-Score, LLR, and Keyness). Mean average precision for retrieval of MWU forms ranges from 0.789 to 0.804, while mean average precision of lemma production ranges from 0.956 to 0.960. The evaluation showed that 94{%} of distinct multi-word forms were evaluated as proper multi-word units, and among them 97{%} were associated with correct lemmas.
Tasks	Lemmatization
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1081/
PDF	https://www.aclweb.org/anthology/L16-1081
PWC	https://paperswithcode.com/paper/rule-based-automatic-multi-word-term
Repo
Framework

A Dependency Treebank of the Chinese Buddhist Canon


Title	A Dependency Treebank of the Chinese Buddhist Canon
Authors	Tak-sum Wong, John Lee
Abstract	We present a dependency treebank of the Chinese Buddhist Canon, which contains 1,514 texts with about 50 million Chinese characters. The treebank was created by an automatic parser trained on a smaller treebank, containing four manually annotated sutras (Lee and Kong, 2014). We report results on word segmentation, part-of-speech tagging and dependency parsing, and discuss challenges posed by the processing of medieval Chinese. In a case study, we exploit the treebank to examine verbs frequently associated with Buddha, and to analyze usage patterns of quotative verbs in direct speech. Our results suggest that certain quotative verbs imply status differences between the speaker and the listener.
Tasks	Dependency Parsing, Part-Of-Speech Tagging
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1265/
PDF	https://www.aclweb.org/anthology/L16-1265
PWC	https://paperswithcode.com/paper/a-dependency-treebank-of-the-chinese-buddhist
Repo
Framework

“How Bullying is this Message?": A Psychometric Thermometer for Bullying


Title	“How Bullying is this Message?": A Psychometric Thermometer for Bullying
Authors	Parma Nand, Rivindu Perera, Abhijeet Kasture
Abstract
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/papers/C16-1067/c16-1067
PDF	https://www.aclweb.org/anthology/C16-1067
PWC	https://paperswithcode.com/paper/how-bullying-is-this-message-a-psychometric
Repo
Framework

Distribution of Valency Complements in Czech Complex Predicates: Between Verb and Noun


Title	Distribution of Valency Complements in Czech Complex Predicates: Between Verb and Noun
Authors	V{'a}clava Kettnerov{'a}, Eduard Bej{\v{c}}ek
Abstract	In this paper, we focus on Czech complex predicates formed by a light verb and a predicative noun expressed as the direct object. Although Czech ― as an inflectional language encoding syntactic relations via morphological cases ― provides an excellent opportunity to study the distribution of valency complements in the syntactic structure with complex predicates, this distribution has not been described so far. On the basis of a manual analysis of the richly annotated data from the Prague Dependency Treebank, we thus formulate principles governing this distribution. In an automatic experiment, we verify these principles on well-formed syntactic structures from the Prague Dependency Treebank and the Prague Czech-English Dependency Treebank with very satisfactory results: the distribution of 97{%} of valency complements in the surface structure is governed by the proposed principles. These results corroborate that the surface structure formation of complex predicates is a regular process.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1082/
PDF	https://www.aclweb.org/anthology/L16-1082
PWC	https://paperswithcode.com/paper/distribution-of-valency-complements-in-czech
Repo
Framework

Clustering with Bregman Divergences: an Asymptotic Analysis


Title	Clustering with Bregman Divergences: an Asymptotic Analysis
Authors	Chaoyue Liu, Mikhail Belkin
Abstract	Clustering, in particular $k$-means clustering, is a central topic in data analysis. Clustering with Bregman divergences is a recently proposed generalization of $k$-means clustering which has already been widely used in applications. In this paper we analyze theoretical properties of Bregman clustering when the number of the clusters $k$ is large. We establish quantization rates and describe the limiting distribution of the centers as $k\to \infty$, extending well-known results for $k$-means clustering.
Tasks	Quantization
Published	2016-12-01
URL	http://papers.nips.cc/paper/6550-clustering-with-bregman-divergences-an-asymptotic-analysis
PDF	http://papers.nips.cc/paper/6550-clustering-with-bregman-divergences-an-asymptotic-analysis.pdf
PWC	https://paperswithcode.com/paper/clustering-with-bregman-divergences-an
Repo
Framework

Phrase-Level Combination of SMT and TM Using Constrained Word Lattice


Title	Phrase-Level Combination of SMT and TM Using Constrained Word Lattice
Authors	Liangyou Li, Andy Way, Qun Liu
Abstract
Tasks	Machine Translation
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-2045/
PDF	https://www.aclweb.org/anthology/P16-2045
PWC	https://paperswithcode.com/paper/phrase-level-combination-of-smt-and-tm-using
Repo
Framework

A Lexical Resource of Hebrew Verb-Noun Multi-Word Expressions


Title	A Lexical Resource of Hebrew Verb-Noun Multi-Word Expressions
Authors	Chaya Liebeskind, Yaakov HaCohen-Kerner
Abstract	A verb-noun Multi-Word Expression (MWE) is a combination of a verb and a noun with or without other words, in which the combination has a meaning different from the meaning of the words considered separately. In this paper, we present a new lexical resource of Hebrew Verb-Noun MWEs (VN-MWEs). The VN-MWEs of this resource were manually collected and annotated from five different web resources. In addition, we analyze the lexical properties of Hebrew VN-MWEs by classifying them to three types: morphological, syntactic, and semantic. These two contributions are essential for designing algorithms for automatic VN-MWEs extraction. The analysis suggests some interesting features of VN-MWEs for exploration. The lexical resource enables to sample a set of positive examples for Hebrew VN-MWEs. This set of examples can either be used for training supervised algorithms or as seeds in unsupervised bootstrapping algorithms. Thus, this resource is a first step towards automatic identification of Hebrew VN-MWEs, which is important for natural language understanding, generation and translation systems.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1083/
PDF	https://www.aclweb.org/anthology/L16-1083
PWC	https://paperswithcode.com/paper/a-lexical-resource-of-hebrew-verb-noun-multi
Repo
Framework

K-SRL: Instance-based Learning for Semantic Role Labeling


Title	K-SRL: Instance-based Learning for Semantic Role Labeling
Authors	Alan Akbik, Yunyao Li
Abstract	Semantic role labeling (SRL) is the task of identifying and labeling predicate-argument structures in sentences with semantic frame and role labels. A known challenge in SRL is the large number of low-frequency exceptions in training data, which are highly context-specific and difficult to generalize. To overcome this challenge, we propose the use of instance-based learning that performs no explicit generalization, but rather extrapolates predictions from the most similar instances in the training data. We present a variant of k-nearest neighbors (kNN) classification with composite features to identify nearest neighbors for SRL. We show that high-quality predictions can be derived from a very small number of similar instances. In a comparative evaluation we experimentally demonstrate that our instance-based learning approach significantly outperforms current state-of-the-art systems on both in-domain and out-of-domain data, reaching F1-scores of 89,28{%} and 79.91{%} respectively.
Tasks	Machine Translation, Question Answering, Semantic Role Labeling
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1058/
PDF	https://www.aclweb.org/anthology/C16-1058
PWC	https://paperswithcode.com/paper/k-srl-instance-based-learning-for-semantic
Repo
Framework

SemLinker, a Modular and Open Source Framework for Named Entity Discovery and Linking


Title	SemLinker, a Modular and Open Source Framework for Named Entity Discovery and Linking
Authors	Marie-Jean Meurs, Hayda Almeida, Ludovic Jean-Louis, Eric Charton
Abstract	This paper presents SemLinker, an open source system that discovers named entities, connects them to a reference knowledge base, and clusters them semantically. SemLinker relies on several modules that perform surface form generation, mutual disambiguation, entity clustering, and make use of two annotation engines. SemLinker was evaluated in the English Entity Discovery and Linking track of the Text Analysis Conference on Knowledge Base Population, organized by the US National Institute of Standards and Technology. Along with the SemLinker source code, we release our annotation files containing the discovered named entities, their types, and position across processed documents.
Tasks	Knowledge Base Population
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1085/
PDF	https://www.aclweb.org/anthology/L16-1085
PWC	https://paperswithcode.com/paper/semlinker-a-modular-and-open-source-framework
Repo
Framework

OSU_CHGCG at SemEval-2016 Task 9 : Chinese Semantic Dependency Parsing with Generalized Categorial Grammar


Title	OSU_CHGCG at SemEval-2016 Task 9 : Chinese Semantic Dependency Parsing with Generalized Categorial Grammar
Authors	Manjuan Duan, Lifeng Jin, William Schuler
Abstract
Tasks	Dependency Parsing, Question Answering, Semantic Dependency Parsing
Published	2016-06-01
URL	https://www.aclweb.org/anthology/S16-1189/
PDF	https://www.aclweb.org/anthology/S16-1189
PWC	https://paperswithcode.com/paper/osu_chgcg-at-semeval-2016-task-9-chinese
Repo
Framework

Multilingual Aliasing for Auto-Generating Proposition Banks


Title	Multilingual Aliasing for Auto-Generating Proposition Banks
Authors	Alan Akbik, Xinyu Guan, Yunyao Li
Abstract	Semantic Role Labeling (SRL) is the task of identifying the predicate-argument structure in sentences with semantic frame and role labels. For the English language, the Proposition Bank provides both a lexicon of all possible semantic frames and large amounts of labeled training data. In order to expand SRL beyond English, previous work investigated automatic approaches based on parallel corpora to automatically generate Proposition Banks for new target languages (TLs). However, this approach heuristically produces the frame lexicon from word alignments, leading to a range of lexicon-level errors and inconsistencies. To address these issues, we propose to manually alias TL verbs to existing English frames. For instance, the German verb drehen may evoke several meanings, including {`}turn something{''} and {`}film something{''}. Accordingly, we alias the former to the frame TURN.01 and the latter to a group of frames that includes FILM.01 and SHOOT.03. We execute a large-scale manual aliasing effort for three target languages and apply the new lexicons to automatically generate large Proposition Banks for Chinese, French and German with manually curated frames. We present a detailed evaluation in which we find that our proposed approach significantly increases the quality and consistency of the generated Proposition Banks. We release these resources to the research community.
Tasks	Machine Translation, Question Answering, Semantic Role Labeling
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1327/
PDF	https://www.aclweb.org/anthology/C16-1327
PWC	https://paperswithcode.com/paper/multilingual-aliasing-for-auto-generating
Repo
Framework