Paper Group NANR 12
Agreement and Disagreement: Comparison of Points of View in the Political Domain. CogALex-V Shared Task: GHHH - Detecting Semantic Relations via Word Embeddings. A Study of the Bump Alternation in Japanese from the Perspective of Extended/Onset Causation. The Trouble with Machine Translation Coherence. GhoSt-PV: A Representative Gold Standard of Ge …
Agreement and Disagreement: Comparison of Points of View in the Political Domain
Title | Agreement and Disagreement: Comparison of Points of View in the Political Domain |
Authors | Stefano Menini, Sara Tonelli |
Abstract | The automated comparison of points of view between two politicians is a very challenging task, due not only to the lack of annotated resources, but also to the different dimensions participating to the definition of agreement and disagreement. In order to shed light on this complex task, we first carry out a pilot study to manually annotate the components involved in detecting agreement and disagreement. Then, based on these findings, we implement different features to capture them automatically via supervised classification. We do not focus on debates in dialogical form, but we rather consider sets of documents, in which politicians may express their position with respect to different topics in an implicit or explicit way, like during an electoral campaign. We create and make available three different datasets. |
Tasks | Sentiment Analysis |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1232/ |
https://www.aclweb.org/anthology/C16-1232 | |
PWC | https://paperswithcode.com/paper/agreement-and-disagreement-comparison-of |
Repo | |
Framework | |
CogALex-V Shared Task: GHHH - Detecting Semantic Relations via Word Embeddings
Title | CogALex-V Shared Task: GHHH - Detecting Semantic Relations via Word Embeddings |
Authors | Mohammed Attia, Suraj Maharjan, Younes Samih, Laura Kallmeyer, Thamar Solorio |
Abstract | This paper describes our system submission to the CogALex-2016 Shared Task on Corpus-Based Identification of Semantic Relations. Our system won first place for Task-1 and second place for Task-2. The evaluation results of our system on the test set is 88.1{%} (79.0{%} for TRUE only) f-measure for Task-1 on detecting semantic similarity, and 76.0{%} (42.3{%} when excluding RANDOM) for Task-2 on identifying finer-grained semantic relations. In our experiments, we try word analogy, linear regression, and multi-task Convolutional Neural Networks (CNNs) with word embeddings from publicly available word vectors. We found that linear regression performs better in the binary classification (Task-1), while CNNs have better performance in the multi-class semantic classification (Task-2). We assume that word analogy is more suited for deterministic answers rather than handling the ambiguity of one-to-many and many-to-many relationships. We also show that classifier performance could benefit from balancing the distribution of labels in the training data. |
Tasks | Information Retrieval, Knowledge Graphs, Semantic Similarity, Semantic Textual Similarity, Word Embeddings |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-5311/ |
https://www.aclweb.org/anthology/W16-5311 | |
PWC | https://paperswithcode.com/paper/cogalex-v-shared-task-ghhh-detecting-semantic |
Repo | |
Framework | |
A Study of the Bump Alternation in Japanese from the Perspective of Extended/Onset Causation
Title | A Study of the Bump Alternation in Japanese from the Perspective of Extended/Onset Causation |
Authors | Natsuno Aoki, Kentaro Nakatani |
Abstract | This paper deals with a seldom studied object/oblique alternation phenomenon in Japanese, which. We call this the bump alternation. This phenomenon, first discussed by Sadanobu (1990), is similar to the English with/against alternation. For example, compare hit the wall with the bat [=immobile-as-direct-object frame] to hit the bat against the wall [=mobile-as-direct-object frame]). However, in the Japanese version, the case frame remains constant. Although we fundamentally question Sadanobu{'}s acceptability judgment, we also claim that the causation type (i.e., whether the event is an instance of onset or extended causation; Talmy, 1988; 2000) could make an improvement. An extended causative interpretation could improve the acceptability of the otherwise awkward immobile-as-direct-object frame. We examined this claim through a rating study, and the results showed an interaction between the Causation type (extended/onset) and the Object type (mobile/immobile) in the direction we predicted. We propose that a perspective shift on what is moving causes the {``}extended causation{''} advantage. | |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-5317/ |
https://www.aclweb.org/anthology/W16-5317 | |
PWC | https://paperswithcode.com/paper/a-study-of-the-bump-alternation-in-japanese |
Repo | |
Framework | |
The Trouble with Machine Translation Coherence
Title | The Trouble with Machine Translation Coherence |
Authors | Karin Sim Smith, Wilker Aziz, Lucia Specia |
Abstract | |
Tasks | Machine Translation |
Published | 2016-01-01 |
URL | https://www.aclweb.org/anthology/W16-3407/ |
https://www.aclweb.org/anthology/W16-3407 | |
PWC | https://paperswithcode.com/paper/the-trouble-with-machine-translation |
Repo | |
Framework | |
GhoSt-PV: A Representative Gold Standard of German Particle Verbs
Title | GhoSt-PV: A Representative Gold Standard of German Particle Verbs |
Authors | Stefan Bott, Nana Khvtisavrishvili, Max Kisselew, Sabine Schulte im Walde |
Abstract | German particle verbs represent a frequent type of multi-word-expression that forms a highly productive paradigm in the lexicon. Similarly to other multi-word expressions, particle verbs exhibit various levels of compositionality. One of the major obstacles for the study of compositionality is the lack of representative gold standards of human ratings. In order to address this bottleneck, this paper presents such a gold standard data set containing 400 randomly selected German particle verbs. It is balanced across several particle types and three frequency bands, and accomplished by human ratings on the degree of semantic compositionality. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-5318/ |
https://www.aclweb.org/anthology/W16-5318 | |
PWC | https://paperswithcode.com/paper/ghost-pv-a-representative-gold-standard-of |
Repo | |
Framework | |
Improving Information Extraction from Wikipedia Texts using Basic English
Title | Improving Information Extraction from Wikipedia Texts using Basic English |
Authors | Teresa Rodr{'\i}guez-Ferreira, Adri{'a}n Rabad{'a}n, Raquel Herv{'a}s, Alberto D{'\i}az |
Abstract | The aim of this paper is to study the effect that the use of Basic English versus common English has on information extraction from online resources. The amount of online information available to the public grows exponentially, and is potentially an excellent resource for information extraction. The problem is that this information often comes in an unstructured format, such as plain text. In order to retrieve knowledge from this type of text, it must first be analysed to find the relevant details, and the nature of the language used can greatly impact the quality of the extracted information. In this paper, we compare triplets that represent definitions or properties of concepts obtained from three online collaborative resources (English Wikipedia, Simple English Wikipedia and Simple English Wiktionary) and study the differences in the results when Basic English is used instead of common English. The results show that resources written in Basic English produce less quantity of triplets, but with higher quality. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1062/ |
https://www.aclweb.org/anthology/L16-1062 | |
PWC | https://paperswithcode.com/paper/improving-information-extraction-from |
Repo | |
Framework | |
Lexfom: a lexical functions ontology model
Title | Lexfom: a lexical functions ontology model |
Authors | Alexs Fonseca, ro, Fatiha Sadat, Fran{\c{c}}ois Lareau |
Abstract | A lexical function represents a type of relation that exists between lexical units (words or expressions) in any language. For example, the antonymy is a type of relation that is represented by the lexical function Anti: Anti(big) = small. Those relations include both paradigmatic relations, i.e. vertical relations, such as synonymy, antonymy and meronymy and syntagmatic relations, i.e. horizontal relations, such as objective qualification (legitimate demand), subjective qualification (fruitful analysis), positive evaluation (good review) and support verbs (pay a visit, subject to an interrogation). In this paper, we present the Lexical Functions Ontology Model (lexfom) to represent lexical functions and the relation among lexical units. Lexfom is divided in four modules: lexical function representation (lfrep), lexical function family (lffam), lexical function semantic perspective (lfsem) and lexical function relations (lfrel). Moreover, we show how it combines to Lexical Model for Ontologies (lemon), for the transformation of lexical networks into the semantic web formats. So far, we have implemented 100 simple and 500 complex lexical functions, and encoded about 8,000 syntagmatic and 46,000 paradigmatic relations, for the French language. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-5320/ |
https://www.aclweb.org/anthology/W16-5320 | |
PWC | https://paperswithcode.com/paper/lexfom-a-lexical-functions-ontology-model |
Repo | |
Framework | |
CASSAurus: A Resource of Simpler Spanish Synonyms
Title | CASSAurus: A Resource of Simpler Spanish Synonyms |
Authors | Ricardo Baeza-Yates, Luz Rello, Julia Dembowski |
Abstract | In this work we introduce and describe a language resource composed of lists of simpler synonyms for Spanish. The synonyms are divided in different senses taken from the Spanish OpenThesaurus, where context disambiguation was performed by using statistical information from the Web and Google Books Ngrams. This resource is freely available online and can be used for different NLP tasks such as lexical simplification. Indeed, so far it has been already integrated into four tools. |
Tasks | Lexical Simplification |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1151/ |
https://www.aclweb.org/anthology/L16-1151 | |
PWC | https://paperswithcode.com/paper/cassaurus-a-resource-of-simpler-spanish |
Repo | |
Framework | |
Processing Document Collections to Automatically Extract Linked Data: Semantic Storytelling Technologies for Smart Curation Workflows
Title | Processing Document Collections to Automatically Extract Linked Data: Semantic Storytelling Technologies for Smart Curation Workflows |
Authors | Peter Bourgonje, Julian Moreno Schneider, Georg Rehm, Felix Sasaki |
Abstract | |
Tasks | Efficient Exploration, Entity Linking, Machine Translation, Named Entity Recognition, Text Generation |
Published | 2016-09-01 |
URL | https://www.aclweb.org/anthology/W16-3503/ |
https://www.aclweb.org/anthology/W16-3503 | |
PWC | https://paperswithcode.com/paper/processing-document-collections-to |
Repo | |
Framework | |
Multiword Expressions at the Grammar-Lexicon Interface
Title | Multiword Expressions at the Grammar-Lexicon Interface |
Authors | Timothy Baldwin |
Abstract | In this talk, I will outline a range of challenges presented by multiword expressions in terms of (lexicalist) precision grammar engineering, and different strategies for accommodating those challenges, in an attempt to strike the right balance in terms of generalisation and over- and under-generation. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-3802/ |
https://www.aclweb.org/anthology/W16-3802 | |
PWC | https://paperswithcode.com/paper/multiword-expressions-at-the-grammar-lexicon |
Repo | |
Framework | |
Enhanced English Universal Dependencies: An Improved Representation for Natural Language Understanding Tasks
Title | Enhanced English Universal Dependencies: An Improved Representation for Natural Language Understanding Tasks |
Authors | Sebastian Schuster, Christopher D. Manning |
Abstract | Many shallow natural language understanding tasks use dependency trees to extract relations between content words. However, strict surface-structure dependency trees tend to follow the linguistic structure of sentences too closely and frequently fail to provide direct relations between content words. To mitigate this problem, the original Stanford Dependencies representation also defines two dependency graph representations which contain additional and augmented relations that explicitly capture otherwise implicit relations between content words. In this paper, we revisit and extend these dependency graph representations in light of the recent Universal Dependencies (UD) initiative and provide a detailed account of an enhanced and an enhanced++ English UD representation. We further present a converter from constituency to basic, i.e., strict surface structure, UD trees, and a converter from basic UD trees to enhanced and enhanced++ English UD graphs. We release both converters as part of Stanford CoreNLP and the Stanford Parser. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1376/ |
https://www.aclweb.org/anthology/L16-1376 | |
PWC | https://paperswithcode.com/paper/enhanced-english-universal-dependencies-an |
Repo | |
Framework | |
A Unified Architecture for Semantic Role Labeling and Relation Classification
Title | A Unified Architecture for Semantic Role Labeling and Relation Classification |
Authors | Jiang Guo, Wanxiang Che, Haifeng Wang, Ting Liu, Jun Xu |
Abstract | This paper describes a unified neural architecture for identifying and classifying multi-typed semantic relations between words in a sentence. We investigate two typical and well-studied tasks: semantic role labeling (SRL) which identifies the relations between predicates and arguments, and relation classification (RC) which focuses on the relation between two entities or nominals. While mostly studied separately in prior work, we show that the two tasks can be effectively connected and modeled using a general architecture. Experiments on CoNLL-2009 benchmark datasets show that our SRL models significantly outperform state-of-the-art approaches. Our RC models also yield competitive performance with the best published records. Furthermore, we show that the two tasks can be trained jointly with multi-task learning, resulting in additive significant improvements for SRL. |
Tasks | Feature Engineering, Information Retrieval, Multi-Task Learning, Named Entity Recognition, Part-Of-Speech Tagging, Relation Classification, Semantic Role Labeling |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1120/ |
https://www.aclweb.org/anthology/C16-1120 | |
PWC | https://paperswithcode.com/paper/a-unified-architecture-for-semantic-role |
Repo | |
Framework | |
An Overview of BPPT’s Indonesian Language Resources
Title | An Overview of BPPT’s Indonesian Language Resources |
Authors | Gunarso Gunarso, Hammam Riza |
Abstract | This paper describes various Indonesian language resources that Agency for the Assessment and Application of Technology (BPPT) has developed and collected since mid 80{'}s when we joined MMTS (Multilingual Machine Translation System), an international project coordinated by CICC-Japan to develop a machine translation system for five Asian languages (Bahasa Indonesia, Malay, Thai, Japanese, and Chinese). Since then, we have been actively doing many types of research in the field of statistical machine translation, speech recognition, and speech synthesis which requires many text and speech corpus. Most recent cooperation within ASEAN-IVO is the development of Indonesian ALT (Asian Language Treebank) has added new NLP tools. |
Tasks | Machine Translation, Speech Recognition, Speech Synthesis |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-5409/ |
https://www.aclweb.org/anthology/W16-5409 | |
PWC | https://paperswithcode.com/paper/an-overview-of-bppts-indonesian-language |
Repo | |
Framework | |
BCCWJ-DepPara: A Syntactic Annotation Treebank on the `Balanced Corpus of Contemporary Written Japanese’
Title | BCCWJ-DepPara: A Syntactic Annotation Treebank on the `Balanced Corpus of Contemporary Written Japanese’ | |
Authors | Masayuki Asahara, Yuji Matsumoto |
Abstract | Paratactic syntactic structures are difficult to represent in syntactic dependency tree structures. As such, we propose an annotation schema for syntactic dependency annotation of Japanese, in which coordinate structures are split from and overlaid on bunsetsu-based (base phrase unit) dependency. The schema represents nested coordinate structures, non-constituent conjuncts, and forward sharing as the set of regions. The annotation was performed on the core data of {`}Balanced Corpus of Contemporary Written Japanese{'}, which comprised about one million words and 1980 samples from six registers, such as newspapers, books, magazines, and web texts. | |
Tasks | Dependency Parsing |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-5406/ |
https://www.aclweb.org/anthology/W16-5406 | |
PWC | https://paperswithcode.com/paper/bccwj-deppara-a-syntactic-annotation-treebank |
Repo | |
Framework | |
Construction and Analysis of a Large Vietnamese Text Corpus
Title | Construction and Analysis of a Large Vietnamese Text Corpus |
Authors | Dieu-Thu Le, Uwe Quasthoff |
Abstract | This paper presents a new Vietnamese text corpus which contains around 4.05 billion words. It is a collection of Wikipedia texts, newspaper articles and random web texts. The paper describes the process of collecting, cleaning and creating the corpus. Processing Vietnamese texts faced several challenges, for example, different from many Latin languages, Vietnamese language does not use blanks for separating words, hence using common tokenizers such as replacing blanks with word boundary does not work. A short review about different approaches of Vietnamese tokenization is presented together with how the corpus has been processed and created. After that, some statistical analysis on this data is reported including the number of syllable, average word length, sentence length and topic analysis. The corpus is integrated into a framework which allows searching and browsing. Using this web interface, users can find out how many times a particular word appears in the corpus, sample sentences where this word occurs, its left and right neighbors. |
Tasks | Tokenization |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1065/ |
https://www.aclweb.org/anthology/L16-1065 | |
PWC | https://paperswithcode.com/paper/construction-and-analysis-of-a-large |
Repo | |
Framework | |