Paper Group NANR 52
Information structure, syntax, and pragmatics and other factors in resolving scope ambiguity. A Lexical Resource for the Identification of ``Weak Words’’ in German Specification Documents. Improving corpus search via parsing. Affective Lexicon Creation for the Greek Language. A Hungarian Sentiment Corpus Manually Annotated at Aspect Level. SVALex: …
Information structure, syntax, and pragmatics and other factors in resolving scope ambiguity
Title | Information structure, syntax, and pragmatics and other factors in resolving scope ambiguity |
Authors | Valentina Apresjan |
Abstract | The paper is a corpus study of the factors involved in disambiguating potential scope ambiguity in sentences with negation and universal quantifier, such as {}I don{'}t want talk to all these people{''}, which can alternatively mean {`}I don{'}t want to talk to any of these people{'} and {`}I don{'}t want to talk to some of these people{'}. The relevant factors are demonstrated to be largely different from those involved in disambiguating lexical polysemy. They include the syntactic function of the constituent containing { }all{''} quantifier (subject, direct complement, adjunct), as well as the deepness of its embedding; the status of the main predicate and {``}all{''} constituent with respect to the information structure of the 6utterance (topic vs. focus, given vs. new information); pragmatic implicatures pertaining to the situations described in the utterances. | |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-3801/ |
https://www.aclweb.org/anthology/W16-3801 | |
PWC | https://paperswithcode.com/paper/information-structure-syntax-and-pragmatics |
Repo | |
Framework | |
A Lexical Resource for the Identification of ``Weak Words’’ in German Specification Documents
Title | A Lexical Resource for the Identification of ``Weak Words’’ in German Specification Documents | |
Authors | Jennifer Krisch, Melanie Dick, Ronny Jauch, Ulrich Heid |
Abstract | We report on the creation of a lexical resource for the identification of potentially unspecific or imprecise constructions in German requirements documentation from the car manufacturing industry. In requirements engineering, such expressions are called {``}weak words{''}: they are not sufficiently precise to ensure an unambiguous interpretation by the contractual partners, who for the definition of their cooperation, typically rely on specification documents (Melchisedech, 2000); an example are dimension adjectives, such as kurz or lang ({}short{'}, { }long{'}) which need to be modified by adverbials indicating the exact duration, size etc. Contrary to standard practice in requirements engineering, where the identification of such weak words is merely based on stopword lists, we identify weak uses in context, by querying annotated text. The queries are part of the resource, as they define the conditions when a word use is weak. We evaluate the recognition of weak uses on our development corpus and on an unseen evaluation corpus, reaching stable F1-scores above 0.95. | |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1454/ |
https://www.aclweb.org/anthology/L16-1454 | |
PWC | https://paperswithcode.com/paper/a-lexical-resource-for-the-identification-of |
Repo | |
Framework | |
Improving corpus search via parsing
Title | Improving corpus search via parsing |
Authors | Natalia Klyueva, Pavel Stra{\v{n}}{'a}k |
Abstract | In this paper, we describe an addition to the corpus query system Kontext that enables to enhance the search using syntactic attributes in addition to the existing features, mainly lemmas and morphological categories. We present the enhancements of the corpus query system itself, the attributes we use to represent syntactic structures in data, and some examples of querying the syntactically annotated corpora, such as treebanks in various languages as well as an automatically parsed large corpus. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1457/ |
https://www.aclweb.org/anthology/L16-1457 | |
PWC | https://paperswithcode.com/paper/improving-corpus-search-via-parsing |
Repo | |
Framework | |
Affective Lexicon Creation for the Greek Language
Title | Affective Lexicon Creation for the Greek Language |
Authors | Elisavet Palogiannidi, Polychronis Koutsakis, Elias Iosif, Alex Potamianos, ros |
Abstract | Starting from the English affective lexicon ANEW (Bradley and Lang, 1999a) we have created the first Greek affective lexicon. It contains human ratings for the three continuous affective dimensions of valence, arousal and dominance for 1034 words. The Greek affective lexicon is compared with affective lexica in English, Spanish and Portuguese. The lexicon is automatically expanded by selecting a small number of manually annotated words to bootstrap the process of estimating affective ratings of unknown words. We experimented with the parameters of the semantic-affective model in order to investigate their impact to its performance, which reaches 85{%} binary classification accuracy (positive vs. negative ratings). We share the Greek affective lexicon that consists of 1034 words and the automatically expanded Greek affective lexicon that contains 407K words. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1458/ |
https://www.aclweb.org/anthology/L16-1458 | |
PWC | https://paperswithcode.com/paper/affective-lexicon-creation-for-the-greek |
Repo | |
Framework | |
A Hungarian Sentiment Corpus Manually Annotated at Aspect Level
Title | A Hungarian Sentiment Corpus Manually Annotated at Aspect Level |
Authors | Martina Katalin Szab{'o}, Veronika Vincze, Katalin Ilona Simk{'o}, Viktor Varga, Viktor Hangya |
Abstract | In this paper we present a Hungarian sentiment corpus manually annotated at aspect level. Our corpus consists of Hungarian opinion texts written about different types of products. The main aim of creating the corpus was to produce an appropriate database providing possibilities for developing text mining software tools. The corpus is a unique Hungarian database: to the best of our knowledge, no digitized Hungarian sentiment corpus that is annotated on the level of fragments and targets has been made so far. In addition, many language elements of the corpus, relevant from the point of view of sentiment analysis, got distinct types of tags in the annotation. In this paper, on the one hand, we present the method of annotation, and we discuss the difficulties concerning text annotation process. On the other hand, we provide some quantitative and qualitative data on the corpus. We conclude with a description of the applicability of the corpus. |
Tasks | Sentiment Analysis |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1459/ |
https://www.aclweb.org/anthology/L16-1459 | |
PWC | https://paperswithcode.com/paper/a-hungarian-sentiment-corpus-manually |
Repo | |
Framework | |
SVALex: a CEFR-graded Lexical Resource for Swedish Foreign and Second Language Learners
Title | SVALex: a CEFR-graded Lexical Resource for Swedish Foreign and Second Language Learners |
Authors | Thomas Fran{\c{c}}ois, Elena Volodina, Ildik{'o} Pil{'a}n, Ana{"\i}s Tack |
Abstract | The paper introduces SVALex, a lexical resource primarily aimed at learners and teachers of Swedish as a foreign and second language that describes the distribution of 15,681 words and expressions across the Common European Framework of Reference (CEFR). The resource is based on a corpus of coursebook texts, and thus describes receptive vocabulary learners are exposed to during reading activities, as opposed to productive vocabulary they use when speaking or writing. The paper describes the methodology applied to create the list and to estimate the frequency distribution. It also discusses some characteristics of the resulting resource and compares it to other lexical resources for Swedish. An interesting feature of this resource is the possibility to separate the wheat from the chaff, identifying the core vocabulary at each level, i.e. vocabulary shared by several coursebook writers at each level, from peripheral vocabulary which is used by the minority of the coursebook writers. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1032/ |
https://www.aclweb.org/anthology/L16-1032 | |
PWC | https://paperswithcode.com/paper/svalex-a-cefr-graded-lexical-resource-for |
Repo | |
Framework | |
Effect Functors for Opinion Inference
Title | Effect Functors for Opinion Inference |
Authors | Josef Ruppenhofer, Br, Jasper es |
Abstract | Sentiment analysis has so far focused on the detection of explicit opinions. However, of late implicit opinions have received broader attention, the key idea being that the evaluation of an event type by a speaker depends on how the participants in the event are valued and how the event itself affects the participants. We present an annotation scheme for adding relevant information, couched in terms of so-called effect functors, to German lexical items. Our scheme synthesizes and extends previous proposals. We report on an inter-annotator agreement study. We also present results of a crowdsourcing experiment to test the utility of some known and some new functors for opinion inference where, unlike in previous work, subjects are asked to reason from event evaluation to participant evaluation. |
Tasks | Sentiment Analysis |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1460/ |
https://www.aclweb.org/anthology/L16-1460 | |
PWC | https://paperswithcode.com/paper/effect-functors-for-opinion-inference |
Repo | |
Framework | |
The Hebrew FrameNet Project
Title | The Hebrew FrameNet Project |
Authors | Avi Hayoun, Michael Elhadad |
Abstract | We present the Hebrew FrameNet project, describe the development and annotation processes and enumerate the challenges we faced along the way. We have developed semi-automatic tools to help speed the annotation and data collection process. The resource currently covers 167 frames, 3,000 lexical units and about 500 fully annotated sentences. We have started training and testing automatic SRL tools on the seed data. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1688/ |
https://www.aclweb.org/anthology/L16-1688 | |
PWC | https://paperswithcode.com/paper/the-hebrew-framenet-project |
Repo | |
Framework | |
OPFI: A Tool for Opinion Finding in Polish
Title | OPFI: A Tool for Opinion Finding in Polish |
Authors | Aleks Wawer, er |
Abstract | The paper contains a description of OPFI: Opinion Finder for the Polish Language, a freely available tool for opinion target extraction. The goal of the tool is opinion finding: a task of identifying tuples composed of sentiment (positive or negative) and its target (about what or whom is the sentiment expressed). OPFI is not dependent on any particular method of sentiment identification and provides a built-in sentiment dictionary as a convenient option. Technically, it contains implementations of three different modes of opinion tuple generation: one hybrid based on dependency parsing and CRF, the second based on shallow parsing and the third on deep learning, namely GRU neural network. The paper also contains a description of related language resources: two annotated treebanks and one set of tweets. |
Tasks | Dependency Parsing |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1464/ |
https://www.aclweb.org/anthology/L16-1464 | |
PWC | https://paperswithcode.com/paper/opfi-a-tool-for-opinion-finding-in-polish |
Repo | |
Framework | |
A Portable Method for Parallel and Comparable Document Alignment
Title | A Portable Method for Parallel and Comparable Document Alignment |
Authors | Thierry Etchegoyhen, Andoni Azpeitia |
Abstract | |
Tasks | Machine Translation |
Published | 2016-01-01 |
URL | https://www.aclweb.org/anthology/W16-3412/ |
https://www.aclweb.org/anthology/W16-3412 | |
PWC | https://paperswithcode.com/paper/a-portable-method-for-parallel-and-comparable |
Repo | |
Framework | |
Uzbek-English and Turkish-English Morpheme Alignment Corpora
Title | Uzbek-English and Turkish-English Morpheme Alignment Corpora |
Authors | Xuansong Li, Jennifer Tracey, Stephen Grimes, Stephanie Strassel |
Abstract | Morphologically-rich languages pose problems for machine translation (MT) systems, including word-alignment errors, data sparsity and multiple affixes. Current alignment models at word-level do not distinguish words and morphemes, thus yielding low-quality alignment and subsequently affecting end translation quality. Models using morpheme-level alignment can reduce the vocabulary size of morphologically-rich languages and overcomes data sparsity. The alignment data based on smallest units reveals subtle language features and enhances translation quality. Recent research proves such morpheme-level alignment (MA) data to be valuable linguistic resources for SMT, particularly for languages with rich morphology. In support of this research trend, the Linguistic Data Consortium (LDC) created Uzbek-English and Turkish-English alignment data which are manually aligned at the morpheme level. This paper describes the creation of MA corpora, including alignment and tagging process and approaches, highlighting annotation challenges and specific features of languages with rich morphology. The light tagging annotation on the alignment layer adds extra value to the MA data, facilitating users in flexibly tailoring the data for various MT model training. |
Tasks | Machine Translation, Word Alignment |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1467/ |
https://www.aclweb.org/anthology/L16-1467 | |
PWC | https://paperswithcode.com/paper/uzbek-english-and-turkish-english-morpheme |
Repo | |
Framework | |
An extension of ISO-Space for annotating object direction
Title | An extension of ISO-Space for annotating object direction |
Authors | Daiki Gotou, Hitoshi Nishikawa, Takenobu Tokunaga |
Abstract | In this paper, we extend an existing annotation scheme ISO-Space for annotating necessary spatial information for the task placing an specified object at a specified location with a specified direction according to a natural language instruction. We call such task the spatial placement problem. Our extension particularly focuses on describing the object direction, when the object is placed on the 2D plane. We conducted an annotation experiment in which a corpus of 20 situated dialogues were annotated. The annotation result showed the number of newly introduced tags by our proposal is not negligible. We also implemented an analyser that automatically assigns the proposed tags to the corpus and evaluated its performance. The result showed that the performance for entity tag was quite high ranging from 0.68 to 0.99 in F-measure, but not the case for relation tags, i.e. less than 0.4 in F-measure. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-5401/ |
https://www.aclweb.org/anthology/W16-5401 | |
PWC | https://paperswithcode.com/paper/an-extension-of-iso-space-for-annotating |
Repo | |
Framework | |
MEANTIME, the NewsReader Multilingual Event and Time Corpus
Title | MEANTIME, the NewsReader Multilingual Event and Time Corpus |
Authors | Anne-Lyse Minard, Manuela Speranza, Ruben Urizar, Bego{~n}a Altuna, Marieke van Erp, Anneleen Schoen, Chantal van Son |
Abstract | In this paper, we present the NewsReader MEANTIME corpus, a semantically annotated corpus of Wikinews articles. The corpus consists of 480 news articles, i.e. 120 English news articles and their translations in Spanish, Italian, and Dutch. MEANTIME contains annotations at different levels. The document-level annotation includes markables (e.g. entity mentions, event mentions, time expressions, and numerical expressions), relations between markables (modeling, for example, temporal information and semantic role labeling), and entity and event intra-document coreference. The corpus-level annotation includes entity and event cross-document coreference. Semantic annotation on the English section was performed manually; for the annotation in Italian, Spanish, and (partially) Dutch, a procedure was devised to automatically project the annotations on the English texts onto the translated texts, based on the manual alignment of the annotated elements; this enabled us not only to speed up the annotation process but also provided cross-lingual coreference. The English section of the corpus was extended with timeline annotations for the SemEval 2015 TimeLine shared task. The {``}First CLIN Dutch Shared Task{''} at CLIN26 was based on the Dutch section, while the EVALITA 2016 FactA (Event Factuality Annotation) shared task, based on the Italian section, is currently being organized. | |
Tasks | Semantic Role Labeling |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1699/ |
https://www.aclweb.org/anthology/L16-1699 | |
PWC | https://paperswithcode.com/paper/meantime-the-newsreader-multilingual-event |
Repo | |
Framework | |
Adversarial Multiclass Classification: A Risk Minimization Perspective
Title | Adversarial Multiclass Classification: A Risk Minimization Perspective |
Authors | Rizal Fathony, Anqi Liu, Kaiser Asif, Brian Ziebart |
Abstract | Recently proposed adversarial classification methods have shown promising results for cost sensitive and multivariate losses. In contrast with empirical risk minimization (ERM) methods, which use convex surrogate losses to approximate the desired non-convex target loss function, adversarial methods minimize non-convex losses by treating the properties of the training data as being uncertain and worst case within a minimax game. Despite this difference in formulation, we recast adversarial classification under zero-one loss as an ERM method with a novel prescribed loss function. We demonstrate a number of theoretical and practical advantages over the very closely related hinge loss ERM methods. This establishes adversarial classification under the zero-one loss as a method that fills the long standing gap in multiclass hinge loss classification, simultaneously guaranteeing Fisher consistency and universal consistency, while also providing dual parameter sparsity and high accuracy predictions in practice. |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6088-adversarial-multiclass-classification-a-risk-minimization-perspective |
http://papers.nips.cc/paper/6088-adversarial-multiclass-classification-a-risk-minimization-perspective.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-multiclass-classification-a-risk |
Repo | |
Framework | |
Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning
Title | Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning |
Authors | Boliang Zhang, Xiaoman Pan, Tianlu Wang, Ashish Vaswani, Heng Ji, Kevin Knight, Daniel Marcu |
Abstract | |
Tasks | Cross-Lingual Entity Linking, Entity Linking |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-1029/ |
https://www.aclweb.org/anthology/N16-1029 | |
PWC | https://paperswithcode.com/paper/name-tagging-for-low-resource-incident |
Repo | |
Framework | |