May 5, 2019

2395 words 12 mins read

Paper Group NANR 52

Information structure, syntax, and pragmatics and other factors in resolving scope ambiguity. A Lexical Resource for the Identification of ``Weak Words’’ in German Specification Documents. Improving corpus search via parsing. Affective Lexicon Creation for the Greek Language. A Hungarian Sentiment Corpus Manually Annotated at Aspect Level. SVALex: …

Information structure, syntax, and pragmatics and other factors in resolving scope ambiguity


Title	Information structure, syntax, and pragmatics and other factors in resolving scope ambiguity
Authors	Valentina Apresjan
Abstract	The paper is a corpus study of the factors involved in disambiguating potential scope ambiguity in sentences with negation and universal quantifier, such as {}I don{'}t want talk to all these people{''}, which can alternatively mean {`}I don{'}t want to talk to any of these people{'} and {`}I don{'}t want to talk to some of these people{'}. The relevant factors are demonstrated to be largely different from those involved in disambiguating lexical polysemy. They include the syntactic function of the constituent containing {}all{''} quantifier (subject, direct complement, adjunct), as well as the deepness of its embedding; the status of the main predicate and {``}all{''} constituent with respect to the information structure of the 6utterance (topic vs. focus, given vs. new information); pragmatic implicatures pertaining to the situations described in the utterances. \|
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-3801/
PDF	https://www.aclweb.org/anthology/W16-3801
PWC	https://paperswithcode.com/paper/information-structure-syntax-and-pragmatics
Repo
Framework

A Lexical Resource for the Identification of ``Weak Words’’ in German Specification Documents


Title	A Lexical Resource for the Identification of ``Weak Words’’ in German Specification Documents \|
Authors	Jennifer Krisch, Melanie Dick, Ronny Jauch, Ulrich Heid
Abstract	We report on the creation of a lexical resource for the identification of potentially unspecific or imprecise constructions in German requirements documentation from the car manufacturing industry. In requirements engineering, such expressions are called {``}weak words{''}: they are not sufficiently precise to ensure an unambiguous interpretation by the contractual partners, who for the definition of their cooperation, typically rely on specification documents (Melchisedech, 2000); an example are dimension adjectives, such as kurz or lang ({`}short{'}, {`}long{'}) which need to be modified by adverbials indicating the exact duration, size etc. Contrary to standard practice in requirements engineering, where the identification of such weak words is merely based on stopword lists, we identify weak uses in context, by querying annotated text. The queries are part of the resource, as they define the conditions when a word use is weak. We evaluate the recognition of weak uses on our development corpus and on an unseen evaluation corpus, reaching stable F1-scores above 0.95. \|
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1454/
PDF	https://www.aclweb.org/anthology/L16-1454
PWC	https://paperswithcode.com/paper/a-lexical-resource-for-the-identification-of
Repo
Framework

Improving corpus search via parsing


Title	Improving corpus search via parsing
Authors	Natalia Klyueva, Pavel Stra{\v{n}}{'a}k
Abstract	In this paper, we describe an addition to the corpus query system Kontext that enables to enhance the search using syntactic attributes in addition to the existing features, mainly lemmas and morphological categories. We present the enhancements of the corpus query system itself, the attributes we use to represent syntactic structures in data, and some examples of querying the syntactically annotated corpora, such as treebanks in various languages as well as an automatically parsed large corpus.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1457/
PDF	https://www.aclweb.org/anthology/L16-1457
PWC	https://paperswithcode.com/paper/improving-corpus-search-via-parsing
Repo
Framework

Affective Lexicon Creation for the Greek Language


Title	Affective Lexicon Creation for the Greek Language
Authors	Elisavet Palogiannidi, Polychronis Koutsakis, Elias Iosif, Alex Potamianos, ros
Abstract	Starting from the English affective lexicon ANEW (Bradley and Lang, 1999a) we have created the first Greek affective lexicon. It contains human ratings for the three continuous affective dimensions of valence, arousal and dominance for 1034 words. The Greek affective lexicon is compared with affective lexica in English, Spanish and Portuguese. The lexicon is automatically expanded by selecting a small number of manually annotated words to bootstrap the process of estimating affective ratings of unknown words. We experimented with the parameters of the semantic-affective model in order to investigate their impact to its performance, which reaches 85{%} binary classification accuracy (positive vs. negative ratings). We share the Greek affective lexicon that consists of 1034 words and the automatically expanded Greek affective lexicon that contains 407K words.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1458/
PDF	https://www.aclweb.org/anthology/L16-1458
PWC	https://paperswithcode.com/paper/affective-lexicon-creation-for-the-greek
Repo
Framework

A Hungarian Sentiment Corpus Manually Annotated at Aspect Level


Title	A Hungarian Sentiment Corpus Manually Annotated at Aspect Level
Authors	Martina Katalin Szab{'o}, Veronika Vincze, Katalin Ilona Simk{'o}, Viktor Varga, Viktor Hangya
Abstract	In this paper we present a Hungarian sentiment corpus manually annotated at aspect level. Our corpus consists of Hungarian opinion texts written about different types of products. The main aim of creating the corpus was to produce an appropriate database providing possibilities for developing text mining software tools. The corpus is a unique Hungarian database: to the best of our knowledge, no digitized Hungarian sentiment corpus that is annotated on the level of fragments and targets has been made so far. In addition, many language elements of the corpus, relevant from the point of view of sentiment analysis, got distinct types of tags in the annotation. In this paper, on the one hand, we present the method of annotation, and we discuss the difficulties concerning text annotation process. On the other hand, we provide some quantitative and qualitative data on the corpus. We conclude with a description of the applicability of the corpus.
Tasks	Sentiment Analysis
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1459/
PDF	https://www.aclweb.org/anthology/L16-1459
PWC	https://paperswithcode.com/paper/a-hungarian-sentiment-corpus-manually
Repo
Framework

SVALex: a CEFR-graded Lexical Resource for Swedish Foreign and Second Language Learners


Title	SVALex: a CEFR-graded Lexical Resource for Swedish Foreign and Second Language Learners
Authors	Thomas Fran{\c{c}}ois, Elena Volodina, Ildik{'o} Pil{'a}n, Ana{"\i}s Tack
Abstract	The paper introduces SVALex, a lexical resource primarily aimed at learners and teachers of Swedish as a foreign and second language that describes the distribution of 15,681 words and expressions across the Common European Framework of Reference (CEFR). The resource is based on a corpus of coursebook texts, and thus describes receptive vocabulary learners are exposed to during reading activities, as opposed to productive vocabulary they use when speaking or writing. The paper describes the methodology applied to create the list and to estimate the frequency distribution. It also discusses some characteristics of the resulting resource and compares it to other lexical resources for Swedish. An interesting feature of this resource is the possibility to separate the wheat from the chaff, identifying the core vocabulary at each level, i.e. vocabulary shared by several coursebook writers at each level, from peripheral vocabulary which is used by the minority of the coursebook writers.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1032/
PDF	https://www.aclweb.org/anthology/L16-1032
PWC	https://paperswithcode.com/paper/svalex-a-cefr-graded-lexical-resource-for
Repo
Framework

Effect Functors for Opinion Inference


Title	Effect Functors for Opinion Inference
Authors	Josef Ruppenhofer, Br, Jasper es
Abstract	Sentiment analysis has so far focused on the detection of explicit opinions. However, of late implicit opinions have received broader attention, the key idea being that the evaluation of an event type by a speaker depends on how the participants in the event are valued and how the event itself affects the participants. We present an annotation scheme for adding relevant information, couched in terms of so-called effect functors, to German lexical items. Our scheme synthesizes and extends previous proposals. We report on an inter-annotator agreement study. We also present results of a crowdsourcing experiment to test the utility of some known and some new functors for opinion inference where, unlike in previous work, subjects are asked to reason from event evaluation to participant evaluation.
Tasks	Sentiment Analysis
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1460/
PDF	https://www.aclweb.org/anthology/L16-1460
PWC	https://paperswithcode.com/paper/effect-functors-for-opinion-inference
Repo
Framework

The Hebrew FrameNet Project


Title	The Hebrew FrameNet Project
Authors	Avi Hayoun, Michael Elhadad
Abstract	We present the Hebrew FrameNet project, describe the development and annotation processes and enumerate the challenges we faced along the way. We have developed semi-automatic tools to help speed the annotation and data collection process. The resource currently covers 167 frames, 3,000 lexical units and about 500 fully annotated sentences. We have started training and testing automatic SRL tools on the seed data.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1688/
PDF	https://www.aclweb.org/anthology/L16-1688
PWC	https://paperswithcode.com/paper/the-hebrew-framenet-project
Repo
Framework

OPFI: A Tool for Opinion Finding in Polish


Title	OPFI: A Tool for Opinion Finding in Polish
Authors	Aleks Wawer, er
Abstract	The paper contains a description of OPFI: Opinion Finder for the Polish Language, a freely available tool for opinion target extraction. The goal of the tool is opinion finding: a task of identifying tuples composed of sentiment (positive or negative) and its target (about what or whom is the sentiment expressed). OPFI is not dependent on any particular method of sentiment identification and provides a built-in sentiment dictionary as a convenient option. Technically, it contains implementations of three different modes of opinion tuple generation: one hybrid based on dependency parsing and CRF, the second based on shallow parsing and the third on deep learning, namely GRU neural network. The paper also contains a description of related language resources: two annotated treebanks and one set of tweets.
Tasks	Dependency Parsing
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1464/
PDF	https://www.aclweb.org/anthology/L16-1464
PWC	https://paperswithcode.com/paper/opfi-a-tool-for-opinion-finding-in-polish
Repo
Framework

A Portable Method for Parallel and Comparable Document Alignment


Title	A Portable Method for Parallel and Comparable Document Alignment
Authors	Thierry Etchegoyhen, Andoni Azpeitia
Abstract
Tasks	Machine Translation
Published	2016-01-01
URL	https://www.aclweb.org/anthology/W16-3412/
PDF	https://www.aclweb.org/anthology/W16-3412
PWC	https://paperswithcode.com/paper/a-portable-method-for-parallel-and-comparable
Repo
Framework

Uzbek-English and Turkish-English Morpheme Alignment Corpora


Title	Uzbek-English and Turkish-English Morpheme Alignment Corpora
Authors	Xuansong Li, Jennifer Tracey, Stephen Grimes, Stephanie Strassel
Abstract	Morphologically-rich languages pose problems for machine translation (MT) systems, including word-alignment errors, data sparsity and multiple affixes. Current alignment models at word-level do not distinguish words and morphemes, thus yielding low-quality alignment and subsequently affecting end translation quality. Models using morpheme-level alignment can reduce the vocabulary size of morphologically-rich languages and overcomes data sparsity. The alignment data based on smallest units reveals subtle language features and enhances translation quality. Recent research proves such morpheme-level alignment (MA) data to be valuable linguistic resources for SMT, particularly for languages with rich morphology. In support of this research trend, the Linguistic Data Consortium (LDC) created Uzbek-English and Turkish-English alignment data which are manually aligned at the morpheme level. This paper describes the creation of MA corpora, including alignment and tagging process and approaches, highlighting annotation challenges and specific features of languages with rich morphology. The light tagging annotation on the alignment layer adds extra value to the MA data, facilitating users in flexibly tailoring the data for various MT model training.
Tasks	Machine Translation, Word Alignment
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1467/
PDF	https://www.aclweb.org/anthology/L16-1467
PWC	https://paperswithcode.com/paper/uzbek-english-and-turkish-english-morpheme
Repo
Framework

An extension of ISO-Space for annotating object direction


Title	An extension of ISO-Space for annotating object direction
Authors	Daiki Gotou, Hitoshi Nishikawa, Takenobu Tokunaga
Abstract	In this paper, we extend an existing annotation scheme ISO-Space for annotating necessary spatial information for the task placing an specified object at a specified location with a specified direction according to a natural language instruction. We call such task the spatial placement problem. Our extension particularly focuses on describing the object direction, when the object is placed on the 2D plane. We conducted an annotation experiment in which a corpus of 20 situated dialogues were annotated. The annotation result showed the number of newly introduced tags by our proposal is not negligible. We also implemented an analyser that automatically assigns the proposed tags to the corpus and evaluated its performance. The result showed that the performance for entity tag was quite high ranging from 0.68 to 0.99 in F-measure, but not the case for relation tags, i.e. less than 0.4 in F-measure.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-5401/
PDF	https://www.aclweb.org/anthology/W16-5401
PWC	https://paperswithcode.com/paper/an-extension-of-iso-space-for-annotating
Repo
Framework

MEANTIME, the NewsReader Multilingual Event and Time Corpus


Title	MEANTIME, the NewsReader Multilingual Event and Time Corpus
Authors	Anne-Lyse Minard, Manuela Speranza, Ruben Urizar, Bego{~n}a Altuna, Marieke van Erp, Anneleen Schoen, Chantal van Son
Abstract	In this paper, we present the NewsReader MEANTIME corpus, a semantically annotated corpus of Wikinews articles. The corpus consists of 480 news articles, i.e. 120 English news articles and their translations in Spanish, Italian, and Dutch. MEANTIME contains annotations at different levels. The document-level annotation includes markables (e.g. entity mentions, event mentions, time expressions, and numerical expressions), relations between markables (modeling, for example, temporal information and semantic role labeling), and entity and event intra-document coreference. The corpus-level annotation includes entity and event cross-document coreference. Semantic annotation on the English section was performed manually; for the annotation in Italian, Spanish, and (partially) Dutch, a procedure was devised to automatically project the annotations on the English texts onto the translated texts, based on the manual alignment of the annotated elements; this enabled us not only to speed up the annotation process but also provided cross-lingual coreference. The English section of the corpus was extended with timeline annotations for the SemEval 2015 TimeLine shared task. The {``}First CLIN Dutch Shared Task{''} at CLIN26 was based on the Dutch section, while the EVALITA 2016 FactA (Event Factuality Annotation) shared task, based on the Italian section, is currently being organized. \|
Tasks	Semantic Role Labeling
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1699/
PDF	https://www.aclweb.org/anthology/L16-1699
PWC	https://paperswithcode.com/paper/meantime-the-newsreader-multilingual-event
Repo
Framework

Adversarial Multiclass Classification: A Risk Minimization Perspective


Title	Adversarial Multiclass Classification: A Risk Minimization Perspective
Authors	Rizal Fathony, Anqi Liu, Kaiser Asif, Brian Ziebart
Abstract	Recently proposed adversarial classification methods have shown promising results for cost sensitive and multivariate losses. In contrast with empirical risk minimization (ERM) methods, which use convex surrogate losses to approximate the desired non-convex target loss function, adversarial methods minimize non-convex losses by treating the properties of the training data as being uncertain and worst case within a minimax game. Despite this difference in formulation, we recast adversarial classification under zero-one loss as an ERM method with a novel prescribed loss function. We demonstrate a number of theoretical and practical advantages over the very closely related hinge loss ERM methods. This establishes adversarial classification under the zero-one loss as a method that fills the long standing gap in multiclass hinge loss classification, simultaneously guaranteeing Fisher consistency and universal consistency, while also providing dual parameter sparsity and high accuracy predictions in practice.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6088-adversarial-multiclass-classification-a-risk-minimization-perspective
PDF	http://papers.nips.cc/paper/6088-adversarial-multiclass-classification-a-risk-minimization-perspective.pdf
PWC	https://paperswithcode.com/paper/adversarial-multiclass-classification-a-risk
Repo
Framework

Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning


Title	Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning
Authors	Boliang Zhang, Xiaoman Pan, Tianlu Wang, Ashish Vaswani, Heng Ji, Kevin Knight, Daniel Marcu
Abstract
Tasks	Cross-Lingual Entity Linking, Entity Linking
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1029/
PDF	https://www.aclweb.org/anthology/N16-1029
PWC	https://paperswithcode.com/paper/name-tagging-for-low-resource-incident
Repo
Framework