May 5, 2019

2174 words 11 mins read

Paper Group NANR 35

Paper Group NANR 35

BosphorusSign: A Turkish Sign Language Recognition Corpus in Health and Finance Domains. A short proof that O_2 is an MCFL. Different Flavors of GUM: Evaluating Genre and Sentence Type Effects on Multilayer Corpus Annotation Quality. Fostering the Next Generation of European Language Technology: Recent Developments ― Emerging Initiatives ― Challen …

BosphorusSign: A Turkish Sign Language Recognition Corpus in Health and Finance Domains

Title BosphorusSign: A Turkish Sign Language Recognition Corpus in Health and Finance Domains
Authors Necati Cihan Camg{"o}z, Ahmet Alp K{\i}nd{\i}ro{\u{g}}lu, Serpil Karab{"u}kl{"u}, Meltem Kelepir, Ay{\c{s}}e Sumru {"O}zsoy, Lale Akarun
Abstract There are as many sign languages as there are deaf communities in the world. Linguists have been collecting corpora of different sign languages and annotating them extensively in order to study and understand their properties. On the other hand, the field of computer vision has approached the sign language recognition problem as a grand challenge and research efforts have intensified in the last 20 years. However, corpora collected for studying linguistic properties are often not suitable for sign language recognition as the statistical methods used in the field require large amounts of data. Recently, with the availability of inexpensive depth cameras, groups from the computer vision community have started collecting corpora with large number of repetitions for sign language recognition research. In this paper, we present the BosphorusSign Turkish Sign Language corpus, which consists of 855 sign and phrase samples from the health, finance and everyday life domains. The corpus is collected using the state-of-the-art Microsoft Kinect v2 depth sensor, and will be the first in this sign language research field. Furthermore, there will be annotations rendered by linguists so that the corpus will appeal both to the linguistic and sign language recognition research communities.
Tasks Sign Language Recognition
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1220/
PDF https://www.aclweb.org/anthology/L16-1220
PWC https://paperswithcode.com/paper/bosphorussign-a-turkish-sign-language
Repo
Framework

A short proof that O_2 is an MCFL

Title A short proof that O_2 is an MCFL
Authors Mark-Jan Nederhof
Abstract
Tasks
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-1106/
PDF https://www.aclweb.org/anthology/P16-1106
PWC https://paperswithcode.com/paper/a-short-proof-that-o_2-is-an-mcfl-1
Repo
Framework

Different Flavors of GUM: Evaluating Genre and Sentence Type Effects on Multilayer Corpus Annotation Quality

Title Different Flavors of GUM: Evaluating Genre and Sentence Type Effects on Multilayer Corpus Annotation Quality
Authors Amir Zeldes, Dan Simonson
Abstract
Tasks Coreference Resolution, Dependency Parsing, Domain Adaptation, Language Acquisition, Machine Translation
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-1709/
PDF https://www.aclweb.org/anthology/W16-1709
PWC https://paperswithcode.com/paper/different-flavors-of-gum-evaluating-genre-and
Repo
Framework

Fostering the Next Generation of European Language Technology: Recent Developments ― Emerging Initiatives ― Challenges and Opportunities

Title Fostering the Next Generation of European Language Technology: Recent Developments ― Emerging Initiatives ― Challenges and Opportunities
Authors Georg Rehm, Jan Haji{\v{c}}, Josef van Genabith, Andrejs Vasiljevs
Abstract META-NET is a European network of excellence, founded in 2010, that consists of 60 research centres in 34 European countries. One of the key visions and goals of META-NET is a truly multilingual Europe, which is substantially supported and realised through language technologies. In this article we provide an overview of recent developments around the multilingual Europe topic, we also describe recent and upcoming events as well as recent and upcoming strategy papers. Furthermore, we provide overviews of two new emerging initiatives, the CEF.AT and ELRC activity on the one hand and the Cracking the Language Barrier federation on the other. The paper closes with several suggested next steps in order to address the current challenges and to open up new opportunities.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1251/
PDF https://www.aclweb.org/anthology/L16-1251
PWC https://paperswithcode.com/paper/fostering-the-next-generation-of-european
Repo
Framework
Title Trends in HLT Research: A Survey of LDC’s Data Scholarship Program
Authors Denise DiPersio, Christopher Cieri
Abstract Since its inception in 2010, the Linguistic Data Consortium{'}s data scholarship program has awarded no cost grants in data to 64 recipients from 26 countries. A survey of the twelve cycles to date ― two awards each in the Fall and Spring semesters from Fall 2010 through Spring 2016 ― yields an interesting view into graduate program research trends in human language technology and related fields and the particular data sets deemed important to support that research. The survey also reveals regions in which such activity appears to be on a rise, including in Arabic-speaking regions and portions of the Americas and Asia.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1255/
PDF https://www.aclweb.org/anthology/L16-1255
PWC https://paperswithcode.com/paper/trends-in-hlt-research-a-survey-of-ldcs-data
Repo
Framework

Tweeting and Being Ironic in the Debate about a Political Reform: the French Annotated Corpus TWitter-MariagePourTous

Title Tweeting and Being Ironic in the Debate about a Political Reform: the French Annotated Corpus TWitter-MariagePourTous
Authors Cristina Bosco, Mirko Lai, Viviana Patti, Daniela Virone
Abstract The paper introduces a new annotated French data set for Sentiment Analysis, which is a currently missing resource. It focuses on the collection from Twitter of data related to the socio-political debate about the reform of the bill for wedding in France. The design of the annotation scheme is described, which extends a polarity label set by making available tags for marking target semantic areas and figurative language devices. The annotation process is presented and the disagreement discussed, in particular, in the perspective of figurative language use and in that of the semantic oriented annotation, which are open challenges for NLP systems.
Tasks Sentiment Analysis
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1256/
PDF https://www.aclweb.org/anthology/L16-1256
PWC https://paperswithcode.com/paper/tweeting-and-being-ironic-in-the-debate-about
Repo
Framework

Towards a Corpus of Violence Acts in Arabic Social Media

Title Towards a Corpus of Violence Acts in Arabic Social Media
Authors Ayman Alhelbawy, Poesio Massimo, Udo Kruschwitz
Abstract In this paper we present a new corpus of Arabic tweets that mention some form of violent event, developed to support the automatic identification of Human Rights Abuse. The dataset was manually labelled for seven classes of violence using crowdsourcing.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1257/
PDF https://www.aclweb.org/anthology/L16-1257
PWC https://paperswithcode.com/paper/towards-a-corpus-of-violence-acts-in-arabic
Repo
Framework

Variational Inference in Mixed Probabilistic Submodular Models

Title Variational Inference in Mixed Probabilistic Submodular Models
Authors Josip Djolonga, Sebastian Tschiatschek, Andreas Krause
Abstract We consider the problem of variational inference in probabilistic models with both log-submodular and log-supermodular higher-order potentials. These models can represent arbitrary distributions over binary variables, and thus generalize the commonly used pairwise Markov random fields and models with log-supermodular potentials only, for which efficient approximate inference algorithms are known. While inference in the considered models is #P-hard in general, we present efficient approximate algorithms exploiting recent advances in the field of discrete optimization. We demonstrate the effectiveness of our approach in a large set of experiments, where our model allows reasoning about preferences over sets of items with complements and substitutes.
Tasks
Published 2016-12-01
URL http://papers.nips.cc/paper/6225-variational-inference-in-mixed-probabilistic-submodular-models
PDF http://papers.nips.cc/paper/6225-variational-inference-in-mixed-probabilistic-submodular-models.pdf
PWC https://paperswithcode.com/paper/variational-inference-in-mixed-probabilistic
Repo
Framework

WMT2016: A Hybrid Approach to Bilingual Document Alignment

Title WMT2016: A Hybrid Approach to Bilingual Document Alignment
Authors Sainik Mahata, Dipankar Das, Santanu Pal
Abstract
Tasks Machine Translation, Word Sense Disambiguation
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2373/
PDF https://www.aclweb.org/anthology/W16-2373
PWC https://paperswithcode.com/paper/wmt2016-a-hybrid-approach-to-bilingual
Repo
Framework

Utilizing Temporal Information for Taxonomy Construction

Title Utilizing Temporal Information for Taxonomy Construction
Authors Luu Anh Tuan, Siu Cheung Hui, See Kiong Ng
Abstract Taxonomies play an important role in many applications by organizing domain knowledge into a hierarchy of {`}is-a{'} relations between terms. Previous work on automatic construction of taxonomies from text documents either ignored temporal information or used fixed time periods to discretize the time series of documents. In this paper, we propose a time-aware method to automatically construct and effectively maintain a taxonomy from a given series of documents preclustered for a domain of interest. The method extracts temporal information from the documents and uses a timestamp contribution function to score the temporal relevance of the evidence from source texts when identifying the taxonomic relations for constructing the taxonomy. Experimental results show that our proposed method outperforms the state-of-the-art methods by increasing F-measure up to 7{%}{–}20{%}. Furthermore, the proposed method can incrementally update the taxonomy by adding fresh relations from new data and removing outdated relations using an information decay function. It thus avoids rebuilding the whole taxonomy from scratch for every update and keeps the taxonomy effectively up-to-date in order to track the latest information trends in the rapidly evolving domain. |
Tasks Question Answering, Time Series
Published 2016-01-01
URL https://www.aclweb.org/anthology/Q16-1039/
PDF https://www.aclweb.org/anthology/Q16-1039
PWC https://paperswithcode.com/paper/utilizing-temporal-information-for-taxonomy
Repo
Framework

Enhancing The RATP-DECODA Corpus With Linguistic Annotations For Performing A Large Range Of NLP Tasks

Title Enhancing The RATP-DECODA Corpus With Linguistic Annotations For Performing A Large Range Of NLP Tasks
Authors Carole Lailler, L, Ana{"\i}s eau, Fr{'e}d{'e}ric B{'e}chet, Yannick Est{`e}ve, Paul Del{'e}glise
Abstract In this article, we present the RATP-DECODA Corpus which is composed by a set of 67 hours of speech from telephone conversations of a Customer Care Service (CCS). This corpus is already available on line at http://sldr.org/sldr000847/fr in its first version. However, many enhancements have been made in order to allow the development of automatic techniques to transcript conversations and to capture their meaning. These enhancements fall into two categories: firstly, we have increased the size of the corpus with manual transcriptions from a new operational day; secondly we have added new linguistic annotations to the whole corpus (either manually or through an automatic processing) in order to perform various linguistic tasks from syntactic and semantic parsing to dialog act tagging and dialog summarization.
Tasks Semantic Parsing
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1166/
PDF https://www.aclweb.org/anthology/L16-1166
PWC https://paperswithcode.com/paper/enhancing-the-ratp-decoda-corpus-with
Repo
Framework

Challenges of Adjective Mapping between plWordNet and Princeton WordNet

Title Challenges of Adjective Mapping between plWordNet and Princeton WordNet
Authors Ewa Rudnicka, Wojciech Witkowski, Katarzyna Podlaska
Abstract The paper presents the strategy and results of mapping adjective synsets between plWordNet (the wordnet of Polish, cf. Piasecki et al. 2009, Maziarz et al. 2013) and Princeton WordNet (cf. Fellbaum 1998). The main challenge of this enterprise has been very different synset relation structures in the two networks: horizontal, dumbbell-model based in PWN and vertical, hyponymy-based in plWN. Moreover, the two wordnets display differences in the grouping of adjectives into semantic domains and in the size of the adjective category. The handle the above contrasts, a series of automatic prompt algorithms and a manual mapping procedure relying on corresponding synset and lexical unit relations as well as on inter-lingual relations between noun synsets were proposed in the pilot stage of mapping (Rudnicka et al. 2015). In the paper we discuss the final results of the mapping process as well as explain example mapping choices. Suggestions for further development of mapping are also given.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1382/
PDF https://www.aclweb.org/anthology/L16-1382
PWC https://paperswithcode.com/paper/challenges-of-adjective-mapping-between
Repo
Framework

Functions of Code-Switching in Tweets: An Annotation Framework and Some Initial Experiments

Title Functions of Code-Switching in Tweets: An Annotation Framework and Some Initial Experiments
Authors Rafiya Begum, Kalika Bali, Monojit Choudhury, Koustav Rudra, Niloy Ganguly
Abstract Code-Switching (CS) between two languages is extremely common in communities with societal multilingualism where speakers switch between two or more languages when interacting with each other. CS has been extensively studied in spoken language by linguists for several decades but with the popularity of social-media and less formal Computer Mediated Communication, we now see a big rise in the use of CS in the text form. This poses interesting challenges and a need for computational processing of such code-switched data. As with any Computational Linguistic analysis and Natural Language Processing tools and applications, we need annotated data for understanding, processing, and generation of code-switched language. In this study, we focus on CS between English and Hindi Tweets extracted from the Twitter stream of Hindi-English bilinguals. We present an annotation scheme for annotating the pragmatic functions of CS in Hindi-English (Hi-En) code-switched tweets based on a linguistic analysis and some initial experiments.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1260/
PDF https://www.aclweb.org/anthology/L16-1260
PWC https://paperswithcode.com/paper/functions-of-code-switching-in-tweets-an
Repo
Framework

The Negochat Corpus of Human-agent Negotiation Dialogues

Title The Negochat Corpus of Human-agent Negotiation Dialogues
Authors Vasily Konovalov, Ron Artstein, Oren Melamud, Ido Dagan
Abstract Annotated in-domain corpora are crucial to the successful development of dialogue systems of automated agents, and in particular for developing natural language understanding (NLU) components of such systems. Unfortunately, such important resources are scarce. In this work, we introduce an annotated natural language human-agent dialogue corpus in the negotiation domain. The corpus was collected using Amazon Mechanical Turk following the {}Wizard-Of-Oz{'} approach, where a {}wizard{'} human translates the participants{'} natural language utterances in real time into a semantic language. Once dialogue collection was completed, utterances were annotated with intent labels by two independent annotators, achieving high inter-annotator agreement. Our initial experiments with an SVM classifier show that automatically inferring such labels from the utterances is far from trivial. We make our corpus publicly available to serve as an aid in the development of dialogue systems for negotiation agents, and suggest that analogous corpora can be created following our methodology and using our available source code. To the best of our knowledge this is the first publicly available negotiation dialogue corpus.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1501/
PDF https://www.aclweb.org/anthology/L16-1501
PWC https://paperswithcode.com/paper/the-negochat-corpus-of-human-agent
Repo
Framework

Enriching Cold Start Personalized Language Model Using Social Network Information

Title Enriching Cold Start Personalized Language Model Using Social Network Information
Authors Yu-Yang Huang, Rui Yan, Tsung-Ting Kuo, Shou-De Lin
Abstract
Tasks Information Retrieval, Language Modelling
Published 2016-06-01
URL https://www.aclweb.org/anthology/O16-2003/
PDF https://www.aclweb.org/anthology/O16-2003
PWC https://paperswithcode.com/paper/enriching-cold-start-personalized-language
Repo
Framework
comments powered by Disqus