May 5, 2019

1633 words 8 mins read

Paper Group NANR 5

Orthogonality regularizer for question answering. Error Typology and Remediation Strategies for Requirements Written in English by Non-Native Speakers. Improving POS Tagging of German Learner Language in a Reading Comprehension Scenario. Estonian Dependency Treebank: from Constraint Grammar tagset to Universal Dependencies. Chatbot Technology with …

Orthogonality regularizer for question answering


Title	Orthogonality regularizer for question answering
Authors	Chunyang Xiao, Guillaume Bouchard, Marc Dymetman, Claire Gardent
Abstract
Tasks	Information Retrieval, Open-Domain Question Answering, Question Answering
Published	2016-08-01
URL	https://www.aclweb.org/anthology/S16-2019/
PDF	https://www.aclweb.org/anthology/S16-2019
PWC	https://paperswithcode.com/paper/orthogonality-regularizer-for-question
Repo
Framework

Error Typology and Remediation Strategies for Requirements Written in English by Non-Native Speakers


Title	Error Typology and Remediation Strategies for Requirements Written in English by Non-Native Speakers
Authors	Marie Garnier, Patrick Saint-Dizier
Abstract	In most international industries, English is the main language of communication for technical documents. These documents are designed to be as unambiguous as possible for their users. For international industries based in non-English speaking countries, the professionals in charge of writing requirements are often non-native speakers of English, who rarely receive adequate training in the use of English for this task. As a result, requirements can contain a relatively large diversity of lexical and grammatical errors, which are not eliminated by the use of guidelines from controlled languages. This article investigates the distribution of errors in a corpus of requirements written in English by native speakers of French. Errors are defined on the basis of grammaticality and acceptability principles, and classified using comparable categories. Results show a high proportion of errors in the Noun Phrase, notably through modifier stacking, and errors consistent with simplification strategies. Comparisons with similar corpora in other genres reveal the specificity of the distribution of errors in requirements. This research also introduces possible applied uses, in the form of strategies for the automatic detection of errors, and in-person training provided by certification boards in requirements authoring.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1029/
PDF	https://www.aclweb.org/anthology/L16-1029
PWC	https://paperswithcode.com/paper/error-typology-and-remediation-strategies-for
Repo
Framework

Improving POS Tagging of German Learner Language in a Reading Comprehension Scenario


Title	Improving POS Tagging of German Learner Language in a Reading Comprehension Scenario
Authors	Lena Keiper, Andrea Horbach, Stefan Thater
Abstract	We present a novel method to automatically improve the accurracy of part-of-speech taggers on learner language. The key idea underlying our approach is to exploit the structure of a typical language learner task and automatically induce POS information for out-of-vocabulary (OOV) words. To evaluate the effectiveness of our approach, we add manual POS and normalization information to an existing language learner corpus. Our evaluation shows an increase in accurracy from 72.4{%} to 81.5{%} on OOV words.
Tasks	Reading Comprehension
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1030/
PDF	https://www.aclweb.org/anthology/L16-1030
PWC	https://paperswithcode.com/paper/improving-pos-tagging-of-german-learner
Repo
Framework

Estonian Dependency Treebank: from Constraint Grammar tagset to Universal Dependencies


Title	Estonian Dependency Treebank: from Constraint Grammar tagset to Universal Dependencies
Authors	Kadri Muischnek, Kaili M{"u}{"u}risep, Tiina Puolakainen
Abstract	This paper presents the first version of Estonian Universal Dependencies Treebank which has been semi-automatically acquired from Estonian Dependency Treebank and comprises ca 400,000 words (ca 30,000 sentences) representing the genres of fiction, newspapers and scientific writing. Article analyses the differences between two annotation schemes and the conversion procedure to Universal Dependencies format. The conversion has been conducted by manually created Constraint Grammar transfer rules. As the rules enable to consider unbounded context, include lexical information and both flat and tree structure features at the same time, the method has proved to be reliable and flexible enough to handle most of transformations. The automatic conversion procedure achieved LAS 95.2{%}, UAS 96.3{%} and LA 98.4{%}. If punctuation marks were excluded from the calculations, we observed LAS 96.4{%}, UAS 97.7{%} and LA 98.2{%}. Still the refinement of the guidelines and methodology is needed in order to re-annotate some syntactic phenomena, e.g. inter-clausal relations. Although automatic rules usually make quite a good guess even in obscure conditions, some relations should be checked and annotated manually after the main conversion.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1247/
PDF	https://www.aclweb.org/anthology/L16-1247
PWC	https://paperswithcode.com/paper/estonian-dependency-treebank-from-constraint
Repo
Framework

Chatbot Technology with Synthetic Voices in the Acquisition of an Endangered Language: Motivation, Development and Evaluation of a Platform for Irish


Title	Chatbot Technology with Synthetic Voices in the Acquisition of an Endangered Language: Motivation, Development and Evaluation of a Platform for Irish
Authors	Neasa N{'\i} Chiar{'a}in, Ailbhe N{'\i} Chasaide
Abstract	This paper describes the development and evaluation of a chatbot platform designed for the teaching/learning of Irish. The chatbot uses synthetic voices developed for the dialects of Irish. Speech-enabled chatbot technology offers a potentially powerful tool for dealing with the challenges of teaching/learning an endangered language where learners have limited access to native speaker models of the language and limited exposure to the language in a truly communicative setting. The sociolinguistic context that motivates the present development is explained. The evaluation of the chatbot was carried out in 13 schools by 228 pupils and consisted of two parts. Firstly, learners{'} opinions of the overall chatbot platform as a learning environment were elicited. Secondly, learners evaluated the intelligibility, quality, and attractiveness of the synthetic voices used in this platform. Results were overwhelmingly positive to both the learning platform and the synthetic voices and indicate that the time may now be ripe for language learning applications which exploit speech and language technologies. It is further argued that these technologies have a particularly vital role to play in the maintenance of the endangered language.
Tasks	Chatbot
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1547/
PDF	https://www.aclweb.org/anthology/L16-1547
PWC	https://paperswithcode.com/paper/chatbot-technology-with-synthetic-voices-in
Repo
Framework

Read my points: Effect of animation type when speech-reading from EMA data


Title	Read my points: Effect of animation type when speech-reading from EMA data
Authors	Kristy James, Martijn Wieling
Abstract
Tasks	Motion Capture
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2014/
PDF	https://www.aclweb.org/anthology/W16-2014
PWC	https://paperswithcode.com/paper/read-my-points-effect-of-animation-type-when
Repo
Framework

Using longest common subsequence and character models to predict word forms


Title	Using longest common subsequence and character models to predict word forms
Authors	Alexey Sorokin
Abstract
Tasks	Lemmatization, Morphological Inflection
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2009/
PDF	https://www.aclweb.org/anthology/W16-2009
PWC	https://paperswithcode.com/paper/using-longest-common-subsequence-and
Repo
Framework

Arabic Language WEKA-Based Dialect Classifier for Arabic Automatic Speech Recognition Transcripts


Title	Arabic Language WEKA-Based Dialect Classifier for Arabic Automatic Speech Recognition Transcripts
Authors	Areej Alshutayri, Eric Atwell, Abdulrahman Alosaimy, James Dickins, Michael Ingleby, Janet Watson
Abstract	This paper describes an Arabic dialect identification system which we developed for the Discriminating Similar Languages (DSL) 2016 shared task. We classified Arabic dialects by using Waikato Environment for Knowledge Analysis (WEKA) data analytic tool which contains many alternative filters and classifiers for machine learning. We experimented with several classifiers and the best accuracy was achieved using the Sequential Minimal Optimization (SMO) algorithm for training and testing process set to three different feature-sets for each testing process. Our approach achieved an accuracy equal to 42.85{%} which is considerably worse in comparison to the evaluation scores on the training set of 80-90{%} and with training set {``}60:40{''} percentage split which achieved accuracy around 50{%}. We observed that Buckwalter transcripts from the Saarland Automatic Speech Recognition (ASR) system are given without short vowels, though the Buckwalter system has notation for these. We elaborate such observations, describe our methods and analyse the training dataset. \|
Tasks	Language Identification, Speech Recognition
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4826/
PDF	https://www.aclweb.org/anthology/W16-4826
PWC	https://paperswithcode.com/paper/arabic-language-weka-based-dialect-classifier
Repo
Framework

QASSIT at SemEval-2016 Task 13: On the integration of Semantic Vectors in Pretopological Spaces for Lexical Taxonomy Acquisition


Title	QASSIT at SemEval-2016 Task 13: On the integration of Semantic Vectors in Pretopological Spaces for Lexical Taxonomy Acquisition
Authors	Guillaume Cleuziou, Jose G. Moreno
Abstract
Tasks
Published	2016-06-01
URL	https://www.aclweb.org/anthology/S16-1205/
PDF	https://www.aclweb.org/anthology/S16-1205
PWC	https://paperswithcode.com/paper/qassit-at-semeval-2016-task-13-on-the
Repo
Framework

Implicit Polarity and Implicit Aspect Recognition in Opinion Mining


Title	Implicit Polarity and Implicit Aspect Recognition in Opinion Mining
Authors	Huan-Yuan Chen, Hsin-Hsi Chen
Abstract
Tasks	Opinion Mining, Sentiment Analysis
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-2004/
PDF	https://www.aclweb.org/anthology/P16-2004
PWC	https://paperswithcode.com/paper/implicit-polarity-and-implicit-aspect
Repo
Framework

Inferring Perceived Demographics from User Emotional Tone and User-Environment Emotional Contrast


Title	Inferring Perceived Demographics from User Emotional Tone and User-Environment Emotional Contrast
Authors	Svitlana Volkova, Yoram Bachrach
Abstract
Tasks	Recommendation Systems
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-1148/
PDF	https://www.aclweb.org/anthology/P16-1148
PWC	https://paperswithcode.com/paper/inferring-perceived-demographics-from-user
Repo
Framework

An Unsupervised Morphological Criterion for Discriminating Similar Languages


Title	An Unsupervised Morphological Criterion for Discriminating Similar Languages
Authors	Adrien Barbaresi
Abstract	In this study conducted on the occasion of the Discriminating between Similar Languages shared task, I introduce an additional decision factor focusing on the token and subtoken level. The motivation behind this submission is to test whether a morphologically-informed criterion can add linguistically relevant information to global categorization and thus improve performance. The contributions of this paper are (1) a description of the unsupervised, low-resource method; (2) an evaluation and analysis of its raw performance; and (3) an assessment of its impact within a model comprising common indicators used in language identification. I present and discuss the systems used in the task A, a 12-way language identification task comprising varieties of five main language groups. Additionally I introduce a new off-the-shelf Naive Bayes classifier using a contrastive word and subword n-gram model ({``}Bayesline{''}) which outperforms the best submissions. \|
Tasks	Language Identification, Text Categorization
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4827/
PDF	https://www.aclweb.org/anthology/W16-4827
PWC	https://paperswithcode.com/paper/an-unsupervised-morphological-criterion-for
Repo
Framework

Morphological Smoothing and Extrapolation of Word Embeddings


Title	Morphological Smoothing and Extrapolation of Word Embeddings
Authors	Ryan Cotterell, Hinrich Sch{"u}tze, Jason Eisner
Abstract
Tasks	Word Embeddings
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-1156/
PDF	https://www.aclweb.org/anthology/P16-1156
PWC	https://paperswithcode.com/paper/morphological-smoothing-and-extrapolation-of
Repo
Framework

Brave New World: Uncovering Topical Dynamics in the ACL Anthology Reference Corpus Using Term Life Cycle Information


Title	Brave New World: Uncovering Topical Dynamics in the ACL Anthology Reference Corpus Using Term Life Cycle Information
Authors	Anne-Kathrin Schumann
Abstract
Tasks	Text Classification
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2101/
PDF	https://www.aclweb.org/anthology/W16-2101
PWC	https://paperswithcode.com/paper/brave-new-world-uncovering-topical-dynamics
Repo
Framework

COMMIT at SemEval-2016 Task 5: Sentiment Analysis with Rhetorical Structure Theory


Title	COMMIT at SemEval-2016 Task 5: Sentiment Analysis with Rhetorical Structure Theory
Authors	Kim Schouten, Flavius Frasincar
Abstract
Tasks	Aspect-Based Sentiment Analysis, Sentiment Analysis
Published	2016-06-01
URL	https://www.aclweb.org/anthology/S16-1057/
PDF	https://www.aclweb.org/anthology/S16-1057
PWC	https://paperswithcode.com/paper/commit-at-semeval-2016-task-5-sentiment
Repo
Framework