May 5, 2019

1633 words 8 mins read

Paper Group NANR 5

Paper Group NANR 5

Orthogonality regularizer for question answering. Error Typology and Remediation Strategies for Requirements Written in English by Non-Native Speakers. Improving POS Tagging of German Learner Language in a Reading Comprehension Scenario. Estonian Dependency Treebank: from Constraint Grammar tagset to Universal Dependencies. Chatbot Technology with …

Orthogonality regularizer for question answering

Title Orthogonality regularizer for question answering
Authors Chunyang Xiao, Guillaume Bouchard, Marc Dymetman, Claire Gardent
Abstract
Tasks Information Retrieval, Open-Domain Question Answering, Question Answering
Published 2016-08-01
URL https://www.aclweb.org/anthology/S16-2019/
PDF https://www.aclweb.org/anthology/S16-2019
PWC https://paperswithcode.com/paper/orthogonality-regularizer-for-question
Repo
Framework

Error Typology and Remediation Strategies for Requirements Written in English by Non-Native Speakers

Title Error Typology and Remediation Strategies for Requirements Written in English by Non-Native Speakers
Authors Marie Garnier, Patrick Saint-Dizier
Abstract In most international industries, English is the main language of communication for technical documents. These documents are designed to be as unambiguous as possible for their users. For international industries based in non-English speaking countries, the professionals in charge of writing requirements are often non-native speakers of English, who rarely receive adequate training in the use of English for this task. As a result, requirements can contain a relatively large diversity of lexical and grammatical errors, which are not eliminated by the use of guidelines from controlled languages. This article investigates the distribution of errors in a corpus of requirements written in English by native speakers of French. Errors are defined on the basis of grammaticality and acceptability principles, and classified using comparable categories. Results show a high proportion of errors in the Noun Phrase, notably through modifier stacking, and errors consistent with simplification strategies. Comparisons with similar corpora in other genres reveal the specificity of the distribution of errors in requirements. This research also introduces possible applied uses, in the form of strategies for the automatic detection of errors, and in-person training provided by certification boards in requirements authoring.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1029/
PDF https://www.aclweb.org/anthology/L16-1029
PWC https://paperswithcode.com/paper/error-typology-and-remediation-strategies-for
Repo
Framework

Improving POS Tagging of German Learner Language in a Reading Comprehension Scenario

Title Improving POS Tagging of German Learner Language in a Reading Comprehension Scenario
Authors Lena Keiper, Andrea Horbach, Stefan Thater
Abstract We present a novel method to automatically improve the accurracy of part-of-speech taggers on learner language. The key idea underlying our approach is to exploit the structure of a typical language learner task and automatically induce POS information for out-of-vocabulary (OOV) words. To evaluate the effectiveness of our approach, we add manual POS and normalization information to an existing language learner corpus. Our evaluation shows an increase in accurracy from 72.4{%} to 81.5{%} on OOV words.
Tasks Reading Comprehension
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1030/
PDF https://www.aclweb.org/anthology/L16-1030
PWC https://paperswithcode.com/paper/improving-pos-tagging-of-german-learner
Repo
Framework

Estonian Dependency Treebank: from Constraint Grammar tagset to Universal Dependencies

Title Estonian Dependency Treebank: from Constraint Grammar tagset to Universal Dependencies
Authors Kadri Muischnek, Kaili M{"u}{"u}risep, Tiina Puolakainen
Abstract This paper presents the first version of Estonian Universal Dependencies Treebank which has been semi-automatically acquired from Estonian Dependency Treebank and comprises ca 400,000 words (ca 30,000 sentences) representing the genres of fiction, newspapers and scientific writing. Article analyses the differences between two annotation schemes and the conversion procedure to Universal Dependencies format. The conversion has been conducted by manually created Constraint Grammar transfer rules. As the rules enable to consider unbounded context, include lexical information and both flat and tree structure features at the same time, the method has proved to be reliable and flexible enough to handle most of transformations. The automatic conversion procedure achieved LAS 95.2{%}, UAS 96.3{%} and LA 98.4{%}. If punctuation marks were excluded from the calculations, we observed LAS 96.4{%}, UAS 97.7{%} and LA 98.2{%}. Still the refinement of the guidelines and methodology is needed in order to re-annotate some syntactic phenomena, e.g. inter-clausal relations. Although automatic rules usually make quite a good guess even in obscure conditions, some relations should be checked and annotated manually after the main conversion.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1247/
PDF https://www.aclweb.org/anthology/L16-1247
PWC https://paperswithcode.com/paper/estonian-dependency-treebank-from-constraint
Repo
Framework

Chatbot Technology with Synthetic Voices in the Acquisition of an Endangered Language: Motivation, Development and Evaluation of a Platform for Irish

Title Chatbot Technology with Synthetic Voices in the Acquisition of an Endangered Language: Motivation, Development and Evaluation of a Platform for Irish
Authors Neasa N{'\i} Chiar{'a}in, Ailbhe N{'\i} Chasaide
Abstract This paper describes the development and evaluation of a chatbot platform designed for the teaching/learning of Irish. The chatbot uses synthetic voices developed for the dialects of Irish. Speech-enabled chatbot technology offers a potentially powerful tool for dealing with the challenges of teaching/learning an endangered language where learners have limited access to native speaker models of the language and limited exposure to the language in a truly communicative setting. The sociolinguistic context that motivates the present development is explained. The evaluation of the chatbot was carried out in 13 schools by 228 pupils and consisted of two parts. Firstly, learners{'} opinions of the overall chatbot platform as a learning environment were elicited. Secondly, learners evaluated the intelligibility, quality, and attractiveness of the synthetic voices used in this platform. Results were overwhelmingly positive to both the learning platform and the synthetic voices and indicate that the time may now be ripe for language learning applications which exploit speech and language technologies. It is further argued that these technologies have a particularly vital role to play in the maintenance of the endangered language.
Tasks Chatbot
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1547/
PDF https://www.aclweb.org/anthology/L16-1547
PWC https://paperswithcode.com/paper/chatbot-technology-with-synthetic-voices-in
Repo
Framework

Read my points: Effect of animation type when speech-reading from EMA data

Title Read my points: Effect of animation type when speech-reading from EMA data
Authors Kristy James, Martijn Wieling
Abstract
Tasks Motion Capture
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2014/
PDF https://www.aclweb.org/anthology/W16-2014
PWC https://paperswithcode.com/paper/read-my-points-effect-of-animation-type-when
Repo
Framework

Using longest common subsequence and character models to predict word forms

Title Using longest common subsequence and character models to predict word forms
Authors Alexey Sorokin
Abstract
Tasks Lemmatization, Morphological Inflection
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2009/
PDF https://www.aclweb.org/anthology/W16-2009
PWC https://paperswithcode.com/paper/using-longest-common-subsequence-and
Repo
Framework

Arabic Language WEKA-Based Dialect Classifier for Arabic Automatic Speech Recognition Transcripts

Title Arabic Language WEKA-Based Dialect Classifier for Arabic Automatic Speech Recognition Transcripts
Authors Areej Alshutayri, Eric Atwell, Abdulrahman Alosaimy, James Dickins, Michael Ingleby, Janet Watson
Abstract This paper describes an Arabic dialect identification system which we developed for the Discriminating Similar Languages (DSL) 2016 shared task. We classified Arabic dialects by using Waikato Environment for Knowledge Analysis (WEKA) data analytic tool which contains many alternative filters and classifiers for machine learning. We experimented with several classifiers and the best accuracy was achieved using the Sequential Minimal Optimization (SMO) algorithm for training and testing process set to three different feature-sets for each testing process. Our approach achieved an accuracy equal to 42.85{%} which is considerably worse in comparison to the evaluation scores on the training set of 80-90{%} and with training set {``}60:40{''} percentage split which achieved accuracy around 50{%}. We observed that Buckwalter transcripts from the Saarland Automatic Speech Recognition (ASR) system are given without short vowels, though the Buckwalter system has notation for these. We elaborate such observations, describe our methods and analyse the training dataset. |
Tasks Language Identification, Speech Recognition
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4826/
PDF https://www.aclweb.org/anthology/W16-4826
PWC https://paperswithcode.com/paper/arabic-language-weka-based-dialect-classifier
Repo
Framework

QASSIT at SemEval-2016 Task 13: On the integration of Semantic Vectors in Pretopological Spaces for Lexical Taxonomy Acquisition

Title QASSIT at SemEval-2016 Task 13: On the integration of Semantic Vectors in Pretopological Spaces for Lexical Taxonomy Acquisition
Authors Guillaume Cleuziou, Jose G. Moreno
Abstract
Tasks
Published 2016-06-01
URL https://www.aclweb.org/anthology/S16-1205/
PDF https://www.aclweb.org/anthology/S16-1205
PWC https://paperswithcode.com/paper/qassit-at-semeval-2016-task-13-on-the
Repo
Framework

Implicit Polarity and Implicit Aspect Recognition in Opinion Mining

Title Implicit Polarity and Implicit Aspect Recognition in Opinion Mining
Authors Huan-Yuan Chen, Hsin-Hsi Chen
Abstract
Tasks Opinion Mining, Sentiment Analysis
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-2004/
PDF https://www.aclweb.org/anthology/P16-2004
PWC https://paperswithcode.com/paper/implicit-polarity-and-implicit-aspect
Repo
Framework

Inferring Perceived Demographics from User Emotional Tone and User-Environment Emotional Contrast

Title Inferring Perceived Demographics from User Emotional Tone and User-Environment Emotional Contrast
Authors Svitlana Volkova, Yoram Bachrach
Abstract
Tasks Recommendation Systems
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-1148/
PDF https://www.aclweb.org/anthology/P16-1148
PWC https://paperswithcode.com/paper/inferring-perceived-demographics-from-user
Repo
Framework

An Unsupervised Morphological Criterion for Discriminating Similar Languages

Title An Unsupervised Morphological Criterion for Discriminating Similar Languages
Authors Adrien Barbaresi
Abstract In this study conducted on the occasion of the Discriminating between Similar Languages shared task, I introduce an additional decision factor focusing on the token and subtoken level. The motivation behind this submission is to test whether a morphologically-informed criterion can add linguistically relevant information to global categorization and thus improve performance. The contributions of this paper are (1) a description of the unsupervised, low-resource method; (2) an evaluation and analysis of its raw performance; and (3) an assessment of its impact within a model comprising common indicators used in language identification. I present and discuss the systems used in the task A, a 12-way language identification task comprising varieties of five main language groups. Additionally I introduce a new off-the-shelf Naive Bayes classifier using a contrastive word and subword n-gram model ({``}Bayesline{''}) which outperforms the best submissions. |
Tasks Language Identification, Text Categorization
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4827/
PDF https://www.aclweb.org/anthology/W16-4827
PWC https://paperswithcode.com/paper/an-unsupervised-morphological-criterion-for
Repo
Framework

Morphological Smoothing and Extrapolation of Word Embeddings

Title Morphological Smoothing and Extrapolation of Word Embeddings
Authors Ryan Cotterell, Hinrich Sch{"u}tze, Jason Eisner
Abstract
Tasks Word Embeddings
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-1156/
PDF https://www.aclweb.org/anthology/P16-1156
PWC https://paperswithcode.com/paper/morphological-smoothing-and-extrapolation-of
Repo
Framework

Brave New World: Uncovering Topical Dynamics in the ACL Anthology Reference Corpus Using Term Life Cycle Information

Title Brave New World: Uncovering Topical Dynamics in the ACL Anthology Reference Corpus Using Term Life Cycle Information
Authors Anne-Kathrin Schumann
Abstract
Tasks Text Classification
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2101/
PDF https://www.aclweb.org/anthology/W16-2101
PWC https://paperswithcode.com/paper/brave-new-world-uncovering-topical-dynamics
Repo
Framework

COMMIT at SemEval-2016 Task 5: Sentiment Analysis with Rhetorical Structure Theory

Title COMMIT at SemEval-2016 Task 5: Sentiment Analysis with Rhetorical Structure Theory
Authors Kim Schouten, Flavius Frasincar
Abstract
Tasks Aspect-Based Sentiment Analysis, Sentiment Analysis
Published 2016-06-01
URL https://www.aclweb.org/anthology/S16-1057/
PDF https://www.aclweb.org/anthology/S16-1057
PWC https://paperswithcode.com/paper/commit-at-semeval-2016-task-5-sentiment
Repo
Framework
comments powered by Disqus