May 5, 2019

1359 words 7 mins read

Paper Group NANR 4

Paper Group NANR 4

DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation–Maximization and Chunk-based Language Model. Annotating the Little Prince with Chinese AMRs. Phrase Generalization: a Corpus Study in Multi-Document Abstracts and Original News Alignments. Applying Universal Dependency to the Arapaho Language. Filtering …

DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation–Maximization and Chunk-based Language Model

Title DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation–Maximization and Chunk-based Language Model
Authors Ond{\v{r}}ej Herman, V{'\i}t Suchomel, V{'\i}t Baisa, Pavel Rychl{'y}
Abstract In this paper we investigate two approaches to discrimination of similar languages: Expectation{–}maximization algorithm for estimating conditional probability P(wordlanguage) and byte level language models similar to compression-based language modelling methods. The accuracy of these methods reached respectively 86.6{%} and 88.3{%} on set A of the DSL Shared task 2016 competition.
Tasks Language Modelling
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4815/
PDF https://www.aclweb.org/anthology/W16-4815
PWC https://paperswithcode.com/paper/dsl-shared-task-2016-perfect-is-the-enemy-of
Repo
Framework

Annotating the Little Prince with Chinese AMRs

Title Annotating the Little Prince with Chinese AMRs
Authors Bin Li, Yuan Wen, Weiguang Qu, Lijun Bu, Nianwen Xue
Abstract
Tasks Dependency Parsing, Semantic Role Labeling
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-1702/
PDF https://www.aclweb.org/anthology/W16-1702
PWC https://paperswithcode.com/paper/annotating-the-little-prince-with-chinese
Repo
Framework

Phrase Generalization: a Corpus Study in Multi-Document Abstracts and Original News Alignments

Title Phrase Generalization: a Corpus Study in Multi-Document Abstracts and Original News Alignments
Authors Ariani Di-Felippo, Ani Nenkova
Abstract
Tasks Sentence Compression
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-1717/
PDF https://www.aclweb.org/anthology/W16-1717
PWC https://paperswithcode.com/paper/phrase-generalization-a-corpus-study-in-multi
Repo
Framework

Applying Universal Dependency to the Arapaho Language

Title Applying Universal Dependency to the Arapaho Language
Authors Irina Wagner, Andrew Cowell, Jena D. Hwang
Abstract
Tasks
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-1719/
PDF https://www.aclweb.org/anthology/W16-1719
PWC https://paperswithcode.com/paper/applying-universal-dependency-to-the-arapaho
Repo
Framework

Filtering and Measuring the Intrinsic Quality of Human Compositionality Judgments

Title Filtering and Measuring the Intrinsic Quality of Human Compositionality Judgments
Authors Carlos Ramisch, Silvio Cordeiro, Aline Villavicencio
Abstract
Tasks
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-1804/
PDF https://www.aclweb.org/anthology/W16-1804
PWC https://paperswithcode.com/paper/filtering-and-measuring-the-intrinsic-quality
Repo
Framework

Transition-Based Left-Corner Parsing for Identifying PTB-Style Nonlocal Dependencies

Title Transition-Based Left-Corner Parsing for Identifying PTB-Style Nonlocal Dependencies
Authors Yoshihide Kato, Shigeki Matsubara
Abstract
Tasks
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-1088/
PDF https://www.aclweb.org/anthology/P16-1088
PWC https://paperswithcode.com/paper/transition-based-left-corner-parsing-for
Repo
Framework

UnibucKernel: An Approach for Arabic Dialect Identification Based on Multiple String Kernels

Title UnibucKernel: An Approach for Arabic Dialect Identification Based on Multiple String Kernels
Authors Radu Tudor Ionescu, Marius Popescu
Abstract The most common approach in text mining classification tasks is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic features. Unlike the common approach, we present a method that uses only character p-grams (also known as n-grams) as features for the Arabic Dialect Identification (ADI) Closed Shared Task of the DSL 2016 Challenge. The proposed approach combines several string kernels using multiple kernel learning. In the learning stage, we try both Kernel Discriminant Analysis (KDA) and Kernel Ridge Regression (KRR), and we choose KDA as it gives better results in a 10-fold cross-validation carried out on the training set. Our approach is shallow and simple, but the empirical results obtained in the ADI Shared Task prove that it achieves very good results. Indeed, we ranked on the second place with an accuracy of 50.91{%} and a weighted F1 score of 51.31{%}. We also present improved results in this paper, which we obtained after the competition ended. Simply by adding more regularization into our model to make it more suitable for test data that comes from a different distribution than training data, we obtain an accuracy of 51.82{%} and a weighted F1 score of 52.18{%}. Furthermore, the proposed approach has an important advantage in that it is language independent and linguistic theory neutral, as it does not require any NLP tools.
Tasks Text Categorization
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4818/
PDF https://www.aclweb.org/anthology/W16-4818
PWC https://paperswithcode.com/paper/unibuckernel-an-approach-for-arabic-dialect
Repo
Framework

Unshared task: (Dis)agreement in online debates

Title Unshared task: (Dis)agreement in online debates
Authors Maria Skeppstedt, Magnus Sahlgren, Carita Paradis, Andreas Kerren
Abstract
Tasks Argument Mining
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2818/
PDF https://www.aclweb.org/anthology/W16-2818
PWC https://paperswithcode.com/paper/unshared-task-disagreement-in-online-debates
Repo
Framework

Leveraging Inflection Tables for Stemming and Lemmatization.

Title Leveraging Inflection Tables for Stemming and Lemmatization.
Authors Garrett Nicolai, Grzegorz Kondrak
Abstract
Tasks Lemmatization
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-1108/
PDF https://www.aclweb.org/anthology/P16-1108
PWC https://paperswithcode.com/paper/leveraging-inflection-tables-for-stemming-and
Repo
Framework

Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages

Title Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages
Authors Alan Akbik, Vishwajeet Kumar, Yunyao Li
Abstract
Tasks
Published 2016-11-01
URL https://www.aclweb.org/anthology/D16-1102/
PDF https://www.aclweb.org/anthology/D16-1102
PWC https://paperswithcode.com/paper/towards-semi-automatic-generation-of
Repo
Framework

ASIREM Participation at the Discriminating Similar Languages Shared Task 2016

Title ASIREM Participation at the Discriminating Similar Languages Shared Task 2016
Authors Wafia Adouane, Nasredine Semmar, Richard Johansson
Abstract This paper presents the system built by ASIREM team for the Discriminating between Similar Languages (DSL) Shared task 2016. It describes the system which uses character-based and word-based n-grams separately. ASIREM participated in both sub-tasks (sub-task 1 and sub-task 2) and in both open and closed tracks. For the sub-task 1 which deals with Discriminating between similar languages and national language varieties, the system achieved an accuracy of 87.79{%} on the closed track, ending up ninth (the best results being 89.38{%}). In sub-task 2, which deals with Arabic dialect identification, the system achieved its best performance using character-based n-grams (49.67{%} accuracy), ranking fourth in the closed track (the best result being 51.16{%}), and an accuracy of 53.18{%}, ranking first in the open track.
Tasks Language Identification
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4821/
PDF https://www.aclweb.org/anthology/W16-4821
PWC https://paperswithcode.com/paper/asirem-participation-at-the-discriminating
Repo
Framework

Accurate Pinyin-English Codeswitched Language Identification

Title Accurate Pinyin-English Codeswitched Language Identification
Authors Meng Xuan Xia, Jackie Chi Kit Cheung
Abstract
Tasks Language Identification
Published 2016-11-01
URL https://www.aclweb.org/anthology/W16-5809/
PDF https://www.aclweb.org/anthology/W16-5809
PWC https://paperswithcode.com/paper/accurate-pinyin-english-codeswitched-language
Repo
Framework

Discrimination between Similar Languages, Varieties and Dialects using CNN- and LSTM-based Deep Neural Networks

Title Discrimination between Similar Languages, Varieties and Dialects using CNN- and LSTM-based Deep Neural Networks
Authors Chinnappa Guggilla
Abstract In this paper, we describe a system (CGLI) for discriminating similar languages, varieties and dialects using convolutional neural networks (CNNs) and long short-term memory (LSTM) neural networks. We have participated in the Arabic dialect identification sub-task of DSL 2016 shared task for distinguishing different Arabic language texts under closed submission track. Our proposed approach is language independent and works for discriminating any given set of languages, varieties, and dialects. We have obtained 43.29{%} weighted-F1 accuracy in this sub-task using CNN approach using default network parameters.
Tasks Information Retrieval, Language Identification, Machine Translation, Speaker Identification, Text Generation
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4824/
PDF https://www.aclweb.org/anthology/W16-4824
PWC https://paperswithcode.com/paper/discrimination-between-similar-languages
Repo
Framework

Beyond Prefix-Based Interactive Translation Prediction

Title Beyond Prefix-Based Interactive Translation Prediction
Authors Jes{'u}s Gonz{'a}lez-Rubio, Daniel Ortiz-Mart{'\i}nez, Francisco Casacuberta, Jos{'e} Miguel Benedi Ruiz
Abstract
Tasks Machine Translation
Published 2016-08-01
URL https://www.aclweb.org/anthology/K16-1020/
PDF https://www.aclweb.org/anthology/K16-1020
PWC https://paperswithcode.com/paper/beyond-prefix-based-interactive-translation
Repo
Framework

Language and Dialect Discrimination Using Compression-Inspired Language Models

Title Language and Dialect Discrimination Using Compression-Inspired Language Models
Authors Paul McNamee
Abstract The DSL 2016 shared task continued previous evaluations from 2014 and 2015 that facilitated the study of automated language and dialect identification. This paper describes results for this year{'}s shared task and from several related experiments conducted at the Johns Hopkins University Human Language Technology Center of Excellence (JHU HLTCOE). Previously the HLTCOE has explored the use of compression-inspired language modeling for language and dialect identification, using news, Wikipedia, blog post, and Twitter corpora. The technique we have relied upon is based on prediction by partial matching (PPM), a state of the art text compression technique. Due to the close relationship between adaptive compression and language modeling, such compression techniques can also be applied to multi-way text classification problems, and previous studies have examined tasks such as authorship attribution, email spam detection, and topical classification. We applied our approach to the multi-class decision that considered each dialect or language as a possibility for the given shared task input line. Results for test-set A were in accord with our expectations, however results for test-sets B and C appear to be markedly worse. We had not anticipated the inclusion of multiple communications in differing languages in test-set B (social media) input lines, and had not expected the test-set C (dialectal Arabic) data to be represented phonetically instead of in native orthography.
Tasks Language Identification, Language Modelling, Text Classification
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4825/
PDF https://www.aclweb.org/anthology/W16-4825
PWC https://paperswithcode.com/paper/language-and-dialect-discrimination-using
Repo
Framework
comments powered by Disqus