May 5, 2019

1359 words 7 mins read

Paper Group NANR 4

DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation–Maximization and Chunk-based Language Model. Annotating the Little Prince with Chinese AMRs. Phrase Generalization: a Corpus Study in Multi-Document Abstracts and Original News Alignments. Applying Universal Dependency to the Arapaho Language. Filtering …

DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation–Maximization and Chunk-based Language Model


Title	DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation–Maximization and Chunk-based Language Model
Authors	Ond{\v{r}}ej Herman, V{'\i}t Suchomel, V{'\i}t Baisa, Pavel Rychl{'y}
Abstract	In this paper we investigate two approaches to discrimination of similar languages: Expectation{–}maximization algorithm for estimating conditional probability P(wordlanguage) and byte level language models similar to compression-based language modelling methods. The accuracy of these methods reached respectively 86.6{%} and 88.3{%} on set A of the DSL Shared task 2016 competition.
Tasks	Language Modelling
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4815/
PDF	https://www.aclweb.org/anthology/W16-4815
PWC	https://paperswithcode.com/paper/dsl-shared-task-2016-perfect-is-the-enemy-of
Repo
Framework

Annotating the Little Prince with Chinese AMRs


Title	Annotating the Little Prince with Chinese AMRs
Authors	Bin Li, Yuan Wen, Weiguang Qu, Lijun Bu, Nianwen Xue
Abstract
Tasks	Dependency Parsing, Semantic Role Labeling
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-1702/
PDF	https://www.aclweb.org/anthology/W16-1702
PWC	https://paperswithcode.com/paper/annotating-the-little-prince-with-chinese
Repo
Framework

Phrase Generalization: a Corpus Study in Multi-Document Abstracts and Original News Alignments


Title	Phrase Generalization: a Corpus Study in Multi-Document Abstracts and Original News Alignments
Authors	Ariani Di-Felippo, Ani Nenkova
Abstract
Tasks	Sentence Compression
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-1717/
PDF	https://www.aclweb.org/anthology/W16-1717
PWC	https://paperswithcode.com/paper/phrase-generalization-a-corpus-study-in-multi
Repo
Framework

Applying Universal Dependency to the Arapaho Language


Title	Applying Universal Dependency to the Arapaho Language
Authors	Irina Wagner, Andrew Cowell, Jena D. Hwang
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-1719/
PDF	https://www.aclweb.org/anthology/W16-1719
PWC	https://paperswithcode.com/paper/applying-universal-dependency-to-the-arapaho
Repo
Framework

Filtering and Measuring the Intrinsic Quality of Human Compositionality Judgments


Title	Filtering and Measuring the Intrinsic Quality of Human Compositionality Judgments
Authors	Carlos Ramisch, Silvio Cordeiro, Aline Villavicencio
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-1804/
PDF	https://www.aclweb.org/anthology/W16-1804
PWC	https://paperswithcode.com/paper/filtering-and-measuring-the-intrinsic-quality
Repo
Framework

Transition-Based Left-Corner Parsing for Identifying PTB-Style Nonlocal Dependencies


Title	Transition-Based Left-Corner Parsing for Identifying PTB-Style Nonlocal Dependencies
Authors	Yoshihide Kato, Shigeki Matsubara
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-1088/
PDF	https://www.aclweb.org/anthology/P16-1088
PWC	https://paperswithcode.com/paper/transition-based-left-corner-parsing-for
Repo
Framework

UnibucKernel: An Approach for Arabic Dialect Identification Based on Multiple String Kernels


Title	UnibucKernel: An Approach for Arabic Dialect Identification Based on Multiple String Kernels
Authors	Radu Tudor Ionescu, Marius Popescu
Abstract	The most common approach in text mining classification tasks is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic features. Unlike the common approach, we present a method that uses only character p-grams (also known as n-grams) as features for the Arabic Dialect Identification (ADI) Closed Shared Task of the DSL 2016 Challenge. The proposed approach combines several string kernels using multiple kernel learning. In the learning stage, we try both Kernel Discriminant Analysis (KDA) and Kernel Ridge Regression (KRR), and we choose KDA as it gives better results in a 10-fold cross-validation carried out on the training set. Our approach is shallow and simple, but the empirical results obtained in the ADI Shared Task prove that it achieves very good results. Indeed, we ranked on the second place with an accuracy of 50.91{%} and a weighted F1 score of 51.31{%}. We also present improved results in this paper, which we obtained after the competition ended. Simply by adding more regularization into our model to make it more suitable for test data that comes from a different distribution than training data, we obtain an accuracy of 51.82{%} and a weighted F1 score of 52.18{%}. Furthermore, the proposed approach has an important advantage in that it is language independent and linguistic theory neutral, as it does not require any NLP tools.
Tasks	Text Categorization
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4818/
PDF	https://www.aclweb.org/anthology/W16-4818
PWC	https://paperswithcode.com/paper/unibuckernel-an-approach-for-arabic-dialect
Repo
Framework

Unshared task: (Dis)agreement in online debates


Title	Unshared task: (Dis)agreement in online debates
Authors	Maria Skeppstedt, Magnus Sahlgren, Carita Paradis, Andreas Kerren
Abstract
Tasks	Argument Mining
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2818/
PDF	https://www.aclweb.org/anthology/W16-2818
PWC	https://paperswithcode.com/paper/unshared-task-disagreement-in-online-debates
Repo
Framework

Leveraging Inflection Tables for Stemming and Lemmatization.


Title	Leveraging Inflection Tables for Stemming and Lemmatization.
Authors	Garrett Nicolai, Grzegorz Kondrak
Abstract
Tasks	Lemmatization
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-1108/
PDF	https://www.aclweb.org/anthology/P16-1108
PWC	https://paperswithcode.com/paper/leveraging-inflection-tables-for-stemming-and
Repo
Framework

Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages


Title	Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages
Authors	Alan Akbik, Vishwajeet Kumar, Yunyao Li
Abstract
Tasks
Published	2016-11-01
URL	https://www.aclweb.org/anthology/D16-1102/
PDF	https://www.aclweb.org/anthology/D16-1102
PWC	https://paperswithcode.com/paper/towards-semi-automatic-generation-of
Repo
Framework

ASIREM Participation at the Discriminating Similar Languages Shared Task 2016


Title	ASIREM Participation at the Discriminating Similar Languages Shared Task 2016
Authors	Wafia Adouane, Nasredine Semmar, Richard Johansson
Abstract	This paper presents the system built by ASIREM team for the Discriminating between Similar Languages (DSL) Shared task 2016. It describes the system which uses character-based and word-based n-grams separately. ASIREM participated in both sub-tasks (sub-task 1 and sub-task 2) and in both open and closed tracks. For the sub-task 1 which deals with Discriminating between similar languages and national language varieties, the system achieved an accuracy of 87.79{%} on the closed track, ending up ninth (the best results being 89.38{%}). In sub-task 2, which deals with Arabic dialect identification, the system achieved its best performance using character-based n-grams (49.67{%} accuracy), ranking fourth in the closed track (the best result being 51.16{%}), and an accuracy of 53.18{%}, ranking first in the open track.
Tasks	Language Identification
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4821/
PDF	https://www.aclweb.org/anthology/W16-4821
PWC	https://paperswithcode.com/paper/asirem-participation-at-the-discriminating
Repo
Framework

Accurate Pinyin-English Codeswitched Language Identification


Title	Accurate Pinyin-English Codeswitched Language Identification
Authors	Meng Xuan Xia, Jackie Chi Kit Cheung
Abstract
Tasks	Language Identification
Published	2016-11-01
URL	https://www.aclweb.org/anthology/W16-5809/
PDF	https://www.aclweb.org/anthology/W16-5809
PWC	https://paperswithcode.com/paper/accurate-pinyin-english-codeswitched-language
Repo
Framework

Discrimination between Similar Languages, Varieties and Dialects using CNN- and LSTM-based Deep Neural Networks


Title	Discrimination between Similar Languages, Varieties and Dialects using CNN- and LSTM-based Deep Neural Networks
Authors	Chinnappa Guggilla
Abstract	In this paper, we describe a system (CGLI) for discriminating similar languages, varieties and dialects using convolutional neural networks (CNNs) and long short-term memory (LSTM) neural networks. We have participated in the Arabic dialect identification sub-task of DSL 2016 shared task for distinguishing different Arabic language texts under closed submission track. Our proposed approach is language independent and works for discriminating any given set of languages, varieties, and dialects. We have obtained 43.29{%} weighted-F1 accuracy in this sub-task using CNN approach using default network parameters.
Tasks	Information Retrieval, Language Identification, Machine Translation, Speaker Identification, Text Generation
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4824/
PDF	https://www.aclweb.org/anthology/W16-4824
PWC	https://paperswithcode.com/paper/discrimination-between-similar-languages
Repo
Framework

Beyond Prefix-Based Interactive Translation Prediction


Title	Beyond Prefix-Based Interactive Translation Prediction
Authors	Jes{'u}s Gonz{'a}lez-Rubio, Daniel Ortiz-Mart{'\i}nez, Francisco Casacuberta, Jos{'e} Miguel Benedi Ruiz
Abstract
Tasks	Machine Translation
Published	2016-08-01
URL	https://www.aclweb.org/anthology/K16-1020/
PDF	https://www.aclweb.org/anthology/K16-1020
PWC	https://paperswithcode.com/paper/beyond-prefix-based-interactive-translation
Repo
Framework

Language and Dialect Discrimination Using Compression-Inspired Language Models


Title	Language and Dialect Discrimination Using Compression-Inspired Language Models
Authors	Paul McNamee
Abstract	The DSL 2016 shared task continued previous evaluations from 2014 and 2015 that facilitated the study of automated language and dialect identification. This paper describes results for this year{'}s shared task and from several related experiments conducted at the Johns Hopkins University Human Language Technology Center of Excellence (JHU HLTCOE). Previously the HLTCOE has explored the use of compression-inspired language modeling for language and dialect identification, using news, Wikipedia, blog post, and Twitter corpora. The technique we have relied upon is based on prediction by partial matching (PPM), a state of the art text compression technique. Due to the close relationship between adaptive compression and language modeling, such compression techniques can also be applied to multi-way text classification problems, and previous studies have examined tasks such as authorship attribution, email spam detection, and topical classification. We applied our approach to the multi-class decision that considered each dialect or language as a possibility for the given shared task input line. Results for test-set A were in accord with our expectations, however results for test-sets B and C appear to be markedly worse. We had not anticipated the inclusion of multiple communications in differing languages in test-set B (social media) input lines, and had not expected the test-set C (dialectal Arabic) data to be represented phonetically instead of in native orthography.
Tasks	Language Identification, Language Modelling, Text Classification
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4825/
PDF	https://www.aclweb.org/anthology/W16-4825
PWC	https://paperswithcode.com/paper/language-and-dialect-discrimination-using
Repo
Framework