Paper Group NANR 4
DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation–Maximization and Chunk-based Language Model. Annotating the Little Prince with Chinese AMRs. Phrase Generalization: a Corpus Study in Multi-Document Abstracts and Original News Alignments. Applying Universal Dependency to the Arapaho Language. Filtering …
DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation–Maximization and Chunk-based Language Model
Title | DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation–Maximization and Chunk-based Language Model |
Authors | Ond{\v{r}}ej Herman, V{'\i}t Suchomel, V{'\i}t Baisa, Pavel Rychl{'y} |
Abstract | In this paper we investigate two approaches to discrimination of similar languages: Expectation{–}maximization algorithm for estimating conditional probability P(wordlanguage) and byte level language models similar to compression-based language modelling methods. The accuracy of these methods reached respectively 86.6{%} and 88.3{%} on set A of the DSL Shared task 2016 competition. |
Tasks | Language Modelling |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4815/ |
https://www.aclweb.org/anthology/W16-4815 | |
PWC | https://paperswithcode.com/paper/dsl-shared-task-2016-perfect-is-the-enemy-of |
Repo | |
Framework | |
Annotating the Little Prince with Chinese AMRs
Title | Annotating the Little Prince with Chinese AMRs |
Authors | Bin Li, Yuan Wen, Weiguang Qu, Lijun Bu, Nianwen Xue |
Abstract | |
Tasks | Dependency Parsing, Semantic Role Labeling |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-1702/ |
https://www.aclweb.org/anthology/W16-1702 | |
PWC | https://paperswithcode.com/paper/annotating-the-little-prince-with-chinese |
Repo | |
Framework | |
Phrase Generalization: a Corpus Study in Multi-Document Abstracts and Original News Alignments
Title | Phrase Generalization: a Corpus Study in Multi-Document Abstracts and Original News Alignments |
Authors | Ariani Di-Felippo, Ani Nenkova |
Abstract | |
Tasks | Sentence Compression |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-1717/ |
https://www.aclweb.org/anthology/W16-1717 | |
PWC | https://paperswithcode.com/paper/phrase-generalization-a-corpus-study-in-multi |
Repo | |
Framework | |
Applying Universal Dependency to the Arapaho Language
Title | Applying Universal Dependency to the Arapaho Language |
Authors | Irina Wagner, Andrew Cowell, Jena D. Hwang |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-1719/ |
https://www.aclweb.org/anthology/W16-1719 | |
PWC | https://paperswithcode.com/paper/applying-universal-dependency-to-the-arapaho |
Repo | |
Framework | |
Filtering and Measuring the Intrinsic Quality of Human Compositionality Judgments
Title | Filtering and Measuring the Intrinsic Quality of Human Compositionality Judgments |
Authors | Carlos Ramisch, Silvio Cordeiro, Aline Villavicencio |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-1804/ |
https://www.aclweb.org/anthology/W16-1804 | |
PWC | https://paperswithcode.com/paper/filtering-and-measuring-the-intrinsic-quality |
Repo | |
Framework | |
Transition-Based Left-Corner Parsing for Identifying PTB-Style Nonlocal Dependencies
Title | Transition-Based Left-Corner Parsing for Identifying PTB-Style Nonlocal Dependencies |
Authors | Yoshihide Kato, Shigeki Matsubara |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-1088/ |
https://www.aclweb.org/anthology/P16-1088 | |
PWC | https://paperswithcode.com/paper/transition-based-left-corner-parsing-for |
Repo | |
Framework | |
UnibucKernel: An Approach for Arabic Dialect Identification Based on Multiple String Kernels
Title | UnibucKernel: An Approach for Arabic Dialect Identification Based on Multiple String Kernels |
Authors | Radu Tudor Ionescu, Marius Popescu |
Abstract | The most common approach in text mining classification tasks is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic features. Unlike the common approach, we present a method that uses only character p-grams (also known as n-grams) as features for the Arabic Dialect Identification (ADI) Closed Shared Task of the DSL 2016 Challenge. The proposed approach combines several string kernels using multiple kernel learning. In the learning stage, we try both Kernel Discriminant Analysis (KDA) and Kernel Ridge Regression (KRR), and we choose KDA as it gives better results in a 10-fold cross-validation carried out on the training set. Our approach is shallow and simple, but the empirical results obtained in the ADI Shared Task prove that it achieves very good results. Indeed, we ranked on the second place with an accuracy of 50.91{%} and a weighted F1 score of 51.31{%}. We also present improved results in this paper, which we obtained after the competition ended. Simply by adding more regularization into our model to make it more suitable for test data that comes from a different distribution than training data, we obtain an accuracy of 51.82{%} and a weighted F1 score of 52.18{%}. Furthermore, the proposed approach has an important advantage in that it is language independent and linguistic theory neutral, as it does not require any NLP tools. |
Tasks | Text Categorization |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4818/ |
https://www.aclweb.org/anthology/W16-4818 | |
PWC | https://paperswithcode.com/paper/unibuckernel-an-approach-for-arabic-dialect |
Repo | |
Framework | |
Unshared task: (Dis)agreement in online debates
Title | Unshared task: (Dis)agreement in online debates |
Authors | Maria Skeppstedt, Magnus Sahlgren, Carita Paradis, Andreas Kerren |
Abstract | |
Tasks | Argument Mining |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2818/ |
https://www.aclweb.org/anthology/W16-2818 | |
PWC | https://paperswithcode.com/paper/unshared-task-disagreement-in-online-debates |
Repo | |
Framework | |
Leveraging Inflection Tables for Stemming and Lemmatization.
Title | Leveraging Inflection Tables for Stemming and Lemmatization. |
Authors | Garrett Nicolai, Grzegorz Kondrak |
Abstract | |
Tasks | Lemmatization |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-1108/ |
https://www.aclweb.org/anthology/P16-1108 | |
PWC | https://paperswithcode.com/paper/leveraging-inflection-tables-for-stemming-and |
Repo | |
Framework | |
Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages
Title | Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages |
Authors | Alan Akbik, Vishwajeet Kumar, Yunyao Li |
Abstract | |
Tasks | |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/D16-1102/ |
https://www.aclweb.org/anthology/D16-1102 | |
PWC | https://paperswithcode.com/paper/towards-semi-automatic-generation-of |
Repo | |
Framework | |
ASIREM Participation at the Discriminating Similar Languages Shared Task 2016
Title | ASIREM Participation at the Discriminating Similar Languages Shared Task 2016 |
Authors | Wafia Adouane, Nasredine Semmar, Richard Johansson |
Abstract | This paper presents the system built by ASIREM team for the Discriminating between Similar Languages (DSL) Shared task 2016. It describes the system which uses character-based and word-based n-grams separately. ASIREM participated in both sub-tasks (sub-task 1 and sub-task 2) and in both open and closed tracks. For the sub-task 1 which deals with Discriminating between similar languages and national language varieties, the system achieved an accuracy of 87.79{%} on the closed track, ending up ninth (the best results being 89.38{%}). In sub-task 2, which deals with Arabic dialect identification, the system achieved its best performance using character-based n-grams (49.67{%} accuracy), ranking fourth in the closed track (the best result being 51.16{%}), and an accuracy of 53.18{%}, ranking first in the open track. |
Tasks | Language Identification |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4821/ |
https://www.aclweb.org/anthology/W16-4821 | |
PWC | https://paperswithcode.com/paper/asirem-participation-at-the-discriminating |
Repo | |
Framework | |
Accurate Pinyin-English Codeswitched Language Identification
Title | Accurate Pinyin-English Codeswitched Language Identification |
Authors | Meng Xuan Xia, Jackie Chi Kit Cheung |
Abstract | |
Tasks | Language Identification |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/W16-5809/ |
https://www.aclweb.org/anthology/W16-5809 | |
PWC | https://paperswithcode.com/paper/accurate-pinyin-english-codeswitched-language |
Repo | |
Framework | |
Discrimination between Similar Languages, Varieties and Dialects using CNN- and LSTM-based Deep Neural Networks
Title | Discrimination between Similar Languages, Varieties and Dialects using CNN- and LSTM-based Deep Neural Networks |
Authors | Chinnappa Guggilla |
Abstract | In this paper, we describe a system (CGLI) for discriminating similar languages, varieties and dialects using convolutional neural networks (CNNs) and long short-term memory (LSTM) neural networks. We have participated in the Arabic dialect identification sub-task of DSL 2016 shared task for distinguishing different Arabic language texts under closed submission track. Our proposed approach is language independent and works for discriminating any given set of languages, varieties, and dialects. We have obtained 43.29{%} weighted-F1 accuracy in this sub-task using CNN approach using default network parameters. |
Tasks | Information Retrieval, Language Identification, Machine Translation, Speaker Identification, Text Generation |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4824/ |
https://www.aclweb.org/anthology/W16-4824 | |
PWC | https://paperswithcode.com/paper/discrimination-between-similar-languages |
Repo | |
Framework | |
Beyond Prefix-Based Interactive Translation Prediction
Title | Beyond Prefix-Based Interactive Translation Prediction |
Authors | Jes{'u}s Gonz{'a}lez-Rubio, Daniel Ortiz-Mart{'\i}nez, Francisco Casacuberta, Jos{'e} Miguel Benedi Ruiz |
Abstract | |
Tasks | Machine Translation |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/K16-1020/ |
https://www.aclweb.org/anthology/K16-1020 | |
PWC | https://paperswithcode.com/paper/beyond-prefix-based-interactive-translation |
Repo | |
Framework | |
Language and Dialect Discrimination Using Compression-Inspired Language Models
Title | Language and Dialect Discrimination Using Compression-Inspired Language Models |
Authors | Paul McNamee |
Abstract | The DSL 2016 shared task continued previous evaluations from 2014 and 2015 that facilitated the study of automated language and dialect identification. This paper describes results for this year{'}s shared task and from several related experiments conducted at the Johns Hopkins University Human Language Technology Center of Excellence (JHU HLTCOE). Previously the HLTCOE has explored the use of compression-inspired language modeling for language and dialect identification, using news, Wikipedia, blog post, and Twitter corpora. The technique we have relied upon is based on prediction by partial matching (PPM), a state of the art text compression technique. Due to the close relationship between adaptive compression and language modeling, such compression techniques can also be applied to multi-way text classification problems, and previous studies have examined tasks such as authorship attribution, email spam detection, and topical classification. We applied our approach to the multi-class decision that considered each dialect or language as a possibility for the given shared task input line. Results for test-set A were in accord with our expectations, however results for test-sets B and C appear to be markedly worse. We had not anticipated the inclusion of multiple communications in differing languages in test-set B (social media) input lines, and had not expected the test-set C (dialectal Arabic) data to be represented phonetically instead of in native orthography. |
Tasks | Language Identification, Language Modelling, Text Classification |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4825/ |
https://www.aclweb.org/anthology/W16-4825 | |
PWC | https://paperswithcode.com/paper/language-and-dialect-discrimination-using |
Repo | |
Framework | |