Paper Group NANR 110
CLUZH at VarDial GDI 2017: Testing a Variety of Machine Learning Tools for the Classification of Swiss German Dialects. Arabic Dialect Identification Using iVectors and ASR Transcripts. Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Compositionality for perceptual classification. Ex …
CLUZH at VarDial GDI 2017: Testing a Variety of Machine Learning Tools for the Classification of Swiss German Dialects
Title | CLUZH at VarDial GDI 2017: Testing a Variety of Machine Learning Tools for the Classification of Swiss German Dialects |
Authors | Simon Clematide, Peter Makarov |
Abstract | Our submissions for the GDI 2017 Shared Task are the results from three different types of classifiers: Na{"\i}ve Bayes, Conditional Random Fields (CRF), and Support Vector Machine (SVM). Our CRF-based run achieves a weighted F1 score of 65{%} (third rank) being beaten by the best system by 0.9{%}. Measured by classification accuracy, our ensemble run (Na{"\i}ve Bayes, CRF, SVM) reaches 67{%} (second rank) being 1{%} lower than the best system. We also describe our experiments with Recurrent Neural Network (RNN) architectures. Since they performed worse than our non-neural approaches we did not include them in the submission. |
Tasks | Language Identification, Text Classification |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1221/ |
https://www.aclweb.org/anthology/W17-1221 | |
PWC | https://paperswithcode.com/paper/cluzh-at-vardial-gdi-2017-testing-a-variety |
Repo | |
Framework | |
Arabic Dialect Identification Using iVectors and ASR Transcripts
Title | Arabic Dialect Identification Using iVectors and ASR Transcripts |
Authors | Shervin Malmasi, Marcos Zampieri |
Abstract | This paper presents the systems submitted by the MAZA team to the Arabic Dialect Identification (ADI) shared task at the VarDial Evaluation Campaign 2017. The goal of the task is to evaluate computational models to identify the dialect of Arabic utterances using both audio and text transcriptions. The ADI shared task dataset included Modern Standard Arabic (MSA) and four Arabic dialects: Egyptian, Gulf, Levantine, and North-African. The three systems submitted by MAZA are based on combinations of multiple machine learning classifiers arranged as (1) voting ensemble; (2) mean probability ensemble; (3) meta-classifier. The best results were obtained by the meta-classifier achieving 71.7{%} accuracy, ranking second among the six teams which participated in the ADI shared task. |
Tasks | Machine Translation |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1222/ |
https://www.aclweb.org/anthology/W17-1222 | |
PWC | https://paperswithcode.com/paper/arabic-dialect-identification-using-ivectors |
Repo | |
Framework | |
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Title | Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis |
Authors | |
Abstract | |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-5200/ |
https://www.aclweb.org/anthology/W17-5200 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-8th-workshop-on |
Repo | |
Framework | |
Compositionality for perceptual classification
Title | Compositionality for perceptual classification |
Authors | Staffan Larsson |
Abstract | |
Tasks | |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/W17-6923/ |
https://www.aclweb.org/anthology/W17-6923 | |
PWC | https://paperswithcode.com/paper/compositionality-for-perceptual |
Repo | |
Framework | |
Exploring Lexical and Syntactic Features for Language Variety Identification
Title | Exploring Lexical and Syntactic Features for Language Variety Identification |
Authors | Chris van der Lee, Antal van den Bosch |
Abstract | We present a method to discriminate between texts written in either the Netherlandic or the Flemish variant of the Dutch language. The method draws on a feature bundle representing text statistics, syntactic features, and word $n$-grams. Text statistics include average word length and sentence length, while syntactic features include ratios of function words and part-of-speech $n$-grams. The effectiveness of the classifier was measured by classifying Dutch subtitles developed for either Dutch or Flemish television. Several machine learning algorithms were compared as well as feature combination methods in order to find the optimal generalization performance. A machine-learning meta classifier based on AdaBoost attained the best F-score of 0.92. |
Tasks | Language Identification, Text Classification |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1224/ |
https://www.aclweb.org/anthology/W17-1224 | |
PWC | https://paperswithcode.com/paper/exploring-lexical-and-syntactic-features-for |
Repo | |
Framework | |
Learning to Identify Arabic and German Dialects using Multiple Kernels
Title | Learning to Identify Arabic and German Dialects using Multiple Kernels |
Authors | Radu Tudor Ionescu, Andrei Butnaru |
Abstract | We present a machine learning approach for the Arabic Dialect Identification (ADI) and the German Dialect Identification (GDI) Closed Shared Tasks of the DSL 2017 Challenge. The proposed approach combines several kernels using multiple kernel learning. While most of our kernels are based on character p-grams (also known as n-grams) extracted from speech transcripts, we also use a kernel based on i-vectors, a low-dimensional representation of audio recordings, provided only for the Arabic data. In the learning stage, we independently employ Kernel Discriminant Analysis (KDA) and Kernel Ridge Regression (KRR). Our approach is shallow and simple, but the empirical results obtained in the shared tasks prove that it achieves very good results. Indeed, we ranked on the first place in the ADI Shared Task with a weighted F1 score of 76.32{%} (4.62{%} above the second place) and on the fifth place in the GDI Shared Task with a weighted F1 score of 63.67{%} (2.57{%} below the first place). |
Tasks | |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1225/ |
https://www.aclweb.org/anthology/W17-1225 | |
PWC | https://paperswithcode.com/paper/learning-to-identify-arabic-and-german |
Repo | |
Framework | |
LIPN at SemEval-2017 Task 10: Filtering Candidate Keyphrases from Scientific Publications with Part-of-Speech Tag Sequences to Train a Sequence Labeling Model
Title | LIPN at SemEval-2017 Task 10: Filtering Candidate Keyphrases from Scientific Publications with Part-of-Speech Tag Sequences to Train a Sequence Labeling Model |
Authors | Hern, Simon David ez, Davide Buscaldi, Thierry Charnois |
Abstract | This paper describes the system used by the team LIPN in SemEval 2017 Task 10: Extracting Keyphrases and Relations from Scientific Publications. The team participated in Scenario 1, that includes three subtasks, Identification of keyphrases (Subtask A), Classification of identified keyphrases (Subtask B) and Extraction of relationships between two identified keyphrases (Subtask C). The presented system was mainly focused on the use of part-of-speech tag sequences to filter candidate keyphrases for Subtask A. Subtasks A and B were addressed as a sequence labeling problem using Conditional Random Fields (CRFs) and even though Subtask C was out of the scope of this approach, one rule was included to identify synonyms. |
Tasks | |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/S17-2174/ |
https://www.aclweb.org/anthology/S17-2174 | |
PWC | https://paperswithcode.com/paper/lipn-at-semeval-2017-task-10-filtering |
Repo | |
Framework | |
IITP at IJCNLP-2017 Task 4: Auto Analysis of Customer Feedback using CNN and GRU Network
Title | IITP at IJCNLP-2017 Task 4: Auto Analysis of Customer Feedback using CNN and GRU Network |
Authors | Deepak Gupta, Pabitra Lenka, Harsimran Bedi, Asif Ekbal, Pushpak Bhattacharyya |
Abstract | Analyzing customer feedback is the best way to channelize the data into new marketing strategies that benefit entrepreneurs as well as customers. Therefore an automated system which can analyze the customer behavior is in great demand. Users may write feedbacks in any language, and hence mining appropriate information often becomes intractable. Especially in a traditional feature-based supervised model, it is difficult to build a generic system as one has to understand the concerned language for finding the relevant features. In order to overcome this, we propose deep Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) based approaches that do not require handcrafting of features. We evaluate these techniques for analyzing customer feedback sentences on four languages, namely English, French, Japanese and Spanish. Our empirical analysis shows that our models perform well in all the four languages on the setups of IJCNLP Shared Task on Customer Feedback Analysis. Our model achieved the second rank in French, with an accuracy of 71.75{%} and third ranks for all the other languages. |
Tasks | Document Classification, Emotion Classification, Sentiment Analysis |
Published | 2017-12-01 |
URL | https://www.aclweb.org/anthology/I17-4031/ |
https://www.aclweb.org/anthology/I17-4031 | |
PWC | https://paperswithcode.com/paper/iitp-at-ijcnlp-2017-task-4-auto-analysis-of |
Repo | |
Framework | |
A Neural Architecture for Dialectal Arabic Segmentation
Title | A Neural Architecture for Dialectal Arabic Segmentation |
Authors | Younes Samih, Mohammed Attia, Mohamed Eldesouki, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer, Kareem Darwish |
Abstract | The automated processing of Arabic Dialects is challenging due to the lack of spelling standards and to the scarcity of annotated data and resources in general. Segmentation of words into its constituent parts is an important processing building block. In this paper, we show how a segmenter can be trained using only 350 annotated tweets using neural networks without any normalization or use of lexical features or lexical resources. We deal with segmentation as a sequence labeling problem at the character level. We show experimentally that our model can rival state-of-the-art methods that rely on additional resources. |
Tasks | Machine Translation, Morphological Analysis, Part-Of-Speech Tagging |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1306/ |
https://www.aclweb.org/anthology/W17-1306 | |
PWC | https://paperswithcode.com/paper/a-neural-architecture-for-dialectal-arabic |
Repo | |
Framework | |
A Morphological Analyzer for Gulf Arabic Verbs
Title | A Morphological Analyzer for Gulf Arabic Verbs |
Authors | Salam Khalifa, Sara Hassan, Nizar Habash |
Abstract | We present CALIMAGLF, a Gulf Arabic morphological analyzer currently covering over 2,600 verbal lemmas. We describe in detail the process of building the analyzer starting from phonetic dictionary entries to fully inflected orthographic paradigms and associated lexicon and orthographic variants. We evaluate the coverage of CALIMA-GLF against Modern Standard Arabic and Egyptian Arabic analyzers on part of a Gulf Arabic novel. CALIMA-GLF verb analysis token recall for identifying correct POS tag outperforms both the Modern Standard Arabic and Egyptian Arabic analyzers by over 27.4{%} and 16.9{%} absolute, respectively. |
Tasks | Morphological Tagging, Part-Of-Speech Tagging |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1305/ |
https://www.aclweb.org/anthology/W17-1305 | |
PWC | https://paperswithcode.com/paper/a-morphological-analyzer-for-gulf-arabic |
Repo | |
Framework | |
Unsupervised Domain Adaptation for Clinical Negation Detection
Title | Unsupervised Domain Adaptation for Clinical Negation Detection |
Authors | Timothy Miller, Steven Bethard, Hadi Amiri, Guergana Savova |
Abstract | Detecting negated concepts in clinical texts is an important part of NLP information extraction systems. However, generalizability of negation systems is lacking, as cross-domain experiments suffer dramatic performance losses. We examine the performance of multiple unsupervised domain adaptation algorithms on clinical negation detection, finding only modest gains that fall well short of in-domain performance. |
Tasks | Domain Adaptation, Negation Detection, Unsupervised Domain Adaptation |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-2320/ |
https://www.aclweb.org/anthology/W17-2320 | |
PWC | https://paperswithcode.com/paper/unsupervised-domain-adaptation-for-clinical |
Repo | |
Framework | |
Arabic Tweets Treebanking and Parsing: A Bootstrapping Approach
Title | Arabic Tweets Treebanking and Parsing: A Bootstrapping Approach |
Authors | Fahad Albogamy, Allan Ramsay, Hanady Ahmed |
Abstract | In this paper, we propose using a {``}bootstrapping{''} method for constructing a dependency treebank of Arabic tweets. This method uses a rule-based parser to create a small treebank of one thousand Arabic tweets and a data-driven parser to create a larger treebank by using the small treebank as a seed training set. We are able to create a dependency treebank from unlabelled tweets without any manual intervention. Experiments results show that this method can improve the speed of training the parser and the accuracy of the resulting parsers. | |
Tasks | Domain Adaptation |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1312/ |
https://www.aclweb.org/anthology/W17-1312 | |
PWC | https://paperswithcode.com/paper/arabic-tweets-treebanking-and-parsing-a |
Repo | |
Framework | |
Post-Processing Techniques for Improving Predictions of Multilabel Learning Approaches
Title | Post-Processing Techniques for Improving Predictions of Multilabel Learning Approaches |
Authors | Akshay Soni, Aasish Pappu, Jerry Chia-mau Ni, Troy Chevalier |
Abstract | In Multilabel Learning (MLL) each training instance is associated with a set of labels and the task is to learn a function that maps an unseen instance to its corresponding label set. In this paper, we present a suite of {–} MLL algorithm independent {–} post-processing techniques that utilize the conditional and directional label-dependences in order to make the predictions from any MLL approach more coherent and precise. We solve constraint optimization problem over the output produced by any MLL approach and the result is a refined version of the input predicted label set. Using proposed techniques, we show absolute improvement of 3{%} on English News and 10{%} on Chinese E-commerce datasets for P@K metric. |
Tasks | |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-2011/ |
https://www.aclweb.org/anthology/I17-2011 | |
PWC | https://paperswithcode.com/paper/post-processing-techniques-for-improving |
Repo | |
Framework | |
Robust Dictionary Lookup in Multiple Noisy Orthographies
Title | Robust Dictionary Lookup in Multiple Noisy Orthographies |
Authors | Lingliang Zhang, Nizar Habash, Godfried Toussaint |
Abstract | We present the MultiScript Phonetic Search algorithm to address the problem of language learners looking up unfamiliar words that they heard. We apply it to Arabic dictionary lookup with noisy queries done using both the Arabic and Roman scripts. Our algorithm is based on a computational phonetic distance metric that can be optionally machine learned. To benchmark our performance, we created the ArabScribe dataset, containing 10,000 noisy transcriptions of random Arabic dictionary words. Our algorithm outperforms Google Translate{'}s {``}did you mean{''} feature, as well as the Yamli smart Arabic keyboard. | |
Tasks | Transliteration |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1315/ |
https://www.aclweb.org/anthology/W17-1315 | |
PWC | https://paperswithcode.com/paper/robust-dictionary-lookup-in-multiple-noisy |
Repo | |
Framework | |
ULISBOA at SemEval-2017 Task 12: Extraction and classification of temporal expressions and events
Title | ULISBOA at SemEval-2017 Task 12: Extraction and classification of temporal expressions and events |
Authors | Andre Lamurias, Diana Sousa, Sofia Pereira, Luka Clarke, Francisco M. Couto |
Abstract | This paper presents our approach to participate in the SemEval 2017 Task 12: Clinical TempEval challenge, specifically in the event and time expressions span and attribute identification subtasks (ES, EA, TS, TA). Our approach consisted in training Conditional Random Fields (CRF) classifiers using the provided annotations, and in creating manually curated rules to classify the attributes of each event and time expression. We used a set of common features for the event and time CRF classifiers, and a set of features specific to each type of entity, based on domain knowledge. Training only on the source domain data, our best F-scores were 0.683 and 0.485 for event and time span identification subtasks. When adding target domain annotations to the training data, the best F-scores obtained were 0.729 and 0.554, for the same subtasks. We obtained the second highest F-score of the challenge on the event polarity subtask (0.708). The source code of our system, Clinical Timeline Annotation (CiTA), is available at \url{https://github.com/lasigeBioTM/CiTA}. |
Tasks | |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/S17-2179/ |
https://www.aclweb.org/anthology/S17-2179 | |
PWC | https://paperswithcode.com/paper/ulisboa-at-semeval-2017-task-12-extraction |
Repo | |
Framework | |