July 26, 2019

2165 words 11 mins read

Paper Group NANR 110

Paper Group NANR 110

CLUZH at VarDial GDI 2017: Testing a Variety of Machine Learning Tools for the Classification of Swiss German Dialects. Arabic Dialect Identification Using iVectors and ASR Transcripts. Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Compositionality for perceptual classification. Ex …

CLUZH at VarDial GDI 2017: Testing a Variety of Machine Learning Tools for the Classification of Swiss German Dialects

Title CLUZH at VarDial GDI 2017: Testing a Variety of Machine Learning Tools for the Classification of Swiss German Dialects
Authors Simon Clematide, Peter Makarov
Abstract Our submissions for the GDI 2017 Shared Task are the results from three different types of classifiers: Na{"\i}ve Bayes, Conditional Random Fields (CRF), and Support Vector Machine (SVM). Our CRF-based run achieves a weighted F1 score of 65{%} (third rank) being beaten by the best system by 0.9{%}. Measured by classification accuracy, our ensemble run (Na{"\i}ve Bayes, CRF, SVM) reaches 67{%} (second rank) being 1{%} lower than the best system. We also describe our experiments with Recurrent Neural Network (RNN) architectures. Since they performed worse than our non-neural approaches we did not include them in the submission.
Tasks Language Identification, Text Classification
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1221/
PDF https://www.aclweb.org/anthology/W17-1221
PWC https://paperswithcode.com/paper/cluzh-at-vardial-gdi-2017-testing-a-variety
Repo
Framework

Arabic Dialect Identification Using iVectors and ASR Transcripts

Title Arabic Dialect Identification Using iVectors and ASR Transcripts
Authors Shervin Malmasi, Marcos Zampieri
Abstract This paper presents the systems submitted by the MAZA team to the Arabic Dialect Identification (ADI) shared task at the VarDial Evaluation Campaign 2017. The goal of the task is to evaluate computational models to identify the dialect of Arabic utterances using both audio and text transcriptions. The ADI shared task dataset included Modern Standard Arabic (MSA) and four Arabic dialects: Egyptian, Gulf, Levantine, and North-African. The three systems submitted by MAZA are based on combinations of multiple machine learning classifiers arranged as (1) voting ensemble; (2) mean probability ensemble; (3) meta-classifier. The best results were obtained by the meta-classifier achieving 71.7{%} accuracy, ranking second among the six teams which participated in the ADI shared task.
Tasks Machine Translation
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1222/
PDF https://www.aclweb.org/anthology/W17-1222
PWC https://paperswithcode.com/paper/arabic-dialect-identification-using-ivectors
Repo
Framework

Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Title Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Authors
Abstract
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-5200/
PDF https://www.aclweb.org/anthology/W17-5200
PWC https://paperswithcode.com/paper/proceedings-of-the-8th-workshop-on
Repo
Framework

Compositionality for perceptual classification

Title Compositionality for perceptual classification
Authors Staffan Larsson
Abstract
Tasks
Published 2017-01-01
URL https://www.aclweb.org/anthology/W17-6923/
PDF https://www.aclweb.org/anthology/W17-6923
PWC https://paperswithcode.com/paper/compositionality-for-perceptual
Repo
Framework

Exploring Lexical and Syntactic Features for Language Variety Identification

Title Exploring Lexical and Syntactic Features for Language Variety Identification
Authors Chris van der Lee, Antal van den Bosch
Abstract We present a method to discriminate between texts written in either the Netherlandic or the Flemish variant of the Dutch language. The method draws on a feature bundle representing text statistics, syntactic features, and word $n$-grams. Text statistics include average word length and sentence length, while syntactic features include ratios of function words and part-of-speech $n$-grams. The effectiveness of the classifier was measured by classifying Dutch subtitles developed for either Dutch or Flemish television. Several machine learning algorithms were compared as well as feature combination methods in order to find the optimal generalization performance. A machine-learning meta classifier based on AdaBoost attained the best F-score of 0.92.
Tasks Language Identification, Text Classification
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1224/
PDF https://www.aclweb.org/anthology/W17-1224
PWC https://paperswithcode.com/paper/exploring-lexical-and-syntactic-features-for
Repo
Framework

Learning to Identify Arabic and German Dialects using Multiple Kernels

Title Learning to Identify Arabic and German Dialects using Multiple Kernels
Authors Radu Tudor Ionescu, Andrei Butnaru
Abstract We present a machine learning approach for the Arabic Dialect Identification (ADI) and the German Dialect Identification (GDI) Closed Shared Tasks of the DSL 2017 Challenge. The proposed approach combines several kernels using multiple kernel learning. While most of our kernels are based on character p-grams (also known as n-grams) extracted from speech transcripts, we also use a kernel based on i-vectors, a low-dimensional representation of audio recordings, provided only for the Arabic data. In the learning stage, we independently employ Kernel Discriminant Analysis (KDA) and Kernel Ridge Regression (KRR). Our approach is shallow and simple, but the empirical results obtained in the shared tasks prove that it achieves very good results. Indeed, we ranked on the first place in the ADI Shared Task with a weighted F1 score of 76.32{%} (4.62{%} above the second place) and on the fifth place in the GDI Shared Task with a weighted F1 score of 63.67{%} (2.57{%} below the first place).
Tasks
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1225/
PDF https://www.aclweb.org/anthology/W17-1225
PWC https://paperswithcode.com/paper/learning-to-identify-arabic-and-german
Repo
Framework

LIPN at SemEval-2017 Task 10: Filtering Candidate Keyphrases from Scientific Publications with Part-of-Speech Tag Sequences to Train a Sequence Labeling Model

Title LIPN at SemEval-2017 Task 10: Filtering Candidate Keyphrases from Scientific Publications with Part-of-Speech Tag Sequences to Train a Sequence Labeling Model
Authors Hern, Simon David ez, Davide Buscaldi, Thierry Charnois
Abstract This paper describes the system used by the team LIPN in SemEval 2017 Task 10: Extracting Keyphrases and Relations from Scientific Publications. The team participated in Scenario 1, that includes three subtasks, Identification of keyphrases (Subtask A), Classification of identified keyphrases (Subtask B) and Extraction of relationships between two identified keyphrases (Subtask C). The presented system was mainly focused on the use of part-of-speech tag sequences to filter candidate keyphrases for Subtask A. Subtasks A and B were addressed as a sequence labeling problem using Conditional Random Fields (CRFs) and even though Subtask C was out of the scope of this approach, one rule was included to identify synonyms.
Tasks
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2174/
PDF https://www.aclweb.org/anthology/S17-2174
PWC https://paperswithcode.com/paper/lipn-at-semeval-2017-task-10-filtering
Repo
Framework

IITP at IJCNLP-2017 Task 4: Auto Analysis of Customer Feedback using CNN and GRU Network

Title IITP at IJCNLP-2017 Task 4: Auto Analysis of Customer Feedback using CNN and GRU Network
Authors Deepak Gupta, Pabitra Lenka, Harsimran Bedi, Asif Ekbal, Pushpak Bhattacharyya
Abstract Analyzing customer feedback is the best way to channelize the data into new marketing strategies that benefit entrepreneurs as well as customers. Therefore an automated system which can analyze the customer behavior is in great demand. Users may write feedbacks in any language, and hence mining appropriate information often becomes intractable. Especially in a traditional feature-based supervised model, it is difficult to build a generic system as one has to understand the concerned language for finding the relevant features. In order to overcome this, we propose deep Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) based approaches that do not require handcrafting of features. We evaluate these techniques for analyzing customer feedback sentences on four languages, namely English, French, Japanese and Spanish. Our empirical analysis shows that our models perform well in all the four languages on the setups of IJCNLP Shared Task on Customer Feedback Analysis. Our model achieved the second rank in French, with an accuracy of 71.75{%} and third ranks for all the other languages.
Tasks Document Classification, Emotion Classification, Sentiment Analysis
Published 2017-12-01
URL https://www.aclweb.org/anthology/I17-4031/
PDF https://www.aclweb.org/anthology/I17-4031
PWC https://paperswithcode.com/paper/iitp-at-ijcnlp-2017-task-4-auto-analysis-of
Repo
Framework

A Neural Architecture for Dialectal Arabic Segmentation

Title A Neural Architecture for Dialectal Arabic Segmentation
Authors Younes Samih, Mohammed Attia, Mohamed Eldesouki, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer, Kareem Darwish
Abstract The automated processing of Arabic Dialects is challenging due to the lack of spelling standards and to the scarcity of annotated data and resources in general. Segmentation of words into its constituent parts is an important processing building block. In this paper, we show how a segmenter can be trained using only 350 annotated tweets using neural networks without any normalization or use of lexical features or lexical resources. We deal with segmentation as a sequence labeling problem at the character level. We show experimentally that our model can rival state-of-the-art methods that rely on additional resources.
Tasks Machine Translation, Morphological Analysis, Part-Of-Speech Tagging
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1306/
PDF https://www.aclweb.org/anthology/W17-1306
PWC https://paperswithcode.com/paper/a-neural-architecture-for-dialectal-arabic
Repo
Framework

A Morphological Analyzer for Gulf Arabic Verbs

Title A Morphological Analyzer for Gulf Arabic Verbs
Authors Salam Khalifa, Sara Hassan, Nizar Habash
Abstract We present CALIMAGLF, a Gulf Arabic morphological analyzer currently covering over 2,600 verbal lemmas. We describe in detail the process of building the analyzer starting from phonetic dictionary entries to fully inflected orthographic paradigms and associated lexicon and orthographic variants. We evaluate the coverage of CALIMA-GLF against Modern Standard Arabic and Egyptian Arabic analyzers on part of a Gulf Arabic novel. CALIMA-GLF verb analysis token recall for identifying correct POS tag outperforms both the Modern Standard Arabic and Egyptian Arabic analyzers by over 27.4{%} and 16.9{%} absolute, respectively.
Tasks Morphological Tagging, Part-Of-Speech Tagging
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1305/
PDF https://www.aclweb.org/anthology/W17-1305
PWC https://paperswithcode.com/paper/a-morphological-analyzer-for-gulf-arabic
Repo
Framework

Unsupervised Domain Adaptation for Clinical Negation Detection

Title Unsupervised Domain Adaptation for Clinical Negation Detection
Authors Timothy Miller, Steven Bethard, Hadi Amiri, Guergana Savova
Abstract Detecting negated concepts in clinical texts is an important part of NLP information extraction systems. However, generalizability of negation systems is lacking, as cross-domain experiments suffer dramatic performance losses. We examine the performance of multiple unsupervised domain adaptation algorithms on clinical negation detection, finding only modest gains that fall well short of in-domain performance.
Tasks Domain Adaptation, Negation Detection, Unsupervised Domain Adaptation
Published 2017-08-01
URL https://www.aclweb.org/anthology/W17-2320/
PDF https://www.aclweb.org/anthology/W17-2320
PWC https://paperswithcode.com/paper/unsupervised-domain-adaptation-for-clinical
Repo
Framework

Arabic Tweets Treebanking and Parsing: A Bootstrapping Approach

Title Arabic Tweets Treebanking and Parsing: A Bootstrapping Approach
Authors Fahad Albogamy, Allan Ramsay, Hanady Ahmed
Abstract In this paper, we propose using a {``}bootstrapping{''} method for constructing a dependency treebank of Arabic tweets. This method uses a rule-based parser to create a small treebank of one thousand Arabic tweets and a data-driven parser to create a larger treebank by using the small treebank as a seed training set. We are able to create a dependency treebank from unlabelled tweets without any manual intervention. Experiments results show that this method can improve the speed of training the parser and the accuracy of the resulting parsers. |
Tasks Domain Adaptation
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1312/
PDF https://www.aclweb.org/anthology/W17-1312
PWC https://paperswithcode.com/paper/arabic-tweets-treebanking-and-parsing-a
Repo
Framework

Post-Processing Techniques for Improving Predictions of Multilabel Learning Approaches

Title Post-Processing Techniques for Improving Predictions of Multilabel Learning Approaches
Authors Akshay Soni, Aasish Pappu, Jerry Chia-mau Ni, Troy Chevalier
Abstract In Multilabel Learning (MLL) each training instance is associated with a set of labels and the task is to learn a function that maps an unseen instance to its corresponding label set. In this paper, we present a suite of {–} MLL algorithm independent {–} post-processing techniques that utilize the conditional and directional label-dependences in order to make the predictions from any MLL approach more coherent and precise. We solve constraint optimization problem over the output produced by any MLL approach and the result is a refined version of the input predicted label set. Using proposed techniques, we show absolute improvement of 3{%} on English News and 10{%} on Chinese E-commerce datasets for P@K metric.
Tasks
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-2011/
PDF https://www.aclweb.org/anthology/I17-2011
PWC https://paperswithcode.com/paper/post-processing-techniques-for-improving
Repo
Framework

Robust Dictionary Lookup in Multiple Noisy Orthographies

Title Robust Dictionary Lookup in Multiple Noisy Orthographies
Authors Lingliang Zhang, Nizar Habash, Godfried Toussaint
Abstract We present the MultiScript Phonetic Search algorithm to address the problem of language learners looking up unfamiliar words that they heard. We apply it to Arabic dictionary lookup with noisy queries done using both the Arabic and Roman scripts. Our algorithm is based on a computational phonetic distance metric that can be optionally machine learned. To benchmark our performance, we created the ArabScribe dataset, containing 10,000 noisy transcriptions of random Arabic dictionary words. Our algorithm outperforms Google Translate{'}s {``}did you mean{''} feature, as well as the Yamli smart Arabic keyboard. |
Tasks Transliteration
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1315/
PDF https://www.aclweb.org/anthology/W17-1315
PWC https://paperswithcode.com/paper/robust-dictionary-lookup-in-multiple-noisy
Repo
Framework

ULISBOA at SemEval-2017 Task 12: Extraction and classification of temporal expressions and events

Title ULISBOA at SemEval-2017 Task 12: Extraction and classification of temporal expressions and events
Authors Andre Lamurias, Diana Sousa, Sofia Pereira, Luka Clarke, Francisco M. Couto
Abstract This paper presents our approach to participate in the SemEval 2017 Task 12: Clinical TempEval challenge, specifically in the event and time expressions span and attribute identification subtasks (ES, EA, TS, TA). Our approach consisted in training Conditional Random Fields (CRF) classifiers using the provided annotations, and in creating manually curated rules to classify the attributes of each event and time expression. We used a set of common features for the event and time CRF classifiers, and a set of features specific to each type of entity, based on domain knowledge. Training only on the source domain data, our best F-scores were 0.683 and 0.485 for event and time span identification subtasks. When adding target domain annotations to the training data, the best F-scores obtained were 0.729 and 0.554, for the same subtasks. We obtained the second highest F-score of the challenge on the event polarity subtask (0.708). The source code of our system, Clinical Timeline Annotation (CiTA), is available at \url{https://github.com/lasigeBioTM/CiTA}.
Tasks
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2179/
PDF https://www.aclweb.org/anthology/S17-2179
PWC https://paperswithcode.com/paper/ulisboa-at-semeval-2017-task-12-extraction
Repo
Framework
comments powered by Disqus