July 26, 2019

2191 words 11 mins read

Paper Group NANR 85

Paper Group NANR 85

MoRS at SemEval-2017 Task 3: Easy to use SVM in Ranking Tasks. Sentiment Analysis and Lexical Cohesion for the Story Cloze Task. Frame-based Data Factorizations. An Extensible Multilingual Open Source Lemmatizer. Interactive Visualization for Linguistic Structure. Translating Dialectal Arabic as Low Resource Language using Word Embedding. Proceedin …

MoRS at SemEval-2017 Task 3: Easy to use SVM in Ranking Tasks

Title MoRS at SemEval-2017 Task 3: Easy to use SVM in Ranking Tasks
Authors Miguel J. Rodrigues, Francisco M. Couto
Abstract This paper describes our system, dubbed MoRS (Modular Ranking System), pronounced {`}Morse{'}, which participated in Task 3 of SemEval-2017. We used MoRS to perform the Community Question Answering Task 3, which consisted on reordering a set of comments according to their usefulness in answering the question in the thread. This was made for a large collection of questions created by a user community. As for this challenge we wanted to go back to simple, easy-to-use, and somewhat forgotten technologies that we think, in the hands of non-expert people, could be reused in their own data sets. Some of our techniques included the annotation of text, the retrieval of meta-data for each comment, POS tagging and Named Entity Recognition, among others. These gave place to syntactical analysis and semantic measurements. Finally we show and discuss our results and the context of our approach, which is part of a more comprehensive system in development, named MoQA. |
Tasks Community Question Answering, Information Retrieval, Named Entity Recognition, Question Answering
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2046/
PDF https://www.aclweb.org/anthology/S17-2046
PWC https://paperswithcode.com/paper/mors-at-semeval-2017-task-3-easy-to-use-svm
Repo
Framework

Sentiment Analysis and Lexical Cohesion for the Story Cloze Task

Title Sentiment Analysis and Lexical Cohesion for the Story Cloze Task
Authors Michael Flor, Swapna Somasundaran
Abstract We present two NLP components for the Story Cloze Task {–} dictionary-based sentiment analysis and lexical cohesion. While previous research found no contribution from sentiment analysis to the accuracy on this task, we demonstrate that sentiment is an important aspect. We describe a new approach, using a rule that estimates sentiment congruence in a story. Our sentiment-based system achieves strong results on this task. Our lexical cohesion system achieves accuracy comparable to previously published baseline results. A combination of the two systems achieves better accuracy than published baselines. We argue that sentiment analysis should be considered an integral part of narrative comprehension.
Tasks Sentiment Analysis
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-0909/
PDF https://www.aclweb.org/anthology/W17-0909
PWC https://paperswithcode.com/paper/sentiment-analysis-and-lexical-cohesion-for
Repo
Framework

Frame-based Data Factorizations

Title Frame-based Data Factorizations
Authors Sebastian Mair, Ahcène Boubekki, Ulf Brefeld
Abstract Archetypal Analysis is the method of choice to compute interpretable matrix factorizations. Every data point is represented as a convex combination of factors, i.e., points on the boundary of the convex hull of the data. This renders computation inefficient. In this paper, we show that the set of vertices of a convex hull, the so-called frame, can be efficiently computed by a quadratic program. We provide theoretical and empirical results for our proposed approach and make use of the frame to accelerate Archetypal Analysis. The novel method yields similar reconstruction errors as baseline competitors but is much faster to compute.
Tasks
Published 2017-08-01
URL https://icml.cc/Conferences/2017/Schedule?showEvent=722
PDF http://proceedings.mlr.press/v70/mair17a/mair17a.pdf
PWC https://paperswithcode.com/paper/frame-based-data-factorizations
Repo
Framework

An Extensible Multilingual Open Source Lemmatizer

Title An Extensible Multilingual Open Source Lemmatizer
Authors Ahmet Aker, Johann Petrak, Firas Sabbah
Abstract We present GATE DictLemmatizer, a multilingual open source lemmatizer for the GATE NLP framework that currently supports English, German, Italian, French, Dutch, and Spanish, and is easily extensible to other languages. The software is freely available under the LGPL license. The lemmatization is based on the Helsinki Finite-State Transducer Technology (HFST) and lemma dictionaries automatically created from Wiktionary. We evaluate the performance of the lemmatizers against TreeTagger, which is only freely available for research purposes. Our evaluation shows that DictLemmatizer achieves similar or even better results than TreeTagger for languages where there is support from HFST. The performance drops when there is no support from HFST and the entire lemmatization process is based on lemma dictionaries. However, the results are still satisfactory given the fact that DictLemmatizer isopen-source and can be easily extended to other languages. The software for extending the lemmatizer by creating word lists from Wiktionary dictionaries is also freely available as open-source software.
Tasks Information Retrieval, Lemmatization
Published 2017-09-01
URL https://www.aclweb.org/anthology/R17-1006/
PDF https://doi.org/10.26615/978-954-452-049-6_006
PWC https://paperswithcode.com/paper/an-extensible-multilingual-open-source
Repo
Framework

Interactive Visualization for Linguistic Structure

Title Interactive Visualization for Linguistic Structure
Authors Aaron Sarnat, Vidur Joshi, Cristian Petrescu-Prahova, Alvaro Herrasti, Br Stilson, on, Mark Hopkins
Abstract We provide a visualization library and web interface for interactively exploring a parse tree or a forest of parses. The library is not tied to any particular linguistic representation, but provides a general-purpose API for the interactive exploration of hierarchical linguistic structure. To facilitate rapid understanding of a complex structure, the API offers several important features, including expand/collapse functionality, positional and color cues, explicit visual support for sequential structure, and dynamic highlighting to convey node-to-text correspondence.
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-2009/
PDF https://www.aclweb.org/anthology/D17-2009
PWC https://paperswithcode.com/paper/interactive-visualization-for-linguistic
Repo
Framework

Translating Dialectal Arabic as Low Resource Language using Word Embedding

Title Translating Dialectal Arabic as Low Resource Language using Word Embedding
Authors Ebtesam H Almansor, Ahmed Al-Ani
Abstract A number of machine translation methods have been proposed in recent years to deal with the increasingly important problem of automatic translation between texts of different languages or languages and their dialects. These methods have produced promising results when applied to some of the widely studied languages. Existing translation methods are mainly implemented using rule-based and static machine translation approaches. Rule based approaches utilize language translation rules that can either be constructed by an expert, which is quite difficult when dealing with dialects, or rely on rule construction algorithms, which require very large parallel datasets. Statistical approaches also require large parallel datasets to build the translation models. However, large parallel datasets do not exist for languages with low resources, such as the Arabic language and its dialects. In this paper we propose an algorithm that attempts to overcome this limitation, and apply it to translate the Egyptian dialect (EGY) to Modern Standard Arabic (MSA). Monolingual corpus was collected for both MSA and EGY and a relatively small parallel language pair set was built to train the models. The proposed method utilizes Word embedding as it requires monolingual data rather than parallel corpus. Both Continuous Bag of Words and Skip-gram were used to build word vectors. The proposed method was validated on four different datasets using a four-fold cross validation approach.
Tasks Machine Translation, Sentiment Analysis, Text Generation, Word Embeddings
Published 2017-09-01
URL https://www.aclweb.org/anthology/R17-1008/
PDF https://doi.org/10.26615/978-954-452-049-6_008
PWC https://paperswithcode.com/paper/translating-dialectal-arabic-as-low-resource
Repo
Framework

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

Title Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Authors
Abstract
Tasks
Published 2017-08-01
URL https://www.aclweb.org/anthology/K17-3000/
PDF https://www.aclweb.org/anthology/K17-3000
PWC https://paperswithcode.com/paper/proceedings-of-the-conll-2017-shared-task
Repo
Framework

DCU System Report on the WMT 2017 Multi-modal Machine Translation Task

Title DCU System Report on the WMT 2017 Multi-modal Machine Translation Task
Authors Iacer Calixto, Koel Dutta Chowdhury, Qun Liu
Abstract
Tasks Machine Translation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4747/
PDF https://www.aclweb.org/anthology/W17-4747
PWC https://paperswithcode.com/paper/dcu-system-report-on-the-wmt-2017-multi-modal
Repo
Framework

DFKI-DKT at SemEval-2017 Task 8: Rumour Detection and Classification using Cascading Heuristics

Title DFKI-DKT at SemEval-2017 Task 8: Rumour Detection and Classification using Cascading Heuristics
Authors Ankit Srivastava, Georg Rehm, Julian Moreno Schneider
Abstract We describe our submissions for SemEval-2017 Task 8, Determining Rumour Veracity and Support for Rumours. The Digital Curation Technologies (DKT) team at the German Research Center for Artificial Intelligence (DFKI) participated in two subtasks: Subtask A (determining the stance of a message) and Subtask B (determining veracity of a message, closed variant). In both cases, our implementation consisted of a Multivariate Logistic Regression (Maximum Entropy) classifier coupled with hand-written patterns and rules (heuristics) applied in a post-process cascading fashion. We provide a detailed analysis of the system performance and report on variants of our systems that were not part of the official submission.
Tasks Rumour Detection
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2085/
PDF https://www.aclweb.org/anthology/S17-2085
PWC https://paperswithcode.com/paper/dfki-dkt-at-semeval-2017-task-8-rumour
Repo
Framework

A statistical model for morphology inspired by the Amis language

Title A statistical model for morphology inspired by the Amis language
Authors Isabelle Bril, Achraf Lassoued, Michel de Rougemont
Abstract
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-7009/
PDF https://www.aclweb.org/anthology/W17-7009
PWC https://paperswithcode.com/paper/a-statistical-model-for-morphology-inspired
Repo
Framework

Coreference Resolution for Swedish and German using Distant Supervision

Title Coreference Resolution for Swedish and German using Distant Supervision
Authors Alex Wallin, er, Pierre Nugues
Abstract
Tasks Coreference Resolution, Knowledge Graphs, Question Answering
Published 2017-05-01
URL https://www.aclweb.org/anthology/W17-0206/
PDF https://www.aclweb.org/anthology/W17-0206
PWC https://paperswithcode.com/paper/coreference-resolution-for-swedish-and-german
Repo
Framework

Exploiting Document Level Information to Improve Event Detection via Recurrent Neural Networks

Title Exploiting Document Level Information to Improve Event Detection via Recurrent Neural Networks
Authors Shaoyang Duan, Ruifang He, Wenli Zhao
Abstract This paper tackles the task of event detection, which involves identifying and categorizing events. The previous work mainly exist two problems: (1) the traditional feature-based methods apply cross-sentence information, yet need taking a large amount of human effort to design complicated feature sets and inference rules; (2) the representation-based methods though overcome the problem of manually extracting features, while just depend on local sentence representation. Considering local sentence context is insufficient to resolve ambiguities in identifying particular event types, therefore, we propose a novel document level Recurrent Neural Networks (DLRNN) model, which can automatically extract cross-sentence clues to improve sentence level event detection without designing complex reasoning rules. Experiment results show that our approach outperforms other state-of-the-art methods on ACE 2005 dataset without external knowledge base.
Tasks
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-1036/
PDF https://www.aclweb.org/anthology/I17-1036
PWC https://paperswithcode.com/paper/exploiting-document-level-information-to
Repo
Framework

A Hybrid System to apply Natural Language Inference over Dependency Trees

Title A Hybrid System to apply Natural Language Inference over Dependency Trees
Authors Ali Almiman, Allan Ramsay
Abstract This paper presents the development of a natural language inference engine that benefits from two current standard approaches; i.e., shallow and deep approaches. This system combines two non-deterministic algorithms: the approximate matching from the shallow approach and a theorem prover from the deep approach for handling multi-step inference tasks. The theorem prover is customized to accept dependency trees and apply inference rules to these trees. The inference rules are automatically generated as syllogistic rules from our test data (FraCaS test suite). The theorem prover exploits a non-deterministic matching algorithm within a standard backward chaining inference engine. We employ continuation programming as a way of seamlessly handling the combination of these two non-deterministic algorithms. Testing the matching algorithm on {}Generalized quantifiers{''} and {}adjectives{''} topics in FraCaS (MacCartney and Manning 2007), we achieved an accuracy of 92.8{%} of the single-premise cases. For the multi-steps of inference, we checked the validity of our syllogistic rules and then extracted four generic instances that can be applied to more than one problem.
Tasks Natural Language Inference
Published 2017-09-01
URL https://www.aclweb.org/anthology/R17-1010/
PDF https://doi.org/10.26615/978-954-452-049-6_010
PWC https://paperswithcode.com/paper/a-hybrid-system-to-apply-natural-language
Repo
Framework

Ensembles of Classifiers for Cleaning Web Parallel Corpora and Translation Memories

Title Ensembles of Classifiers for Cleaning Web Parallel Corpora and Translation Memories
Authors Eduard Barbu
Abstract The last years witnessed an increasing interest in the automatic methods for spotting false translation units in translation memories. This problem presents a great interest to industry as there are many translation memories that contain errors. A closely related line of research deals with identifying sentences that do not align in the parallel corpora mined from the web. The task of spotting false translations is modeled as a binary classification problem. It is known that in certain conditions the ensembles of classifiers improve over the performance of the individual members. In this paper we benchmark the most popular ensemble of classifiers: Majority Voting, Bagging, Stacking and Ada Boost at the task of spotting false translation units for translation memories and parallel web corpora. We want to know if for this specific problem any ensemble technique improves the performance of the individual classifiers and if there is a difference between the data in translation memories and parallel web corpora with respect to this task.
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/R17-1011/
PDF https://doi.org/10.26615/978-954-452-049-6_011
PWC https://paperswithcode.com/paper/ensembles-of-classifiers-for-cleaning-web
Repo
Framework

Exploiting and Evaluating a Supervised, Multilanguage Keyphrase Extraction pipeline for under-resourced languages

Title Exploiting and Evaluating a Supervised, Multilanguage Keyphrase Extraction pipeline for under-resourced languages
Authors Marco Basaldella, Muhammad Helmy, Elisa Antolli, Mihai Horia Popescu, Giuseppe Serra, Carlo Tasso
Abstract This paper evaluates different techniques for building a supervised, multilanguage keyphrase extraction pipeline for languages which lack a gold standard. Starting from an unsupervised English keyphrase extraction pipeline, we implement pipelines for Arabic, Italian, Portuguese, and Romanian, and we build test collections for languages which lack one. Then, we add a Machine Learning module trained on a well-known English language corpus and we evaluate the performance not only over English but on the other languages as well. Finally, we repeat the same evaluation after training the pipeline over an Arabic language corpus to check whether using a language-specific corpus brings a further improvement in performance. On the five languages we analyzed, results show an improvement in performance when using a machine learning algorithm, even if such algorithm is not trained and tested on the same language.
Tasks Information Retrieval, Text Summarization
Published 2017-09-01
URL https://www.aclweb.org/anthology/R17-1012/
PDF https://doi.org/10.26615/978-954-452-049-6_012
PWC https://paperswithcode.com/paper/exploiting-and-evaluating-a-supervised
Repo
Framework
comments powered by Disqus