July 26, 2019

2191 words 11 mins read

Paper Group NANR 85

MoRS at SemEval-2017 Task 3: Easy to use SVM in Ranking Tasks. Sentiment Analysis and Lexical Cohesion for the Story Cloze Task. Frame-based Data Factorizations. An Extensible Multilingual Open Source Lemmatizer. Interactive Visualization for Linguistic Structure. Translating Dialectal Arabic as Low Resource Language using Word Embedding. Proceedin …

MoRS at SemEval-2017 Task 3: Easy to use SVM in Ranking Tasks


Title	MoRS at SemEval-2017 Task 3: Easy to use SVM in Ranking Tasks
Authors	Miguel J. Rodrigues, Francisco M. Couto
Abstract	This paper describes our system, dubbed MoRS (Modular Ranking System), pronounced {`}Morse{'}, which participated in Task 3 of SemEval-2017. We used MoRS to perform the Community Question Answering Task 3, which consisted on reordering a set of comments according to their usefulness in answering the question in the thread. This was made for a large collection of questions created by a user community. As for this challenge we wanted to go back to simple, easy-to-use, and somewhat forgotten technologies that we think, in the hands of non-expert people, could be reused in their own data sets. Some of our techniques included the annotation of text, the retrieval of meta-data for each comment, POS tagging and Named Entity Recognition, among others. These gave place to syntactical analysis and semantic measurements. Finally we show and discuss our results and the context of our approach, which is part of a more comprehensive system in development, named MoQA. \|
Tasks	Community Question Answering, Information Retrieval, Named Entity Recognition, Question Answering
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2046/
PDF	https://www.aclweb.org/anthology/S17-2046
PWC	https://paperswithcode.com/paper/mors-at-semeval-2017-task-3-easy-to-use-svm
Repo
Framework

Sentiment Analysis and Lexical Cohesion for the Story Cloze Task


Title	Sentiment Analysis and Lexical Cohesion for the Story Cloze Task
Authors	Michael Flor, Swapna Somasundaran
Abstract	We present two NLP components for the Story Cloze Task {–} dictionary-based sentiment analysis and lexical cohesion. While previous research found no contribution from sentiment analysis to the accuracy on this task, we demonstrate that sentiment is an important aspect. We describe a new approach, using a rule that estimates sentiment congruence in a story. Our sentiment-based system achieves strong results on this task. Our lexical cohesion system achieves accuracy comparable to previously published baseline results. A combination of the two systems achieves better accuracy than published baselines. We argue that sentiment analysis should be considered an integral part of narrative comprehension.
Tasks	Sentiment Analysis
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-0909/
PDF	https://www.aclweb.org/anthology/W17-0909
PWC	https://paperswithcode.com/paper/sentiment-analysis-and-lexical-cohesion-for
Repo
Framework

Frame-based Data Factorizations


Title	Frame-based Data Factorizations
Authors	Sebastian Mair, Ahcène Boubekki, Ulf Brefeld
Abstract	Archetypal Analysis is the method of choice to compute interpretable matrix factorizations. Every data point is represented as a convex combination of factors, i.e., points on the boundary of the convex hull of the data. This renders computation inefficient. In this paper, we show that the set of vertices of a convex hull, the so-called frame, can be efficiently computed by a quadratic program. We provide theoretical and empirical results for our proposed approach and make use of the frame to accelerate Archetypal Analysis. The novel method yields similar reconstruction errors as baseline competitors but is much faster to compute.
Tasks
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=722
PDF	http://proceedings.mlr.press/v70/mair17a/mair17a.pdf
PWC	https://paperswithcode.com/paper/frame-based-data-factorizations
Repo
Framework

An Extensible Multilingual Open Source Lemmatizer


Title	An Extensible Multilingual Open Source Lemmatizer
Authors	Ahmet Aker, Johann Petrak, Firas Sabbah
Abstract	We present GATE DictLemmatizer, a multilingual open source lemmatizer for the GATE NLP framework that currently supports English, German, Italian, French, Dutch, and Spanish, and is easily extensible to other languages. The software is freely available under the LGPL license. The lemmatization is based on the Helsinki Finite-State Transducer Technology (HFST) and lemma dictionaries automatically created from Wiktionary. We evaluate the performance of the lemmatizers against TreeTagger, which is only freely available for research purposes. Our evaluation shows that DictLemmatizer achieves similar or even better results than TreeTagger for languages where there is support from HFST. The performance drops when there is no support from HFST and the entire lemmatization process is based on lemma dictionaries. However, the results are still satisfactory given the fact that DictLemmatizer isopen-source and can be easily extended to other languages. The software for extending the lemmatizer by creating word lists from Wiktionary dictionaries is also freely available as open-source software.
Tasks	Information Retrieval, Lemmatization
Published	2017-09-01
URL	https://www.aclweb.org/anthology/R17-1006/
PDF	https://doi.org/10.26615/978-954-452-049-6_006
PWC	https://paperswithcode.com/paper/an-extensible-multilingual-open-source
Repo
Framework

Interactive Visualization for Linguistic Structure


Title	Interactive Visualization for Linguistic Structure
Authors	Aaron Sarnat, Vidur Joshi, Cristian Petrescu-Prahova, Alvaro Herrasti, Br Stilson, on, Mark Hopkins
Abstract	We provide a visualization library and web interface for interactively exploring a parse tree or a forest of parses. The library is not tied to any particular linguistic representation, but provides a general-purpose API for the interactive exploration of hierarchical linguistic structure. To facilitate rapid understanding of a complex structure, the API offers several important features, including expand/collapse functionality, positional and color cues, explicit visual support for sequential structure, and dynamic highlighting to convey node-to-text correspondence.
Tasks
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-2009/
PDF	https://www.aclweb.org/anthology/D17-2009
PWC	https://paperswithcode.com/paper/interactive-visualization-for-linguistic
Repo
Framework

Translating Dialectal Arabic as Low Resource Language using Word Embedding


Title	Translating Dialectal Arabic as Low Resource Language using Word Embedding
Authors	Ebtesam H Almansor, Ahmed Al-Ani
Abstract	A number of machine translation methods have been proposed in recent years to deal with the increasingly important problem of automatic translation between texts of different languages or languages and their dialects. These methods have produced promising results when applied to some of the widely studied languages. Existing translation methods are mainly implemented using rule-based and static machine translation approaches. Rule based approaches utilize language translation rules that can either be constructed by an expert, which is quite difficult when dealing with dialects, or rely on rule construction algorithms, which require very large parallel datasets. Statistical approaches also require large parallel datasets to build the translation models. However, large parallel datasets do not exist for languages with low resources, such as the Arabic language and its dialects. In this paper we propose an algorithm that attempts to overcome this limitation, and apply it to translate the Egyptian dialect (EGY) to Modern Standard Arabic (MSA). Monolingual corpus was collected for both MSA and EGY and a relatively small parallel language pair set was built to train the models. The proposed method utilizes Word embedding as it requires monolingual data rather than parallel corpus. Both Continuous Bag of Words and Skip-gram were used to build word vectors. The proposed method was validated on four different datasets using a four-fold cross validation approach.
Tasks	Machine Translation, Sentiment Analysis, Text Generation, Word Embeddings
Published	2017-09-01
URL	https://www.aclweb.org/anthology/R17-1008/
PDF	https://doi.org/10.26615/978-954-452-049-6_008
PWC	https://paperswithcode.com/paper/translating-dialectal-arabic-as-low-resource
Repo
Framework

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies


Title	Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Authors
Abstract
Tasks
Published	2017-08-01
URL	https://www.aclweb.org/anthology/K17-3000/
PDF	https://www.aclweb.org/anthology/K17-3000
PWC	https://paperswithcode.com/paper/proceedings-of-the-conll-2017-shared-task
Repo
Framework


Title	DCU System Report on the WMT 2017 Multi-modal Machine Translation Task
Authors	Iacer Calixto, Koel Dutta Chowdhury, Qun Liu
Abstract
Tasks	Machine Translation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4747/
PDF	https://www.aclweb.org/anthology/W17-4747
PWC	https://paperswithcode.com/paper/dcu-system-report-on-the-wmt-2017-multi-modal
Repo
Framework

DFKI-DKT at SemEval-2017 Task 8: Rumour Detection and Classification using Cascading Heuristics


Title	DFKI-DKT at SemEval-2017 Task 8: Rumour Detection and Classification using Cascading Heuristics
Authors	Ankit Srivastava, Georg Rehm, Julian Moreno Schneider
Abstract	We describe our submissions for SemEval-2017 Task 8, Determining Rumour Veracity and Support for Rumours. The Digital Curation Technologies (DKT) team at the German Research Center for Artificial Intelligence (DFKI) participated in two subtasks: Subtask A (determining the stance of a message) and Subtask B (determining veracity of a message, closed variant). In both cases, our implementation consisted of a Multivariate Logistic Regression (Maximum Entropy) classifier coupled with hand-written patterns and rules (heuristics) applied in a post-process cascading fashion. We provide a detailed analysis of the system performance and report on variants of our systems that were not part of the official submission.
Tasks	Rumour Detection
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2085/
PDF	https://www.aclweb.org/anthology/S17-2085
PWC	https://paperswithcode.com/paper/dfki-dkt-at-semeval-2017-task-8-rumour
Repo
Framework

A statistical model for morphology inspired by the Amis language


Title	A statistical model for morphology inspired by the Amis language
Authors	Isabelle Bril, Achraf Lassoued, Michel de Rougemont
Abstract
Tasks
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-7009/
PDF	https://www.aclweb.org/anthology/W17-7009
PWC	https://paperswithcode.com/paper/a-statistical-model-for-morphology-inspired
Repo
Framework

Coreference Resolution for Swedish and German using Distant Supervision


Title	Coreference Resolution for Swedish and German using Distant Supervision
Authors	Alex Wallin, er, Pierre Nugues
Abstract
Tasks	Coreference Resolution, Knowledge Graphs, Question Answering
Published	2017-05-01
URL	https://www.aclweb.org/anthology/W17-0206/
PDF	https://www.aclweb.org/anthology/W17-0206
PWC	https://paperswithcode.com/paper/coreference-resolution-for-swedish-and-german
Repo
Framework

Exploiting Document Level Information to Improve Event Detection via Recurrent Neural Networks


Title	Exploiting Document Level Information to Improve Event Detection via Recurrent Neural Networks
Authors	Shaoyang Duan, Ruifang He, Wenli Zhao
Abstract	This paper tackles the task of event detection, which involves identifying and categorizing events. The previous work mainly exist two problems: (1) the traditional feature-based methods apply cross-sentence information, yet need taking a large amount of human effort to design complicated feature sets and inference rules; (2) the representation-based methods though overcome the problem of manually extracting features, while just depend on local sentence representation. Considering local sentence context is insufficient to resolve ambiguities in identifying particular event types, therefore, we propose a novel document level Recurrent Neural Networks (DLRNN) model, which can automatically extract cross-sentence clues to improve sentence level event detection without designing complex reasoning rules. Experiment results show that our approach outperforms other state-of-the-art methods on ACE 2005 dataset without external knowledge base.
Tasks
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-1036/
PDF	https://www.aclweb.org/anthology/I17-1036
PWC	https://paperswithcode.com/paper/exploiting-document-level-information-to
Repo
Framework

A Hybrid System to apply Natural Language Inference over Dependency Trees


Title	A Hybrid System to apply Natural Language Inference over Dependency Trees
Authors	Ali Almiman, Allan Ramsay
Abstract	This paper presents the development of a natural language inference engine that benefits from two current standard approaches; i.e., shallow and deep approaches. This system combines two non-deterministic algorithms: the approximate matching from the shallow approach and a theorem prover from the deep approach for handling multi-step inference tasks. The theorem prover is customized to accept dependency trees and apply inference rules to these trees. The inference rules are automatically generated as syllogistic rules from our test data (FraCaS test suite). The theorem prover exploits a non-deterministic matching algorithm within a standard backward chaining inference engine. We employ continuation programming as a way of seamlessly handling the combination of these two non-deterministic algorithms. Testing the matching algorithm on {`}Generalized quantifiers{''} and {`}adjectives{''} topics in FraCaS (MacCartney and Manning 2007), we achieved an accuracy of 92.8{%} of the single-premise cases. For the multi-steps of inference, we checked the validity of our syllogistic rules and then extracted four generic instances that can be applied to more than one problem.
Tasks	Natural Language Inference
Published	2017-09-01
URL	https://www.aclweb.org/anthology/R17-1010/
PDF	https://doi.org/10.26615/978-954-452-049-6_010
PWC	https://paperswithcode.com/paper/a-hybrid-system-to-apply-natural-language
Repo
Framework

Ensembles of Classifiers for Cleaning Web Parallel Corpora and Translation Memories


Title	Ensembles of Classifiers for Cleaning Web Parallel Corpora and Translation Memories
Authors	Eduard Barbu
Abstract	The last years witnessed an increasing interest in the automatic methods for spotting false translation units in translation memories. This problem presents a great interest to industry as there are many translation memories that contain errors. A closely related line of research deals with identifying sentences that do not align in the parallel corpora mined from the web. The task of spotting false translations is modeled as a binary classification problem. It is known that in certain conditions the ensembles of classifiers improve over the performance of the individual members. In this paper we benchmark the most popular ensemble of classifiers: Majority Voting, Bagging, Stacking and Ada Boost at the task of spotting false translation units for translation memories and parallel web corpora. We want to know if for this specific problem any ensemble technique improves the performance of the individual classifiers and if there is a difference between the data in translation memories and parallel web corpora with respect to this task.
Tasks
Published	2017-09-01
URL	https://www.aclweb.org/anthology/R17-1011/
PDF	https://doi.org/10.26615/978-954-452-049-6_011
PWC	https://paperswithcode.com/paper/ensembles-of-classifiers-for-cleaning-web
Repo
Framework

Exploiting and Evaluating a Supervised, Multilanguage Keyphrase Extraction pipeline for under-resourced languages


Title	Exploiting and Evaluating a Supervised, Multilanguage Keyphrase Extraction pipeline for under-resourced languages
Authors	Marco Basaldella, Muhammad Helmy, Elisa Antolli, Mihai Horia Popescu, Giuseppe Serra, Carlo Tasso
Abstract	This paper evaluates different techniques for building a supervised, multilanguage keyphrase extraction pipeline for languages which lack a gold standard. Starting from an unsupervised English keyphrase extraction pipeline, we implement pipelines for Arabic, Italian, Portuguese, and Romanian, and we build test collections for languages which lack one. Then, we add a Machine Learning module trained on a well-known English language corpus and we evaluate the performance not only over English but on the other languages as well. Finally, we repeat the same evaluation after training the pipeline over an Arabic language corpus to check whether using a language-specific corpus brings a further improvement in performance. On the five languages we analyzed, results show an improvement in performance when using a machine learning algorithm, even if such algorithm is not trained and tested on the same language.
Tasks	Information Retrieval, Text Summarization
Published	2017-09-01
URL	https://www.aclweb.org/anthology/R17-1012/
PDF	https://doi.org/10.26615/978-954-452-049-6_012
PWC	https://paperswithcode.com/paper/exploiting-and-evaluating-a-supervised
Repo
Framework