Paper Group NANR 85
MoRS at SemEval-2017 Task 3: Easy to use SVM in Ranking Tasks. Sentiment Analysis and Lexical Cohesion for the Story Cloze Task. Frame-based Data Factorizations. An Extensible Multilingual Open Source Lemmatizer. Interactive Visualization for Linguistic Structure. Translating Dialectal Arabic as Low Resource Language using Word Embedding. Proceedin …
MoRS at SemEval-2017 Task 3: Easy to use SVM in Ranking Tasks
Title | MoRS at SemEval-2017 Task 3: Easy to use SVM in Ranking Tasks |
Authors | Miguel J. Rodrigues, Francisco M. Couto |
Abstract | This paper describes our system, dubbed MoRS (Modular Ranking System), pronounced {`}Morse{'}, which participated in Task 3 of SemEval-2017. We used MoRS to perform the Community Question Answering Task 3, which consisted on reordering a set of comments according to their usefulness in answering the question in the thread. This was made for a large collection of questions created by a user community. As for this challenge we wanted to go back to simple, easy-to-use, and somewhat forgotten technologies that we think, in the hands of non-expert people, could be reused in their own data sets. Some of our techniques included the annotation of text, the retrieval of meta-data for each comment, POS tagging and Named Entity Recognition, among others. These gave place to syntactical analysis and semantic measurements. Finally we show and discuss our results and the context of our approach, which is part of a more comprehensive system in development, named MoQA. | |
Tasks | Community Question Answering, Information Retrieval, Named Entity Recognition, Question Answering |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/S17-2046/ |
https://www.aclweb.org/anthology/S17-2046 | |
PWC | https://paperswithcode.com/paper/mors-at-semeval-2017-task-3-easy-to-use-svm |
Repo | |
Framework | |
Sentiment Analysis and Lexical Cohesion for the Story Cloze Task
Title | Sentiment Analysis and Lexical Cohesion for the Story Cloze Task |
Authors | Michael Flor, Swapna Somasundaran |
Abstract | We present two NLP components for the Story Cloze Task {–} dictionary-based sentiment analysis and lexical cohesion. While previous research found no contribution from sentiment analysis to the accuracy on this task, we demonstrate that sentiment is an important aspect. We describe a new approach, using a rule that estimates sentiment congruence in a story. Our sentiment-based system achieves strong results on this task. Our lexical cohesion system achieves accuracy comparable to previously published baseline results. A combination of the two systems achieves better accuracy than published baselines. We argue that sentiment analysis should be considered an integral part of narrative comprehension. |
Tasks | Sentiment Analysis |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-0909/ |
https://www.aclweb.org/anthology/W17-0909 | |
PWC | https://paperswithcode.com/paper/sentiment-analysis-and-lexical-cohesion-for |
Repo | |
Framework | |
Frame-based Data Factorizations
Title | Frame-based Data Factorizations |
Authors | Sebastian Mair, Ahcène Boubekki, Ulf Brefeld |
Abstract | Archetypal Analysis is the method of choice to compute interpretable matrix factorizations. Every data point is represented as a convex combination of factors, i.e., points on the boundary of the convex hull of the data. This renders computation inefficient. In this paper, we show that the set of vertices of a convex hull, the so-called frame, can be efficiently computed by a quadratic program. We provide theoretical and empirical results for our proposed approach and make use of the frame to accelerate Archetypal Analysis. The novel method yields similar reconstruction errors as baseline competitors but is much faster to compute. |
Tasks | |
Published | 2017-08-01 |
URL | https://icml.cc/Conferences/2017/Schedule?showEvent=722 |
http://proceedings.mlr.press/v70/mair17a/mair17a.pdf | |
PWC | https://paperswithcode.com/paper/frame-based-data-factorizations |
Repo | |
Framework | |
An Extensible Multilingual Open Source Lemmatizer
Title | An Extensible Multilingual Open Source Lemmatizer |
Authors | Ahmet Aker, Johann Petrak, Firas Sabbah |
Abstract | We present GATE DictLemmatizer, a multilingual open source lemmatizer for the GATE NLP framework that currently supports English, German, Italian, French, Dutch, and Spanish, and is easily extensible to other languages. The software is freely available under the LGPL license. The lemmatization is based on the Helsinki Finite-State Transducer Technology (HFST) and lemma dictionaries automatically created from Wiktionary. We evaluate the performance of the lemmatizers against TreeTagger, which is only freely available for research purposes. Our evaluation shows that DictLemmatizer achieves similar or even better results than TreeTagger for languages where there is support from HFST. The performance drops when there is no support from HFST and the entire lemmatization process is based on lemma dictionaries. However, the results are still satisfactory given the fact that DictLemmatizer isopen-source and can be easily extended to other languages. The software for extending the lemmatizer by creating word lists from Wiktionary dictionaries is also freely available as open-source software. |
Tasks | Information Retrieval, Lemmatization |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/R17-1006/ |
https://doi.org/10.26615/978-954-452-049-6_006 | |
PWC | https://paperswithcode.com/paper/an-extensible-multilingual-open-source |
Repo | |
Framework | |
Interactive Visualization for Linguistic Structure
Title | Interactive Visualization for Linguistic Structure |
Authors | Aaron Sarnat, Vidur Joshi, Cristian Petrescu-Prahova, Alvaro Herrasti, Br Stilson, on, Mark Hopkins |
Abstract | We provide a visualization library and web interface for interactively exploring a parse tree or a forest of parses. The library is not tied to any particular linguistic representation, but provides a general-purpose API for the interactive exploration of hierarchical linguistic structure. To facilitate rapid understanding of a complex structure, the API offers several important features, including expand/collapse functionality, positional and color cues, explicit visual support for sequential structure, and dynamic highlighting to convey node-to-text correspondence. |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-2009/ |
https://www.aclweb.org/anthology/D17-2009 | |
PWC | https://paperswithcode.com/paper/interactive-visualization-for-linguistic |
Repo | |
Framework | |
Translating Dialectal Arabic as Low Resource Language using Word Embedding
Title | Translating Dialectal Arabic as Low Resource Language using Word Embedding |
Authors | Ebtesam H Almansor, Ahmed Al-Ani |
Abstract | A number of machine translation methods have been proposed in recent years to deal with the increasingly important problem of automatic translation between texts of different languages or languages and their dialects. These methods have produced promising results when applied to some of the widely studied languages. Existing translation methods are mainly implemented using rule-based and static machine translation approaches. Rule based approaches utilize language translation rules that can either be constructed by an expert, which is quite difficult when dealing with dialects, or rely on rule construction algorithms, which require very large parallel datasets. Statistical approaches also require large parallel datasets to build the translation models. However, large parallel datasets do not exist for languages with low resources, such as the Arabic language and its dialects. In this paper we propose an algorithm that attempts to overcome this limitation, and apply it to translate the Egyptian dialect (EGY) to Modern Standard Arabic (MSA). Monolingual corpus was collected for both MSA and EGY and a relatively small parallel language pair set was built to train the models. The proposed method utilizes Word embedding as it requires monolingual data rather than parallel corpus. Both Continuous Bag of Words and Skip-gram were used to build word vectors. The proposed method was validated on four different datasets using a four-fold cross validation approach. |
Tasks | Machine Translation, Sentiment Analysis, Text Generation, Word Embeddings |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/R17-1008/ |
https://doi.org/10.26615/978-954-452-049-6_008 | |
PWC | https://paperswithcode.com/paper/translating-dialectal-arabic-as-low-resource |
Repo | |
Framework | |
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Title | Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies |
Authors | |
Abstract | |
Tasks | |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/K17-3000/ |
https://www.aclweb.org/anthology/K17-3000 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-conll-2017-shared-task |
Repo | |
Framework | |
DCU System Report on the WMT 2017 Multi-modal Machine Translation Task
Title | DCU System Report on the WMT 2017 Multi-modal Machine Translation Task |
Authors | Iacer Calixto, Koel Dutta Chowdhury, Qun Liu |
Abstract | |
Tasks | Machine Translation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4747/ |
https://www.aclweb.org/anthology/W17-4747 | |
PWC | https://paperswithcode.com/paper/dcu-system-report-on-the-wmt-2017-multi-modal |
Repo | |
Framework | |
DFKI-DKT at SemEval-2017 Task 8: Rumour Detection and Classification using Cascading Heuristics
Title | DFKI-DKT at SemEval-2017 Task 8: Rumour Detection and Classification using Cascading Heuristics |
Authors | Ankit Srivastava, Georg Rehm, Julian Moreno Schneider |
Abstract | We describe our submissions for SemEval-2017 Task 8, Determining Rumour Veracity and Support for Rumours. The Digital Curation Technologies (DKT) team at the German Research Center for Artificial Intelligence (DFKI) participated in two subtasks: Subtask A (determining the stance of a message) and Subtask B (determining veracity of a message, closed variant). In both cases, our implementation consisted of a Multivariate Logistic Regression (Maximum Entropy) classifier coupled with hand-written patterns and rules (heuristics) applied in a post-process cascading fashion. We provide a detailed analysis of the system performance and report on variants of our systems that were not part of the official submission. |
Tasks | Rumour Detection |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/S17-2085/ |
https://www.aclweb.org/anthology/S17-2085 | |
PWC | https://paperswithcode.com/paper/dfki-dkt-at-semeval-2017-task-8-rumour |
Repo | |
Framework | |
A statistical model for morphology inspired by the Amis language
Title | A statistical model for morphology inspired by the Amis language |
Authors | Isabelle Bril, Achraf Lassoued, Michel de Rougemont |
Abstract | |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-7009/ |
https://www.aclweb.org/anthology/W17-7009 | |
PWC | https://paperswithcode.com/paper/a-statistical-model-for-morphology-inspired |
Repo | |
Framework | |
Coreference Resolution for Swedish and German using Distant Supervision
Title | Coreference Resolution for Swedish and German using Distant Supervision |
Authors | Alex Wallin, er, Pierre Nugues |
Abstract | |
Tasks | Coreference Resolution, Knowledge Graphs, Question Answering |
Published | 2017-05-01 |
URL | https://www.aclweb.org/anthology/W17-0206/ |
https://www.aclweb.org/anthology/W17-0206 | |
PWC | https://paperswithcode.com/paper/coreference-resolution-for-swedish-and-german |
Repo | |
Framework | |
Exploiting Document Level Information to Improve Event Detection via Recurrent Neural Networks
Title | Exploiting Document Level Information to Improve Event Detection via Recurrent Neural Networks |
Authors | Shaoyang Duan, Ruifang He, Wenli Zhao |
Abstract | This paper tackles the task of event detection, which involves identifying and categorizing events. The previous work mainly exist two problems: (1) the traditional feature-based methods apply cross-sentence information, yet need taking a large amount of human effort to design complicated feature sets and inference rules; (2) the representation-based methods though overcome the problem of manually extracting features, while just depend on local sentence representation. Considering local sentence context is insufficient to resolve ambiguities in identifying particular event types, therefore, we propose a novel document level Recurrent Neural Networks (DLRNN) model, which can automatically extract cross-sentence clues to improve sentence level event detection without designing complex reasoning rules. Experiment results show that our approach outperforms other state-of-the-art methods on ACE 2005 dataset without external knowledge base. |
Tasks | |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-1036/ |
https://www.aclweb.org/anthology/I17-1036 | |
PWC | https://paperswithcode.com/paper/exploiting-document-level-information-to |
Repo | |
Framework | |
A Hybrid System to apply Natural Language Inference over Dependency Trees
Title | A Hybrid System to apply Natural Language Inference over Dependency Trees |
Authors | Ali Almiman, Allan Ramsay |
Abstract | This paper presents the development of a natural language inference engine that benefits from two current standard approaches; i.e., shallow and deep approaches. This system combines two non-deterministic algorithms: the approximate matching from the shallow approach and a theorem prover from the deep approach for handling multi-step inference tasks. The theorem prover is customized to accept dependency trees and apply inference rules to these trees. The inference rules are automatically generated as syllogistic rules from our test data (FraCaS test suite). The theorem prover exploits a non-deterministic matching algorithm within a standard backward chaining inference engine. We employ continuation programming as a way of seamlessly handling the combination of these two non-deterministic algorithms. Testing the matching algorithm on {}Generalized quantifiers{''} and { }adjectives{''} topics in FraCaS (MacCartney and Manning 2007), we achieved an accuracy of 92.8{%} of the single-premise cases. For the multi-steps of inference, we checked the validity of our syllogistic rules and then extracted four generic instances that can be applied to more than one problem. |
Tasks | Natural Language Inference |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/R17-1010/ |
https://doi.org/10.26615/978-954-452-049-6_010 | |
PWC | https://paperswithcode.com/paper/a-hybrid-system-to-apply-natural-language |
Repo | |
Framework | |
Ensembles of Classifiers for Cleaning Web Parallel Corpora and Translation Memories
Title | Ensembles of Classifiers for Cleaning Web Parallel Corpora and Translation Memories |
Authors | Eduard Barbu |
Abstract | The last years witnessed an increasing interest in the automatic methods for spotting false translation units in translation memories. This problem presents a great interest to industry as there are many translation memories that contain errors. A closely related line of research deals with identifying sentences that do not align in the parallel corpora mined from the web. The task of spotting false translations is modeled as a binary classification problem. It is known that in certain conditions the ensembles of classifiers improve over the performance of the individual members. In this paper we benchmark the most popular ensemble of classifiers: Majority Voting, Bagging, Stacking and Ada Boost at the task of spotting false translation units for translation memories and parallel web corpora. We want to know if for this specific problem any ensemble technique improves the performance of the individual classifiers and if there is a difference between the data in translation memories and parallel web corpora with respect to this task. |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/R17-1011/ |
https://doi.org/10.26615/978-954-452-049-6_011 | |
PWC | https://paperswithcode.com/paper/ensembles-of-classifiers-for-cleaning-web |
Repo | |
Framework | |
Exploiting and Evaluating a Supervised, Multilanguage Keyphrase Extraction pipeline for under-resourced languages
Title | Exploiting and Evaluating a Supervised, Multilanguage Keyphrase Extraction pipeline for under-resourced languages |
Authors | Marco Basaldella, Muhammad Helmy, Elisa Antolli, Mihai Horia Popescu, Giuseppe Serra, Carlo Tasso |
Abstract | This paper evaluates different techniques for building a supervised, multilanguage keyphrase extraction pipeline for languages which lack a gold standard. Starting from an unsupervised English keyphrase extraction pipeline, we implement pipelines for Arabic, Italian, Portuguese, and Romanian, and we build test collections for languages which lack one. Then, we add a Machine Learning module trained on a well-known English language corpus and we evaluate the performance not only over English but on the other languages as well. Finally, we repeat the same evaluation after training the pipeline over an Arabic language corpus to check whether using a language-specific corpus brings a further improvement in performance. On the five languages we analyzed, results show an improvement in performance when using a machine learning algorithm, even if such algorithm is not trained and tested on the same language. |
Tasks | Information Retrieval, Text Summarization |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/R17-1012/ |
https://doi.org/10.26615/978-954-452-049-6_012 | |
PWC | https://paperswithcode.com/paper/exploiting-and-evaluating-a-supervised |
Repo | |
Framework | |