Paper Group NANR 96
YAMAMA: Yet Another Multi-Dialect Arabic Morphological Analyzer. Syntactic and Lexical Complexity in Italian Noncanonical Structures. Translation systems and experimental results of the EHR group for WAT2016 tasks. From Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse. Thematic fit eval …
YAMAMA: Yet Another Multi-Dialect Arabic Morphological Analyzer
Title | YAMAMA: Yet Another Multi-Dialect Arabic Morphological Analyzer |
Authors | Salam Khalifa, Nasser Zalmout, Nizar Habash |
Abstract | In this paper, we present YAMAMA, a multi-dialect Arabic morphological analyzer and disambiguator. Our system is almost five times faster than the state-of-art MADAMIRA system with a slightly lower quality. In addition to speed, YAMAMA outputs a rich representation which allows for a wider spectrum of use. In this regard, YAMAMA transcends other systems, such as FARASA, which is faster but provides specific outputs catering to specific applications. |
Tasks | Lemmatization, Morphological Analysis, Tokenization, Transliteration |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-2047/ |
https://www.aclweb.org/anthology/C16-2047 | |
PWC | https://paperswithcode.com/paper/yamama-yet-another-multi-dialect-arabic |
Repo | |
Framework | |
Syntactic and Lexical Complexity in Italian Noncanonical Structures
Title | Syntactic and Lexical Complexity in Italian Noncanonical Structures |
Authors | Rodolfo Delmonte |
Abstract | In this paper we will be dealing with different levels of complexity in the processing of Italian, a Romance language inheriting many properties from Latin which make it an almost free word order language . The paper is concerned with syntactic complexity as measurable on the basis of the cognitive parser that incrementally builds up a syntactic representation to be used by the semantic component. The theory behind will be LFG and parsing preferences will be used to justify one choice both from a principled and a processing point of view. LFG is a transformationless theory in which there is no deep structure separate from surface syntactic structure. This is partially in accordance with constructional theories in which noncanonical structures containing non-argument functions FOCUS/TOPIC are treated as multifunctional constituents. Complexity is computed on a processing basis following suggestions made by Blache and demonstrated by Kluender and Chesi |
Tasks | Language Modelling |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4108/ |
https://www.aclweb.org/anthology/W16-4108 | |
PWC | https://paperswithcode.com/paper/syntactic-and-lexical-complexity-in-italian |
Repo | |
Framework | |
Translation systems and experimental results of the EHR group for WAT2016 tasks
Title | Translation systems and experimental results of the EHR group for WAT2016 tasks |
Authors | Terumasa Ehara |
Abstract | System architecture, experimental settings and experimental results of the group for the WAT2016 tasks are described. We participate in six tasks: en-ja, zh-ja, JPCzh-ja, JPCko-ja, HINDENen-hi and HINDENhi-ja. Although the basic architecture of our sys-tems is PBSMT with reordering, several techniques are conducted. Especially, the system for the HINDENhi-ja task with pivoting by English uses the reordering technique. Be-cause Hindi and Japanese are both OV type languages and English is a VO type language, we can use reordering technique to the pivot language. We can improve BLEU score from 7.47 to 7.66 by the reordering technique for the sentence level pivoting of this task. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4609/ |
https://www.aclweb.org/anthology/W16-4609 | |
PWC | https://paperswithcode.com/paper/translation-systems-and-experimental-results |
Repo | |
Framework | |
From Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse
Title | From Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse |
Authors | Ekaterina Lapshinova-Koltunski, Kerstin Anna Kunz, Anna Nedoluzhko |
Abstract | In the present paper, we analyse variation of discourse phenomena in two typologically different languages, i.e. in German and Czech. The novelty of our approach lies in the nature of the resources we are using. Advantage is taken of existing resources, which are, however, annotated on the basis of two different frameworks. We use an interoperable scheme unifying discourse phenomena in both frameworks into more abstract categories and considering only those phenomena that have a direct match in German and Czech. The discourse properties we focus on are relations of identity, semantic similarity, ellipsis and discourse relations. Our study shows that the application of interoperable schemes allows an exploitation of discourse-related phenomena analysed in different projects and on the basis of different frameworks. As corpus compilation and annotation is a time-consuming task, positive results of this experiment open up new paths for contrastive linguistics, translation studies and NLP, including machine translation. |
Tasks | Machine Translation, Semantic Similarity, Semantic Textual Similarity |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1157/ |
https://www.aclweb.org/anthology/L16-1157 | |
PWC | https://paperswithcode.com/paper/from-interoperable-annotations-towards |
Repo | |
Framework | |
Thematic fit evaluation: an aspect of selectional preferences
Title | Thematic fit evaluation: an aspect of selectional preferences |
Authors | Asad Sayeed, Clayton Greenberg, Vera Demberg |
Abstract | |
Tasks | Decision Making |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2518/ |
https://www.aclweb.org/anthology/W16-2518 | |
PWC | https://paperswithcode.com/paper/thematic-fit-evaluation-an-aspect-of |
Repo | |
Framework | |
BIT at SemEval-2016 Task 1: Sentence Similarity Based on Alignments and Vector with the Weight of Information Content
Title | BIT at SemEval-2016 Task 1: Sentence Similarity Based on Alignments and Vector with the Weight of Information Content |
Authors | Hao Wu, Heyan Huang, Wenpeng Lu |
Abstract | |
Tasks | Information Retrieval, Machine Translation, Question Answering, Recommendation Systems, Semantic Textual Similarity, Text Summarization, Word Embeddings |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/S16-1105/ |
https://www.aclweb.org/anthology/S16-1105 | |
PWC | https://paperswithcode.com/paper/bit-at-semeval-2016-task-1-sentence |
Repo | |
Framework | |
Automatic evaluation of surface coherence in L2 texts in Czech
Title | Automatic evaluation of surface coherence in L2 texts in Czech |
Authors | Kate{\v{r}}ina Rysov{'a}, Magdal{'e}na Rysov{'a}, Ji{\v{r}}{'\i} M{'\i}rovsk{'y} |
Abstract | |
Tasks | |
Published | 2016-10-01 |
URL | https://www.aclweb.org/anthology/O16-1021/ |
https://www.aclweb.org/anthology/O16-1021 | |
PWC | https://paperswithcode.com/paper/automatic-evaluation-of-surface-coherence-in |
Repo | |
Framework | |
Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence
Title | Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence |
Authors | Gakuto Kurata, Bing Xiang, Bowen Zhou |
Abstract | |
Tasks | Multi-Label Classification, Multi-Label Text Classification, Text Classification, Word Embeddings |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-1063/ |
https://www.aclweb.org/anthology/N16-1063 | |
PWC | https://paperswithcode.com/paper/improved-neural-network-based-multi-label |
Repo | |
Framework | |
context2vec: Learning Generic Context Embedding with Bidirectional LSTM
Title | context2vec: Learning Generic Context Embedding with Bidirectional LSTM |
Authors | Oren Melamud, Jacob Goldberger, Ido Dagan |
Abstract | |
Tasks | Chunking, Coreference Resolution, Named Entity Recognition, Semantic Role Labeling, Word Embeddings, Word Sense Disambiguation, Word Sense Induction |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/K16-1006/ |
https://www.aclweb.org/anthology/K16-1006 | |
PWC | https://paperswithcode.com/paper/context2vec-learning-generic-context |
Repo | |
Framework | |
Developing Universal Dependencies for Mandarin Chinese
Title | Developing Universal Dependencies for Mandarin Chinese |
Authors | Herman Leung, Rafa{"e}l Poiret, Tak-sum Wong, Xinying Chen, Kim Gerdes, John Lee |
Abstract | This article proposes a Universal Dependency Annotation Scheme for Mandarin Chinese, including POS tags and dependency analysis. We identify cases of idiosyncrasy of Mandarin Chinese that are difficult to fit into the current schema which has mainly been based on the descriptions of various Indo-European languages. We discuss differences between our scheme and those of the Stanford Chinese Dependencies and the Chinese Dependency Treebank. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-5403/ |
https://www.aclweb.org/anthology/W16-5403 | |
PWC | https://paperswithcode.com/paper/developing-universal-dependencies-for |
Repo | |
Framework | |
Ambiguity Diagnosis for Terms in Digital Humanities
Title | Ambiguity Diagnosis for Terms in Digital Humanities |
Authors | B{'e}atrice Daille, Evelyne Jacquey, Ga{"e}l Lejeune, Luis Felipe Melo, Yannick Toussaint |
Abstract | Among all researches dedicating to terminology and word sense disambiguation, little attention has been devoted to the ambiguity of term occurrences. If a lexical unit is indeed a term of the domain, it is not true, even in a specialised corpus, that all its occurrences are terminological. Some occurrences are terminological and other are not. Thus, a global decision at the corpus level about the terminological status of all occurrences of a lexical unit would then be erroneous. In this paper, we propose three original methods to characterise the ambiguity of term occurrences in the domain of social sciences for French. These methods differently model the context of the term occurrences: one is relying on text mining, the second is based on textometry, and the last one focuses on text genre properties. The experimental results show the potential of the proposed approaches and give an opportunity to discuss about their hybridisation. |
Tasks | Word Sense Disambiguation |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1690/ |
https://www.aclweb.org/anthology/L16-1690 | |
PWC | https://paperswithcode.com/paper/ambiguity-diagnosis-for-terms-in-digital |
Repo | |
Framework | |
DRANZIERA: An Evaluation Protocol For Multi-Domain Opinion Mining
Title | DRANZIERA: An Evaluation Protocol For Multi-Domain Opinion Mining |
Authors | Mauro Dragoni, Andrea Tettamanzi, C{'e}lia da Costa Pereira |
Abstract | Opinion Mining is a topic which attracted a lot of interest in the last years. By observing the literature, it is often hard to replicate system evaluation due to the unavailability of the data used for the evaluation or to the lack of details about the protocol used in the campaign. In this paper, we propose an evaluation protocol, called DRANZIERA, composed of a multi-domain dataset and guidelines allowing both to evaluate opinion mining systems in different contexts (Closed, Semi-Open, and Open) and to compare them to each other and to a number of baselines. |
Tasks | Opinion Mining |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1041/ |
https://www.aclweb.org/anthology/L16-1041 | |
PWC | https://paperswithcode.com/paper/dranziera-an-evaluation-protocol-for-multi |
Repo | |
Framework | |
The Value of Semantic Parse Labeling for Knowledge Base Question Answering
Title | The Value of Semantic Parse Labeling for Knowledge Base Question Answering |
Authors | Wen-tau Yih, Matthew Richardson, Chris Meek, Ming-Wei Chang, Jina Suh |
Abstract | |
Tasks | Knowledge Base Question Answering, Question Answering, Semantic Parsing |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-2033/ |
https://www.aclweb.org/anthology/P16-2033 | |
PWC | https://paperswithcode.com/paper/the-value-of-semantic-parse-labeling-for |
Repo | |
Framework | |
The TYPALOC Corpus: A Collection of Various Dysarthric Speech Recordings in Read and Spontaneous Styles
Title | The TYPALOC Corpus: A Collection of Various Dysarthric Speech Recordings in Read and Spontaneous Styles |
Authors | Christine Meunier, Cecile Fougeron, Corinne Fredouille, Brigitte Bigi, Lise Crevier-Buchman, Elisabeth Delais-Roussarie, Laurianne Georgeton, Alain Ghio, Imed Laaridh, Thierry Legou, Claire Pillot-Loiseau, Gilles Pouchoulin |
Abstract | This paper presents the TYPALOC corpus of French Dysarthric and Healthy speech and the rationale underlying its constitution. The objective is to compare phonetic variation in the speech of dysarthric vs. healthy speakers in different speech conditions (read and unprepared speech). More precisely, we aim to compare the extent, types and location of phonetic variation within these different populations and speech conditions. The TYPALOC corpus is constituted of a selection of 28 dysarthric patients (three different pathologies) and of 12 healthy control speakers recorded while reading the same text and in a more natural continuous speech condition. Each audio signal has been segmented into Inter-Pausal Units. Then, the corpus has been manually transcribed and automatically aligned. The alignment has been corrected by an expert phonetician. Moreover, the corpus benefits from an automatic syllabification and an Automatic Detection of Acoustic Phone-Based Anomalies. Finally, in order to interpret phonetic variations due to pathologies, a perceptual evaluation of each patient has been conducted. Quantitative data are provided at the end of the paper. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1738/ |
https://www.aclweb.org/anthology/L16-1738 | |
PWC | https://paperswithcode.com/paper/the-typaloc-corpus-a-collection-of-various |
Repo | |
Framework | |
UIMA-Based JCoRe 2.0 Goes GitHub and Maven Central ― State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines
Title | UIMA-Based JCoRe 2.0 Goes GitHub and Maven Central ― State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines |
Authors | Udo Hahn, Franz Matthies, Erik Faessler, Johannes Hellrich |
Abstract | We introduce JCoRe 2.0, the relaunch of a UIMA-based open software repository for full-scale natural language processing originating from the Jena University Language {&} Information Engineering (JULIE) Lab. In an attempt to put the new release of JCoRe on firm software engineering ground, we uploaded it to GitHub, a social coding platform, with an underlying source code versioning system and various means to support collaboration for software development and code modification management. In order to automate the builds of complex NLP pipelines and properly represent and track dependencies of the underlying Java code, we incorporated Maven as part of our software configuration management efforts. In the meantime, we have deployed our artifacts on Maven Central, as well. JCoRe 2.0 offers a broad range of text analytics functionality (mostly) for English-language scientific abstracts and full-text articles, especially from the life sciences domain. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1397/ |
https://www.aclweb.org/anthology/L16-1397 | |
PWC | https://paperswithcode.com/paper/uima-based-jcore-20-goes-github-and-maven |
Repo | |
Framework | |