May 5, 2019

1858 words 9 mins read

Paper Group NANR 96

Paper Group NANR 96

YAMAMA: Yet Another Multi-Dialect Arabic Morphological Analyzer. Syntactic and Lexical Complexity in Italian Noncanonical Structures. Translation systems and experimental results of the EHR group for WAT2016 tasks. From Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse. Thematic fit eval …

YAMAMA: Yet Another Multi-Dialect Arabic Morphological Analyzer

Title YAMAMA: Yet Another Multi-Dialect Arabic Morphological Analyzer
Authors Salam Khalifa, Nasser Zalmout, Nizar Habash
Abstract In this paper, we present YAMAMA, a multi-dialect Arabic morphological analyzer and disambiguator. Our system is almost five times faster than the state-of-art MADAMIRA system with a slightly lower quality. In addition to speed, YAMAMA outputs a rich representation which allows for a wider spectrum of use. In this regard, YAMAMA transcends other systems, such as FARASA, which is faster but provides specific outputs catering to specific applications.
Tasks Lemmatization, Morphological Analysis, Tokenization, Transliteration
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-2047/
PDF https://www.aclweb.org/anthology/C16-2047
PWC https://paperswithcode.com/paper/yamama-yet-another-multi-dialect-arabic
Repo
Framework

Syntactic and Lexical Complexity in Italian Noncanonical Structures

Title Syntactic and Lexical Complexity in Italian Noncanonical Structures
Authors Rodolfo Delmonte
Abstract In this paper we will be dealing with different levels of complexity in the processing of Italian, a Romance language inheriting many properties from Latin which make it an almost free word order language . The paper is concerned with syntactic complexity as measurable on the basis of the cognitive parser that incrementally builds up a syntactic representation to be used by the semantic component. The theory behind will be LFG and parsing preferences will be used to justify one choice both from a principled and a processing point of view. LFG is a transformationless theory in which there is no deep structure separate from surface syntactic structure. This is partially in accordance with constructional theories in which noncanonical structures containing non-argument functions FOCUS/TOPIC are treated as multifunctional constituents. Complexity is computed on a processing basis following suggestions made by Blache and demonstrated by Kluender and Chesi
Tasks Language Modelling
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4108/
PDF https://www.aclweb.org/anthology/W16-4108
PWC https://paperswithcode.com/paper/syntactic-and-lexical-complexity-in-italian
Repo
Framework

Translation systems and experimental results of the EHR group for WAT2016 tasks

Title Translation systems and experimental results of the EHR group for WAT2016 tasks
Authors Terumasa Ehara
Abstract System architecture, experimental settings and experimental results of the group for the WAT2016 tasks are described. We participate in six tasks: en-ja, zh-ja, JPCzh-ja, JPCko-ja, HINDENen-hi and HINDENhi-ja. Although the basic architecture of our sys-tems is PBSMT with reordering, several techniques are conducted. Especially, the system for the HINDENhi-ja task with pivoting by English uses the reordering technique. Be-cause Hindi and Japanese are both OV type languages and English is a VO type language, we can use reordering technique to the pivot language. We can improve BLEU score from 7.47 to 7.66 by the reordering technique for the sentence level pivoting of this task.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4609/
PDF https://www.aclweb.org/anthology/W16-4609
PWC https://paperswithcode.com/paper/translation-systems-and-experimental-results
Repo
Framework

From Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse

Title From Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse
Authors Ekaterina Lapshinova-Koltunski, Kerstin Anna Kunz, Anna Nedoluzhko
Abstract In the present paper, we analyse variation of discourse phenomena in two typologically different languages, i.e. in German and Czech. The novelty of our approach lies in the nature of the resources we are using. Advantage is taken of existing resources, which are, however, annotated on the basis of two different frameworks. We use an interoperable scheme unifying discourse phenomena in both frameworks into more abstract categories and considering only those phenomena that have a direct match in German and Czech. The discourse properties we focus on are relations of identity, semantic similarity, ellipsis and discourse relations. Our study shows that the application of interoperable schemes allows an exploitation of discourse-related phenomena analysed in different projects and on the basis of different frameworks. As corpus compilation and annotation is a time-consuming task, positive results of this experiment open up new paths for contrastive linguistics, translation studies and NLP, including machine translation.
Tasks Machine Translation, Semantic Similarity, Semantic Textual Similarity
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1157/
PDF https://www.aclweb.org/anthology/L16-1157
PWC https://paperswithcode.com/paper/from-interoperable-annotations-towards
Repo
Framework

Thematic fit evaluation: an aspect of selectional preferences

Title Thematic fit evaluation: an aspect of selectional preferences
Authors Asad Sayeed, Clayton Greenberg, Vera Demberg
Abstract
Tasks Decision Making
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2518/
PDF https://www.aclweb.org/anthology/W16-2518
PWC https://paperswithcode.com/paper/thematic-fit-evaluation-an-aspect-of
Repo
Framework

BIT at SemEval-2016 Task 1: Sentence Similarity Based on Alignments and Vector with the Weight of Information Content

Title BIT at SemEval-2016 Task 1: Sentence Similarity Based on Alignments and Vector with the Weight of Information Content
Authors Hao Wu, Heyan Huang, Wenpeng Lu
Abstract
Tasks Information Retrieval, Machine Translation, Question Answering, Recommendation Systems, Semantic Textual Similarity, Text Summarization, Word Embeddings
Published 2016-06-01
URL https://www.aclweb.org/anthology/S16-1105/
PDF https://www.aclweb.org/anthology/S16-1105
PWC https://paperswithcode.com/paper/bit-at-semeval-2016-task-1-sentence
Repo
Framework

Automatic evaluation of surface coherence in L2 texts in Czech

Title Automatic evaluation of surface coherence in L2 texts in Czech
Authors Kate{\v{r}}ina Rysov{'a}, Magdal{'e}na Rysov{'a}, Ji{\v{r}}{'\i} M{'\i}rovsk{'y}
Abstract
Tasks
Published 2016-10-01
URL https://www.aclweb.org/anthology/O16-1021/
PDF https://www.aclweb.org/anthology/O16-1021
PWC https://paperswithcode.com/paper/automatic-evaluation-of-surface-coherence-in
Repo
Framework

Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence

Title Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence
Authors Gakuto Kurata, Bing Xiang, Bowen Zhou
Abstract
Tasks Multi-Label Classification, Multi-Label Text Classification, Text Classification, Word Embeddings
Published 2016-06-01
URL https://www.aclweb.org/anthology/N16-1063/
PDF https://www.aclweb.org/anthology/N16-1063
PWC https://paperswithcode.com/paper/improved-neural-network-based-multi-label
Repo
Framework

context2vec: Learning Generic Context Embedding with Bidirectional LSTM

Title context2vec: Learning Generic Context Embedding with Bidirectional LSTM
Authors Oren Melamud, Jacob Goldberger, Ido Dagan
Abstract
Tasks Chunking, Coreference Resolution, Named Entity Recognition, Semantic Role Labeling, Word Embeddings, Word Sense Disambiguation, Word Sense Induction
Published 2016-08-01
URL https://www.aclweb.org/anthology/K16-1006/
PDF https://www.aclweb.org/anthology/K16-1006
PWC https://paperswithcode.com/paper/context2vec-learning-generic-context
Repo
Framework

Developing Universal Dependencies for Mandarin Chinese

Title Developing Universal Dependencies for Mandarin Chinese
Authors Herman Leung, Rafa{"e}l Poiret, Tak-sum Wong, Xinying Chen, Kim Gerdes, John Lee
Abstract This article proposes a Universal Dependency Annotation Scheme for Mandarin Chinese, including POS tags and dependency analysis. We identify cases of idiosyncrasy of Mandarin Chinese that are difficult to fit into the current schema which has mainly been based on the descriptions of various Indo-European languages. We discuss differences between our scheme and those of the Stanford Chinese Dependencies and the Chinese Dependency Treebank.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-5403/
PDF https://www.aclweb.org/anthology/W16-5403
PWC https://paperswithcode.com/paper/developing-universal-dependencies-for
Repo
Framework

Ambiguity Diagnosis for Terms in Digital Humanities

Title Ambiguity Diagnosis for Terms in Digital Humanities
Authors B{'e}atrice Daille, Evelyne Jacquey, Ga{"e}l Lejeune, Luis Felipe Melo, Yannick Toussaint
Abstract Among all researches dedicating to terminology and word sense disambiguation, little attention has been devoted to the ambiguity of term occurrences. If a lexical unit is indeed a term of the domain, it is not true, even in a specialised corpus, that all its occurrences are terminological. Some occurrences are terminological and other are not. Thus, a global decision at the corpus level about the terminological status of all occurrences of a lexical unit would then be erroneous. In this paper, we propose three original methods to characterise the ambiguity of term occurrences in the domain of social sciences for French. These methods differently model the context of the term occurrences: one is relying on text mining, the second is based on textometry, and the last one focuses on text genre properties. The experimental results show the potential of the proposed approaches and give an opportunity to discuss about their hybridisation.
Tasks Word Sense Disambiguation
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1690/
PDF https://www.aclweb.org/anthology/L16-1690
PWC https://paperswithcode.com/paper/ambiguity-diagnosis-for-terms-in-digital
Repo
Framework

DRANZIERA: An Evaluation Protocol For Multi-Domain Opinion Mining

Title DRANZIERA: An Evaluation Protocol For Multi-Domain Opinion Mining
Authors Mauro Dragoni, Andrea Tettamanzi, C{'e}lia da Costa Pereira
Abstract Opinion Mining is a topic which attracted a lot of interest in the last years. By observing the literature, it is often hard to replicate system evaluation due to the unavailability of the data used for the evaluation or to the lack of details about the protocol used in the campaign. In this paper, we propose an evaluation protocol, called DRANZIERA, composed of a multi-domain dataset and guidelines allowing both to evaluate opinion mining systems in different contexts (Closed, Semi-Open, and Open) and to compare them to each other and to a number of baselines.
Tasks Opinion Mining
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1041/
PDF https://www.aclweb.org/anthology/L16-1041
PWC https://paperswithcode.com/paper/dranziera-an-evaluation-protocol-for-multi
Repo
Framework

The Value of Semantic Parse Labeling for Knowledge Base Question Answering

Title The Value of Semantic Parse Labeling for Knowledge Base Question Answering
Authors Wen-tau Yih, Matthew Richardson, Chris Meek, Ming-Wei Chang, Jina Suh
Abstract
Tasks Knowledge Base Question Answering, Question Answering, Semantic Parsing
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-2033/
PDF https://www.aclweb.org/anthology/P16-2033
PWC https://paperswithcode.com/paper/the-value-of-semantic-parse-labeling-for
Repo
Framework

The TYPALOC Corpus: A Collection of Various Dysarthric Speech Recordings in Read and Spontaneous Styles

Title The TYPALOC Corpus: A Collection of Various Dysarthric Speech Recordings in Read and Spontaneous Styles
Authors Christine Meunier, Cecile Fougeron, Corinne Fredouille, Brigitte Bigi, Lise Crevier-Buchman, Elisabeth Delais-Roussarie, Laurianne Georgeton, Alain Ghio, Imed Laaridh, Thierry Legou, Claire Pillot-Loiseau, Gilles Pouchoulin
Abstract This paper presents the TYPALOC corpus of French Dysarthric and Healthy speech and the rationale underlying its constitution. The objective is to compare phonetic variation in the speech of dysarthric vs. healthy speakers in different speech conditions (read and unprepared speech). More precisely, we aim to compare the extent, types and location of phonetic variation within these different populations and speech conditions. The TYPALOC corpus is constituted of a selection of 28 dysarthric patients (three different pathologies) and of 12 healthy control speakers recorded while reading the same text and in a more natural continuous speech condition. Each audio signal has been segmented into Inter-Pausal Units. Then, the corpus has been manually transcribed and automatically aligned. The alignment has been corrected by an expert phonetician. Moreover, the corpus benefits from an automatic syllabification and an Automatic Detection of Acoustic Phone-Based Anomalies. Finally, in order to interpret phonetic variations due to pathologies, a perceptual evaluation of each patient has been conducted. Quantitative data are provided at the end of the paper.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1738/
PDF https://www.aclweb.org/anthology/L16-1738
PWC https://paperswithcode.com/paper/the-typaloc-corpus-a-collection-of-various
Repo
Framework

UIMA-Based JCoRe 2.0 Goes GitHub and Maven Central ― State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines

Title UIMA-Based JCoRe 2.0 Goes GitHub and Maven Central ― State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines
Authors Udo Hahn, Franz Matthies, Erik Faessler, Johannes Hellrich
Abstract We introduce JCoRe 2.0, the relaunch of a UIMA-based open software repository for full-scale natural language processing originating from the Jena University Language {&} Information Engineering (JULIE) Lab. In an attempt to put the new release of JCoRe on firm software engineering ground, we uploaded it to GitHub, a social coding platform, with an underlying source code versioning system and various means to support collaboration for software development and code modification management. In order to automate the builds of complex NLP pipelines and properly represent and track dependencies of the underlying Java code, we incorporated Maven as part of our software configuration management efforts. In the meantime, we have deployed our artifacts on Maven Central, as well. JCoRe 2.0 offers a broad range of text analytics functionality (mostly) for English-language scientific abstracts and full-text articles, especially from the life sciences domain.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1397/
PDF https://www.aclweb.org/anthology/L16-1397
PWC https://paperswithcode.com/paper/uima-based-jcore-20-goes-github-and-maven
Repo
Framework
comments powered by Disqus