May 5, 2019

1858 words 9 mins read

Paper Group NANR 96

YAMAMA: Yet Another Multi-Dialect Arabic Morphological Analyzer. Syntactic and Lexical Complexity in Italian Noncanonical Structures. Translation systems and experimental results of the EHR group for WAT2016 tasks. From Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse. Thematic fit eval …

YAMAMA: Yet Another Multi-Dialect Arabic Morphological Analyzer


Title	YAMAMA: Yet Another Multi-Dialect Arabic Morphological Analyzer
Authors	Salam Khalifa, Nasser Zalmout, Nizar Habash
Abstract	In this paper, we present YAMAMA, a multi-dialect Arabic morphological analyzer and disambiguator. Our system is almost five times faster than the state-of-art MADAMIRA system with a slightly lower quality. In addition to speed, YAMAMA outputs a rich representation which allows for a wider spectrum of use. In this regard, YAMAMA transcends other systems, such as FARASA, which is faster but provides specific outputs catering to specific applications.
Tasks	Lemmatization, Morphological Analysis, Tokenization, Transliteration
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-2047/
PDF	https://www.aclweb.org/anthology/C16-2047
PWC	https://paperswithcode.com/paper/yamama-yet-another-multi-dialect-arabic
Repo
Framework

Syntactic and Lexical Complexity in Italian Noncanonical Structures


Title	Syntactic and Lexical Complexity in Italian Noncanonical Structures
Authors	Rodolfo Delmonte
Abstract	In this paper we will be dealing with different levels of complexity in the processing of Italian, a Romance language inheriting many properties from Latin which make it an almost free word order language . The paper is concerned with syntactic complexity as measurable on the basis of the cognitive parser that incrementally builds up a syntactic representation to be used by the semantic component. The theory behind will be LFG and parsing preferences will be used to justify one choice both from a principled and a processing point of view. LFG is a transformationless theory in which there is no deep structure separate from surface syntactic structure. This is partially in accordance with constructional theories in which noncanonical structures containing non-argument functions FOCUS/TOPIC are treated as multifunctional constituents. Complexity is computed on a processing basis following suggestions made by Blache and demonstrated by Kluender and Chesi
Tasks	Language Modelling
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4108/
PDF	https://www.aclweb.org/anthology/W16-4108
PWC	https://paperswithcode.com/paper/syntactic-and-lexical-complexity-in-italian
Repo
Framework

Translation systems and experimental results of the EHR group for WAT2016 tasks


Title	Translation systems and experimental results of the EHR group for WAT2016 tasks
Authors	Terumasa Ehara
Abstract	System architecture, experimental settings and experimental results of the group for the WAT2016 tasks are described. We participate in six tasks: en-ja, zh-ja, JPCzh-ja, JPCko-ja, HINDENen-hi and HINDENhi-ja. Although the basic architecture of our sys-tems is PBSMT with reordering, several techniques are conducted. Especially, the system for the HINDENhi-ja task with pivoting by English uses the reordering technique. Be-cause Hindi and Japanese are both OV type languages and English is a VO type language, we can use reordering technique to the pivot language. We can improve BLEU score from 7.47 to 7.66 by the reordering technique for the sentence level pivoting of this task.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4609/
PDF	https://www.aclweb.org/anthology/W16-4609
PWC	https://paperswithcode.com/paper/translation-systems-and-experimental-results
Repo
Framework

From Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse


Title	From Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse
Authors	Ekaterina Lapshinova-Koltunski, Kerstin Anna Kunz, Anna Nedoluzhko
Abstract	In the present paper, we analyse variation of discourse phenomena in two typologically different languages, i.e. in German and Czech. The novelty of our approach lies in the nature of the resources we are using. Advantage is taken of existing resources, which are, however, annotated on the basis of two different frameworks. We use an interoperable scheme unifying discourse phenomena in both frameworks into more abstract categories and considering only those phenomena that have a direct match in German and Czech. The discourse properties we focus on are relations of identity, semantic similarity, ellipsis and discourse relations. Our study shows that the application of interoperable schemes allows an exploitation of discourse-related phenomena analysed in different projects and on the basis of different frameworks. As corpus compilation and annotation is a time-consuming task, positive results of this experiment open up new paths for contrastive linguistics, translation studies and NLP, including machine translation.
Tasks	Machine Translation, Semantic Similarity, Semantic Textual Similarity
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1157/
PDF	https://www.aclweb.org/anthology/L16-1157
PWC	https://paperswithcode.com/paper/from-interoperable-annotations-towards
Repo
Framework

Thematic fit evaluation: an aspect of selectional preferences


Title	Thematic fit evaluation: an aspect of selectional preferences
Authors	Asad Sayeed, Clayton Greenberg, Vera Demberg
Abstract
Tasks	Decision Making
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2518/
PDF	https://www.aclweb.org/anthology/W16-2518
PWC	https://paperswithcode.com/paper/thematic-fit-evaluation-an-aspect-of
Repo
Framework

BIT at SemEval-2016 Task 1: Sentence Similarity Based on Alignments and Vector with the Weight of Information Content


Title	BIT at SemEval-2016 Task 1: Sentence Similarity Based on Alignments and Vector with the Weight of Information Content
Authors	Hao Wu, Heyan Huang, Wenpeng Lu
Abstract
Tasks	Information Retrieval, Machine Translation, Question Answering, Recommendation Systems, Semantic Textual Similarity, Text Summarization, Word Embeddings
Published	2016-06-01
URL	https://www.aclweb.org/anthology/S16-1105/
PDF	https://www.aclweb.org/anthology/S16-1105
PWC	https://paperswithcode.com/paper/bit-at-semeval-2016-task-1-sentence
Repo
Framework

Automatic evaluation of surface coherence in L2 texts in Czech


Title	Automatic evaluation of surface coherence in L2 texts in Czech
Authors	Kate{\v{r}}ina Rysov{'a}, Magdal{'e}na Rysov{'a}, Ji{\v{r}}{'\i} M{'\i}rovsk{'y}
Abstract
Tasks
Published	2016-10-01
URL	https://www.aclweb.org/anthology/O16-1021/
PDF	https://www.aclweb.org/anthology/O16-1021
PWC	https://paperswithcode.com/paper/automatic-evaluation-of-surface-coherence-in
Repo
Framework

Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence


Title	Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence
Authors	Gakuto Kurata, Bing Xiang, Bowen Zhou
Abstract
Tasks	Multi-Label Classification, Multi-Label Text Classification, Text Classification, Word Embeddings
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1063/
PDF	https://www.aclweb.org/anthology/N16-1063
PWC	https://paperswithcode.com/paper/improved-neural-network-based-multi-label
Repo
Framework

context2vec: Learning Generic Context Embedding with Bidirectional LSTM


Title	context2vec: Learning Generic Context Embedding with Bidirectional LSTM
Authors	Oren Melamud, Jacob Goldberger, Ido Dagan
Abstract
Tasks	Chunking, Coreference Resolution, Named Entity Recognition, Semantic Role Labeling, Word Embeddings, Word Sense Disambiguation, Word Sense Induction
Published	2016-08-01
URL	https://www.aclweb.org/anthology/K16-1006/
PDF	https://www.aclweb.org/anthology/K16-1006
PWC	https://paperswithcode.com/paper/context2vec-learning-generic-context
Repo
Framework

Developing Universal Dependencies for Mandarin Chinese


Title	Developing Universal Dependencies for Mandarin Chinese
Authors	Herman Leung, Rafa{"e}l Poiret, Tak-sum Wong, Xinying Chen, Kim Gerdes, John Lee
Abstract	This article proposes a Universal Dependency Annotation Scheme for Mandarin Chinese, including POS tags and dependency analysis. We identify cases of idiosyncrasy of Mandarin Chinese that are difficult to fit into the current schema which has mainly been based on the descriptions of various Indo-European languages. We discuss differences between our scheme and those of the Stanford Chinese Dependencies and the Chinese Dependency Treebank.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-5403/
PDF	https://www.aclweb.org/anthology/W16-5403
PWC	https://paperswithcode.com/paper/developing-universal-dependencies-for
Repo
Framework

Ambiguity Diagnosis for Terms in Digital Humanities


Title	Ambiguity Diagnosis for Terms in Digital Humanities
Authors	B{'e}atrice Daille, Evelyne Jacquey, Ga{"e}l Lejeune, Luis Felipe Melo, Yannick Toussaint
Abstract	Among all researches dedicating to terminology and word sense disambiguation, little attention has been devoted to the ambiguity of term occurrences. If a lexical unit is indeed a term of the domain, it is not true, even in a specialised corpus, that all its occurrences are terminological. Some occurrences are terminological and other are not. Thus, a global decision at the corpus level about the terminological status of all occurrences of a lexical unit would then be erroneous. In this paper, we propose three original methods to characterise the ambiguity of term occurrences in the domain of social sciences for French. These methods differently model the context of the term occurrences: one is relying on text mining, the second is based on textometry, and the last one focuses on text genre properties. The experimental results show the potential of the proposed approaches and give an opportunity to discuss about their hybridisation.
Tasks	Word Sense Disambiguation
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1690/
PDF	https://www.aclweb.org/anthology/L16-1690
PWC	https://paperswithcode.com/paper/ambiguity-diagnosis-for-terms-in-digital
Repo
Framework

DRANZIERA: An Evaluation Protocol For Multi-Domain Opinion Mining


Title	DRANZIERA: An Evaluation Protocol For Multi-Domain Opinion Mining
Authors	Mauro Dragoni, Andrea Tettamanzi, C{'e}lia da Costa Pereira
Abstract	Opinion Mining is a topic which attracted a lot of interest in the last years. By observing the literature, it is often hard to replicate system evaluation due to the unavailability of the data used for the evaluation or to the lack of details about the protocol used in the campaign. In this paper, we propose an evaluation protocol, called DRANZIERA, composed of a multi-domain dataset and guidelines allowing both to evaluate opinion mining systems in different contexts (Closed, Semi-Open, and Open) and to compare them to each other and to a number of baselines.
Tasks	Opinion Mining
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1041/
PDF	https://www.aclweb.org/anthology/L16-1041
PWC	https://paperswithcode.com/paper/dranziera-an-evaluation-protocol-for-multi
Repo
Framework

The Value of Semantic Parse Labeling for Knowledge Base Question Answering


Title	The Value of Semantic Parse Labeling for Knowledge Base Question Answering
Authors	Wen-tau Yih, Matthew Richardson, Chris Meek, Ming-Wei Chang, Jina Suh
Abstract
Tasks	Knowledge Base Question Answering, Question Answering, Semantic Parsing
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-2033/
PDF	https://www.aclweb.org/anthology/P16-2033
PWC	https://paperswithcode.com/paper/the-value-of-semantic-parse-labeling-for
Repo
Framework

The TYPALOC Corpus: A Collection of Various Dysarthric Speech Recordings in Read and Spontaneous Styles


Title	The TYPALOC Corpus: A Collection of Various Dysarthric Speech Recordings in Read and Spontaneous Styles
Authors	Christine Meunier, Cecile Fougeron, Corinne Fredouille, Brigitte Bigi, Lise Crevier-Buchman, Elisabeth Delais-Roussarie, Laurianne Georgeton, Alain Ghio, Imed Laaridh, Thierry Legou, Claire Pillot-Loiseau, Gilles Pouchoulin
Abstract	This paper presents the TYPALOC corpus of French Dysarthric and Healthy speech and the rationale underlying its constitution. The objective is to compare phonetic variation in the speech of dysarthric vs. healthy speakers in different speech conditions (read and unprepared speech). More precisely, we aim to compare the extent, types and location of phonetic variation within these different populations and speech conditions. The TYPALOC corpus is constituted of a selection of 28 dysarthric patients (three different pathologies) and of 12 healthy control speakers recorded while reading the same text and in a more natural continuous speech condition. Each audio signal has been segmented into Inter-Pausal Units. Then, the corpus has been manually transcribed and automatically aligned. The alignment has been corrected by an expert phonetician. Moreover, the corpus benefits from an automatic syllabification and an Automatic Detection of Acoustic Phone-Based Anomalies. Finally, in order to interpret phonetic variations due to pathologies, a perceptual evaluation of each patient has been conducted. Quantitative data are provided at the end of the paper.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1738/
PDF	https://www.aclweb.org/anthology/L16-1738
PWC	https://paperswithcode.com/paper/the-typaloc-corpus-a-collection-of-various
Repo
Framework

UIMA-Based JCoRe 2.0 Goes GitHub and Maven Central â€• State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines


Title	UIMA-Based JCoRe 2.0 Goes GitHub and Maven Central â€• State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines
Authors	Udo Hahn, Franz Matthies, Erik Faessler, Johannes Hellrich
Abstract	We introduce JCoRe 2.0, the relaunch of a UIMA-based open software repository for full-scale natural language processing originating from the Jena University Language {&} Information Engineering (JULIE) Lab. In an attempt to put the new release of JCoRe on firm software engineering ground, we uploaded it to GitHub, a social coding platform, with an underlying source code versioning system and various means to support collaboration for software development and code modification management. In order to automate the builds of complex NLP pipelines and properly represent and track dependencies of the underlying Java code, we incorporated Maven as part of our software configuration management efforts. In the meantime, we have deployed our artifacts on Maven Central, as well. JCoRe 2.0 offers a broad range of text analytics functionality (mostly) for English-language scientific abstracts and full-text articles, especially from the life sciences domain.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1397/
PDF	https://www.aclweb.org/anthology/L16-1397
PWC	https://paperswithcode.com/paper/uima-based-jcore-20-goes-github-and-maven
Repo
Framework