May 5, 2019

2058 words 10 mins read

Paper Group NANR 64

Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models. Towards grounding computational linguistic approaches to readability: Modeling reader-text interaction for easy and difficult texts. Dynamic pause assessment of keystroke logged data for the detection of complexity in translation and monolingual text …

Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models


Title	Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models
Authors	Amir Hazem, Emmanuel Morin
Abstract	There is a rich flora of word space models that have proven their efficiency in many different applications including information retrieval (Dumais, 1988), word sense disambiguation (Schutze, 1992), various semantic knowledge tests (Lund et al., 1995; Karlgren, 2001), and text categorization (Sahlgren, 2005). Based on the assumption that each model captures some aspects of word meanings and provides its own empirical evidence, we present in this paper a systematic exploration of the principal corpus-based word space models for bilingual terminology extraction from comparable corpora. We find that, once we have identified the best procedures, a very simple combination approach leads to significant improvements compared to individual models.
Tasks	Information Retrieval, Text Categorization, Word Sense Disambiguation
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1661/
PDF	https://www.aclweb.org/anthology/L16-1661
PWC	https://paperswithcode.com/paper/improving-bilingual-terminology-extraction
Repo
Framework

Towards grounding computational linguistic approaches to readability: Modeling reader-text interaction for easy and difficult texts


Title	Towards grounding computational linguistic approaches to readability: Modeling reader-text interaction for easy and difficult texts
Authors	Sowmya Vajjala, Detmar Meurers, Alex Eitel, er, Katharina Scheiter
Abstract	Computational approaches to readability assessment are generally built and evaluated using gold standard corpora labeled by publishers or teachers rather than being grounded in observations about human performance. Considering that both the reading process and the outcome can be observed, there is an empirical wealth that could be used to ground computational analysis of text readability. This will also support explicit readability models connecting text complexity and the reader{'}s language proficiency to the reading process and outcomes. This paper takes a step in this direction by reporting on an experiment to study how the relation between text complexity and reader{'}s language proficiency affects the reading process and performance outcomes of readers after reading We modeled the reading process using three eye tracking variables: fixation count, average fixation count, and second pass reading duration. Our models for these variables explained 78.9{%}, 74{%} and 67.4{%} variance, respectively. Performance outcome was modeled through recall and comprehension questions, and these models explained 58.9{%} and 27.6{%} of the variance, respectively. While the online models give us a better understanding of the cognitive correlates of reading with text complexity and language proficiency, modeling of the offline measures can be particularly relevant for incorporating user aspects into readability models.
Tasks	Eye Tracking
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4105/
PDF	https://www.aclweb.org/anthology/W16-4105
PWC	https://paperswithcode.com/paper/towards-grounding-computational-linguistic
Repo
Framework

Dynamic pause assessment of keystroke logged data for the detection of complexity in translation and monolingual text production


Title	Dynamic pause assessment of keystroke logged data for the detection of complexity in translation and monolingual text production
Authors	Arndt Heilmann, Stella Neumann
Abstract	Pause analysis of key-stroke logged translations is a hallmark of process based translation studies. However, an exact definition of what a cognitively effortful pause during the translation process is has not been found yet (Saldanha and O{'}Brien, 2013). This paper investigates the design of a key-stroke and subject dependent identification system of cognitive effort to track complexity in translation with keystroke logging (cf. also (Dragsted, 2005) (Couto-Vale, in preparation)). It is an elastic measure that takes into account idiosyncratic pause duration of translators as well as further confounds such as bi-gram frequency, letter frequency and some motor tasks involved in writing. The method is compared to a common static threshold of 1000 ms in an analysis of cognitive effort during the translation of grammatical functions from English to German. Additionally, the results are triangulated with eye tracking data for further validation. The findings show that at least for smaller sets of data a dynamic pause assessment may lead to more accurate results than a generic static pause threshold of similar duration.
Tasks	Eye Tracking
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4111/
PDF	https://www.aclweb.org/anthology/W16-4111
PWC	https://paperswithcode.com/paper/dynamic-pause-assessment-of-keystroke-logged
Repo
Framework

Larger-Context Language Modelling with Recurrent Neural Network


Title	Larger-Context Language Modelling with Recurrent Neural Network
Authors	Tian Wang, Kyunghyun Cho
Abstract
Tasks	Language Modelling
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-1125/
PDF	https://www.aclweb.org/anthology/P16-1125
PWC	https://paperswithcode.com/paper/larger-context-language-modelling-with
Repo
Framework

A Turkish-German Code-Switching Corpus


Title	A Turkish-German Code-Switching Corpus
Authors	{"O}zlem {\c{C}}etino{\u{g}}lu
Abstract	Bilingual communities often alternate between languages both in spoken and written communication. One such community, Germany residents of Turkish origin produce Turkish-German code-switching, by heavily mixing two languages at discourse, sentence, or word level. Code-switching in general, and Turkish-German code-switching in particular, has been studied for a long time from a linguistic perspective. Yet resources to study them from a more computational perspective are limited due to either small size or licence issues. In this work we contribute the solution of this problem with a corpus. We present a Turkish-German code-switching corpus which consists of 1029 tweets, with a majority of intra-sentential switches. We share different type of code-switching we have observed in our collection and describe our processing steps. The first step is data collection and filtering. This is followed by manual tokenisation and normalisation. And finally, we annotate data with word-level language identification information. The resulting corpus is available for research purposes.
Tasks	Language Identification
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1667/
PDF	https://www.aclweb.org/anthology/L16-1667
PWC	https://paperswithcode.com/paper/a-turkish-german-code-switching-corpus
Repo
Framework

A Multi-media Approach to Cross-lingual Entity Knowledge Transfer


Title	A Multi-media Approach to Cross-lingual Entity Knowledge Transfer
Authors	Di Lu, Xiaoman Pan, Nima Pourdamghani, Shih-Fu Chang, Heng Ji, Kevin Knight
Abstract
Tasks	Cross-Lingual Entity Linking, Entity Linking, Face Recognition, Image Retrieval, Machine Translation, Transfer Learning
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-1006/
PDF	https://www.aclweb.org/anthology/P16-1006
PWC	https://paperswithcode.com/paper/a-multi-media-approach-to-cross-lingual
Repo
Framework

Modelling a Parallel Corpus of French and French Belgian Sign Language


Title	Modelling a Parallel Corpus of French and French Belgian Sign Language
Authors	Laurence Meurant, Maxime Gobert, Anthony Cleve
Abstract	The overarching objective underlying this research is to develop an online tool, based on a parallel corpus of French Belgian Sign Language (LSFB) and written Belgian French. This tool is aimed to assist various set of tasks related to the comparison of LSFB and French, to the benefit of general users as well as teachers in bilingual schools, translators and interpreters, as well as linguists. These tasks include (1) the comprehension of LSFB or French texts, (2) the production of LSFB or French texts, (3) the translation between LSFB and French in both directions and (4) the contrastive analysis of these languages. The first step of investigation aims at creating an unidirectional French-LSFB concordancer, able to align a one- or multiple-word expression from the French translated text with its corresponding expressions in the videotaped LSFB productions. We aim at testing the efficiency of this concordancer for the extraction of a dictionary of meanings in context. In this paper, we will present the modelling of the different data sources at our disposal and specifically the way they interact with one another.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1670/
PDF	https://www.aclweb.org/anthology/L16-1670
PWC	https://paperswithcode.com/paper/modelling-a-parallel-corpus-of-french-and
Repo
Framework

Multi-language Speech Collection for NIST LRE


Title	Multi-language Speech Collection for NIST LRE
Authors	Karen Jones, Stephanie Strassel, Kevin Walker, David Graff, Jonathan Wright
Abstract	The Multi-language Speech (MLS) Corpus supports NIST{'}s Language Recognition Evaluation series by providing new conversational telephone speech and broadcast narrowband data in 20 languages/dialects. The corpus was built with the intention of testing system performance in the matter of distinguishing closely related or confusable linguistic varieties, and careful manual auditing of collected data was an important aspect of this work. This paper lists the specific data requirements for the collection and provides both a commentary on the rationale for those requirements as well as an outline of the various steps taken to ensure all goals were met as specified. LDC conducted a large-scale recruitment effort involving the implementation of candidate assessment and interview techniques suitable for hiring a large contingent of telecommuting workers, and this recruitment effort is discussed in detail. We also describe the telephone and broadcast collection infrastructure and protocols, and provide details of the steps taken to pre-process collected data prior to auditing. Finally, annotation training, procedures and outcomes are presented in detail.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1674/
PDF	https://www.aclweb.org/anthology/L16-1674
PWC	https://paperswithcode.com/paper/multi-language-speech-collection-for-nist-lre
Repo
Framework

使用字典學習法於強健性語音辨識(The Use of Dictionary Learning Approach for Robustness Speech Recognition) [In Chinese]


Title	使用字典學習法於強健性語音辨識(The Use of Dictionary Learning Approach for Robustness Speech Recognition) [In Chinese]
Authors	Bi-Cheng Yan, Chin-Hong Shih, Shih-Hung Liu, Berlin Chen
Abstract
Tasks	Dictionary Learning, Speech Recognition
Published	2016-10-01
URL	https://www.aclweb.org/anthology/O16-1003/
PDF	https://www.aclweb.org/anthology/O16-1003
PWC	https://paperswithcode.com/paper/a12c-aa-a-c314a14aeae3e34-ethe-use-of
Repo
Framework


Title	Recognizing Reference Spans and Classifying their Discourse Facets
Authors	Kun Lu, Jin Mao, Gang Li, Jian Xu
Abstract
Tasks	Information Retrieval, Learning-To-Rank, Text Classification
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-1516/
PDF	https://www.aclweb.org/anthology/W16-1516
PWC	https://paperswithcode.com/paper/recognizing-reference-spans-and-classifying
Repo
Framework

A Position Encoding Convolutional Neural Network Based on Dependency Tree for Relation Classification


Title	A Position Encoding Convolutional Neural Network Based on Dependency Tree for Relation Classification
Authors	Yunlun Yang, Yunhai Tong, Shulei Ma, Zhi-Hong Deng
Abstract
Tasks	Feature Selection, Information Retrieval, Machine Translation, Relation Classification
Published	2016-11-01
URL	https://www.aclweb.org/anthology/D16-1007/
PDF	https://www.aclweb.org/anthology/D16-1007
PWC	https://paperswithcode.com/paper/a-position-encoding-convolutional-neural
Repo
Framework

Exploiting Arabic Diacritization for High Quality Automatic Annotation


Title	Exploiting Arabic Diacritization for High Quality Automatic Annotation
Authors	Nizar Habash, Anas Shahrour, Muhamed Al-Khalil
Abstract	We present a novel technique for Arabic morphological annotation. The technique utilizes diacritization to produce morphological annotations of quality comparable to human annotators. Although Arabic text is generally written without diacritics, diacritization is already available for large corpora of Arabic text in several genres. Furthermore, diacritization can be generated at a low cost for new text as it does not require specialized training beyond what educated Arabic typists know. The basic approach is to enrich the input to a state-of-the-art Arabic morphological analyzer with word diacritics (full or partial) to enhance its performance. When applied to fully diacritized text, our approach produces annotations with an accuracy of over 97{%} on lemma, part-of-speech, and tokenization combined.
Tasks	Tokenization
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1681/
PDF	https://www.aclweb.org/anthology/L16-1681
PWC	https://paperswithcode.com/paper/exploiting-arabic-diacritization-for-high
Repo
Framework

Using a Small Lexicon with CRFs Confidence Measure to Improve POS Tagging Accuracy


Title	Using a Small Lexicon with CRFs Confidence Measure to Improve POS Tagging Accuracy
Authors	Mohamed Outahajala, Paolo Rosso
Abstract	Like most of the languages which have only recently started being investigated for the Natural Language Processing (NLP) tasks, Amazigh lacks annotated corpora and tools and still suffers from the scarcity of linguistic tools and resources. The main aim of this paper is to present a new part-of-speech (POS) tagger based on a new Amazigh tag set (AMTS) composed of 28 tags. In line with our goal we have trained Conditional Random Fields (CRFs) to build a POS tagger for the Amazigh language. We have used the 10-fold technique to evaluate and validate our approach. The CRFs 10 folds average level is 87.95{%} and the best fold level result is 91.18{%}. In order to improve this result, we have gathered a set of about 8k words with their POS tags. The collected lexicon was used with CRFs confidence measure in order to have a more accurate POS-tagger. Hence, we have obtained a better performance of 93.82{%}.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1683/
PDF	https://www.aclweb.org/anthology/L16-1683
PWC	https://paperswithcode.com/paper/using-a-small-lexicon-with-crfs-confidence
Repo
Framework

Discovering Fuzzy Synsets from the Redundancy in Different Lexical-Semantic Resources


Title	Discovering Fuzzy Synsets from the Redundancy in Different Lexical-Semantic Resources
Authors	Hugo Gon{\c{c}}alo Oliveira, F{'a}bio Santos
Abstract	Although represented as such in wordnets, word senses are not discrete. To handle word senses as fuzzy objects, we exploit the graph structure of synonymy pairs acquired from different sources to discover synsets where words have different membership degrees that reflect confidence. Following this approach, a wide-coverage fuzzy thesaurus was discovered from a synonymy network compiled from seven Portuguese lexical-semantic resources. Based on a crowdsourcing evaluation, we can say that the quality of the obtained synsets is far from perfect but, as expected in a confidence measure, it increases significantly for higher cut-points on the membership and, at a certain point, reaches 100{%} correction rate.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1687/
PDF	https://www.aclweb.org/anthology/L16-1687
PWC	https://paperswithcode.com/paper/discovering-fuzzy-synsets-from-the-redundancy
Repo
Framework

Sarcasm Detection in Chinese Using a Crowdsourced Corpus


Title	Sarcasm Detection in Chinese Using a Crowdsourced Corpus
Authors	Shih-Kai Lin, Shu-Kai Hsieh
Abstract
Tasks	Sarcasm Detection, Sentiment Analysis
Published	2016-10-01
URL	https://www.aclweb.org/anthology/O16-1027/
PDF	https://www.aclweb.org/anthology/O16-1027
PWC	https://paperswithcode.com/paper/sarcasm-detection-in-chinese-using-a
Repo
Framework