May 5, 2019

2058 words 10 mins read

Paper Group NANR 64

Paper Group NANR 64

Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models. Towards grounding computational linguistic approaches to readability: Modeling reader-text interaction for easy and difficult texts. Dynamic pause assessment of keystroke logged data for the detection of complexity in translation and monolingual text …

Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models

Title Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models
Authors Amir Hazem, Emmanuel Morin
Abstract There is a rich flora of word space models that have proven their efficiency in many different applications including information retrieval (Dumais, 1988), word sense disambiguation (Schutze, 1992), various semantic knowledge tests (Lund et al., 1995; Karlgren, 2001), and text categorization (Sahlgren, 2005). Based on the assumption that each model captures some aspects of word meanings and provides its own empirical evidence, we present in this paper a systematic exploration of the principal corpus-based word space models for bilingual terminology extraction from comparable corpora. We find that, once we have identified the best procedures, a very simple combination approach leads to significant improvements compared to individual models.
Tasks Information Retrieval, Text Categorization, Word Sense Disambiguation
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1661/
PDF https://www.aclweb.org/anthology/L16-1661
PWC https://paperswithcode.com/paper/improving-bilingual-terminology-extraction
Repo
Framework

Towards grounding computational linguistic approaches to readability: Modeling reader-text interaction for easy and difficult texts

Title Towards grounding computational linguistic approaches to readability: Modeling reader-text interaction for easy and difficult texts
Authors Sowmya Vajjala, Detmar Meurers, Alex Eitel, er, Katharina Scheiter
Abstract Computational approaches to readability assessment are generally built and evaluated using gold standard corpora labeled by publishers or teachers rather than being grounded in observations about human performance. Considering that both the reading process and the outcome can be observed, there is an empirical wealth that could be used to ground computational analysis of text readability. This will also support explicit readability models connecting text complexity and the reader{'}s language proficiency to the reading process and outcomes. This paper takes a step in this direction by reporting on an experiment to study how the relation between text complexity and reader{'}s language proficiency affects the reading process and performance outcomes of readers after reading We modeled the reading process using three eye tracking variables: fixation count, average fixation count, and second pass reading duration. Our models for these variables explained 78.9{%}, 74{%} and 67.4{%} variance, respectively. Performance outcome was modeled through recall and comprehension questions, and these models explained 58.9{%} and 27.6{%} of the variance, respectively. While the online models give us a better understanding of the cognitive correlates of reading with text complexity and language proficiency, modeling of the offline measures can be particularly relevant for incorporating user aspects into readability models.
Tasks Eye Tracking
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4105/
PDF https://www.aclweb.org/anthology/W16-4105
PWC https://paperswithcode.com/paper/towards-grounding-computational-linguistic
Repo
Framework

Dynamic pause assessment of keystroke logged data for the detection of complexity in translation and monolingual text production

Title Dynamic pause assessment of keystroke logged data for the detection of complexity in translation and monolingual text production
Authors Arndt Heilmann, Stella Neumann
Abstract Pause analysis of key-stroke logged translations is a hallmark of process based translation studies. However, an exact definition of what a cognitively effortful pause during the translation process is has not been found yet (Saldanha and O{'}Brien, 2013). This paper investigates the design of a key-stroke and subject dependent identification system of cognitive effort to track complexity in translation with keystroke logging (cf. also (Dragsted, 2005) (Couto-Vale, in preparation)). It is an elastic measure that takes into account idiosyncratic pause duration of translators as well as further confounds such as bi-gram frequency, letter frequency and some motor tasks involved in writing. The method is compared to a common static threshold of 1000 ms in an analysis of cognitive effort during the translation of grammatical functions from English to German. Additionally, the results are triangulated with eye tracking data for further validation. The findings show that at least for smaller sets of data a dynamic pause assessment may lead to more accurate results than a generic static pause threshold of similar duration.
Tasks Eye Tracking
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4111/
PDF https://www.aclweb.org/anthology/W16-4111
PWC https://paperswithcode.com/paper/dynamic-pause-assessment-of-keystroke-logged
Repo
Framework

Larger-Context Language Modelling with Recurrent Neural Network

Title Larger-Context Language Modelling with Recurrent Neural Network
Authors Tian Wang, Kyunghyun Cho
Abstract
Tasks Language Modelling
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-1125/
PDF https://www.aclweb.org/anthology/P16-1125
PWC https://paperswithcode.com/paper/larger-context-language-modelling-with
Repo
Framework

A Turkish-German Code-Switching Corpus

Title A Turkish-German Code-Switching Corpus
Authors {"O}zlem {\c{C}}etino{\u{g}}lu
Abstract Bilingual communities often alternate between languages both in spoken and written communication. One such community, Germany residents of Turkish origin produce Turkish-German code-switching, by heavily mixing two languages at discourse, sentence, or word level. Code-switching in general, and Turkish-German code-switching in particular, has been studied for a long time from a linguistic perspective. Yet resources to study them from a more computational perspective are limited due to either small size or licence issues. In this work we contribute the solution of this problem with a corpus. We present a Turkish-German code-switching corpus which consists of 1029 tweets, with a majority of intra-sentential switches. We share different type of code-switching we have observed in our collection and describe our processing steps. The first step is data collection and filtering. This is followed by manual tokenisation and normalisation. And finally, we annotate data with word-level language identification information. The resulting corpus is available for research purposes.
Tasks Language Identification
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1667/
PDF https://www.aclweb.org/anthology/L16-1667
PWC https://paperswithcode.com/paper/a-turkish-german-code-switching-corpus
Repo
Framework

A Multi-media Approach to Cross-lingual Entity Knowledge Transfer

Title A Multi-media Approach to Cross-lingual Entity Knowledge Transfer
Authors Di Lu, Xiaoman Pan, Nima Pourdamghani, Shih-Fu Chang, Heng Ji, Kevin Knight
Abstract
Tasks Cross-Lingual Entity Linking, Entity Linking, Face Recognition, Image Retrieval, Machine Translation, Transfer Learning
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-1006/
PDF https://www.aclweb.org/anthology/P16-1006
PWC https://paperswithcode.com/paper/a-multi-media-approach-to-cross-lingual
Repo
Framework

Modelling a Parallel Corpus of French and French Belgian Sign Language

Title Modelling a Parallel Corpus of French and French Belgian Sign Language
Authors Laurence Meurant, Maxime Gobert, Anthony Cleve
Abstract The overarching objective underlying this research is to develop an online tool, based on a parallel corpus of French Belgian Sign Language (LSFB) and written Belgian French. This tool is aimed to assist various set of tasks related to the comparison of LSFB and French, to the benefit of general users as well as teachers in bilingual schools, translators and interpreters, as well as linguists. These tasks include (1) the comprehension of LSFB or French texts, (2) the production of LSFB or French texts, (3) the translation between LSFB and French in both directions and (4) the contrastive analysis of these languages. The first step of investigation aims at creating an unidirectional French-LSFB concordancer, able to align a one- or multiple-word expression from the French translated text with its corresponding expressions in the videotaped LSFB productions. We aim at testing the efficiency of this concordancer for the extraction of a dictionary of meanings in context. In this paper, we will present the modelling of the different data sources at our disposal and specifically the way they interact with one another.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1670/
PDF https://www.aclweb.org/anthology/L16-1670
PWC https://paperswithcode.com/paper/modelling-a-parallel-corpus-of-french-and
Repo
Framework

Multi-language Speech Collection for NIST LRE

Title Multi-language Speech Collection for NIST LRE
Authors Karen Jones, Stephanie Strassel, Kevin Walker, David Graff, Jonathan Wright
Abstract The Multi-language Speech (MLS) Corpus supports NIST{'}s Language Recognition Evaluation series by providing new conversational telephone speech and broadcast narrowband data in 20 languages/dialects. The corpus was built with the intention of testing system performance in the matter of distinguishing closely related or confusable linguistic varieties, and careful manual auditing of collected data was an important aspect of this work. This paper lists the specific data requirements for the collection and provides both a commentary on the rationale for those requirements as well as an outline of the various steps taken to ensure all goals were met as specified. LDC conducted a large-scale recruitment effort involving the implementation of candidate assessment and interview techniques suitable for hiring a large contingent of telecommuting workers, and this recruitment effort is discussed in detail. We also describe the telephone and broadcast collection infrastructure and protocols, and provide details of the steps taken to pre-process collected data prior to auditing. Finally, annotation training, procedures and outcomes are presented in detail.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1674/
PDF https://www.aclweb.org/anthology/L16-1674
PWC https://paperswithcode.com/paper/multi-language-speech-collection-for-nist-lre
Repo
Framework

使用字典學習法於強健性語音辨識(The Use of Dictionary Learning Approach for Robustness Speech Recognition) [In Chinese]

Title 使用字典學習法於強健性語音辨識(The Use of Dictionary Learning Approach for Robustness Speech Recognition) [In Chinese]
Authors Bi-Cheng Yan, Chin-Hong Shih, Shih-Hung Liu, Berlin Chen
Abstract
Tasks Dictionary Learning, Speech Recognition
Published 2016-10-01
URL https://www.aclweb.org/anthology/O16-1003/
PDF https://www.aclweb.org/anthology/O16-1003
PWC https://paperswithcode.com/paper/a12c-aa-a-c314a14aeae3e34-ethe-use-of
Repo
Framework

Recognizing Reference Spans and Classifying their Discourse Facets

Title Recognizing Reference Spans and Classifying their Discourse Facets
Authors Kun Lu, Jin Mao, Gang Li, Jian Xu
Abstract
Tasks Information Retrieval, Learning-To-Rank, Text Classification
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-1516/
PDF https://www.aclweb.org/anthology/W16-1516
PWC https://paperswithcode.com/paper/recognizing-reference-spans-and-classifying
Repo
Framework

A Position Encoding Convolutional Neural Network Based on Dependency Tree for Relation Classification

Title A Position Encoding Convolutional Neural Network Based on Dependency Tree for Relation Classification
Authors Yunlun Yang, Yunhai Tong, Shulei Ma, Zhi-Hong Deng
Abstract
Tasks Feature Selection, Information Retrieval, Machine Translation, Relation Classification
Published 2016-11-01
URL https://www.aclweb.org/anthology/D16-1007/
PDF https://www.aclweb.org/anthology/D16-1007
PWC https://paperswithcode.com/paper/a-position-encoding-convolutional-neural
Repo
Framework

Exploiting Arabic Diacritization for High Quality Automatic Annotation

Title Exploiting Arabic Diacritization for High Quality Automatic Annotation
Authors Nizar Habash, Anas Shahrour, Muhamed Al-Khalil
Abstract We present a novel technique for Arabic morphological annotation. The technique utilizes diacritization to produce morphological annotations of quality comparable to human annotators. Although Arabic text is generally written without diacritics, diacritization is already available for large corpora of Arabic text in several genres. Furthermore, diacritization can be generated at a low cost for new text as it does not require specialized training beyond what educated Arabic typists know. The basic approach is to enrich the input to a state-of-the-art Arabic morphological analyzer with word diacritics (full or partial) to enhance its performance. When applied to fully diacritized text, our approach produces annotations with an accuracy of over 97{%} on lemma, part-of-speech, and tokenization combined.
Tasks Tokenization
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1681/
PDF https://www.aclweb.org/anthology/L16-1681
PWC https://paperswithcode.com/paper/exploiting-arabic-diacritization-for-high
Repo
Framework

Using a Small Lexicon with CRFs Confidence Measure to Improve POS Tagging Accuracy

Title Using a Small Lexicon with CRFs Confidence Measure to Improve POS Tagging Accuracy
Authors Mohamed Outahajala, Paolo Rosso
Abstract Like most of the languages which have only recently started being investigated for the Natural Language Processing (NLP) tasks, Amazigh lacks annotated corpora and tools and still suffers from the scarcity of linguistic tools and resources. The main aim of this paper is to present a new part-of-speech (POS) tagger based on a new Amazigh tag set (AMTS) composed of 28 tags. In line with our goal we have trained Conditional Random Fields (CRFs) to build a POS tagger for the Amazigh language. We have used the 10-fold technique to evaluate and validate our approach. The CRFs 10 folds average level is 87.95{%} and the best fold level result is 91.18{%}. In order to improve this result, we have gathered a set of about 8k words with their POS tags. The collected lexicon was used with CRFs confidence measure in order to have a more accurate POS-tagger. Hence, we have obtained a better performance of 93.82{%}.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1683/
PDF https://www.aclweb.org/anthology/L16-1683
PWC https://paperswithcode.com/paper/using-a-small-lexicon-with-crfs-confidence
Repo
Framework

Discovering Fuzzy Synsets from the Redundancy in Different Lexical-Semantic Resources

Title Discovering Fuzzy Synsets from the Redundancy in Different Lexical-Semantic Resources
Authors Hugo Gon{\c{c}}alo Oliveira, F{'a}bio Santos
Abstract Although represented as such in wordnets, word senses are not discrete. To handle word senses as fuzzy objects, we exploit the graph structure of synonymy pairs acquired from different sources to discover synsets where words have different membership degrees that reflect confidence. Following this approach, a wide-coverage fuzzy thesaurus was discovered from a synonymy network compiled from seven Portuguese lexical-semantic resources. Based on a crowdsourcing evaluation, we can say that the quality of the obtained synsets is far from perfect but, as expected in a confidence measure, it increases significantly for higher cut-points on the membership and, at a certain point, reaches 100{%} correction rate.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1687/
PDF https://www.aclweb.org/anthology/L16-1687
PWC https://paperswithcode.com/paper/discovering-fuzzy-synsets-from-the-redundancy
Repo
Framework

Sarcasm Detection in Chinese Using a Crowdsourced Corpus

Title Sarcasm Detection in Chinese Using a Crowdsourced Corpus
Authors Shih-Kai Lin, Shu-Kai Hsieh
Abstract
Tasks Sarcasm Detection, Sentiment Analysis
Published 2016-10-01
URL https://www.aclweb.org/anthology/O16-1027/
PDF https://www.aclweb.org/anthology/O16-1027
PWC https://paperswithcode.com/paper/sarcasm-detection-in-chinese-using-a
Repo
Framework
comments powered by Disqus