May 5, 2019

1917 words 9 mins read

Paper Group NANR 31

Paper Group NANR 31

Introducing the Asian Language Treebank (ALT). French Learners Audio Corpus of German Speech (FLACGS). JEDI: Joint Entity and Relation Detection using Type Inference. Al Qamus al Muhit, a Medieval Arabic Lexicon in LMF. PolyU at CL-SciSumm 2016. LREC as a Graph: People and Resources in a Network. Linguistically Inspired Language Model Augmentation …

Introducing the Asian Language Treebank (ALT)

Title Introducing the Asian Language Treebank (ALT)
Authors Ye Kyaw Thu, Win Pa Pa, Masao Utiyama, Andrew Finch, Eiichiro Sumita
Abstract This paper introduces the ALT project initiated by the Advanced Speech Translation Research and Development Promotion Center (ASTREC), NICT, Kyoto, Japan. The aim of this project is to accelerate NLP research for Asian languages such as Indonesian, Japanese, Khmer, Laos, Malay, Myanmar, Philippine, Thai and Vietnamese. The original resource for this project was English articles that were randomly selected from Wikinews. The project has so far created a corpus for Myanmar and will extend in scope to include other languages in the near future. A 20000-sentence corpus of Myanmar that has been manually translated from an English corpus has been word segmented, word aligned, part-of-speech tagged and constituency parsed by human annotators. In this paper, we present the implementation steps for creating the treebank in detail, including a description of the ALT web-based treebanking tool. Moreover, we report statistics on the annotation quality of the Myanmar treebank created so far.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1249/
PDF https://www.aclweb.org/anthology/L16-1249
PWC https://paperswithcode.com/paper/introducing-the-asian-language-treebank-alt
Repo
Framework

French Learners Audio Corpus of German Speech (FLACGS)

Title French Learners Audio Corpus of German Speech (FLACGS)
Authors Jane Wottawa, Martine Adda-Decker
Abstract The French Learners Audio Corpus of German Speech (FLACGS) was created to compare German speech production of German native speakers (GG) and French learners of German (FG) across three speech production tasks of increasing production complexity: repetition, reading and picture description. 40 speakers, 20 GG and 20 FG performed each of the three tasks, which in total leads to approximately 7h of speech. The corpus was manually transcribed and automatically aligned. Analysis that can be performed on this type of corpus are for instance segmental differences in the speech production of L2 learners compared to native speakers. We chose the realization of the velar nasal consonant engma. In spoken French, engma does not appear in a VCV context which leads to production difficulties in FG. With increasing speech production complexity (reading and picture description), engma is realized as engma + plosive by FG in over 50{%} of the cases. The results of a two way ANOVA with unequal sample sizes on the durations of the different realizations of engma indicate that duration is a reliable factor to distinguish between engma and engma + plosive in FG productions compared to the engma productions in GG in a VCV context. The FLACGS corpus allows to study L2 production and perception.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1512/
PDF https://www.aclweb.org/anthology/L16-1512
PWC https://paperswithcode.com/paper/french-learners-audio-corpus-of-german-speech
Repo
Framework

JEDI: Joint Entity and Relation Detection using Type Inference

Title JEDI: Joint Entity and Relation Detection using Type Inference
Authors Johannes Kirschnick, Holmer Hemsen, Volker Markl
Abstract
Tasks
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-4011/
PDF https://www.aclweb.org/anthology/P16-4011
PWC https://paperswithcode.com/paper/jedi-joint-entity-and-relation-detection
Repo
Framework

Al Qamus al Muhit, a Medieval Arabic Lexicon in LMF

Title Al Qamus al Muhit, a Medieval Arabic Lexicon in LMF
Authors Ouafae Nahli, Francesca Frontini, Monica Monachini, Fahad Khan, Arsalan Zarghili, Mustapha Khalfi
Abstract This paper describes the conversion into LMF, a standard lexicographic digital format of {`}al-q{=a}m{=u}s al-muḥ{=\i}ṭ, a Medieval Arabic lexicon. The lexicon is first described, then all the steps required for the conversion are illustrated. The work is will produce a useful lexicographic resource for Arabic NLP, but is also interesting per se, to study the implications of adapting the LMF model to the Arabic language. Some reflections are offered as to the status of roots with respect to previously suggested representations. In particular, roots are, in our opinion are to be not treated as lexical entries, but modeled as lexical metadata for classifying and identifying lexical entries. In this manner, each root connects all entries that are derived from it. |
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1150/
PDF https://www.aclweb.org/anthology/L16-1150
PWC https://paperswithcode.com/paper/al-qamus-al-muhit-a-medieval-arabic-lexicon
Repo
Framework

PolyU at CL-SciSumm 2016

Title PolyU at CL-SciSumm 2016
Authors Ziqiang Cao, Wenjie Li, Dapeng Wu
Abstract
Tasks Information Retrieval
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-1515/
PDF https://www.aclweb.org/anthology/W16-1515
PWC https://paperswithcode.com/paper/polyu-at-cl-scisumm-2016
Repo
Framework

LREC as a Graph: People and Resources in a Network

Title LREC as a Graph: People and Resources in a Network
Authors Riccardo Del Gratta, Francesca Frontini, Monica Monachini, Gabriella Pardelli, Irene Russo, Roberto Bartolini, Fahad Khan, Claudia Soria, Nicoletta Calzolari
Abstract This proposal describes a new way to visualise resources in the LREMap, a community-built repository of language resource descriptions and uses. The LREMap is represented as a force-directed graph, where resources, papers and authors are nodes. The analysis of the visual representation of the underlying graph is used to study how the community gathers around LRs and how LRs are used in research.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1401/
PDF https://www.aclweb.org/anthology/L16-1401
PWC https://paperswithcode.com/paper/lrec-as-a-graph-people-and-resources-in-a
Repo
Framework

Linguistically Inspired Language Model Augmentation for MT

Title Linguistically Inspired Language Model Augmentation for MT
Authors George Tambouratzis, Vasiliki Pouli
Abstract The present article reports on efforts to improve the translation accuracy of a corpus―based Machine Translation (MT) system. In order to achieve that, an error analysis performed on past translation outputs has indicated the likelihood of improving the translation accuracy by augmenting the coverage of the Target-Language (TL) side language model. The method adopted for improving the language model is initially presented, based on the concatenation of consecutive phrases. The algorithmic steps are then described that form the process for augmenting the language model. The key idea is to only augment the language model to cover the most frequent cases of phrase sequences, as counted over a TL-side corpus, in order to maximize the cases covered by the new language model entries. Experiments presented in the article show that substantial improvements in translation accuracy are achieved via the proposed method, when integrating the grown language model to the corpus-based MT system.
Tasks Language Modelling, Machine Translation
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1091/
PDF https://www.aclweb.org/anthology/L16-1091
PWC https://paperswithcode.com/paper/linguistically-inspired-language-model
Repo
Framework

Bridging the gap between computable and expressive event representations in Social Media

Title Bridging the gap between computable and expressive event representations in Social Media
Authors Darina Benikova, Torsten Zesch
Abstract
Tasks
Published 2016-11-01
URL https://www.aclweb.org/anthology/W16-6002/
PDF https://www.aclweb.org/anthology/W16-6002
PWC https://paperswithcode.com/paper/bridging-the-gap-between-computable-and
Repo
Framework

Automated classification of collaborative problem solving interactions in simulated science tasks

Title Automated classification of collaborative problem solving interactions in simulated science tasks
Authors Michael Flor, Su-Youn Yoon, Jiangang Hao, Lei Liu, Alina von Davier
Abstract
Tasks
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-0504/
PDF https://www.aclweb.org/anthology/W16-0504
PWC https://paperswithcode.com/paper/automated-classification-of-collaborative
Repo
Framework

Large-scale Analysis of Spoken Free-verse Poetry

Title Large-scale Analysis of Spoken Free-verse Poetry
Authors Timo Baumann, Burkhard Meyer-Sickendiek
Abstract Most modern and post-modern poems have developed a post-metrical idea of lyrical prosody that employs rhythmical features of everyday language and prose instead of a strict adherence to rhyme and metrical schemes. This development is subsumed under the term free verse prosody. We present our methodology for the large-scale analysis of modern and post-modern poetry in both their written form and as spoken aloud by the author. We employ language processing tools to align text and speech, to generate a null-model of how the poem would be spoken by a na{"\i}ve reader, and to extract contrastive prosodic features used by the poet. On these, we intend to build our model of free verse prosody, which will help to understand, differentiate and relate the different styles of free verse poetry. We plan to use our processing scheme on large amounts of data to iteratively build models of styles, to validate and guide manual style annotation, to identify further rhythmical categories, and ultimately to broaden our understanding of free verse poetry. In this paper, we report on a proof-of-concept of our methodology using smaller amounts of poems and a limited set of features. We find that our methodology helps to extract differentiating features in the authors{'} speech that can be explained by philological insight. Thus, our automatic method helps to guide the literary analysis and this in turn helps to improve our computational models.
Tasks Speech Synthesis
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4017/
PDF https://www.aclweb.org/anthology/W16-4017
PWC https://paperswithcode.com/paper/large-scale-analysis-of-spoken-free-verse
Repo
Framework

Annotating Discourse Relations with the PDTB Annotator

Title Annotating Discourse Relations with the PDTB Annotator
Authors Alan Lee, Rashmi Prasad, Bonnie Webber, Aravind K. Joshi
Abstract The PDTB Annotator is a tool for annotating and adjudicating discourse relations based on the annotation framework of the Penn Discourse TreeBank (PDTB). This demo describes the benefits of using the PDTB Annotator, gives an overview of the PDTB Framework and discusses the tool{'}s features, setup requirements and how it can also be used for adjudication.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-2026/
PDF https://www.aclweb.org/anthology/C16-2026
PWC https://paperswithcode.com/paper/annotating-discourse-relations-with-the-pdtb
Repo
Framework

Coh-Metrix-Esp: A Complexity Analysis Tool for Documents Written in Spanish

Title Coh-Metrix-Esp: A Complexity Analysis Tool for Documents Written in Spanish
Authors Andre Quispesaravia, Walter Perez, Marco Sobrevilla Cabezudo, Fern Alva-Manchego, o
Abstract Text Complexity Analysis is an useful task in Education. For example, it can help teachers select appropriate texts for their students according to their educational level. This task requires the analysis of several text features that people do mostly manually (e.g. syntactic complexity, words variety, etc.). In this paper, we present a tool useful for Complexity Analysis, called Coh-Metrix-Esp. This is the Spanish version of Coh-Metrix and is able to calculate 45 readability indices. We analyse how these indices behave in a corpus of {}simple{''} and {}complex{''} documents, and also use them as features in a complexity binary classifier for texts in Spanish. After some experiments with machine learning algorithms, we got 0.9 F-measure for a corpus that contains tales for kids and adults and 0.82 F-measure for a corpus with texts written for students of Spanish as a foreign language.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1745/
PDF https://www.aclweb.org/anthology/L16-1745
PWC https://paperswithcode.com/paper/coh-metrix-esp-a-complexity-analysis-tool-for
Repo
Framework

Argument linking in LTAG: A constraint-based implementation with XMG

Title Argument linking in LTAG: A constraint-based implementation with XMG
Authors Laura Kallmeyer, Timm Lichte, Rainer Osswald, Simon Petitjean
Abstract
Tasks Text Generation
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-3305/
PDF https://www.aclweb.org/anthology/W16-3305
PWC https://paperswithcode.com/paper/argument-linking-in-ltag-a-constraint-based
Repo
Framework
Title 評估尺度相關最佳化方法於華語錯誤發音檢測之研究(Evaluation Metric-related Optimization Methods for Mandarin Mispronunciation Detection) [In Chinese]
Authors Yao-Chi Hsu, Ming-Han Yang, Hsiao-Tsung Hung, Yi-Ju Lin, Berlin Chen
Abstract
Tasks
Published 2016-10-01
URL https://www.aclweb.org/anthology/O16-1001/
PDF https://www.aclweb.org/anthology/O16-1001
PWC https://paperswithcode.com/paper/ea14aoaoc-ea123a1314e-eae-eac14e3a-a1c
Repo
Framework

Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task

Title Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task
Authors Shervin Malmasi, Marcos Zampieri, Nikola Ljube{\v{s}}i{'c}, Preslav Nakov, Ahmed Ali, J{"o}rg Tiedemann
Abstract We present the results of the third edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial{'}2016 workshop at COLING{'}2016. The challenge offered two subtasks: subtask 1 focused on the identification of very similar languages and language varieties in newswire texts, whereas subtask 2 dealt with Arabic dialect identification in speech transcripts. A total of 37 teams registered to participate in the task, 24 teams submitted test results, and 20 teams also wrote system description papers. High-order character n-grams were the most successful feature, and the best classification approaches included traditional supervised learning methods such as SVM, logistic regression, and language models, while deep learning approaches did not perform very well.
Tasks Language Identification
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4801/
PDF https://www.aclweb.org/anthology/W16-4801
PWC https://paperswithcode.com/paper/discriminating-between-similar-languages-and
Repo
Framework
comments powered by Disqus