May 5, 2019

1917 words 9 mins read

Paper Group NANR 31

Introducing the Asian Language Treebank (ALT). French Learners Audio Corpus of German Speech (FLACGS). JEDI: Joint Entity and Relation Detection using Type Inference. Al Qamus al Muhit, a Medieval Arabic Lexicon in LMF. PolyU at CL-SciSumm 2016. LREC as a Graph: People and Resources in a Network. Linguistically Inspired Language Model Augmentation …

Introducing the Asian Language Treebank (ALT)


Title	Introducing the Asian Language Treebank (ALT)
Authors	Ye Kyaw Thu, Win Pa Pa, Masao Utiyama, Andrew Finch, Eiichiro Sumita
Abstract	This paper introduces the ALT project initiated by the Advanced Speech Translation Research and Development Promotion Center (ASTREC), NICT, Kyoto, Japan. The aim of this project is to accelerate NLP research for Asian languages such as Indonesian, Japanese, Khmer, Laos, Malay, Myanmar, Philippine, Thai and Vietnamese. The original resource for this project was English articles that were randomly selected from Wikinews. The project has so far created a corpus for Myanmar and will extend in scope to include other languages in the near future. A 20000-sentence corpus of Myanmar that has been manually translated from an English corpus has been word segmented, word aligned, part-of-speech tagged and constituency parsed by human annotators. In this paper, we present the implementation steps for creating the treebank in detail, including a description of the ALT web-based treebanking tool. Moreover, we report statistics on the annotation quality of the Myanmar treebank created so far.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1249/
PDF	https://www.aclweb.org/anthology/L16-1249
PWC	https://paperswithcode.com/paper/introducing-the-asian-language-treebank-alt
Repo
Framework

French Learners Audio Corpus of German Speech (FLACGS)


Title	French Learners Audio Corpus of German Speech (FLACGS)
Authors	Jane Wottawa, Martine Adda-Decker
Abstract	The French Learners Audio Corpus of German Speech (FLACGS) was created to compare German speech production of German native speakers (GG) and French learners of German (FG) across three speech production tasks of increasing production complexity: repetition, reading and picture description. 40 speakers, 20 GG and 20 FG performed each of the three tasks, which in total leads to approximately 7h of speech. The corpus was manually transcribed and automatically aligned. Analysis that can be performed on this type of corpus are for instance segmental differences in the speech production of L2 learners compared to native speakers. We chose the realization of the velar nasal consonant engma. In spoken French, engma does not appear in a VCV context which leads to production difficulties in FG. With increasing speech production complexity (reading and picture description), engma is realized as engma + plosive by FG in over 50{%} of the cases. The results of a two way ANOVA with unequal sample sizes on the durations of the different realizations of engma indicate that duration is a reliable factor to distinguish between engma and engma + plosive in FG productions compared to the engma productions in GG in a VCV context. The FLACGS corpus allows to study L2 production and perception.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1512/
PDF	https://www.aclweb.org/anthology/L16-1512
PWC	https://paperswithcode.com/paper/french-learners-audio-corpus-of-german-speech
Repo
Framework

JEDI: Joint Entity and Relation Detection using Type Inference


Title	JEDI: Joint Entity and Relation Detection using Type Inference
Authors	Johannes Kirschnick, Holmer Hemsen, Volker Markl
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-4011/
PDF	https://www.aclweb.org/anthology/P16-4011
PWC	https://paperswithcode.com/paper/jedi-joint-entity-and-relation-detection
Repo
Framework

Al Qamus al Muhit, a Medieval Arabic Lexicon in LMF


Title	Al Qamus al Muhit, a Medieval Arabic Lexicon in LMF
Authors	Ouafae Nahli, Francesca Frontini, Monica Monachini, Fahad Khan, Arsalan Zarghili, Mustapha Khalfi
Abstract	This paper describes the conversion into LMF, a standard lexicographic digital format of {`}al-q{=a}m{=u}s al-muḥ{=\i}ṭ, a Medieval Arabic lexicon. The lexicon is first described, then all the steps required for the conversion are illustrated. The work is will produce a useful lexicographic resource for Arabic NLP, but is also interesting per se, to study the implications of adapting the LMF model to the Arabic language. Some reflections are offered as to the status of roots with respect to previously suggested representations. In particular, roots are, in our opinion are to be not treated as lexical entries, but modeled as lexical metadata for classifying and identifying lexical entries. In this manner, each root connects all entries that are derived from it. \|
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1150/
PDF	https://www.aclweb.org/anthology/L16-1150
PWC	https://paperswithcode.com/paper/al-qamus-al-muhit-a-medieval-arabic-lexicon
Repo
Framework

PolyU at CL-SciSumm 2016


Title	PolyU at CL-SciSumm 2016
Authors	Ziqiang Cao, Wenjie Li, Dapeng Wu
Abstract
Tasks	Information Retrieval
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-1515/
PDF	https://www.aclweb.org/anthology/W16-1515
PWC	https://paperswithcode.com/paper/polyu-at-cl-scisumm-2016
Repo
Framework

LREC as a Graph: People and Resources in a Network


Title	LREC as a Graph: People and Resources in a Network
Authors	Riccardo Del Gratta, Francesca Frontini, Monica Monachini, Gabriella Pardelli, Irene Russo, Roberto Bartolini, Fahad Khan, Claudia Soria, Nicoletta Calzolari
Abstract	This proposal describes a new way to visualise resources in the LREMap, a community-built repository of language resource descriptions and uses. The LREMap is represented as a force-directed graph, where resources, papers and authors are nodes. The analysis of the visual representation of the underlying graph is used to study how the community gathers around LRs and how LRs are used in research.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1401/
PDF	https://www.aclweb.org/anthology/L16-1401
PWC	https://paperswithcode.com/paper/lrec-as-a-graph-people-and-resources-in-a
Repo
Framework

Linguistically Inspired Language Model Augmentation for MT


Title	Linguistically Inspired Language Model Augmentation for MT
Authors	George Tambouratzis, Vasiliki Pouli
Abstract	The present article reports on efforts to improve the translation accuracy of a corpusâ€•based Machine Translation (MT) system. In order to achieve that, an error analysis performed on past translation outputs has indicated the likelihood of improving the translation accuracy by augmenting the coverage of the Target-Language (TL) side language model. The method adopted for improving the language model is initially presented, based on the concatenation of consecutive phrases. The algorithmic steps are then described that form the process for augmenting the language model. The key idea is to only augment the language model to cover the most frequent cases of phrase sequences, as counted over a TL-side corpus, in order to maximize the cases covered by the new language model entries. Experiments presented in the article show that substantial improvements in translation accuracy are achieved via the proposed method, when integrating the grown language model to the corpus-based MT system.
Tasks	Language Modelling, Machine Translation
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1091/
PDF	https://www.aclweb.org/anthology/L16-1091
PWC	https://paperswithcode.com/paper/linguistically-inspired-language-model
Repo
Framework


Title	Bridging the gap between computable and expressive event representations in Social Media
Authors	Darina Benikova, Torsten Zesch
Abstract
Tasks
Published	2016-11-01
URL	https://www.aclweb.org/anthology/W16-6002/
PDF	https://www.aclweb.org/anthology/W16-6002
PWC	https://paperswithcode.com/paper/bridging-the-gap-between-computable-and
Repo
Framework

Automated classification of collaborative problem solving interactions in simulated science tasks


Title	Automated classification of collaborative problem solving interactions in simulated science tasks
Authors	Michael Flor, Su-Youn Yoon, Jiangang Hao, Lei Liu, Alina von Davier
Abstract
Tasks
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-0504/
PDF	https://www.aclweb.org/anthology/W16-0504
PWC	https://paperswithcode.com/paper/automated-classification-of-collaborative
Repo
Framework

Large-scale Analysis of Spoken Free-verse Poetry


Title	Large-scale Analysis of Spoken Free-verse Poetry
Authors	Timo Baumann, Burkhard Meyer-Sickendiek
Abstract	Most modern and post-modern poems have developed a post-metrical idea of lyrical prosody that employs rhythmical features of everyday language and prose instead of a strict adherence to rhyme and metrical schemes. This development is subsumed under the term free verse prosody. We present our methodology for the large-scale analysis of modern and post-modern poetry in both their written form and as spoken aloud by the author. We employ language processing tools to align text and speech, to generate a null-model of how the poem would be spoken by a na{"\i}ve reader, and to extract contrastive prosodic features used by the poet. On these, we intend to build our model of free verse prosody, which will help to understand, differentiate and relate the different styles of free verse poetry. We plan to use our processing scheme on large amounts of data to iteratively build models of styles, to validate and guide manual style annotation, to identify further rhythmical categories, and ultimately to broaden our understanding of free verse poetry. In this paper, we report on a proof-of-concept of our methodology using smaller amounts of poems and a limited set of features. We find that our methodology helps to extract differentiating features in the authors{'} speech that can be explained by philological insight. Thus, our automatic method helps to guide the literary analysis and this in turn helps to improve our computational models.
Tasks	Speech Synthesis
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4017/
PDF	https://www.aclweb.org/anthology/W16-4017
PWC	https://paperswithcode.com/paper/large-scale-analysis-of-spoken-free-verse
Repo
Framework

Annotating Discourse Relations with the PDTB Annotator


Title	Annotating Discourse Relations with the PDTB Annotator
Authors	Alan Lee, Rashmi Prasad, Bonnie Webber, Aravind K. Joshi
Abstract	The PDTB Annotator is a tool for annotating and adjudicating discourse relations based on the annotation framework of the Penn Discourse TreeBank (PDTB). This demo describes the benefits of using the PDTB Annotator, gives an overview of the PDTB Framework and discusses the tool{'}s features, setup requirements and how it can also be used for adjudication.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-2026/
PDF	https://www.aclweb.org/anthology/C16-2026
PWC	https://paperswithcode.com/paper/annotating-discourse-relations-with-the-pdtb
Repo
Framework

Coh-Metrix-Esp: A Complexity Analysis Tool for Documents Written in Spanish


Title	Coh-Metrix-Esp: A Complexity Analysis Tool for Documents Written in Spanish
Authors	Andre Quispesaravia, Walter Perez, Marco Sobrevilla Cabezudo, Fern Alva-Manchego, o
Abstract	Text Complexity Analysis is an useful task in Education. For example, it can help teachers select appropriate texts for their students according to their educational level. This task requires the analysis of several text features that people do mostly manually (e.g. syntactic complexity, words variety, etc.). In this paper, we present a tool useful for Complexity Analysis, called Coh-Metrix-Esp. This is the Spanish version of Coh-Metrix and is able to calculate 45 readability indices. We analyse how these indices behave in a corpus of {`}simple{''} and {`}complex{''} documents, and also use them as features in a complexity binary classifier for texts in Spanish. After some experiments with machine learning algorithms, we got 0.9 F-measure for a corpus that contains tales for kids and adults and 0.82 F-measure for a corpus with texts written for students of Spanish as a foreign language.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1745/
PDF	https://www.aclweb.org/anthology/L16-1745
PWC	https://paperswithcode.com/paper/coh-metrix-esp-a-complexity-analysis-tool-for
Repo
Framework

Argument linking in LTAG: A constraint-based implementation with XMG


Title	Argument linking in LTAG: A constraint-based implementation with XMG
Authors	Laura Kallmeyer, Timm Lichte, Rainer Osswald, Simon Petitjean
Abstract
Tasks	Text Generation
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-3305/
PDF	https://www.aclweb.org/anthology/W16-3305
PWC	https://paperswithcode.com/paper/argument-linking-in-ltag-a-constraint-based
Repo
Framework


Title	評估尺度相關最佳化方法於華語錯誤發音檢測之研究(Evaluation Metric-related Optimization Methods for Mandarin Mispronunciation Detection) [In Chinese]
Authors	Yao-Chi Hsu, Ming-Han Yang, Hsiao-Tsung Hung, Yi-Ju Lin, Berlin Chen
Abstract
Tasks
Published	2016-10-01
URL	https://www.aclweb.org/anthology/O16-1001/
PDF	https://www.aclweb.org/anthology/O16-1001
PWC	https://paperswithcode.com/paper/ea14aoaoc-ea123a1314e-eae-eac14e3a-a1c
Repo
Framework

Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task


Title	Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task
Authors	Shervin Malmasi, Marcos Zampieri, Nikola Ljube{\v{s}}i{'c}, Preslav Nakov, Ahmed Ali, J{"o}rg Tiedemann
Abstract	We present the results of the third edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial{'}2016 workshop at COLING{'}2016. The challenge offered two subtasks: subtask 1 focused on the identification of very similar languages and language varieties in newswire texts, whereas subtask 2 dealt with Arabic dialect identification in speech transcripts. A total of 37 teams registered to participate in the task, 24 teams submitted test results, and 20 teams also wrote system description papers. High-order character n-grams were the most successful feature, and the best classification approaches included traditional supervised learning methods such as SVM, logistic regression, and language models, while deep learning approaches did not perform very well.
Tasks	Language Identification
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4801/
PDF	https://www.aclweb.org/anthology/W16-4801
PWC	https://paperswithcode.com/paper/discriminating-between-similar-languages-and
Repo
Framework