May 5, 2019

1354 words 7 mins read

Paper Group NANR 29

ILLC-UvA Adaptation System (Scorpio) at WMT’16 IT-DOMAIN Task. CEPLEXicon ― A Lexicon of Child European Portuguese. YODA System for WMT16 Shared Task: Bilingual Document Alignment. Referential Translation Machines for Predicting Translation Performance. Extracting Weighted Language Lexicons from Wikipedia. Structured Generative Models of Continuous …

ILLC-UvA Adaptation System (Scorpio) at WMT’16 IT-DOMAIN Task


Title	ILLC-UvA Adaptation System (Scorpio) at WMT’16 IT-DOMAIN Task
Authors	Hoang Cuong, Stella Frank, Khalil Sima{'}an
Abstract
Tasks	Machine Translation
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2330/
PDF	https://www.aclweb.org/anthology/W16-2330
PWC	https://paperswithcode.com/paper/illc-uva-adaptation-system-scorpio-at-wmt16
Repo
Framework

CEPLEXicon ― A Lexicon of Child European Portuguese


Title	CEPLEXicon ― A Lexicon of Child European Portuguese
Authors	Ana L{'u}cia Santos, Maria Jo{~a}o Freitas, Aida Cardoso
Abstract	CEPLEXicon (version 1.1) is a child lexicon resulting from the automatic tagging of two child corpora: the corpus Santos (Santos, 2006; Santos et al. 2014) and the corpus Child ― Adult Interaction (Freitas et al. 2012), which integrates information from the corpus Freitas (Freitas, 1997). This lexicon includes spontaneous speech produced by seven children (1;02.00 to 3;11.12) during approximately 86h of child-adult interaction. The automatic tagging comprised the lemmatization and morphosyntactic classification of the speech produced by the seven children included in the two child corpora; the lexicon contains information pertaining to lemmas and syntactic categories as well as absolute number of occurrences and frequencies in three age intervals: {\textless} 2 years; {\mbox{$\geq$}} 2 years and {\textless} 3 years; {\mbox{$\geq$}} 3 years. The information included in this lexicon and the format in which it is presented enables research in different areas and allows researchers to obtain measures of lexical growth. CEPLEXicon is available through the ELRA catalogue.
Tasks	Lemmatization
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1216/
PDF	https://www.aclweb.org/anthology/L16-1216
PWC	https://paperswithcode.com/paper/ceplexicon-a-lexicon-of-child-european
Repo
Framework

YODA System for WMT16 Shared Task: Bilingual Document Alignment


Title	YODA System for WMT16 Shared Task: Bilingual Document Alignment
Authors	Aswarth Abhilash Dara, Yiu-Chang Lin
Abstract
Tasks	Machine Translation
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2366/
PDF	https://www.aclweb.org/anthology/W16-2366
PWC	https://paperswithcode.com/paper/yoda-system-for-wmt16-shared-task-bilingual
Repo
Framework

Referential Translation Machines for Predicting Translation Performance


Title	Referential Translation Machines for Predicting Translation Performance
Authors	Ergun Bi{\c{c}}ici
Abstract
Tasks	Machine Translation
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2382/
PDF	https://www.aclweb.org/anthology/W16-2382
PWC	https://paperswithcode.com/paper/referential-translation-machines-for-1
Repo
Framework

Extracting Weighted Language Lexicons from Wikipedia


Title	Extracting Weighted Language Lexicons from Wikipedia
Authors	Gregory Grefenstette
Abstract	Language models are used in applications as diverse as speech recognition, optical character recognition and information retrieval. They are used to predict word appearance, and to weight the importance of words in these applications. One basic element of language models is the list of words in a language. Another is the unigram frequency of each word. But this basic information is not available for most languages in the world. Since the multilingual Wikipedia project encourages the production of encyclopedic-like articles in many world languages, we can find there an ever-growing source of text from which to extract these two language modelling elements: word list and frequency. Here we present a simple technique for converting this Wikipedia text into lexicons of weighted unigrams for the more than 280 languages present currently present in Wikipedia. The lexicons produced, and the source code for producing them in a Linux-based system are here made available for free on the Web.
Tasks	Information Retrieval, Language Modelling, Optical Character Recognition, Speech Recognition
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1217/
PDF	https://www.aclweb.org/anthology/L16-1217
PWC	https://paperswithcode.com/paper/extracting-weighted-language-lexicons-from
Repo
Framework

Structured Generative Models of Continuous Features for Word Sense Induction


Title	Structured Generative Models of Continuous Features for Word Sense Induction
Authors	Alex Komninos, ros, Man, Suresh har
Abstract	We propose a structured generative latent variable model that integrates information from multiple contextual representations for Word Sense Induction. Our approach jointly models global lexical, local lexical and dependency syntactic context. Each context type is associated with a latent variable and the three types of variables share a hierarchical structure. We use skip-gram based word and dependency context embeddings to construct all three types of representations, reducing the total number of parameters to be estimated and enabling better generalization. We describe an EM algorithm to efficiently estimate model parameters and use the Integrated Complete Likelihood criterion to automatically estimate the number of senses. Our model achieves state-of-the-art results on the SemEval-2010 and SemEval-2013 Word Sense Induction datasets.
Tasks	Word Embeddings, Word Sense Disambiguation, Word Sense Induction
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1337/
PDF	https://www.aclweb.org/anthology/C16-1337
PWC	https://paperswithcode.com/paper/structured-generative-models-of-continuous
Repo
Framework

Unraveling the English-Bengali Code-Mixing Phenomenon


Title	Unraveling the English-Bengali Code-Mixing Phenomenon
Authors	Ch, Arunavha a, Dipankar Das, Ch Mazumdar, an
Abstract
Tasks
Published	2016-11-01
URL	https://www.aclweb.org/anthology/W16-5810/
PDF	https://www.aclweb.org/anthology/W16-5810
PWC	https://paperswithcode.com/paper/unraveling-the-english-bengali-code-mixing
Repo
Framework

Context-Dependent Sense Embedding


Title	Context-Dependent Sense Embedding
Authors	Lin Qiu, Kewei Tu, Yong Yu
Abstract
Tasks	Word Embeddings, Word Sense Disambiguation, Word Sense Induction
Published	2016-11-01
URL	https://www.aclweb.org/anthology/D16-1018/
PDF	https://www.aclweb.org/anthology/D16-1018
PWC	https://paperswithcode.com/paper/context-dependent-sense-embedding
Repo
Framework

A Multimodal Corpus for the Assessment of Public Speaking Ability and Anxiety


Title	A Multimodal Corpus for the Assessment of Public Speaking Ability and Anxiety
Authors	Mathieu Chollet, Torsten W{"o}rtwein, Louis-Philippe Morency, Stefan Scherer
Abstract	The ability to efficiently speak in public is an essential asset for many professions and is used in everyday life. As such, tools enabling the improvement of public speaking performance and the assessment and mitigation of anxiety related to public speaking would be very useful. Multimodal interaction technologies, such as computer vision and embodied conversational agents, have recently been investigated for the training and assessment of interpersonal skills. Once central requirement for these technologies is multimodal corpora for training machine learning models. This paper addresses the need of these technologies by presenting and sharing a multimodal corpus of public speaking presentations. These presentations were collected in an experimental study investigating the potential of interactive virtual audiences for public speaking training. This corpus includes audio-visual data and automatically extracted features, measures of public speaking anxiety and personality, annotations of participants{'} behaviors and expert ratings of behavioral aspects and overall performance of the presenters. We hope this corpus will help other research teams in developing tools for supporting public speaking training.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1078/
PDF	https://www.aclweb.org/anthology/L16-1078
PWC	https://paperswithcode.com/paper/a-multimodal-corpus-for-the-assessment-of
Repo
Framework

Wiktionnaire’s Wikicode GLAWIfied: a Workable French Machine-Readable Dictionary


Title	Wiktionnaire’s Wikicode GLAWIfied: a Workable French Machine-Readable Dictionary
Authors	Nabil Hathout, Franck Sajous
Abstract	GLAWI is a free, large-scale and versatile Machine-Readable Dictionary (MRD) that has been extracted from the French language edition of Wiktionary, called Wiktionnaire. In (Sajous and Hathout, 2015), we introduced GLAWI, gave the rationale behind the creation of this lexicographic resource and described the extraction process, focusing on the conversion and standardization of the heterogeneous data provided by this collaborative dictionary. In the current article, we describe the content of GLAWI and illustrate how it is structured. We also suggest various applications, ranging from linguistic studies, NLP applications to psycholinguistic experimentation. They all can take advantage of the diversity of the lexical knowledge available in GLAWI. Besides this diversity and extensive lexical coverage, GLAWI is also remarkable because it is the only free lexical resource of contemporary French that contains definitions. This unique material opens way to the renewal of MRD-based methods, notably the automated extraction and acquisition of semantic relations.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1218/
PDF	https://www.aclweb.org/anthology/L16-1218
PWC	https://paperswithcode.com/paper/wiktionnaires-wikicode-glawified-a-workable
Repo
Framework

Transition-Based Parsing for Deep Dependency Structures


Title	Transition-Based Parsing for Deep Dependency Structures
Authors	Xun Zhang, Yantao Du, Weiwei Sun, Xiaojun Wan
Abstract
Tasks
Published	2016-09-01
URL	https://www.aclweb.org/anthology/J16-3001/
PDF	https://www.aclweb.org/anthology/J16-3001
PWC	https://paperswithcode.com/paper/transition-based-parsing-for-deep-dependency
Repo
Framework

Proceedings of the Sixth Named Entity Workshop


Title	Proceedings of the Sixth Named Entity Workshop
Authors
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2700/
PDF	https://www.aclweb.org/anthology/W16-2700
PWC	https://paperswithcode.com/paper/proceedings-of-the-sixth-named-entity
Repo
Framework

On Bias-free Crawling and Representative Web Corpora


Title	On Bias-free Crawling and Representative Web Corpora
Authors	Rol Sch{"a}fer,
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2612/
PDF	https://www.aclweb.org/anthology/W16-2612
PWC	https://paperswithcode.com/paper/on-bias-free-crawling-and-representative-web
Repo
Framework

Genre classification for a corpus of academic webpages


Title	Genre classification for a corpus of academic webpages
Authors	Erika Dalan, Serge Sharoff
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2611/
PDF	https://www.aclweb.org/anthology/W16-2611
PWC	https://paperswithcode.com/paper/genre-classification-for-a-corpus-of-academic
Repo
Framework

Experimental Study of Vowels in Nagamese, Ao and Lotha: Languages of Nagaland


Title	Experimental Study of Vowels in Nagamese, Ao and Lotha: Languages of Nagaland
Authors	Joyanta Basu, Tulika Basu, Soma Khan, Madhab Pal, Rajib Roy, Tapan Kumar Basu
Abstract
Tasks	Language Identification, Speech Recognition
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-6339/
PDF	https://www.aclweb.org/anthology/W16-6339
PWC	https://paperswithcode.com/paper/experimental-study-of-vowels-in-nagamese-ao
Repo
Framework