May 5, 2019

2275 words 11 mins read

Paper Group NANR 46

Discontinuous VP in Bulgarian. The Alaskan Athabascan Grammar Database. A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus. Passing a USA National Bar Exam: a First Corpus for Experimentation. Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference.. Word Segmentation for Akkadian Cuneiform. Corpus …

Discontinuous VP in Bulgarian


Title	Discontinuous VP in Bulgarian
Authors	Elisaveta Balabanova
Abstract
Tasks
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-0904/
PDF	https://www.aclweb.org/anthology/W16-0904
PWC	https://paperswithcode.com/paper/discontinuous-vp-in-bulgarian
Repo
Framework

The Alaskan Athabascan Grammar Database


Title	The Alaskan Athabascan Grammar Database
Authors	Sebastian Nordhoff, Siri Tuttle, Olga Lovick
Abstract	This paper describes a repository of example sentences in three endangered Athabascan languages: Koyukon, Upper Tanana, Lower Tanana. The repository allows researchers or language teachers to browse the example sentence corpus to either investigate the languages or to prepare teaching materials. The originally heterogeneous text collection was imported into a SOLR store via the POIO bridge. This paper describes the requirements, implementation, advantages and drawbacks of this approach and discusses the potential to apply it for other languages of the Athabascan family or beyond.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1523/
PDF	https://www.aclweb.org/anthology/L16-1523
PWC	https://paperswithcode.com/paper/the-alaskan-athabascan-grammar-database
Repo
Framework

A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus


Title	A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus
Authors	Mauro Nicolao, Heidi Christensen, Stuart Cunningham, Phil Green, Thomas Hain
Abstract	This paper introduces a new British English speech database, named the homeService corpus, which has been gathered as part of the homeService project. This project aims to help users with speech and motor disabilities to operate their home appliances using voice commands. The audio recorded during such interactions consists of realistic data of speakers with severe dysarthria. The majority of the homeService corpus is recorded in real home environments where voice control is often the normal means by which users interact with their devices. The collection of the corpus is motivated by the shortage of realistic dysarthric speech corpora available to the scientific community. Along with the details on how the data is organised and how it can be accessed, a brief description of the framework used to make the recordings is provided. Finally, the performance of the homeService automatic recogniser for dysarthric speech trained with single-speaker data from the corpus is provided as an initial baseline. Access to the homeService corpus is provided through the dedicated web page at http://mini.dcs.shef.ac.uk/resources/homeservice-corpus/. This will also have the most updated description of the data. At the time of writing the collection process is still ongoing.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1315/
PDF	https://www.aclweb.org/anthology/L16-1315
PWC	https://paperswithcode.com/paper/a-framework-for-collecting-realistic
Repo
Framework

Passing a USA National Bar Exam: a First Corpus for Experimentation


Title	Passing a USA National Bar Exam: a First Corpus for Experimentation
Authors	Biralatei Fawei, Adam Wyner, Jeff Pan
Abstract	Bar exams provide a key watershed by which legal professionals demonstrate their knowledge of the law and its application. Passing the bar entitles one to practice the law in a given jurisdiction. The bar provides an excellent benchmark for the performance of legal information systems since passing the bar would arguably signal that the system has acquired key aspects of legal reason on a par with a human lawyer. The paper provides a corpus and experimental results with material derived from a real bar exam, treating the problem as a form of textual entailment from the question to an answer. The providers of the bar exam material set the Gold Standard, which is the answer key. The experiments carried out using the {`}out of the box{'} the Excitement Open Platform for textual entailment. The results and evaluation show that the tool can identify wrong answers (non-entailment) with a high F1 score, but it performs poorly in identifying the correct answer (entailment). The results provide a baseline performance measure against which to evaluate future improvements. The reasons for the poor performance are examined, and proposals are made to augment the tool in the future. The corpus facilitates experimentation by other researchers. \|
Tasks	Natural Language Inference
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1538/
PDF	https://www.aclweb.org/anthology/L16-1538
PWC	https://paperswithcode.com/paper/passing-a-usa-national-bar-exam-a-first
Repo
Framework

Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference.


Title	Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference.
Authors	Jon Chamberlain, Massimo Poesio, Udo Kruschwitz
Abstract	Natural Language Engineering tasks require large and complex annotated datasets to build more advanced models of language. Corpora are typically annotated by several experts to create a gold standard; however, there are now compelling reasons to use a non-expert crowd to annotate text, driven by cost, speed and scalability. Phrase Detectives Corpus 1.0 is an anaphorically-annotated corpus of encyclopedic and narrative text that contains a gold standard created by multiple experts, as well as a set of annotations created by a large non-expert crowd. Analysis shows very good inter-expert agreement (kappa=.88-.93) but a more variable baseline crowd agreement (kappa=.52-.96). Encyclopedic texts show less agreement (and by implication are harder to annotate) than narrative texts. The release of this corpus is intended to encourage research into the use of crowds for text annotation and the development of more advanced, probabilistic language models, in particular for anaphoric coreference.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1323/
PDF	https://www.aclweb.org/anthology/L16-1323
PWC	https://paperswithcode.com/paper/phrase-detectives-corpus-10-crowdsourced
Repo
Framework

Word Segmentation for Akkadian Cuneiform


Title	Word Segmentation for Akkadian Cuneiform
Authors	Timo Homburg, Christian Chiarcos
Abstract	We present experiments on word segmentation for Akkadian cuneiform, an ancient writing system and a language used for about 3 millennia in the ancient Near East. To our best knowledge, this is the first study of this kind applied to either the Akkadian language or the cuneiform writing system. As a logosyllabic writing system, cuneiform structurally resembles Eastern Asian writing systems, so, we employ word segmentation algorithms originally developed for Chinese and Japanese. We describe results of rule-based algorithms, dictionary-based algorithms, statistical and machine learning approaches. Our results may indicate possible promising steps in cuneiform word segmentation that can create and improve natural language processing in this area.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1642/
PDF	https://www.aclweb.org/anthology/L16-1642
PWC	https://paperswithcode.com/paper/word-segmentation-for-akkadian-cuneiform
Repo
Framework

Corpus-Based Diacritic Restoration for South Slavic Languages


Title	Corpus-Based Diacritic Restoration for South Slavic Languages
Authors	Nikola Ljube{\v{s}}i{'c}, Toma{\v{z}} Erjavec, Darja Fi{\v{s}}er
Abstract	In computer-mediated communication, Latin-based scripts users often omit diacritics when writing. Such text is typically easily understandable to humans but very difficult for computational processing because many words become ambiguous or unknown. Letter-level approaches to diacritic restoration generalise better and do not require a lot of training data but word-level approaches tend to yield better results. However, they typically rely on a lexicon which is an expensive resource, not covering non-standard forms, and often not available for less-resourced languages. In this paper we present diacritic restoration models that are trained on easy-to-acquire corpora. We test three different types of corpora (Wikipedia, general web, Twitter) for three South Slavic languages (Croatian, Serbian and Slovene) and evaluate them on two types of text: standard (Wikipedia) and non-standard (Twitter). The proposed approach considerably outperforms charlifter, so far the only open source tool available for this task. We make the best performing systems freely available.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1573/
PDF	https://www.aclweb.org/anthology/L16-1573
PWC	https://paperswithcode.com/paper/corpus-based-diacritic-restoration-for-south
Repo
Framework

A Universal Framework for Inductive Transfer Parsing across Multi-typed Treebanks


Title	A Universal Framework for Inductive Transfer Parsing across Multi-typed Treebanks
Authors	Jiang Guo, Wanxiang Che, Haifeng Wang, Ting Liu
Abstract	Various treebanks have been released for dependency parsing. Despite that treebanks may belong to different languages or have different annotation schemes, they contain common syntactic knowledge that is potential to benefit each other. This paper presents a universal framework for transfer parsing across multi-typed treebanks with deep multi-task learning. We consider two kinds of treebanks as source: the multilingual universal treebanks and the monolingual heterogeneous treebanks. Knowledge across the source and target treebanks are effectively transferred through multi-level parameter sharing. Experiments on several benchmark datasets in various languages demonstrate that our approach can make effective use of arbitrary source treebanks to improve target parsing models.
Tasks	Dependency Parsing, Information Retrieval, Multi-Task Learning
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1002/
PDF	https://www.aclweb.org/anthology/C16-1002
PWC	https://paperswithcode.com/paper/a-universal-framework-for-inductive-transfer
Repo
Framework

Factuality Annotation and Learning in Spanish Texts


Title	Factuality Annotation and Learning in Spanish Texts
Authors	Dina Wonsever, Aiala Ros{'a}, Marisa Malcuori
Abstract	We present a proposal for the annotation of factuality of event mentions in Spanish texts and a free available annotated corpus. Our factuality model aims to capture a pragmatic notion of factuality, trying to reflect a casual reader judgements about the realis / irrealis status of mentioned events. Also, some learning experiments (SVM and CRF) have been held, showing encouraging results.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1329/
PDF	https://www.aclweb.org/anthology/L16-1329
PWC	https://paperswithcode.com/paper/factuality-annotation-and-learning-in-spanish
Repo
Framework

NNBlocks: A Deep Learning Framework for Computational Linguistics Neural Network Models


Title	NNBlocks: A Deep Learning Framework for Computational Linguistics Neural Network Models
Authors	Frederico Tommasi Caroli, Andr{'e} Freitas, Jo{~a}o Carlos Pereira da Silva, H, Siegfried schuh
Abstract	Lately, with the success of Deep Learning techniques in some computational linguistics tasks, many researchers want to explore new models for their linguistics applications. These models tend to be very different from what standard Neural Networks look like, limiting the possibility to use standard Neural Networks frameworks. This work presents NNBlocks, a new framework written in Python to build and train Neural Networks that are not constrained by a specific kind of architecture, making it possible to use it in computational linguistics.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1330/
PDF	https://www.aclweb.org/anthology/L16-1330
PWC	https://paperswithcode.com/paper/nnblocks-a-deep-learning-framework-for
Repo
Framework

Facebook 活動事件擷取系統(Facebook Activity Event Extraction System)[In Chinese]


Title	Facebook 活動事件擷取系統(Facebook Activity Event Extraction System)[In Chinese]
Authors	Yuan-Hao Lin, Chia-Hui Chang
Abstract
Tasks	Named Entity Recognition, Relation Extraction, Sequential Pattern Mining
Published	2016-10-01
URL	https://www.aclweb.org/anthology/O16-1022/
PDF	https://www.aclweb.org/anthology/O16-1022
PWC	https://paperswithcode.com/paper/facebook-aaoaac3cfacebook-activity-event
Repo
Framework

A Regional News Corpora for Contextualized Entity Discovery and Linking


Title	A Regional News Corpora for Contextualized Entity Discovery and Linking
Authors	Adrian Bra{\c{s}}oveanu, Lyndon J.B. Nixon, Albert Weichselbraun, Arno Scharl
Abstract	This paper presents a German corpus for Named Entity Linking (NEL) and Knowledge Base Population (KBP) tasks. We describe the annotation guideline, the annotation process, NIL clustering techniques and conversion to popular NEL formats such as NIF and TAC that have been used to construct this corpus based on news transcripts from the German regional broadcaster RBB (Rundfunk Berlin Brandenburg). Since creating such language resources requires significant effort, the paper also discusses how to derive additional evaluation resources for tasks like named entity contextualization or ontology enrichment by exploiting the links between named entities from the annotated corpus. The paper concludes with an evaluation that shows how several well-known NEL tools perform on the corpus, a discussion of the evaluation results, and with suggestions on how to keep evaluation corpora and datasets up to date.
Tasks	Entity Linking, Knowledge Base Population
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1531/
PDF	https://www.aclweb.org/anthology/L16-1531
PWC	https://paperswithcode.com/paper/a-regional-news-corpora-for-contextualized
Repo
Framework

Label Embedding for Zero-shot Fine-grained Named Entity Typing


Title	Label Embedding for Zero-shot Fine-grained Named Entity Typing
Authors	Yukun Ma, Erik Cambria, Sa Gao
Abstract	Named entity typing is the task of detecting the types of a named entity in context. For instance, given {``}Eric is giving a presentation{''}, our goal is to infer that {`}Eric{'} is a speaker or a presenter and a person. Existing approaches to named entity typing cannot work with a growing type set and fails to recognize entity mentions of unseen types. In this paper, we present a label embedding method that incorporates prototypical and hierarchical information to learn pre-trained label embeddings. In addition, we adapt a zero-shot learning framework that can predict both seen and previously unseen entity types. We perform evaluation on three benchmark datasets with two settings: 1) few-shots recognition where all types are covered by the training set; and 2) zero-shot recognition where fine-grained types are assumed absent from training set. Results show that prior knowledge encoded using our label embedding methods can significantly boost the performance of classification for both cases. \|
Tasks	Entity Linking, Entity Typing, Named Entity Recognition, Question Answering, Sentiment Analysis, Zero-Shot Learning
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1017/
PDF	https://www.aclweb.org/anthology/C16-1017
PWC	https://paperswithcode.com/paper/label-embedding-for-zero-shot-fine-grained
Repo
Framework

Japanese Word―Color Associations with and without Contexts


Title	Japanese Word―Color Associations with and without Contexts
Authors	Jun Harashima
Abstract	Although some words carry strong associations with specific colors (e.g., the word danger is associated with the color red), few studies have investigated these relationships. This may be due to the relative rarity of databases that contain large quantities of such information. Additionally, these resources are often limited to particular languages, such as English. Moreover, the existing resources often do not consider the possible contexts of words in assessing the associations between a word and a color. As a result, the influence of context on word―color associations is not fully understood. In this study, we constructed a novel language resource for word―color associations. The resource has two characteristics: First, our resource is the first to include Japanese word―color associations, which were collected via crowdsourcing. Second, the word―color associations in the resource are linked to contexts. We show that word―color associations depend on language and that associations with certain colors are affected by context information.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1336/
PDF	https://www.aclweb.org/anthology/L16-1336
PWC	https://paperswithcode.com/paper/japanese-wordcolor-associations-with-and
Repo
Framework

Extending the Use of Adaptor Grammars for Unsupervised Morphological Segmentation of Unseen Languages


Title	Extending the Use of Adaptor Grammars for Unsupervised Morphological Segmentation of Unseen Languages
Authors	Esk, Ramy er, Owen Rambow, Tianchun Yang
Abstract	We investigate using Adaptor Grammars for unsupervised morphological segmentation. Using six development languages, we investigate in detail different grammars, the use of morphological knowledge from outside sources, and the use of a cascaded architecture. Using cross-validation on our development languages, we propose a system which is language-independent. We show that it outperforms two state-of-the-art systems on 5 out of 6 languages.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1086/
PDF	https://www.aclweb.org/anthology/C16-1086
PWC	https://paperswithcode.com/paper/extending-the-use-of-adaptor-grammars-for
Repo
Framework