May 5, 2019

2275 words 11 mins read

Paper Group NANR 46

Paper Group NANR 46

Discontinuous VP in Bulgarian. The Alaskan Athabascan Grammar Database. A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus. Passing a USA National Bar Exam: a First Corpus for Experimentation. Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference.. Word Segmentation for Akkadian Cuneiform. Corpus …

Discontinuous VP in Bulgarian

Title Discontinuous VP in Bulgarian
Authors Elisaveta Balabanova
Abstract
Tasks
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-0904/
PDF https://www.aclweb.org/anthology/W16-0904
PWC https://paperswithcode.com/paper/discontinuous-vp-in-bulgarian
Repo
Framework

The Alaskan Athabascan Grammar Database

Title The Alaskan Athabascan Grammar Database
Authors Sebastian Nordhoff, Siri Tuttle, Olga Lovick
Abstract This paper describes a repository of example sentences in three endangered Athabascan languages: Koyukon, Upper Tanana, Lower Tanana. The repository allows researchers or language teachers to browse the example sentence corpus to either investigate the languages or to prepare teaching materials. The originally heterogeneous text collection was imported into a SOLR store via the POIO bridge. This paper describes the requirements, implementation, advantages and drawbacks of this approach and discusses the potential to apply it for other languages of the Athabascan family or beyond.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1523/
PDF https://www.aclweb.org/anthology/L16-1523
PWC https://paperswithcode.com/paper/the-alaskan-athabascan-grammar-database
Repo
Framework

A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus

Title A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus
Authors Mauro Nicolao, Heidi Christensen, Stuart Cunningham, Phil Green, Thomas Hain
Abstract This paper introduces a new British English speech database, named the homeService corpus, which has been gathered as part of the homeService project. This project aims to help users with speech and motor disabilities to operate their home appliances using voice commands. The audio recorded during such interactions consists of realistic data of speakers with severe dysarthria. The majority of the homeService corpus is recorded in real home environments where voice control is often the normal means by which users interact with their devices. The collection of the corpus is motivated by the shortage of realistic dysarthric speech corpora available to the scientific community. Along with the details on how the data is organised and how it can be accessed, a brief description of the framework used to make the recordings is provided. Finally, the performance of the homeService automatic recogniser for dysarthric speech trained with single-speaker data from the corpus is provided as an initial baseline. Access to the homeService corpus is provided through the dedicated web page at http://mini.dcs.shef.ac.uk/resources/homeservice-corpus/. This will also have the most updated description of the data. At the time of writing the collection process is still ongoing.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1315/
PDF https://www.aclweb.org/anthology/L16-1315
PWC https://paperswithcode.com/paper/a-framework-for-collecting-realistic
Repo
Framework

Passing a USA National Bar Exam: a First Corpus for Experimentation

Title Passing a USA National Bar Exam: a First Corpus for Experimentation
Authors Biralatei Fawei, Adam Wyner, Jeff Pan
Abstract Bar exams provide a key watershed by which legal professionals demonstrate their knowledge of the law and its application. Passing the bar entitles one to practice the law in a given jurisdiction. The bar provides an excellent benchmark for the performance of legal information systems since passing the bar would arguably signal that the system has acquired key aspects of legal reason on a par with a human lawyer. The paper provides a corpus and experimental results with material derived from a real bar exam, treating the problem as a form of textual entailment from the question to an answer. The providers of the bar exam material set the Gold Standard, which is the answer key. The experiments carried out using the {`}out of the box{'} the Excitement Open Platform for textual entailment. The results and evaluation show that the tool can identify wrong answers (non-entailment) with a high F1 score, but it performs poorly in identifying the correct answer (entailment). The results provide a baseline performance measure against which to evaluate future improvements. The reasons for the poor performance are examined, and proposals are made to augment the tool in the future. The corpus facilitates experimentation by other researchers. |
Tasks Natural Language Inference
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1538/
PDF https://www.aclweb.org/anthology/L16-1538
PWC https://paperswithcode.com/paper/passing-a-usa-national-bar-exam-a-first
Repo
Framework

Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference.

Title Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference.
Authors Jon Chamberlain, Massimo Poesio, Udo Kruschwitz
Abstract Natural Language Engineering tasks require large and complex annotated datasets to build more advanced models of language. Corpora are typically annotated by several experts to create a gold standard; however, there are now compelling reasons to use a non-expert crowd to annotate text, driven by cost, speed and scalability. Phrase Detectives Corpus 1.0 is an anaphorically-annotated corpus of encyclopedic and narrative text that contains a gold standard created by multiple experts, as well as a set of annotations created by a large non-expert crowd. Analysis shows very good inter-expert agreement (kappa=.88-.93) but a more variable baseline crowd agreement (kappa=.52-.96). Encyclopedic texts show less agreement (and by implication are harder to annotate) than narrative texts. The release of this corpus is intended to encourage research into the use of crowds for text annotation and the development of more advanced, probabilistic language models, in particular for anaphoric coreference.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1323/
PDF https://www.aclweb.org/anthology/L16-1323
PWC https://paperswithcode.com/paper/phrase-detectives-corpus-10-crowdsourced
Repo
Framework

Word Segmentation for Akkadian Cuneiform

Title Word Segmentation for Akkadian Cuneiform
Authors Timo Homburg, Christian Chiarcos
Abstract We present experiments on word segmentation for Akkadian cuneiform, an ancient writing system and a language used for about 3 millennia in the ancient Near East. To our best knowledge, this is the first study of this kind applied to either the Akkadian language or the cuneiform writing system. As a logosyllabic writing system, cuneiform structurally resembles Eastern Asian writing systems, so, we employ word segmentation algorithms originally developed for Chinese and Japanese. We describe results of rule-based algorithms, dictionary-based algorithms, statistical and machine learning approaches. Our results may indicate possible promising steps in cuneiform word segmentation that can create and improve natural language processing in this area.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1642/
PDF https://www.aclweb.org/anthology/L16-1642
PWC https://paperswithcode.com/paper/word-segmentation-for-akkadian-cuneiform
Repo
Framework

Corpus-Based Diacritic Restoration for South Slavic Languages

Title Corpus-Based Diacritic Restoration for South Slavic Languages
Authors Nikola Ljube{\v{s}}i{'c}, Toma{\v{z}} Erjavec, Darja Fi{\v{s}}er
Abstract In computer-mediated communication, Latin-based scripts users often omit diacritics when writing. Such text is typically easily understandable to humans but very difficult for computational processing because many words become ambiguous or unknown. Letter-level approaches to diacritic restoration generalise better and do not require a lot of training data but word-level approaches tend to yield better results. However, they typically rely on a lexicon which is an expensive resource, not covering non-standard forms, and often not available for less-resourced languages. In this paper we present diacritic restoration models that are trained on easy-to-acquire corpora. We test three different types of corpora (Wikipedia, general web, Twitter) for three South Slavic languages (Croatian, Serbian and Slovene) and evaluate them on two types of text: standard (Wikipedia) and non-standard (Twitter). The proposed approach considerably outperforms charlifter, so far the only open source tool available for this task. We make the best performing systems freely available.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1573/
PDF https://www.aclweb.org/anthology/L16-1573
PWC https://paperswithcode.com/paper/corpus-based-diacritic-restoration-for-south
Repo
Framework

A Universal Framework for Inductive Transfer Parsing across Multi-typed Treebanks

Title A Universal Framework for Inductive Transfer Parsing across Multi-typed Treebanks
Authors Jiang Guo, Wanxiang Che, Haifeng Wang, Ting Liu
Abstract Various treebanks have been released for dependency parsing. Despite that treebanks may belong to different languages or have different annotation schemes, they contain common syntactic knowledge that is potential to benefit each other. This paper presents a universal framework for transfer parsing across multi-typed treebanks with deep multi-task learning. We consider two kinds of treebanks as source: the multilingual universal treebanks and the monolingual heterogeneous treebanks. Knowledge across the source and target treebanks are effectively transferred through multi-level parameter sharing. Experiments on several benchmark datasets in various languages demonstrate that our approach can make effective use of arbitrary source treebanks to improve target parsing models.
Tasks Dependency Parsing, Information Retrieval, Multi-Task Learning
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1002/
PDF https://www.aclweb.org/anthology/C16-1002
PWC https://paperswithcode.com/paper/a-universal-framework-for-inductive-transfer
Repo
Framework

Factuality Annotation and Learning in Spanish Texts

Title Factuality Annotation and Learning in Spanish Texts
Authors Dina Wonsever, Aiala Ros{'a}, Marisa Malcuori
Abstract We present a proposal for the annotation of factuality of event mentions in Spanish texts and a free available annotated corpus. Our factuality model aims to capture a pragmatic notion of factuality, trying to reflect a casual reader judgements about the realis / irrealis status of mentioned events. Also, some learning experiments (SVM and CRF) have been held, showing encouraging results.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1329/
PDF https://www.aclweb.org/anthology/L16-1329
PWC https://paperswithcode.com/paper/factuality-annotation-and-learning-in-spanish
Repo
Framework

NNBlocks: A Deep Learning Framework for Computational Linguistics Neural Network Models

Title NNBlocks: A Deep Learning Framework for Computational Linguistics Neural Network Models
Authors Frederico Tommasi Caroli, Andr{'e} Freitas, Jo{~a}o Carlos Pereira da Silva, H, Siegfried schuh
Abstract Lately, with the success of Deep Learning techniques in some computational linguistics tasks, many researchers want to explore new models for their linguistics applications. These models tend to be very different from what standard Neural Networks look like, limiting the possibility to use standard Neural Networks frameworks. This work presents NNBlocks, a new framework written in Python to build and train Neural Networks that are not constrained by a specific kind of architecture, making it possible to use it in computational linguistics.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1330/
PDF https://www.aclweb.org/anthology/L16-1330
PWC https://paperswithcode.com/paper/nnblocks-a-deep-learning-framework-for
Repo
Framework

Facebook 活動事件擷取系統(Facebook Activity Event Extraction System)[In Chinese]

Title Facebook 活動事件擷取系統(Facebook Activity Event Extraction System)[In Chinese]
Authors Yuan-Hao Lin, Chia-Hui Chang
Abstract
Tasks Named Entity Recognition, Relation Extraction, Sequential Pattern Mining
Published 2016-10-01
URL https://www.aclweb.org/anthology/O16-1022/
PDF https://www.aclweb.org/anthology/O16-1022
PWC https://paperswithcode.com/paper/facebook-aaoaac3cfacebook-activity-event
Repo
Framework

A Regional News Corpora for Contextualized Entity Discovery and Linking

Title A Regional News Corpora for Contextualized Entity Discovery and Linking
Authors Adrian Bra{\c{s}}oveanu, Lyndon J.B. Nixon, Albert Weichselbraun, Arno Scharl
Abstract This paper presents a German corpus for Named Entity Linking (NEL) and Knowledge Base Population (KBP) tasks. We describe the annotation guideline, the annotation process, NIL clustering techniques and conversion to popular NEL formats such as NIF and TAC that have been used to construct this corpus based on news transcripts from the German regional broadcaster RBB (Rundfunk Berlin Brandenburg). Since creating such language resources requires significant effort, the paper also discusses how to derive additional evaluation resources for tasks like named entity contextualization or ontology enrichment by exploiting the links between named entities from the annotated corpus. The paper concludes with an evaluation that shows how several well-known NEL tools perform on the corpus, a discussion of the evaluation results, and with suggestions on how to keep evaluation corpora and datasets up to date.
Tasks Entity Linking, Knowledge Base Population
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1531/
PDF https://www.aclweb.org/anthology/L16-1531
PWC https://paperswithcode.com/paper/a-regional-news-corpora-for-contextualized
Repo
Framework

Label Embedding for Zero-shot Fine-grained Named Entity Typing

Title Label Embedding for Zero-shot Fine-grained Named Entity Typing
Authors Yukun Ma, Erik Cambria, Sa Gao
Abstract Named entity typing is the task of detecting the types of a named entity in context. For instance, given {``}Eric is giving a presentation{''}, our goal is to infer that {`}Eric{'} is a speaker or a presenter and a person. Existing approaches to named entity typing cannot work with a growing type set and fails to recognize entity mentions of unseen types. In this paper, we present a label embedding method that incorporates prototypical and hierarchical information to learn pre-trained label embeddings. In addition, we adapt a zero-shot learning framework that can predict both seen and previously unseen entity types. We perform evaluation on three benchmark datasets with two settings: 1) few-shots recognition where all types are covered by the training set; and 2) zero-shot recognition where fine-grained types are assumed absent from training set. Results show that prior knowledge encoded using our label embedding methods can significantly boost the performance of classification for both cases. |
Tasks Entity Linking, Entity Typing, Named Entity Recognition, Question Answering, Sentiment Analysis, Zero-Shot Learning
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1017/
PDF https://www.aclweb.org/anthology/C16-1017
PWC https://paperswithcode.com/paper/label-embedding-for-zero-shot-fine-grained
Repo
Framework

Japanese Word―Color Associations with and without Contexts

Title Japanese Word―Color Associations with and without Contexts
Authors Jun Harashima
Abstract Although some words carry strong associations with specific colors (e.g., the word danger is associated with the color red), few studies have investigated these relationships. This may be due to the relative rarity of databases that contain large quantities of such information. Additionally, these resources are often limited to particular languages, such as English. Moreover, the existing resources often do not consider the possible contexts of words in assessing the associations between a word and a color. As a result, the influence of context on word―color associations is not fully understood. In this study, we constructed a novel language resource for word―color associations. The resource has two characteristics: First, our resource is the first to include Japanese word―color associations, which were collected via crowdsourcing. Second, the word―color associations in the resource are linked to contexts. We show that word―color associations depend on language and that associations with certain colors are affected by context information.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1336/
PDF https://www.aclweb.org/anthology/L16-1336
PWC https://paperswithcode.com/paper/japanese-wordcolor-associations-with-and
Repo
Framework

Extending the Use of Adaptor Grammars for Unsupervised Morphological Segmentation of Unseen Languages

Title Extending the Use of Adaptor Grammars for Unsupervised Morphological Segmentation of Unseen Languages
Authors Esk, Ramy er, Owen Rambow, Tianchun Yang
Abstract We investigate using Adaptor Grammars for unsupervised morphological segmentation. Using six development languages, we investigate in detail different grammars, the use of morphological knowledge from outside sources, and the use of a cascaded architecture. Using cross-validation on our development languages, we propose a system which is language-independent. We show that it outperforms two state-of-the-art systems on 5 out of 6 languages.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1086/
PDF https://www.aclweb.org/anthology/C16-1086
PWC https://paperswithcode.com/paper/extending-the-use-of-adaptor-grammars-for
Repo
Framework
comments powered by Disqus