Paper Group NANR 46
Discontinuous VP in Bulgarian. The Alaskan Athabascan Grammar Database. A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus. Passing a USA National Bar Exam: a First Corpus for Experimentation. Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference.. Word Segmentation for Akkadian Cuneiform. Corpus …
Discontinuous VP in Bulgarian
Title | Discontinuous VP in Bulgarian |
Authors | Elisaveta Balabanova |
Abstract | |
Tasks | |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-0904/ |
https://www.aclweb.org/anthology/W16-0904 | |
PWC | https://paperswithcode.com/paper/discontinuous-vp-in-bulgarian |
Repo | |
Framework | |
The Alaskan Athabascan Grammar Database
Title | The Alaskan Athabascan Grammar Database |
Authors | Sebastian Nordhoff, Siri Tuttle, Olga Lovick |
Abstract | This paper describes a repository of example sentences in three endangered Athabascan languages: Koyukon, Upper Tanana, Lower Tanana. The repository allows researchers or language teachers to browse the example sentence corpus to either investigate the languages or to prepare teaching materials. The originally heterogeneous text collection was imported into a SOLR store via the POIO bridge. This paper describes the requirements, implementation, advantages and drawbacks of this approach and discusses the potential to apply it for other languages of the Athabascan family or beyond. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1523/ |
https://www.aclweb.org/anthology/L16-1523 | |
PWC | https://paperswithcode.com/paper/the-alaskan-athabascan-grammar-database |
Repo | |
Framework | |
A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus
Title | A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus |
Authors | Mauro Nicolao, Heidi Christensen, Stuart Cunningham, Phil Green, Thomas Hain |
Abstract | This paper introduces a new British English speech database, named the homeService corpus, which has been gathered as part of the homeService project. This project aims to help users with speech and motor disabilities to operate their home appliances using voice commands. The audio recorded during such interactions consists of realistic data of speakers with severe dysarthria. The majority of the homeService corpus is recorded in real home environments where voice control is often the normal means by which users interact with their devices. The collection of the corpus is motivated by the shortage of realistic dysarthric speech corpora available to the scientific community. Along with the details on how the data is organised and how it can be accessed, a brief description of the framework used to make the recordings is provided. Finally, the performance of the homeService automatic recogniser for dysarthric speech trained with single-speaker data from the corpus is provided as an initial baseline. Access to the homeService corpus is provided through the dedicated web page at http://mini.dcs.shef.ac.uk/resources/homeservice-corpus/. This will also have the most updated description of the data. At the time of writing the collection process is still ongoing. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1315/ |
https://www.aclweb.org/anthology/L16-1315 | |
PWC | https://paperswithcode.com/paper/a-framework-for-collecting-realistic |
Repo | |
Framework | |
Passing a USA National Bar Exam: a First Corpus for Experimentation
Title | Passing a USA National Bar Exam: a First Corpus for Experimentation |
Authors | Biralatei Fawei, Adam Wyner, Jeff Pan |
Abstract | Bar exams provide a key watershed by which legal professionals demonstrate their knowledge of the law and its application. Passing the bar entitles one to practice the law in a given jurisdiction. The bar provides an excellent benchmark for the performance of legal information systems since passing the bar would arguably signal that the system has acquired key aspects of legal reason on a par with a human lawyer. The paper provides a corpus and experimental results with material derived from a real bar exam, treating the problem as a form of textual entailment from the question to an answer. The providers of the bar exam material set the Gold Standard, which is the answer key. The experiments carried out using the {`}out of the box{'} the Excitement Open Platform for textual entailment. The results and evaluation show that the tool can identify wrong answers (non-entailment) with a high F1 score, but it performs poorly in identifying the correct answer (entailment). The results provide a baseline performance measure against which to evaluate future improvements. The reasons for the poor performance are examined, and proposals are made to augment the tool in the future. The corpus facilitates experimentation by other researchers. | |
Tasks | Natural Language Inference |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1538/ |
https://www.aclweb.org/anthology/L16-1538 | |
PWC | https://paperswithcode.com/paper/passing-a-usa-national-bar-exam-a-first |
Repo | |
Framework | |
Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference.
Title | Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference. |
Authors | Jon Chamberlain, Massimo Poesio, Udo Kruschwitz |
Abstract | Natural Language Engineering tasks require large and complex annotated datasets to build more advanced models of language. Corpora are typically annotated by several experts to create a gold standard; however, there are now compelling reasons to use a non-expert crowd to annotate text, driven by cost, speed and scalability. Phrase Detectives Corpus 1.0 is an anaphorically-annotated corpus of encyclopedic and narrative text that contains a gold standard created by multiple experts, as well as a set of annotations created by a large non-expert crowd. Analysis shows very good inter-expert agreement (kappa=.88-.93) but a more variable baseline crowd agreement (kappa=.52-.96). Encyclopedic texts show less agreement (and by implication are harder to annotate) than narrative texts. The release of this corpus is intended to encourage research into the use of crowds for text annotation and the development of more advanced, probabilistic language models, in particular for anaphoric coreference. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1323/ |
https://www.aclweb.org/anthology/L16-1323 | |
PWC | https://paperswithcode.com/paper/phrase-detectives-corpus-10-crowdsourced |
Repo | |
Framework | |
Word Segmentation for Akkadian Cuneiform
Title | Word Segmentation for Akkadian Cuneiform |
Authors | Timo Homburg, Christian Chiarcos |
Abstract | We present experiments on word segmentation for Akkadian cuneiform, an ancient writing system and a language used for about 3 millennia in the ancient Near East. To our best knowledge, this is the first study of this kind applied to either the Akkadian language or the cuneiform writing system. As a logosyllabic writing system, cuneiform structurally resembles Eastern Asian writing systems, so, we employ word segmentation algorithms originally developed for Chinese and Japanese. We describe results of rule-based algorithms, dictionary-based algorithms, statistical and machine learning approaches. Our results may indicate possible promising steps in cuneiform word segmentation that can create and improve natural language processing in this area. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1642/ |
https://www.aclweb.org/anthology/L16-1642 | |
PWC | https://paperswithcode.com/paper/word-segmentation-for-akkadian-cuneiform |
Repo | |
Framework | |
Corpus-Based Diacritic Restoration for South Slavic Languages
Title | Corpus-Based Diacritic Restoration for South Slavic Languages |
Authors | Nikola Ljube{\v{s}}i{'c}, Toma{\v{z}} Erjavec, Darja Fi{\v{s}}er |
Abstract | In computer-mediated communication, Latin-based scripts users often omit diacritics when writing. Such text is typically easily understandable to humans but very difficult for computational processing because many words become ambiguous or unknown. Letter-level approaches to diacritic restoration generalise better and do not require a lot of training data but word-level approaches tend to yield better results. However, they typically rely on a lexicon which is an expensive resource, not covering non-standard forms, and often not available for less-resourced languages. In this paper we present diacritic restoration models that are trained on easy-to-acquire corpora. We test three different types of corpora (Wikipedia, general web, Twitter) for three South Slavic languages (Croatian, Serbian and Slovene) and evaluate them on two types of text: standard (Wikipedia) and non-standard (Twitter). The proposed approach considerably outperforms charlifter, so far the only open source tool available for this task. We make the best performing systems freely available. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1573/ |
https://www.aclweb.org/anthology/L16-1573 | |
PWC | https://paperswithcode.com/paper/corpus-based-diacritic-restoration-for-south |
Repo | |
Framework | |
A Universal Framework for Inductive Transfer Parsing across Multi-typed Treebanks
Title | A Universal Framework for Inductive Transfer Parsing across Multi-typed Treebanks |
Authors | Jiang Guo, Wanxiang Che, Haifeng Wang, Ting Liu |
Abstract | Various treebanks have been released for dependency parsing. Despite that treebanks may belong to different languages or have different annotation schemes, they contain common syntactic knowledge that is potential to benefit each other. This paper presents a universal framework for transfer parsing across multi-typed treebanks with deep multi-task learning. We consider two kinds of treebanks as source: the multilingual universal treebanks and the monolingual heterogeneous treebanks. Knowledge across the source and target treebanks are effectively transferred through multi-level parameter sharing. Experiments on several benchmark datasets in various languages demonstrate that our approach can make effective use of arbitrary source treebanks to improve target parsing models. |
Tasks | Dependency Parsing, Information Retrieval, Multi-Task Learning |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1002/ |
https://www.aclweb.org/anthology/C16-1002 | |
PWC | https://paperswithcode.com/paper/a-universal-framework-for-inductive-transfer |
Repo | |
Framework | |
Factuality Annotation and Learning in Spanish Texts
Title | Factuality Annotation and Learning in Spanish Texts |
Authors | Dina Wonsever, Aiala Ros{'a}, Marisa Malcuori |
Abstract | We present a proposal for the annotation of factuality of event mentions in Spanish texts and a free available annotated corpus. Our factuality model aims to capture a pragmatic notion of factuality, trying to reflect a casual reader judgements about the realis / irrealis status of mentioned events. Also, some learning experiments (SVM and CRF) have been held, showing encouraging results. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1329/ |
https://www.aclweb.org/anthology/L16-1329 | |
PWC | https://paperswithcode.com/paper/factuality-annotation-and-learning-in-spanish |
Repo | |
Framework | |
NNBlocks: A Deep Learning Framework for Computational Linguistics Neural Network Models
Title | NNBlocks: A Deep Learning Framework for Computational Linguistics Neural Network Models |
Authors | Frederico Tommasi Caroli, Andr{'e} Freitas, Jo{~a}o Carlos Pereira da Silva, H, Siegfried schuh |
Abstract | Lately, with the success of Deep Learning techniques in some computational linguistics tasks, many researchers want to explore new models for their linguistics applications. These models tend to be very different from what standard Neural Networks look like, limiting the possibility to use standard Neural Networks frameworks. This work presents NNBlocks, a new framework written in Python to build and train Neural Networks that are not constrained by a specific kind of architecture, making it possible to use it in computational linguistics. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1330/ |
https://www.aclweb.org/anthology/L16-1330 | |
PWC | https://paperswithcode.com/paper/nnblocks-a-deep-learning-framework-for |
Repo | |
Framework | |
Facebook 活動事件擷取系統(Facebook Activity Event Extraction System)[In Chinese]
Title | Facebook 活動事件擷取系統(Facebook Activity Event Extraction System)[In Chinese] |
Authors | Yuan-Hao Lin, Chia-Hui Chang |
Abstract | |
Tasks | Named Entity Recognition, Relation Extraction, Sequential Pattern Mining |
Published | 2016-10-01 |
URL | https://www.aclweb.org/anthology/O16-1022/ |
https://www.aclweb.org/anthology/O16-1022 | |
PWC | https://paperswithcode.com/paper/facebook-aaoaac3cfacebook-activity-event |
Repo | |
Framework | |
A Regional News Corpora for Contextualized Entity Discovery and Linking
Title | A Regional News Corpora for Contextualized Entity Discovery and Linking |
Authors | Adrian Bra{\c{s}}oveanu, Lyndon J.B. Nixon, Albert Weichselbraun, Arno Scharl |
Abstract | This paper presents a German corpus for Named Entity Linking (NEL) and Knowledge Base Population (KBP) tasks. We describe the annotation guideline, the annotation process, NIL clustering techniques and conversion to popular NEL formats such as NIF and TAC that have been used to construct this corpus based on news transcripts from the German regional broadcaster RBB (Rundfunk Berlin Brandenburg). Since creating such language resources requires significant effort, the paper also discusses how to derive additional evaluation resources for tasks like named entity contextualization or ontology enrichment by exploiting the links between named entities from the annotated corpus. The paper concludes with an evaluation that shows how several well-known NEL tools perform on the corpus, a discussion of the evaluation results, and with suggestions on how to keep evaluation corpora and datasets up to date. |
Tasks | Entity Linking, Knowledge Base Population |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1531/ |
https://www.aclweb.org/anthology/L16-1531 | |
PWC | https://paperswithcode.com/paper/a-regional-news-corpora-for-contextualized |
Repo | |
Framework | |
Label Embedding for Zero-shot Fine-grained Named Entity Typing
Title | Label Embedding for Zero-shot Fine-grained Named Entity Typing |
Authors | Yukun Ma, Erik Cambria, Sa Gao |
Abstract | Named entity typing is the task of detecting the types of a named entity in context. For instance, given {``}Eric is giving a presentation{''}, our goal is to infer that {`}Eric{'} is a speaker or a presenter and a person. Existing approaches to named entity typing cannot work with a growing type set and fails to recognize entity mentions of unseen types. In this paper, we present a label embedding method that incorporates prototypical and hierarchical information to learn pre-trained label embeddings. In addition, we adapt a zero-shot learning framework that can predict both seen and previously unseen entity types. We perform evaluation on three benchmark datasets with two settings: 1) few-shots recognition where all types are covered by the training set; and 2) zero-shot recognition where fine-grained types are assumed absent from training set. Results show that prior knowledge encoded using our label embedding methods can significantly boost the performance of classification for both cases. | |
Tasks | Entity Linking, Entity Typing, Named Entity Recognition, Question Answering, Sentiment Analysis, Zero-Shot Learning |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1017/ |
https://www.aclweb.org/anthology/C16-1017 | |
PWC | https://paperswithcode.com/paper/label-embedding-for-zero-shot-fine-grained |
Repo | |
Framework | |
Japanese Word―Color Associations with and without Contexts
Title | Japanese Word―Color Associations with and without Contexts |
Authors | Jun Harashima |
Abstract | Although some words carry strong associations with specific colors (e.g., the word danger is associated with the color red), few studies have investigated these relationships. This may be due to the relative rarity of databases that contain large quantities of such information. Additionally, these resources are often limited to particular languages, such as English. Moreover, the existing resources often do not consider the possible contexts of words in assessing the associations between a word and a color. As a result, the influence of context on word―color associations is not fully understood. In this study, we constructed a novel language resource for word―color associations. The resource has two characteristics: First, our resource is the first to include Japanese word―color associations, which were collected via crowdsourcing. Second, the word―color associations in the resource are linked to contexts. We show that word―color associations depend on language and that associations with certain colors are affected by context information. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1336/ |
https://www.aclweb.org/anthology/L16-1336 | |
PWC | https://paperswithcode.com/paper/japanese-wordcolor-associations-with-and |
Repo | |
Framework | |
Extending the Use of Adaptor Grammars for Unsupervised Morphological Segmentation of Unseen Languages
Title | Extending the Use of Adaptor Grammars for Unsupervised Morphological Segmentation of Unseen Languages |
Authors | Esk, Ramy er, Owen Rambow, Tianchun Yang |
Abstract | We investigate using Adaptor Grammars for unsupervised morphological segmentation. Using six development languages, we investigate in detail different grammars, the use of morphological knowledge from outside sources, and the use of a cascaded architecture. Using cross-validation on our development languages, we propose a system which is language-independent. We show that it outperforms two state-of-the-art systems on 5 out of 6 languages. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1086/ |
https://www.aclweb.org/anthology/C16-1086 | |
PWC | https://paperswithcode.com/paper/extending-the-use-of-adaptor-grammars-for |
Repo | |
Framework | |