May 5, 2019

2304 words 11 mins read

Paper Group NANR 48

Paper Group NANR 48

A Document Repository for Social Media and Speech Conversations. OCR++: A Robust Framework For Information Extraction from Scholarly Articles. A Supervised Approach for Enriching the Relational Structure of Frame Semantics in FrameNet. Transfer-Based Learning-to-Rank Assessment of Medical Term Technicality. Manual and Automatic Paraphrases for MT E …

A Document Repository for Social Media and Speech Conversations

Title A Document Repository for Social Media and Speech Conversations
Authors Adam Funk, Robert Gaizauskas, Benoit Favre
Abstract We present a successfully implemented document repository REST service for flexible SCRUD (search, crate, read, update, delete) storage of social media conversations, using a GATE/TIPSTER-like document object model and providing a query language for document features. This software is currently being used in the SENSEI research project and will be published as open-source software before the project ends. It is, to the best of our knowledge, the first freely available, general purpose data repository to support large-scale multimodal (i.e., speech or text) conversation analytics.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1070/
PDF https://www.aclweb.org/anthology/L16-1070
PWC https://paperswithcode.com/paper/a-document-repository-for-social-media-and
Repo
Framework

OCR++: A Robust Framework For Information Extraction from Scholarly Articles

Title OCR++: A Robust Framework For Information Extraction from Scholarly Articles
Authors Mayank Singh, Barnopriyo Barua, Priyank Palod, Manvi Garg, Sidhartha Satapathy, Samuel Bushi, Kumar Ayush, Krishna Sai Rohith, Tulasi Gamidi, Pawan Goyal, Animesh Mukherjee
Abstract This paper proposes OCR++, an open-source framework designed for a variety of information extraction tasks from scholarly articles including metadata (title, author names, affiliation and e-mail), structure (section headings and body text, table and figure headings, URLs and footnotes) and bibliography (citation instances and references). We analyze a diverse set of scientific articles written in English to understand generic writing patterns and formulate rules to develop this hybrid framework. Extensive evaluations show that the proposed framework outperforms the existing state-of-the-art tools by a large margin in structural information extraction along with improved performance in metadata and bibliography extraction tasks, both in terms of accuracy (around 50{%} improvement) and processing time (around 52{%} improvement). A user experience study conducted with the help of 30 researchers reveals that the researchers found this system to be very helpful. As an additional objective, we discuss two novel use cases including automatically extracting links to public datasets from the proceedings, which would further accelerate the advancement in digital libraries. The result of the framework can be exported as a whole into structured TEI-encoded documents. Our framework is accessible online at \url{http://www.cnergres.iitkgp.ac.in/OCR++/home/}.
Tasks Optical Character Recognition
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1320/
PDF https://www.aclweb.org/anthology/C16-1320
PWC https://paperswithcode.com/paper/ocr-a-robust-framework-for-information
Repo
Framework

A Supervised Approach for Enriching the Relational Structure of Frame Semantics in FrameNet

Title A Supervised Approach for Enriching the Relational Structure of Frame Semantics in FrameNet
Authors Shafqat Mumtaz Virk, Philippe Muller, Juliette Conrath
Abstract Frame semantics is a theory of linguistic meanings, and is considered to be a useful framework for shallow semantic analysis of natural language. FrameNet, which is based on frame semantics, is a popular lexical semantic resource. In addition to providing a set of core semantic frames and their frame elements, FrameNet also provides relations between those frames (hence providing a network of frames i.e. FrameNet). We address here the limited coverage of the network of conceptual relations between frames in FrameNet, which has previously been pointed out by others. We present a supervised model using rich features from three different sources: structural features from the existing FrameNet network, information from the WordNet relations between synsets projected into semantic frames, and corpus-collected lexical associations. We show large improvements over baselines consisting of each of the three groups of features in isolation. We then use this model to select frame pairs as candidate relations, and perform evaluation on a sample with good precision.
Tasks Coreference Resolution, Question Answering
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1334/
PDF https://www.aclweb.org/anthology/C16-1334
PWC https://paperswithcode.com/paper/a-supervised-approach-for-enriching-the
Repo
Framework

Transfer-Based Learning-to-Rank Assessment of Medical Term Technicality

Title Transfer-Based Learning-to-Rank Assessment of Medical Term Technicality
Authors Dhouha Bouamor, Leonardo Campillos Llanos, Anne-Laure Ligozat, Sophie Rosset, Pierre Zweigenbaum
Abstract While measuring the readability of texts has been a long-standing research topic, assessing the technicality of terms has only been addressed more recently and mostly for the English language. In this paper, we train a learning-to-rank model to determine a specialization degree for each term found in a given list. Since no training data for this task exist for French, we train our system with non-lexical features on English data, namely, the Consumer Health Vocabulary, then apply it to French. The features include the likelihood ratio of the term based on specialized and lay language models, and tests for containing morphologically complex words. The evaluation of this approach is conducted on 134 terms from the UMLS Metathesaurus and 868 terms from the Eugloss thesaurus. The Normalized Discounted Cumulative Gain obtained by our system is over 0.8 on both test sets. Besides, thanks to the learning-to-rank approach, adding morphological features to the language model features improves the results on the Eugloss thesaurus.
Tasks Language Modelling, Learning-To-Rank
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1366/
PDF https://www.aclweb.org/anthology/L16-1366
PWC https://paperswithcode.com/paper/transfer-based-learning-to-rank-assessment-of
Repo
Framework

Manual and Automatic Paraphrases for MT Evaluation

Title Manual and Automatic Paraphrases for MT Evaluation
Authors Ale{\v{s}} Tamchyna, Petra Baran{\v{c}}{'\i}kov{'a}
Abstract Paraphrasing of reference translations has been shown to improve the correlation with human judgements in automatic evaluation of machine translation (MT) outputs. In this work, we present a new dataset for evaluating English-Czech translation based on automatic paraphrases. We compare this dataset with an existing set of manually created paraphrases and find that even automatic paraphrases can improve MT evaluation. We have also propose and evaluate several criteria for selecting suitable reference translations from a larger set.
Tasks Machine Translation
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1563/
PDF https://www.aclweb.org/anthology/L16-1563
PWC https://paperswithcode.com/paper/manual-and-automatic-paraphrases-for-mt
Repo
Framework

Using Data Mining Techniques for Sentiment Shifter Identification

Title Using Data Mining Techniques for Sentiment Shifter Identification
Authors Samira Noferesti, Mehrnoush Shamsfard
Abstract Sentiment shifters, i.e., words and expressions that can affect text polarity, play an important role in opinion mining. However, the limited ability of current automated opinion mining systems to handle shifters represents a major challenge. The majority of existing approaches rely on a manual list of shifters; few attempts have been made to automatically identify shifters in text. Most of them just focus on negating shifters. This paper presents a novel and efficient semi-automatic method for identifying sentiment shifters in drug reviews, aiming at improving the overall accuracy of opinion mining systems. To this end, we use weighted association rule mining (WARM), a well-known data mining technique, for finding frequent dependency patterns representing sentiment shifters from a domain-specific corpus. These patterns that include different kinds of shifter words such as shifter verbs and quantifiers are able to handle both local and long-distance shifters. We also combine these patterns with a lexicon-based approach for the polarity classification task. Experiments on drug reviews demonstrate that extracted shifters can improve the precision of the lexicon-based approach for polarity classification 9.25 percent.
Tasks Opinion Mining
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1431/
PDF https://www.aclweb.org/anthology/L16-1431
PWC https://paperswithcode.com/paper/using-data-mining-techniques-for-sentiment
Repo
Framework

MWEs in Treebanks: From Survey to Guidelines

Title MWEs in Treebanks: From Survey to Guidelines
Authors Victoria Ros{'e}n, Koenraad De Smedt, Gyri Sm{\o}rdal Losnegaard, Eduard Bej{\v{c}}ek, Agata Savary, Petya Osenova
Abstract By means of an online survey, we have investigated ways in which various types of multiword expressions are annotated in existing treebanks. The results indicate that there is considerable variation in treatments across treebanks and thereby also, to some extent, across languages and across theoretical frameworks. The comparison is focused on the annotation of light verb constructions and verbal idioms. The survey shows that the light verb constructions either get special annotations as such, or are treated as ordinary verbs, while VP idioms are handled through different strategies. Based on insights from our investigation, we propose some general guidelines for annotating multiword expressions in treebanks. The recommendations address the following application-based needs: distinguishing MWEs from similar but compositional constructions; searching distinct types of MWEs in treebanks; awareness of literal and nonliteral meanings; and normalization of the MWE representation. The cross-lingually and cross-theoretically focused survey is intended as an aid to accessing treebanks and an aid for further work on treebank annotation.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1368/
PDF https://www.aclweb.org/anthology/L16-1368
PWC https://paperswithcode.com/paper/mwes-in-treebanks-from-survey-to-guidelines
Repo
Framework

Incorporating Relational Knowledge into Word Representations using Subspace Regularization

Title Incorporating Relational Knowledge into Word Representations using Subspace Regularization
Authors Abhishek Kumar, Jun Araki
Abstract
Tasks Chunking, Dependency Parsing, Knowledge Base Completion, Machine Translation, Named Entity Recognition, Relation Extraction, Sentiment Analysis, Word Embeddings
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-2082/
PDF https://www.aclweb.org/anthology/P16-2082
PWC https://paperswithcode.com/paper/incorporating-relational-knowledge-into-word
Repo
Framework

metaTED: a Corpus of Metadiscourse for Spoken Language

Title metaTED: a Corpus of Metadiscourse for Spoken Language
Authors Rui Correia, Nuno Mamede, Jorge Baptista, Maxine Eskenazi
Abstract This paper describes metaTED ― a freely available corpus of metadiscursive acts in spoken language collected via crowdsourcing. Metadiscursive acts were annotated on a set of 180 randomly chosen TED talks in English, spanning over different speakers and topics. The taxonomy used for annotation is composed of 16 categories, adapted from Adel(2010). This adaptation takes into account both the material to annotate and the setting in which the annotation task is performed. The crowdsourcing setup is described, including considerations regarding training and quality control. The collected data is evaluated in terms of quantity of occurrences, inter-annotator agreement, and annotation related measures (such as average time on task and self-reported confidence). Results show different levels of agreement among metadiscourse acts (α ∈ [0.15; 0.49]). To further assess the collected material, a subset of the annotations was submitted to expert appreciation, who validated which of the marked occurrences truly correspond to instances of the metadiscursive act at hand. Similarly to what happened with the crowd, experts revealed different levels of agreement between categories (α ∈ [0.18; 0.72]). The paper concludes with a discussion on the applicability of metaTED with respect to each of the 16 categories of metadiscourse.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1618/
PDF https://www.aclweb.org/anthology/L16-1618
PWC https://paperswithcode.com/paper/metated-a-corpus-of-metadiscourse-for-spoken
Repo
Framework

Controlled Propagation of Concept Annotations in Textual Corpora

Title Controlled Propagation of Concept Annotations in Textual Corpora
Authors Cyril Grouin
Abstract In this paper, we presented the annotation propagation tool we designed to be used in conjunction with the BRAT rapid annotation tool. We designed two experiments to annotate a corpus of 60 files, first not using our tool, second using our propagation tool. We evaluated the annotation time and the quality of annotations. We shown that using the annotation propagation tool reduces by 31.7{%} the time spent to annotate the corpus with a better quality of results.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1643/
PDF https://www.aclweb.org/anthology/L16-1643
PWC https://paperswithcode.com/paper/controlled-propagation-of-concept-annotations
Repo
Framework

Identifying Eyewitness News-worthy Events on Twitter

Title Identifying Eyewitness News-worthy Events on Twitter
Authors Erika Doggett, Alej Cantarero, ro
Abstract
Tasks
Published 2016-11-01
URL https://www.aclweb.org/anthology/W16-6202/
PDF https://www.aclweb.org/anthology/W16-6202
PWC https://paperswithcode.com/paper/identifying-eyewitness-news-worthy-events-on
Repo
Framework

Computational Natural Language Learning: +-20years +-Data +-Features +-Multimodal +-Bioplausible

Title Computational Natural Language Learning: +-20years +-Data +-Features +-Multimodal +-Bioplausible
Authors David Powers
Abstract
Tasks Machine Translation
Published 2016-08-01
URL https://www.aclweb.org/anthology/K16-1001/
PDF https://www.aclweb.org/anthology/K16-1001
PWC https://paperswithcode.com/paper/computational-natural-language-learning
Repo
Framework

Interpretable Nonlinear Dynamic Modeling of Neural Trajectories

Title Interpretable Nonlinear Dynamic Modeling of Neural Trajectories
Authors Yuan Zhao, Il Memming Park
Abstract A central challenge in neuroscience is understanding how neural system implements computation through its dynamics. We propose a nonlinear time series model aimed at characterizing interpretable dynamics from neural trajectories. Our model assumes low-dimensional continuous dynamics in a finite volume. It incorporates a prior assumption about globally contractional dynamics to avoid overly enthusiastic extrapolation outside of the support of observed trajectories. We show that our model can recover qualitative features of the phase portrait such as attractors, slow points, and bifurcations, while also producing reliable long-term future predictions in a variety of dynamical models and in real neural data.
Tasks Time Series
Published 2016-12-01
URL http://papers.nips.cc/paper/6543-interpretable-nonlinear-dynamic-modeling-of-neural-trajectories
PDF http://papers.nips.cc/paper/6543-interpretable-nonlinear-dynamic-modeling-of-neural-trajectories.pdf
PWC https://paperswithcode.com/paper/interpretable-nonlinear-dynamic-modeling-of
Repo
Framework

Bayesian Language Model based on Mixture of Segmental Contexts for Spontaneous Utterances with Unexpected Words

Title Bayesian Language Model based on Mixture of Segmental Contexts for Spontaneous Utterances with Unexpected Words
Authors Ryu Takeda, Kazunori Komatani
Abstract This paper describes a Bayesian language model for predicting spontaneous utterances. People sometimes say unexpected words, such as fillers or hesitations, that cause the miss-prediction of words in normal N-gram models. Our proposed model considers mixtures of possible segmental contexts, that is, a kind of context-word selection. It can reduce negative effects caused by unexpected words because it represents conditional occurrence probabilities of a word as weighted mixtures of possible segmental contexts. The tuning of mixture weights is the key issue in this approach as the segment patterns becomes numerous, thus we resolve it by using Bayesian model. The generative process is achieved by combining the stick-breaking process and the process used in the variable order Pitman-Yor language model. Experimental evaluations revealed that our model outperformed contiguous N-gram models in terms of perplexity for noisy text including hesitations.
Tasks Language Modelling, Speech Recognition
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1016/
PDF https://www.aclweb.org/anthology/C16-1016
PWC https://paperswithcode.com/paper/bayesian-language-model-based-on-mixture-of
Repo
Framework

Crosswalking from CMDI to Dublin Core and MARC 21

Title Crosswalking from CMDI to Dublin Core and MARC 21
Authors Claus Zinn, Thorsten Trippel, Steve Kaminski, Emanuel Dima
Abstract The Component MetaData Infrastructure (CMDI) is a framework for the creation and usage of metadata formats to describe all kinds of resources in the CLARIN world. To better connect to the library world, and to allow librarians to enter metadata for linguistic resources into their catalogues, a crosswalk from CMDI-based formats to bibliographic standards is required. The general and rather fluid nature of CMDI, however, makes it hard to map arbitrary CMDI schemas to metadata standards such as Dublin Core (DC) or MARC 21, which have a mature, well-defined and fixed set of field descriptors. In this paper, we address the issue and propose crosswalks between CMDI-based profiles originating from the NaLiDa project and DC and MARC 21, respectively.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1395/
PDF https://www.aclweb.org/anthology/L16-1395
PWC https://paperswithcode.com/paper/crosswalking-from-cmdi-to-dublin-core-and
Repo
Framework
comments powered by Disqus