May 5, 2019

2377 words 12 mins read

Paper Group NAWR 10

Paper Group NAWR 10

A Semantically Compositional Annotation Scheme for Time Normalization. Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages. FLAT: Constructing a CLARIN Compatible Home for Language Resources. Temporal Action Detection Using a Statistical Language Model. Building compositional semantics and higher-order inf …

A Semantically Compositional Annotation Scheme for Time Normalization

Title A Semantically Compositional Annotation Scheme for Time Normalization
Authors Steven Bethard, Jonathan Parker
Abstract We present a new annotation scheme for normalizing time expressions, such as {}three days ago{''}, to computer-readable forms, such as 2016-03-07. The annotation scheme addresses several weaknesses of the existing TimeML standard, allowing the representation of time expressions that align to more than one calendar unit (e.g., {}the past three summers{''}), that are defined relative to events (e.g., {}three weeks postoperative{''}), and that are unions or intersections of smaller time expressions (e.g., {}Tuesdays and Thursdays{''}). It achieves this by modeling time expression interpretation as the semantic composition of temporal operators like UNION, NEXT, and AFTER. We have applied the annotation scheme to 34 documents so far, producing 1104 annotations, and achieving inter-annotator agreement of 0.821.
Tasks Semantic Composition
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1599/
PDF https://www.aclweb.org/anthology/L16-1599
PWC https://paperswithcode.com/paper/a-semantically-compositional-annotation
Repo https://github.com/bethard/timenorm
Framework none

Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages

Title Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages
Authors Scott Piao, Paul Rayson, Dawn Archer, Francesca Bianchi, Carmen Dayrell, Mahmoud El-Haj, Ricardo-Mar{'\i}a Jim{'e}nez, Dawn Knight, Michal K{\v{r}}en, Laura L{"o}fberg, Rao Muhammad Adeel Nawab, Jawad Shafi, Phoey Lee Teh, Olga Mudraya
Abstract The last two decades have seen the development of various semantic lexical resources such as WordNet (Miller, 1995) and the USAS semantic lexicon (Rayson et al., 2004), which have played an important role in the areas of natural language processing and corpus-based studies. Recently, increasing efforts have been devoted to extending the semantic frameworks of existing lexical knowledge resources to cover more languages, such as EuroWordNet and Global WordNet. In this paper, we report on the construction of large-scale multilingual semantic lexicons for twelve languages, which employ the unified Lancaster semantic taxonomy and provide a multilingual lexical knowledge base for the automatic UCREL semantic annotation system (USAS). Our work contributes towards the goal of constructing larger-scale and higher-quality multilingual semantic lexical resources and developing corpus annotation tools based on them. Lexical coverage is an important factor concerning the quality of the lexicons and the performance of the corpus annotation tools, and in this experiment we focus on evaluating the lexical coverage achieved by the multilingual lexicons and semantic annotation tools based on them. Our evaluation shows that some semantic lexicons such as those for Finnish and Italian have achieved lexical coverage of over 90{%} while others need further expansion.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1416/
PDF https://www.aclweb.org/anthology/L16-1416
PWC https://paperswithcode.com/paper/lexical-coverage-evaluation-of-large-scale
Repo https://github.com/UCREL/Multilingual-USAS
Framework none

FLAT: Constructing a CLARIN Compatible Home for Language Resources

Title FLAT: Constructing a CLARIN Compatible Home for Language Resources
Authors Menzo Windhouwer, Marc Kemps-Snijders, Paul Trilsbeek, Andr{'e} Moreira, Bas van der Veen, Guilherme Silva, Daniel von Reihn
Abstract Language resources are valuable assets, both for institutions and researchers. To safeguard these resources requirements for repository systems and data management have been specified by various branch organizations, e.g., CLARIN and the Data Seal of Approval. This paper describes these and some additional ones posed by the authors{'} home institutions. And it shows how they are met by FLAT, to provide a new home for language resources. The basis of FLAT is formed by the Fedora Commons repository system. This repository system can meet many of the requirements out-of-the box, but still additional configuration and some development work is needed to meet the remaining ones, e.g., to add support for Handles and Component Metadata. This paper describes design decisions taken in the construction of FLAT{'}s system architecture via a mix-and-match strategy, with a preference for the reuse of existing solutions. FLAT is developed and used by the Meertens Institute and The Language Archive, but is also freely available for anyone in need of a CLARIN-compliant repository for their language resources.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1393/
PDF https://www.aclweb.org/anthology/L16-1393
PWC https://paperswithcode.com/paper/flat-constructing-a-clarin-compatible-home
Repo https://github.com/TheLanguageArchive/FLAT
Framework none

Temporal Action Detection Using a Statistical Language Model

Title Temporal Action Detection Using a Statistical Language Model
Authors Alexander Richard, Juergen Gall
Abstract While current approaches to action recognition on pre-segmented video clips already achieve high accuracies, temporal action detection is still far from comparably good results. Automatically locating and classifying the relevant action segments in videos of varying lengths proves to be a challenging task. We propose a novel method for temporal action detection including statistical length and language modeling to represent temporal and contextual structure. Our approach aims at globally optimizing the joint probability of three components, a length and language model and a discriminative action model, without making intermediate decisions. The problem of finding the most likely action sequence and the corresponding segment boundaries in an exponentially large search space is addressed by dynamic programming. We provide an extensive evaluation of each model component on Thumos 14, a large action detection dataset, and report state-of-the-art results on three datasets.
Tasks Action Detection, Language Modelling, Temporal Action Localization
Published 2016-06-01
URL http://openaccess.thecvf.com/content_cvpr_2016/html/Richard_Temporal_Action_Detection_CVPR_2016_paper.html
PDF http://openaccess.thecvf.com/content_cvpr_2016/papers/Richard_Temporal_Action_Detection_CVPR_2016_paper.pdf
PWC https://paperswithcode.com/paper/temporal-action-detection-using-a-statistical
Repo https://github.com/alexanderrichard/squirrel
Framework none

Building compositional semantics and higher-order inference system for a wide-coverage Japanese CCG parser

Title Building compositional semantics and higher-order inference system for a wide-coverage Japanese CCG parser
Authors Koji Mineshima, Ribeka Tanaka, Pascual Mart{'\i}nez-G{'o}mez, Yusuke Miyao, Daisuke Bekki
Abstract
Tasks Dependency Parsing, Natural Language Inference, Question Answering, Semantic Parsing
Published 2016-11-01
URL https://www.aclweb.org/anthology/D16-1242/
PDF https://www.aclweb.org/anthology/D16-1242
PWC https://paperswithcode.com/paper/building-compositional-semantics-and-higher
Repo https://github.com/mynlp/ccg2lambda
Framework none

Semi-supervised Named Entity Recognition in noisy-text

Title Semi-supervised Named Entity Recognition in noisy-text
Authors Shubhanshu Mishra, Jana Diesner
Abstract Many of the existing Named Entity Recognition (NER) solutions are built based on news corpus data with proper syntax. These solutions might not lead to highly accurate results when being applied to noisy, user generated data, e.g., tweets, which can feature sloppy spelling, concept drift, and limited contextualization of terms and concepts due to length constraints. The models described in this paper are based on linear chain conditional random fields (CRFs), use the BIEOU encoding scheme, and leverage random feature dropout for up-sampling the training data. The considered features include word clusters and pre-trained distributed word representations, updated gazetteer features, and global context predictions. The latter feature allows for ingesting the meaning of new or rare tokens into the system via unsupervised learning and for alleviating the need to learn lexicon based features, which usually tend to be high dimensional. In this paper, we report on the solution [ST] we submitted to the WNUT 2016 NER shared task. We also present an improvement over our original submission [SI], which we built by using semi-supervised learning on labelled training data and pre-trained resourced constructed from unlabelled tweet data. Our ST solution achieved an F1 score of 1.2{%} higher than the baseline (35.1{%} F1) for the task of extracting 10 entity types. The SI resulted in an increase of 8.2{%} in F1 score over the base-line (7.08{%} over ST). Finally, the SI model{'}s evaluation on the test data achieved a F1 score of 47.3{%} ({\textasciitilde}1.15{%} increase over the 2nd best submitted solution). Our experimental setup and results are available as a standalone twitter NER tool at \url{https://github.com/napsternxg/TwitterNER}.
Tasks Named Entity Recognition
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-3927/
PDF https://www.aclweb.org/anthology/W16-3927
PWC https://paperswithcode.com/paper/semi-supervised-named-entity-recognition-in
Repo https://github.com/napsternxg/TwitterNER
Framework none

A Neural Network Approach for Knowledge-Driven Response Generation

Title A Neural Network Approach for Knowledge-Driven Response Generation
Authors Pavlos Vougiouklis, Jonathon Hare, Elena Simperl
Abstract We present a novel response generation system. The system assumes the hypothesis that participants in a conversation base their response not only on previous dialog utterances but also on their background knowledge. Our model is based on a Recurrent Neural Network (RNN) that is trained over concatenated sequences of comments, a Convolution Neural Network that is trained over Wikipedia sentences and a formulation that couples the two trained embeddings in a multimodal space. We create a dataset of aligned Wikipedia sentences and sequences of Reddit utterances, which we we use to train our model. Given a sequence of past utterances and a set of sentences that represent the background knowledge, our end-to-end learnable model is able to generate context-sensitive and knowledge-driven responses by leveraging the alignment of two different data sources. Our approach achieves up to 55{%} improvement in perplexity compared to purely sequential models based on RNNs that are trained only on sequences of utterances.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1318/
PDF https://www.aclweb.org/anthology/C16-1318
PWC https://paperswithcode.com/paper/a-neural-network-approach-for-knowledge
Repo https://github.com/pvougiou/Aligning-Reddit-and-Wikipedia
Framework none

Unsupervised Compound Splitting With Distributional Semantics Rivals Supervised Methods

Title Unsupervised Compound Splitting With Distributional Semantics Rivals Supervised Methods
Authors Martin Riedl, Chris Biemann
Abstract
Tasks Lemmatization
Published 2016-06-01
URL https://www.aclweb.org/anthology/N16-1075/
PDF https://www.aclweb.org/anthology/N16-1075
PWC https://paperswithcode.com/paper/unsupervised-compound-splitting-with
Repo https://github.com/riedlma/SECOS
Framework none

Coreference in Prague Czech-English Dependency Treebank

Title Coreference in Prague Czech-English Dependency Treebank
Authors Anna Nedoluzhko, Michal Nov{'a}k, Silvie Cinkov{'a}, Marie Mikulov{'a}, Ji{\v{r}}{'\i} M{'\i}rovsk{'y}
Abstract We present coreference annotation on parallel Czech-English texts of the Prague Czech-English Dependency Treebank (PCEDT). The paper describes innovations made to PCEDT 2.0 concerning coreference, as well as coreference information already present there. We characterize the coreference annotation scheme, give the statistics and compare our annotation with the coreference annotation in Ontonotes and Prague Dependency Treebank for Czech. We also present the experiments made using this corpus to improve the alignment of coreferential expressions, which helps us to collect better statistics of correspondences between types of coreferential relations in Czech and English. The corpus released as PCEDT 2.0 Coref is publicly available.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1026/
PDF https://www.aclweb.org/anthology/L16-1026
PWC https://paperswithcode.com/paper/coreference-in-prague-czech-english
Repo https://github.com/ufal/pcedt2.0-coref
Framework none

Towards perspective-free object counting with deep learning

Title Towards perspective-free object counting with deep learning
Authors Daniel O˜noro-Rubio, Roberto J. L´opez-Sastre
Abstract In this paper we address the problem of counting objects instances in images. Our models are able to precisely estimate the number of vehicles in a traffic congestion, or to count the humans in a very crowded scene. Our first contribution is the proposal of a novel convolutional neural network solution, named Counting CNN (CCNN). Essentially, the CCNN is formulated as a regression model where the network learns how to map the appearance of the image patches to their corresponding object density maps. Our second contribution consists in a scale-aware counting model, the Hydra CNN, able to estimate object densities in different very crowded scenarios where no geometric information of the scene can be provided. Hydra CNN learns a multiscale non-linear regression model which uses a pyramid of image patches extracted at multiple scales to perform the final density prediction. We report an extensive experimental evaluation, using up to three different object counting benchmarks, where we show how our solutions achieve a state-of-the-art performance.
Tasks Object Counting
Published 2016-01-01
URL http://agamenon.tsc.uah.es/Investigacion/gram/publications/eccv2016-onoro.pdf
PDF http://agamenon.tsc.uah.es/Investigacion/gram/publications/eccv2016-onoro.pdf
PWC https://paperswithcode.com/paper/towards-perspective-free-object-counting-with
Repo https://github.com/gramuah/ccnn
Framework none

Improved Variational Inference with Inverse Autoregressive Flow

Title Improved Variational Inference with Inverse Autoregressive Flow
Authors Durk P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling
Abstract The framework of normalizing flows provides a general strategy for flexible variational inference of posteriors over latent variables. We propose a new type of normalizing flow, inverse autoregressive flow (IAF), that, in contrast to earlier published flows, scales well to high-dimensional latent spaces. The proposed flow consists of a chain of invertible transformations, where each transformation is based on an autoregressive neural network. In experiments, we show that IAF significantly improves upon diagonal Gaussian approximate posteriors. In addition, we demonstrate that a novel type of variational autoencoder, coupled with IAF, is competitive with neural autoregressive models in terms of attained log-likelihood on natural images, while allowing significantly faster synthesis.
Tasks Image Generation
Published 2016-12-01
URL http://papers.nips.cc/paper/6581-improved-variational-inference-with-inverse-autoregressive-flow
PDF http://papers.nips.cc/paper/6581-improved-variational-inference-with-inverse-autoregressive-flow.pdf
PWC https://paperswithcode.com/paper/improved-variational-inference-with-inverse
Repo https://github.com/openai/iaf
Framework tf

A Segmentation Matrix Method for Chinese Segmentation Ambiguity Analysis

Title A Segmentation Matrix Method for Chinese Segmentation Ambiguity Analysis
Authors Yanping Chen, Qinghua Zheng, Feng Tian, Deli Zheng
Abstract
Tasks Chinese Word Segmentation
Published 2016-06-01
URL https://www.aclweb.org/anthology/O16-2001/
PDF https://www.aclweb.org/anthology/O16-2001
PWC https://paperswithcode.com/paper/a-segmentation-matrix-method-for-chinese
Repo https://github.com/YPench/SMatrix
Framework none

A Gold Standard for Scalar Adjectives

Title A Gold Standard for Scalar Adjectives
Authors Bryan Wilkinson, Oates Tim
Abstract We present a gold standard for evaluating scale membership and the order of scalar adjectives. In addition to evaluating existing methods of ordering adjectives, this knowledge will aid in studying the organization of adjectives in the lexicon. This resource is the result of two elicitation tasks conducted with informants from Amazon Mechanical Turk. The first task is notable for gathering open-ended lexical data from informants. The data is analyzed using Cultural Consensus Theory, a framework from anthropology, to not only determine scale membership but also the level of consensus among the informants (Romney et al., 1986). The second task gathers a culturally salient ordering of the words determined to be members. We use this method to produce 12 scales of adjectives for use in evaluation.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1424/
PDF https://www.aclweb.org/anthology/L16-1424
PWC https://paperswithcode.com/paper/a-gold-standard-for-scalar-adjectives
Repo https://github.com/Coral-Lab/scales
Framework none

Data-Driven Spelling Correction using Weighted Finite-State Methods

Title Data-Driven Spelling Correction using Weighted Finite-State Methods
Authors Miikka Silfverberg, Pekka Kauppinen, Krister Lind{'e}n
Abstract
Tasks Optical Character Recognition, Spelling Correction
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2406/
PDF https://www.aclweb.org/anthology/W16-2406
PWC https://paperswithcode.com/paper/data-driven-spelling-correction-using
Repo https://github.com/mpsilfve/ocrpp
Framework none

HeLI, a Word-Based Backoff Method for Language Identification

Title HeLI, a Word-Based Backoff Method for Language Identification
Authors Tommi Jauhiainen, Krister Lind{'e}n, Heidi Jauhiainen
Abstract In this paper we describe the Helsinki language identification method, HeLI, and the resources we created for and used in the 3rd edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial 2016 workshop. The shared task comprised of a total of 8 tracks, of which we participated in 7. The shared task had a record number of participants, with 17 teams providing results for the closed track of the test set A. Our system reached the 2nd position in 4 tracks (A closed and open, B1 open and B2 open) and in this paper we are focusing on the methods and data used for those tracks. We describe our word-based backoff method in mathematical notation. We also describe how we selected the corpus we used in the open tracks.
Tasks Language Identification
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4820/
PDF https://www.aclweb.org/anthology/W16-4820
PWC https://paperswithcode.com/paper/heli-a-word-based-backoff-method-for-language
Repo https://github.com/tosaja/HeLI
Framework none
comments powered by Disqus