May 5, 2019

2377 words 12 mins read

Paper Group NAWR 10

A Semantically Compositional Annotation Scheme for Time Normalization. Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages. FLAT: Constructing a CLARIN Compatible Home for Language Resources. Temporal Action Detection Using a Statistical Language Model. Building compositional semantics and higher-order inf …

A Semantically Compositional Annotation Scheme for Time Normalization


Title	A Semantically Compositional Annotation Scheme for Time Normalization
Authors	Steven Bethard, Jonathan Parker
Abstract	We present a new annotation scheme for normalizing time expressions, such as {`}three days ago{''}, to computer-readable forms, such as 2016-03-07. The annotation scheme addresses several weaknesses of the existing TimeML standard, allowing the representation of time expressions that align to more than one calendar unit (e.g., {`}the past three summers{''}), that are defined relative to events (e.g., {`}three weeks postoperative{''}), and that are unions or intersections of smaller time expressions (e.g., {`}Tuesdays and Thursdays{''}). It achieves this by modeling time expression interpretation as the semantic composition of temporal operators like UNION, NEXT, and AFTER. We have applied the annotation scheme to 34 documents so far, producing 1104 annotations, and achieving inter-annotator agreement of 0.821.
Tasks	Semantic Composition
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1599/
PDF	https://www.aclweb.org/anthology/L16-1599
PWC	https://paperswithcode.com/paper/a-semantically-compositional-annotation
Repo	https://github.com/bethard/timenorm
Framework	none

Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages


Title	Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages
Authors	Scott Piao, Paul Rayson, Dawn Archer, Francesca Bianchi, Carmen Dayrell, Mahmoud El-Haj, Ricardo-Mar{'\i}a Jim{'e}nez, Dawn Knight, Michal K{\v{r}}en, Laura L{"o}fberg, Rao Muhammad Adeel Nawab, Jawad Shafi, Phoey Lee Teh, Olga Mudraya
Abstract	The last two decades have seen the development of various semantic lexical resources such as WordNet (Miller, 1995) and the USAS semantic lexicon (Rayson et al., 2004), which have played an important role in the areas of natural language processing and corpus-based studies. Recently, increasing efforts have been devoted to extending the semantic frameworks of existing lexical knowledge resources to cover more languages, such as EuroWordNet and Global WordNet. In this paper, we report on the construction of large-scale multilingual semantic lexicons for twelve languages, which employ the unified Lancaster semantic taxonomy and provide a multilingual lexical knowledge base for the automatic UCREL semantic annotation system (USAS). Our work contributes towards the goal of constructing larger-scale and higher-quality multilingual semantic lexical resources and developing corpus annotation tools based on them. Lexical coverage is an important factor concerning the quality of the lexicons and the performance of the corpus annotation tools, and in this experiment we focus on evaluating the lexical coverage achieved by the multilingual lexicons and semantic annotation tools based on them. Our evaluation shows that some semantic lexicons such as those for Finnish and Italian have achieved lexical coverage of over 90{%} while others need further expansion.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1416/
PDF	https://www.aclweb.org/anthology/L16-1416
PWC	https://paperswithcode.com/paper/lexical-coverage-evaluation-of-large-scale
Repo	https://github.com/UCREL/Multilingual-USAS
Framework	none

FLAT: Constructing a CLARIN Compatible Home for Language Resources


Title	FLAT: Constructing a CLARIN Compatible Home for Language Resources
Authors	Menzo Windhouwer, Marc Kemps-Snijders, Paul Trilsbeek, Andr{'e} Moreira, Bas van der Veen, Guilherme Silva, Daniel von Reihn
Abstract	Language resources are valuable assets, both for institutions and researchers. To safeguard these resources requirements for repository systems and data management have been specified by various branch organizations, e.g., CLARIN and the Data Seal of Approval. This paper describes these and some additional ones posed by the authors{'} home institutions. And it shows how they are met by FLAT, to provide a new home for language resources. The basis of FLAT is formed by the Fedora Commons repository system. This repository system can meet many of the requirements out-of-the box, but still additional configuration and some development work is needed to meet the remaining ones, e.g., to add support for Handles and Component Metadata. This paper describes design decisions taken in the construction of FLAT{'}s system architecture via a mix-and-match strategy, with a preference for the reuse of existing solutions. FLAT is developed and used by the Meertens Institute and The Language Archive, but is also freely available for anyone in need of a CLARIN-compliant repository for their language resources.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1393/
PDF	https://www.aclweb.org/anthology/L16-1393
PWC	https://paperswithcode.com/paper/flat-constructing-a-clarin-compatible-home
Repo	https://github.com/TheLanguageArchive/FLAT
Framework	none

Temporal Action Detection Using a Statistical Language Model


Title	Temporal Action Detection Using a Statistical Language Model
Authors	Alexander Richard, Juergen Gall
Abstract	While current approaches to action recognition on pre-segmented video clips already achieve high accuracies, temporal action detection is still far from comparably good results. Automatically locating and classifying the relevant action segments in videos of varying lengths proves to be a challenging task. We propose a novel method for temporal action detection including statistical length and language modeling to represent temporal and contextual structure. Our approach aims at globally optimizing the joint probability of three components, a length and language model and a discriminative action model, without making intermediate decisions. The problem of finding the most likely action sequence and the corresponding segment boundaries in an exponentially large search space is addressed by dynamic programming. We provide an extensive evaluation of each model component on Thumos 14, a large action detection dataset, and report state-of-the-art results on three datasets.
Tasks	Action Detection, Language Modelling, Temporal Action Localization
Published	2016-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2016/html/Richard_Temporal_Action_Detection_CVPR_2016_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2016/papers/Richard_Temporal_Action_Detection_CVPR_2016_paper.pdf
PWC	https://paperswithcode.com/paper/temporal-action-detection-using-a-statistical
Repo	https://github.com/alexanderrichard/squirrel
Framework	none

Building compositional semantics and higher-order inference system for a wide-coverage Japanese CCG parser


Title	Building compositional semantics and higher-order inference system for a wide-coverage Japanese CCG parser
Authors	Koji Mineshima, Ribeka Tanaka, Pascual Mart{'\i}nez-G{'o}mez, Yusuke Miyao, Daisuke Bekki
Abstract
Tasks	Dependency Parsing, Natural Language Inference, Question Answering, Semantic Parsing
Published	2016-11-01
URL	https://www.aclweb.org/anthology/D16-1242/
PDF	https://www.aclweb.org/anthology/D16-1242
PWC	https://paperswithcode.com/paper/building-compositional-semantics-and-higher
Repo	https://github.com/mynlp/ccg2lambda
Framework	none

Semi-supervised Named Entity Recognition in noisy-text


Title	Semi-supervised Named Entity Recognition in noisy-text
Authors	Shubhanshu Mishra, Jana Diesner
Abstract	Many of the existing Named Entity Recognition (NER) solutions are built based on news corpus data with proper syntax. These solutions might not lead to highly accurate results when being applied to noisy, user generated data, e.g., tweets, which can feature sloppy spelling, concept drift, and limited contextualization of terms and concepts due to length constraints. The models described in this paper are based on linear chain conditional random fields (CRFs), use the BIEOU encoding scheme, and leverage random feature dropout for up-sampling the training data. The considered features include word clusters and pre-trained distributed word representations, updated gazetteer features, and global context predictions. The latter feature allows for ingesting the meaning of new or rare tokens into the system via unsupervised learning and for alleviating the need to learn lexicon based features, which usually tend to be high dimensional. In this paper, we report on the solution [ST] we submitted to the WNUT 2016 NER shared task. We also present an improvement over our original submission [SI], which we built by using semi-supervised learning on labelled training data and pre-trained resourced constructed from unlabelled tweet data. Our ST solution achieved an F1 score of 1.2{%} higher than the baseline (35.1{%} F1) for the task of extracting 10 entity types. The SI resulted in an increase of 8.2{%} in F1 score over the base-line (7.08{%} over ST). Finally, the SI model{'}s evaluation on the test data achieved a F1 score of 47.3{%} ({\textasciitilde}1.15{%} increase over the 2nd best submitted solution). Our experimental setup and results are available as a standalone twitter NER tool at \url{https://github.com/napsternxg/TwitterNER}.
Tasks	Named Entity Recognition
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-3927/
PDF	https://www.aclweb.org/anthology/W16-3927
PWC	https://paperswithcode.com/paper/semi-supervised-named-entity-recognition-in
Repo	https://github.com/napsternxg/TwitterNER
Framework	none

A Neural Network Approach for Knowledge-Driven Response Generation


Title	A Neural Network Approach for Knowledge-Driven Response Generation
Authors	Pavlos Vougiouklis, Jonathon Hare, Elena Simperl
Abstract	We present a novel response generation system. The system assumes the hypothesis that participants in a conversation base their response not only on previous dialog utterances but also on their background knowledge. Our model is based on a Recurrent Neural Network (RNN) that is trained over concatenated sequences of comments, a Convolution Neural Network that is trained over Wikipedia sentences and a formulation that couples the two trained embeddings in a multimodal space. We create a dataset of aligned Wikipedia sentences and sequences of Reddit utterances, which we we use to train our model. Given a sequence of past utterances and a set of sentences that represent the background knowledge, our end-to-end learnable model is able to generate context-sensitive and knowledge-driven responses by leveraging the alignment of two different data sources. Our approach achieves up to 55{%} improvement in perplexity compared to purely sequential models based on RNNs that are trained only on sequences of utterances.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1318/
PDF	https://www.aclweb.org/anthology/C16-1318
PWC	https://paperswithcode.com/paper/a-neural-network-approach-for-knowledge
Repo	https://github.com/pvougiou/Aligning-Reddit-and-Wikipedia
Framework	none

Unsupervised Compound Splitting With Distributional Semantics Rivals Supervised Methods


Title	Unsupervised Compound Splitting With Distributional Semantics Rivals Supervised Methods
Authors	Martin Riedl, Chris Biemann
Abstract
Tasks	Lemmatization
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1075/
PDF	https://www.aclweb.org/anthology/N16-1075
PWC	https://paperswithcode.com/paper/unsupervised-compound-splitting-with
Repo	https://github.com/riedlma/SECOS
Framework	none

Coreference in Prague Czech-English Dependency Treebank


Title	Coreference in Prague Czech-English Dependency Treebank
Authors	Anna Nedoluzhko, Michal Nov{'a}k, Silvie Cinkov{'a}, Marie Mikulov{'a}, Ji{\v{r}}{'\i} M{'\i}rovsk{'y}
Abstract	We present coreference annotation on parallel Czech-English texts of the Prague Czech-English Dependency Treebank (PCEDT). The paper describes innovations made to PCEDT 2.0 concerning coreference, as well as coreference information already present there. We characterize the coreference annotation scheme, give the statistics and compare our annotation with the coreference annotation in Ontonotes and Prague Dependency Treebank for Czech. We also present the experiments made using this corpus to improve the alignment of coreferential expressions, which helps us to collect better statistics of correspondences between types of coreferential relations in Czech and English. The corpus released as PCEDT 2.0 Coref is publicly available.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1026/
PDF	https://www.aclweb.org/anthology/L16-1026
PWC	https://paperswithcode.com/paper/coreference-in-prague-czech-english
Repo	https://github.com/ufal/pcedt2.0-coref
Framework	none

Towards perspective-free object counting with deep learning


Title	Towards perspective-free object counting with deep learning
Authors	Daniel O˜noro-Rubio, Roberto J. L´opez-Sastre
Abstract	In this paper we address the problem of counting objects instances in images. Our models are able to precisely estimate the number of vehicles in a traffic congestion, or to count the humans in a very crowded scene. Our first contribution is the proposal of a novel convolutional neural network solution, named Counting CNN (CCNN). Essentially, the CCNN is formulated as a regression model where the network learns how to map the appearance of the image patches to their corresponding object density maps. Our second contribution consists in a scale-aware counting model, the Hydra CNN, able to estimate object densities in different very crowded scenarios where no geometric information of the scene can be provided. Hydra CNN learns a multiscale non-linear regression model which uses a pyramid of image patches extracted at multiple scales to perform the final density prediction. We report an extensive experimental evaluation, using up to three different object counting benchmarks, where we show how our solutions achieve a state-of-the-art performance.
Tasks	Object Counting
Published	2016-01-01
URL	http://agamenon.tsc.uah.es/Investigacion/gram/publications/eccv2016-onoro.pdf
PDF	http://agamenon.tsc.uah.es/Investigacion/gram/publications/eccv2016-onoro.pdf
PWC	https://paperswithcode.com/paper/towards-perspective-free-object-counting-with
Repo	https://github.com/gramuah/ccnn
Framework	none

Improved Variational Inference with Inverse Autoregressive Flow


Title	Improved Variational Inference with Inverse Autoregressive Flow
Authors	Durk P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling
Abstract	The framework of normalizing flows provides a general strategy for flexible variational inference of posteriors over latent variables. We propose a new type of normalizing flow, inverse autoregressive flow (IAF), that, in contrast to earlier published flows, scales well to high-dimensional latent spaces. The proposed flow consists of a chain of invertible transformations, where each transformation is based on an autoregressive neural network. In experiments, we show that IAF significantly improves upon diagonal Gaussian approximate posteriors. In addition, we demonstrate that a novel type of variational autoencoder, coupled with IAF, is competitive with neural autoregressive models in terms of attained log-likelihood on natural images, while allowing significantly faster synthesis.
Tasks	Image Generation
Published	2016-12-01
URL	http://papers.nips.cc/paper/6581-improved-variational-inference-with-inverse-autoregressive-flow
PDF	http://papers.nips.cc/paper/6581-improved-variational-inference-with-inverse-autoregressive-flow.pdf
PWC	https://paperswithcode.com/paper/improved-variational-inference-with-inverse
Repo	https://github.com/openai/iaf
Framework	tf

A Segmentation Matrix Method for Chinese Segmentation Ambiguity Analysis


Title	A Segmentation Matrix Method for Chinese Segmentation Ambiguity Analysis
Authors	Yanping Chen, Qinghua Zheng, Feng Tian, Deli Zheng
Abstract
Tasks	Chinese Word Segmentation
Published	2016-06-01
URL	https://www.aclweb.org/anthology/O16-2001/
PDF	https://www.aclweb.org/anthology/O16-2001
PWC	https://paperswithcode.com/paper/a-segmentation-matrix-method-for-chinese
Repo	https://github.com/YPench/SMatrix
Framework	none

A Gold Standard for Scalar Adjectives


Title	A Gold Standard for Scalar Adjectives
Authors	Bryan Wilkinson, Oates Tim
Abstract	We present a gold standard for evaluating scale membership and the order of scalar adjectives. In addition to evaluating existing methods of ordering adjectives, this knowledge will aid in studying the organization of adjectives in the lexicon. This resource is the result of two elicitation tasks conducted with informants from Amazon Mechanical Turk. The first task is notable for gathering open-ended lexical data from informants. The data is analyzed using Cultural Consensus Theory, a framework from anthropology, to not only determine scale membership but also the level of consensus among the informants (Romney et al., 1986). The second task gathers a culturally salient ordering of the words determined to be members. We use this method to produce 12 scales of adjectives for use in evaluation.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1424/
PDF	https://www.aclweb.org/anthology/L16-1424
PWC	https://paperswithcode.com/paper/a-gold-standard-for-scalar-adjectives
Repo	https://github.com/Coral-Lab/scales
Framework	none

Data-Driven Spelling Correction using Weighted Finite-State Methods


Title	Data-Driven Spelling Correction using Weighted Finite-State Methods
Authors	Miikka Silfverberg, Pekka Kauppinen, Krister Lind{'e}n
Abstract
Tasks	Optical Character Recognition, Spelling Correction
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2406/
PDF	https://www.aclweb.org/anthology/W16-2406
PWC	https://paperswithcode.com/paper/data-driven-spelling-correction-using
Repo	https://github.com/mpsilfve/ocrpp
Framework	none

HeLI, a Word-Based Backoff Method for Language Identification


Title	HeLI, a Word-Based Backoff Method for Language Identification
Authors	Tommi Jauhiainen, Krister Lind{'e}n, Heidi Jauhiainen
Abstract	In this paper we describe the Helsinki language identification method, HeLI, and the resources we created for and used in the 3rd edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial 2016 workshop. The shared task comprised of a total of 8 tracks, of which we participated in 7. The shared task had a record number of participants, with 17 teams providing results for the closed track of the test set A. Our system reached the 2nd position in 4 tracks (A closed and open, B1 open and B2 open) and in this paper we are focusing on the methods and data used for those tracks. We describe our word-based backoff method in mathematical notation. We also describe how we selected the corpus we used in the open tracks.
Tasks	Language Identification
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4820/
PDF	https://www.aclweb.org/anthology/W16-4820
PWC	https://paperswithcode.com/paper/heli-a-word-based-backoff-method-for-language
Repo	https://github.com/tosaja/HeLI
Framework	none