Paper Group NAWR 10
A Semantically Compositional Annotation Scheme for Time Normalization. Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages. FLAT: Constructing a CLARIN Compatible Home for Language Resources. Temporal Action Detection Using a Statistical Language Model. Building compositional semantics and higher-order inf …
A Semantically Compositional Annotation Scheme for Time Normalization
Title | A Semantically Compositional Annotation Scheme for Time Normalization |
Authors | Steven Bethard, Jonathan Parker |
Abstract | We present a new annotation scheme for normalizing time expressions, such as {}three days ago{''}, to computer-readable forms, such as 2016-03-07. The annotation scheme addresses several weaknesses of the existing TimeML standard, allowing the representation of time expressions that align to more than one calendar unit (e.g., { }the past three summers{''}), that are defined relative to events (e.g., {}three weeks postoperative{''}), and that are unions or intersections of smaller time expressions (e.g., { }Tuesdays and Thursdays{''}). It achieves this by modeling time expression interpretation as the semantic composition of temporal operators like UNION, NEXT, and AFTER. We have applied the annotation scheme to 34 documents so far, producing 1104 annotations, and achieving inter-annotator agreement of 0.821. |
Tasks | Semantic Composition |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1599/ |
https://www.aclweb.org/anthology/L16-1599 | |
PWC | https://paperswithcode.com/paper/a-semantically-compositional-annotation |
Repo | https://github.com/bethard/timenorm |
Framework | none |
Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages
Title | Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages |
Authors | Scott Piao, Paul Rayson, Dawn Archer, Francesca Bianchi, Carmen Dayrell, Mahmoud El-Haj, Ricardo-Mar{'\i}a Jim{'e}nez, Dawn Knight, Michal K{\v{r}}en, Laura L{"o}fberg, Rao Muhammad Adeel Nawab, Jawad Shafi, Phoey Lee Teh, Olga Mudraya |
Abstract | The last two decades have seen the development of various semantic lexical resources such as WordNet (Miller, 1995) and the USAS semantic lexicon (Rayson et al., 2004), which have played an important role in the areas of natural language processing and corpus-based studies. Recently, increasing efforts have been devoted to extending the semantic frameworks of existing lexical knowledge resources to cover more languages, such as EuroWordNet and Global WordNet. In this paper, we report on the construction of large-scale multilingual semantic lexicons for twelve languages, which employ the unified Lancaster semantic taxonomy and provide a multilingual lexical knowledge base for the automatic UCREL semantic annotation system (USAS). Our work contributes towards the goal of constructing larger-scale and higher-quality multilingual semantic lexical resources and developing corpus annotation tools based on them. Lexical coverage is an important factor concerning the quality of the lexicons and the performance of the corpus annotation tools, and in this experiment we focus on evaluating the lexical coverage achieved by the multilingual lexicons and semantic annotation tools based on them. Our evaluation shows that some semantic lexicons such as those for Finnish and Italian have achieved lexical coverage of over 90{%} while others need further expansion. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1416/ |
https://www.aclweb.org/anthology/L16-1416 | |
PWC | https://paperswithcode.com/paper/lexical-coverage-evaluation-of-large-scale |
Repo | https://github.com/UCREL/Multilingual-USAS |
Framework | none |
FLAT: Constructing a CLARIN Compatible Home for Language Resources
Title | FLAT: Constructing a CLARIN Compatible Home for Language Resources |
Authors | Menzo Windhouwer, Marc Kemps-Snijders, Paul Trilsbeek, Andr{'e} Moreira, Bas van der Veen, Guilherme Silva, Daniel von Reihn |
Abstract | Language resources are valuable assets, both for institutions and researchers. To safeguard these resources requirements for repository systems and data management have been specified by various branch organizations, e.g., CLARIN and the Data Seal of Approval. This paper describes these and some additional ones posed by the authors{'} home institutions. And it shows how they are met by FLAT, to provide a new home for language resources. The basis of FLAT is formed by the Fedora Commons repository system. This repository system can meet many of the requirements out-of-the box, but still additional configuration and some development work is needed to meet the remaining ones, e.g., to add support for Handles and Component Metadata. This paper describes design decisions taken in the construction of FLAT{'}s system architecture via a mix-and-match strategy, with a preference for the reuse of existing solutions. FLAT is developed and used by the Meertens Institute and The Language Archive, but is also freely available for anyone in need of a CLARIN-compliant repository for their language resources. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1393/ |
https://www.aclweb.org/anthology/L16-1393 | |
PWC | https://paperswithcode.com/paper/flat-constructing-a-clarin-compatible-home |
Repo | https://github.com/TheLanguageArchive/FLAT |
Framework | none |
Temporal Action Detection Using a Statistical Language Model
Title | Temporal Action Detection Using a Statistical Language Model |
Authors | Alexander Richard, Juergen Gall |
Abstract | While current approaches to action recognition on pre-segmented video clips already achieve high accuracies, temporal action detection is still far from comparably good results. Automatically locating and classifying the relevant action segments in videos of varying lengths proves to be a challenging task. We propose a novel method for temporal action detection including statistical length and language modeling to represent temporal and contextual structure. Our approach aims at globally optimizing the joint probability of three components, a length and language model and a discriminative action model, without making intermediate decisions. The problem of finding the most likely action sequence and the corresponding segment boundaries in an exponentially large search space is addressed by dynamic programming. We provide an extensive evaluation of each model component on Thumos 14, a large action detection dataset, and report state-of-the-art results on three datasets. |
Tasks | Action Detection, Language Modelling, Temporal Action Localization |
Published | 2016-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2016/html/Richard_Temporal_Action_Detection_CVPR_2016_paper.html |
http://openaccess.thecvf.com/content_cvpr_2016/papers/Richard_Temporal_Action_Detection_CVPR_2016_paper.pdf | |
PWC | https://paperswithcode.com/paper/temporal-action-detection-using-a-statistical |
Repo | https://github.com/alexanderrichard/squirrel |
Framework | none |
Building compositional semantics and higher-order inference system for a wide-coverage Japanese CCG parser
Title | Building compositional semantics and higher-order inference system for a wide-coverage Japanese CCG parser |
Authors | Koji Mineshima, Ribeka Tanaka, Pascual Mart{'\i}nez-G{'o}mez, Yusuke Miyao, Daisuke Bekki |
Abstract | |
Tasks | Dependency Parsing, Natural Language Inference, Question Answering, Semantic Parsing |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/D16-1242/ |
https://www.aclweb.org/anthology/D16-1242 | |
PWC | https://paperswithcode.com/paper/building-compositional-semantics-and-higher |
Repo | https://github.com/mynlp/ccg2lambda |
Framework | none |
Semi-supervised Named Entity Recognition in noisy-text
Title | Semi-supervised Named Entity Recognition in noisy-text |
Authors | Shubhanshu Mishra, Jana Diesner |
Abstract | Many of the existing Named Entity Recognition (NER) solutions are built based on news corpus data with proper syntax. These solutions might not lead to highly accurate results when being applied to noisy, user generated data, e.g., tweets, which can feature sloppy spelling, concept drift, and limited contextualization of terms and concepts due to length constraints. The models described in this paper are based on linear chain conditional random fields (CRFs), use the BIEOU encoding scheme, and leverage random feature dropout for up-sampling the training data. The considered features include word clusters and pre-trained distributed word representations, updated gazetteer features, and global context predictions. The latter feature allows for ingesting the meaning of new or rare tokens into the system via unsupervised learning and for alleviating the need to learn lexicon based features, which usually tend to be high dimensional. In this paper, we report on the solution [ST] we submitted to the WNUT 2016 NER shared task. We also present an improvement over our original submission [SI], which we built by using semi-supervised learning on labelled training data and pre-trained resourced constructed from unlabelled tweet data. Our ST solution achieved an F1 score of 1.2{%} higher than the baseline (35.1{%} F1) for the task of extracting 10 entity types. The SI resulted in an increase of 8.2{%} in F1 score over the base-line (7.08{%} over ST). Finally, the SI model{'}s evaluation on the test data achieved a F1 score of 47.3{%} ({\textasciitilde}1.15{%} increase over the 2nd best submitted solution). Our experimental setup and results are available as a standalone twitter NER tool at \url{https://github.com/napsternxg/TwitterNER}. |
Tasks | Named Entity Recognition |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-3927/ |
https://www.aclweb.org/anthology/W16-3927 | |
PWC | https://paperswithcode.com/paper/semi-supervised-named-entity-recognition-in |
Repo | https://github.com/napsternxg/TwitterNER |
Framework | none |
A Neural Network Approach for Knowledge-Driven Response Generation
Title | A Neural Network Approach for Knowledge-Driven Response Generation |
Authors | Pavlos Vougiouklis, Jonathon Hare, Elena Simperl |
Abstract | We present a novel response generation system. The system assumes the hypothesis that participants in a conversation base their response not only on previous dialog utterances but also on their background knowledge. Our model is based on a Recurrent Neural Network (RNN) that is trained over concatenated sequences of comments, a Convolution Neural Network that is trained over Wikipedia sentences and a formulation that couples the two trained embeddings in a multimodal space. We create a dataset of aligned Wikipedia sentences and sequences of Reddit utterances, which we we use to train our model. Given a sequence of past utterances and a set of sentences that represent the background knowledge, our end-to-end learnable model is able to generate context-sensitive and knowledge-driven responses by leveraging the alignment of two different data sources. Our approach achieves up to 55{%} improvement in perplexity compared to purely sequential models based on RNNs that are trained only on sequences of utterances. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1318/ |
https://www.aclweb.org/anthology/C16-1318 | |
PWC | https://paperswithcode.com/paper/a-neural-network-approach-for-knowledge |
Repo | https://github.com/pvougiou/Aligning-Reddit-and-Wikipedia |
Framework | none |
Unsupervised Compound Splitting With Distributional Semantics Rivals Supervised Methods
Title | Unsupervised Compound Splitting With Distributional Semantics Rivals Supervised Methods |
Authors | Martin Riedl, Chris Biemann |
Abstract | |
Tasks | Lemmatization |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-1075/ |
https://www.aclweb.org/anthology/N16-1075 | |
PWC | https://paperswithcode.com/paper/unsupervised-compound-splitting-with |
Repo | https://github.com/riedlma/SECOS |
Framework | none |
Coreference in Prague Czech-English Dependency Treebank
Title | Coreference in Prague Czech-English Dependency Treebank |
Authors | Anna Nedoluzhko, Michal Nov{'a}k, Silvie Cinkov{'a}, Marie Mikulov{'a}, Ji{\v{r}}{'\i} M{'\i}rovsk{'y} |
Abstract | We present coreference annotation on parallel Czech-English texts of the Prague Czech-English Dependency Treebank (PCEDT). The paper describes innovations made to PCEDT 2.0 concerning coreference, as well as coreference information already present there. We characterize the coreference annotation scheme, give the statistics and compare our annotation with the coreference annotation in Ontonotes and Prague Dependency Treebank for Czech. We also present the experiments made using this corpus to improve the alignment of coreferential expressions, which helps us to collect better statistics of correspondences between types of coreferential relations in Czech and English. The corpus released as PCEDT 2.0 Coref is publicly available. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1026/ |
https://www.aclweb.org/anthology/L16-1026 | |
PWC | https://paperswithcode.com/paper/coreference-in-prague-czech-english |
Repo | https://github.com/ufal/pcedt2.0-coref |
Framework | none |
Towards perspective-free object counting with deep learning
Title | Towards perspective-free object counting with deep learning |
Authors | Daniel O˜noro-Rubio, Roberto J. L´opez-Sastre |
Abstract | In this paper we address the problem of counting objects instances in images. Our models are able to precisely estimate the number of vehicles in a traffic congestion, or to count the humans in a very crowded scene. Our first contribution is the proposal of a novel convolutional neural network solution, named Counting CNN (CCNN). Essentially, the CCNN is formulated as a regression model where the network learns how to map the appearance of the image patches to their corresponding object density maps. Our second contribution consists in a scale-aware counting model, the Hydra CNN, able to estimate object densities in different very crowded scenarios where no geometric information of the scene can be provided. Hydra CNN learns a multiscale non-linear regression model which uses a pyramid of image patches extracted at multiple scales to perform the final density prediction. We report an extensive experimental evaluation, using up to three different object counting benchmarks, where we show how our solutions achieve a state-of-the-art performance. |
Tasks | Object Counting |
Published | 2016-01-01 |
URL | http://agamenon.tsc.uah.es/Investigacion/gram/publications/eccv2016-onoro.pdf |
http://agamenon.tsc.uah.es/Investigacion/gram/publications/eccv2016-onoro.pdf | |
PWC | https://paperswithcode.com/paper/towards-perspective-free-object-counting-with |
Repo | https://github.com/gramuah/ccnn |
Framework | none |
Improved Variational Inference with Inverse Autoregressive Flow
Title | Improved Variational Inference with Inverse Autoregressive Flow |
Authors | Durk P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling |
Abstract | The framework of normalizing flows provides a general strategy for flexible variational inference of posteriors over latent variables. We propose a new type of normalizing flow, inverse autoregressive flow (IAF), that, in contrast to earlier published flows, scales well to high-dimensional latent spaces. The proposed flow consists of a chain of invertible transformations, where each transformation is based on an autoregressive neural network. In experiments, we show that IAF significantly improves upon diagonal Gaussian approximate posteriors. In addition, we demonstrate that a novel type of variational autoencoder, coupled with IAF, is competitive with neural autoregressive models in terms of attained log-likelihood on natural images, while allowing significantly faster synthesis. |
Tasks | Image Generation |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6581-improved-variational-inference-with-inverse-autoregressive-flow |
http://papers.nips.cc/paper/6581-improved-variational-inference-with-inverse-autoregressive-flow.pdf | |
PWC | https://paperswithcode.com/paper/improved-variational-inference-with-inverse |
Repo | https://github.com/openai/iaf |
Framework | tf |
A Segmentation Matrix Method for Chinese Segmentation Ambiguity Analysis
Title | A Segmentation Matrix Method for Chinese Segmentation Ambiguity Analysis |
Authors | Yanping Chen, Qinghua Zheng, Feng Tian, Deli Zheng |
Abstract | |
Tasks | Chinese Word Segmentation |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/O16-2001/ |
https://www.aclweb.org/anthology/O16-2001 | |
PWC | https://paperswithcode.com/paper/a-segmentation-matrix-method-for-chinese |
Repo | https://github.com/YPench/SMatrix |
Framework | none |
A Gold Standard for Scalar Adjectives
Title | A Gold Standard for Scalar Adjectives |
Authors | Bryan Wilkinson, Oates Tim |
Abstract | We present a gold standard for evaluating scale membership and the order of scalar adjectives. In addition to evaluating existing methods of ordering adjectives, this knowledge will aid in studying the organization of adjectives in the lexicon. This resource is the result of two elicitation tasks conducted with informants from Amazon Mechanical Turk. The first task is notable for gathering open-ended lexical data from informants. The data is analyzed using Cultural Consensus Theory, a framework from anthropology, to not only determine scale membership but also the level of consensus among the informants (Romney et al., 1986). The second task gathers a culturally salient ordering of the words determined to be members. We use this method to produce 12 scales of adjectives for use in evaluation. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1424/ |
https://www.aclweb.org/anthology/L16-1424 | |
PWC | https://paperswithcode.com/paper/a-gold-standard-for-scalar-adjectives |
Repo | https://github.com/Coral-Lab/scales |
Framework | none |
Data-Driven Spelling Correction using Weighted Finite-State Methods
Title | Data-Driven Spelling Correction using Weighted Finite-State Methods |
Authors | Miikka Silfverberg, Pekka Kauppinen, Krister Lind{'e}n |
Abstract | |
Tasks | Optical Character Recognition, Spelling Correction |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2406/ |
https://www.aclweb.org/anthology/W16-2406 | |
PWC | https://paperswithcode.com/paper/data-driven-spelling-correction-using |
Repo | https://github.com/mpsilfve/ocrpp |
Framework | none |
HeLI, a Word-Based Backoff Method for Language Identification
Title | HeLI, a Word-Based Backoff Method for Language Identification |
Authors | Tommi Jauhiainen, Krister Lind{'e}n, Heidi Jauhiainen |
Abstract | In this paper we describe the Helsinki language identification method, HeLI, and the resources we created for and used in the 3rd edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial 2016 workshop. The shared task comprised of a total of 8 tracks, of which we participated in 7. The shared task had a record number of participants, with 17 teams providing results for the closed track of the test set A. Our system reached the 2nd position in 4 tracks (A closed and open, B1 open and B2 open) and in this paper we are focusing on the methods and data used for those tracks. We describe our word-based backoff method in mathematical notation. We also describe how we selected the corpus we used in the open tracks. |
Tasks | Language Identification |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4820/ |
https://www.aclweb.org/anthology/W16-4820 | |
PWC | https://paperswithcode.com/paper/heli-a-word-based-backoff-method-for-language |
Repo | https://github.com/tosaja/HeLI |
Framework | none |