January 24, 2020

2697 words 13 mins read

Paper Group NANR 115

Paper Group NANR 115

Attentive Task-Agnostic Meta-Learning for Few-Shot Text Classification. Neural Rendering Model: Joint Generation and Prediction for Semi-Supervised Learning. Multilingual Language Models for Named Entity Recognition in German and English. Choosing between Long and Short Word Forms in Mandarin. Extracting relations between outcomes and significance …

Attentive Task-Agnostic Meta-Learning for Few-Shot Text Classification

Title Attentive Task-Agnostic Meta-Learning for Few-Shot Text Classification
Authors Xiang Jiang, Mohammad Havaei, Gabriel Chartrand, Hassan Chouaib, Thomas Vincent, Andrew Jesson, Nicolas Chapados, Stan Matwin
Abstract Current deep learning based text classification methods are limited by their ability to achieve fast learning and generalization when the data is scarce. We address this problem by integrating a meta-learning procedure that uses the knowledge learned across many tasks as an inductive bias towards better natural language understanding. Inspired by the Model-Agnostic Meta-Learning framework (MAML), we introduce the Attentive Task-Agnostic Meta-Learning (ATAML) algorithm for text classification. The proposed ATAML is designed to encourage task-agnostic representation learning by way of task-agnostic parameterization and facilitate task-specific adaptation via attention mechanisms. We provide evidence to show that the attention mechanism in ATAML has a synergistic effect on learning performance. Our experimental results reveal that, for few-shot text classification tasks, gradient-based meta-learning approaches ourperform popular transfer learning methods. In comparisons with models trained from random initialization, pretrained models and meta trained MAML, our proposed ATAML method generalizes better on single-label and multi-label classification tasks in miniRCV1 and miniReuters-21578 datasets.
Tasks Meta-Learning, Multi-Label Classification, Representation Learning, Text Classification, Transfer Learning
Published 2019-05-01
URL https://openreview.net/forum?id=SyxMWh09KX
PDF https://openreview.net/pdf?id=SyxMWh09KX
PWC https://paperswithcode.com/paper/attentive-task-agnostic-meta-learning-for-few
Repo
Framework

Neural Rendering Model: Joint Generation and Prediction for Semi-Supervised Learning

Title Neural Rendering Model: Joint Generation and Prediction for Semi-Supervised Learning
Authors Nhat Ho, Tan Nguyen, Ankit B. Patel, Anima Anandkumar, Michael I. Jordan, Richard G. Baraniuk
Abstract Unsupervised and semi-supervised learning are important problems that are especially challenging with complex data like natural images. Progress on these problems would accelerate if we had access to appropriate generative models under which to pose the associated inference tasks. Inspired by the success of Convolutional Neural Networks (CNNs) for supervised prediction in images, we design the Neural Rendering Model (NRM), a new hierarchical probabilistic generative model whose inference calculations correspond to those in a CNN. The NRM introduces a small set of latent variables at each level of the model and enforces dependencies among all the latent variables via a conjugate prior distribution. The conjugate prior yields a new regularizer for learning based on the paths rendered in the generative model for training CNNs–the Rendering Path Normalization (RPN). We demonstrate that this regularizer improves generalization both in theory and in practice. Likelihood estimation in the NRM yields the new Max-Min cross entropy training loss, which suggests a new deep network architecture–the Max- Min network–which exceeds or matches the state-of-art for semi-supervised and supervised learning on SVHN, CIFAR10, and CIFAR100.
Tasks
Published 2019-05-01
URL https://openreview.net/forum?id=B1lx42A9Ym
PDF https://openreview.net/pdf?id=B1lx42A9Ym
PWC https://paperswithcode.com/paper/neural-rendering-model-joint-generation-and
Repo
Framework

Multilingual Language Models for Named Entity Recognition in German and English

Title Multilingual Language Models for Named Entity Recognition in German and English
Authors Antonia Baumann
Abstract We assess the language specificity of recent language models by exploring the potential of a multilingual language model. In particular, we evaluate Google{'}s multilingual BERT (mBERT) model on Named Entity Recognition (NER) in German and English. We expand the work on language model fine-tuning by Howard and Ruder (2018), applying it to the BERT architecture. We successfully reproduce the NER results published by Devlin et al. (2019).Our results show that the multilingual language model generalises well for NER in the chosen languages, matching the native model in English and comparing well with recent approaches for German. However, it does not benefit from the added fine-tuning methods.
Tasks Language Modelling, Named Entity Recognition
Published 2019-09-01
URL https://www.aclweb.org/anthology/R19-2004/
PDF https://www.aclweb.org/anthology/R19-2004
PWC https://paperswithcode.com/paper/multilingual-language-models-for-named-entity
Repo
Framework

Choosing between Long and Short Word Forms in Mandarin

Title Choosing between Long and Short Word Forms in Mandarin
Authors Lin Li, Kees van Deemter, Denis Paperno, Jingyu Fan
Abstract Between 80{%} and 90{%} of all Chinese words have long and short form such as 老虎/虎 (lao-hu/hu , tiger) (Duanmu:2013). Consequently, the choice between long and short forms is a key problem for lexical choice across NLP and NLG. Following an earlier work on abbreviations in English (Mahowald et al, 2013), we bring a probabilistic perspective to these questions, using both a behavioral and a corpus-based approach. We hypothesized that there is a higher probability of choosing short form in supportive context than in neutral context in Mandarin. Consistent with our prediction, our findings revealed that predictability of contexts makes effect on speakers{'} long and short form choice.
Tasks
Published 2019-10-01
URL https://www.aclweb.org/anthology/W19-8605/
PDF https://www.aclweb.org/anthology/W19-8605
PWC https://paperswithcode.com/paper/choosing-between-long-and-short-word-forms-in
Repo
Framework

Extracting relations between outcomes and significance levels in Randomized Controlled Trials (RCTs) publications

Title Extracting relations between outcomes and significance levels in Randomized Controlled Trials (RCTs) publications
Authors Anna Koroleva, Patrick Paroubek
Abstract Randomized controlled trials assess the effects of an experimental intervention by comparing it to a control intervention with regard to some variables - trial outcomes. Statistical hypothesis testing is used to test if the experimental intervention is superior to the control. Statistical significance is typically reported for the measured outcomes and is an important characteristic of the results. We propose a machine learning approach to automatically extract reported outcomes, significance levels and the relation between them. We annotated a corpus of 663 sentences with 2,552 outcome - significance level relations (1,372 positive and 1,180 negative relations). We compared several classifiers, using a manually crafted feature set, and a number of deep learning models. The best performance (F-measure of 94{%}) was shown by the BioBERT fine-tuned model.
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-5038/
PDF https://www.aclweb.org/anthology/W19-5038
PWC https://paperswithcode.com/paper/extracting-relations-between-outcomes-and
Repo
Framework

The Rhetorical Structure of Attribution

Title The Rhetorical Structure of Attribution
Authors Andrew Potter
Abstract The relational status of Attribution in Rhetorical Structure Theory has been a matter of ongoing debate. Although several researchers have weighed in on the topic, and although numerous studies have relied upon attributional structures for their analyses, nothing approaching consensus has emerged. This paper identifies three basic issues that must be resolved to determine the relational status of attributions. These are identified as the Discourse Units Issue, the Nuclearity Issue, and the Relation Identification Issue. These three issues are analyzed from the perspective of classical RST. A finding of this analysis is that the nuclearity and the relational identification of attribution structures are shown to depend on the writer{'}s intended effect, such that attributional relations cannot be considered as a single relation, but rather as attributional instances of other RST relations.
Tasks
Published 2019-06-01
URL https://www.aclweb.org/anthology/W19-2706/
PDF https://www.aclweb.org/anthology/W19-2706
PWC https://paperswithcode.com/paper/the-rhetorical-structure-of-attribution
Repo
Framework

Order-Preserving Wasserstein Discriminant Analysis

Title Order-Preserving Wasserstein Discriminant Analysis
Authors Bing Su, Jiahuan Zhou, Ying Wu
Abstract Supervised dimensionality reduction for sequence data projects the observations in sequences onto a low-dimensional subspace to better separate different sequence classes. It is typically more challenging than conventional dimensionality reduction for static data, because measuring the separability of sequences involves non-linear procedures to manipulate the temporal structures. This paper presents a linear method, namely Order-preserving Wasserstein Discriminant Analysis (OWDA), which learns the projection by maximizing the inter-class distance and minimizing the intra-class scatter. For each class, OWDA extracts the order-preserving Wasserstein barycenter and constructs the intra-class scatter as the dispersion of the training sequences around the barycenter. The inter-class distance is measured as the order-preserving Wasserstein distance between the corresponding barycenters. OWDA is able to concentrate on the distinctive differences among classes by lifting the geometric relations with temporal constraints. Experiments show that OWDA achieves competitive results on three 3D action recognition datasets.
Tasks 3D Human Action Recognition, Dimensionality Reduction
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Su_Order-Preserving_Wasserstein_Discriminant_Analysis_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Su_Order-Preserving_Wasserstein_Discriminant_Analysis_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/order-preserving-wasserstein-discriminant
Repo
Framework

Learning Partially Observed PDE Dynamics with Neural Networks

Title Learning Partially Observed PDE Dynamics with Neural Networks
Authors Ibrahim Ayed, Emmanuel De Bézenac, Arthur Pajot, Patrick Gallinari
Abstract Spatio-Temporal processes bear a central importance in many applied scientific fields. Generally, differential equations are used to describe these processes. In this work, we address the problem of learning spatio-temporal dynamics with neural networks when only partial information on the system’s state is available. Taking inspiration from the dynamical system approach, we outline a general framework in which complex dynamics generated by families of differential equations can be learned in a principled way. Two models are derived from this framework. We demonstrate how they can be applied in practice by considering the problem of forecasting fluid flows. We show how the underlying equations fit into our formalism and evaluate our method by comparing with standard baselines.
Tasks
Published 2019-05-01
URL https://openreview.net/forum?id=HyefgnCqFm
PDF https://openreview.net/pdf?id=HyefgnCqFm
PWC https://paperswithcode.com/paper/learning-partially-observed-pde-dynamics-with
Repo
Framework

Microsoft Icecaps: An Open-Source Toolkit for Conversation Modeling

Title Microsoft Icecaps: An Open-Source Toolkit for Conversation Modeling
Authors Vighnesh Leonardo Shiv, Chris Quirk, Anshuman Suri, Xiang Gao, Khuram Shahid, Nithya Govindarajan, Yizhe Zhang, Jianfeng Gao, Michel Galley, Chris Brockett, Tulasi Menon, Bill Dolan
Abstract The Intelligent Conversation Engine: Code and Pre-trained Systems (Microsoft Icecaps) is an upcoming open-source natural language processing repository. Icecaps wraps TensorFlow functionality in a modular component-based architecture, presenting an intuitive and flexible paradigm for constructing sophisticated learning setups. Capabilities include multitask learning between models with shared parameters, upgraded language model decoding features, a range of built-in architectures, and a user-friendly data processing pipeline. The system is targeted toward conversational tasks, exploring diverse response generation, coherence, and knowledge grounding. Icecaps also provides pre-trained conversational models that can be either used directly or loaded for fine-tuning or bootstrapping other models; these models power an online demo of our framework.
Tasks Language Modelling
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-3021/
PDF https://www.aclweb.org/anthology/P19-3021
PWC https://paperswithcode.com/paper/microsoft-icecaps-an-open-source-toolkit-for
Repo
Framework

Using GANs for Generation of Realistic City-Scale Ride Sharing/Hailing Data Sets

Title Using GANs for Generation of Realistic City-Scale Ride Sharing/Hailing Data Sets
Authors Abhinav Jauhri, Brad Stocks, Jian Hui Li, Koichi Yamada, John Paul Shen
Abstract This paper focuses on the synthetic generation of human mobility data in urban areas. We present a novel and scalable application of Generative Adversarial Networks (GANs) for modeling and generating human mobility data. We leverage actual ride requests from ride sharing/hailing services from four major cities in the US to train our GANs model. Our model captures the spatial and temporal variability of the ride-request patterns observed for all four cities on any typical day and over any typical week. Previous works have succinctly characterized the spatial and temporal properties of human mobility data sets using the fractal dimensionality and the densification power law, respectively, which we utilize to validate our GANs-generated synthetic data sets. Such synthetic data sets can avoid privacy concerns and be extremely useful for researchers and policy makers on urban mobility and intelligent transportation.
Tasks
Published 2019-05-01
URL https://openreview.net/forum?id=H1eMBn09Km
PDF https://openreview.net/pdf?id=H1eMBn09Km
PWC https://paperswithcode.com/paper/using-gans-for-generation-of-realistic-city
Repo
Framework

ConvAI at SemEval-2019 Task 6: Offensive Language Identification and Categorization with Perspective and BERT

Title ConvAI at SemEval-2019 Task 6: Offensive Language Identification and Categorization with Perspective and BERT
Authors John Pavlopoulos, Nithum Thain, Lucas Dixon, Ion Androutsopoulos
Abstract This paper presents the application of two strong baseline systems for toxicity detection and evaluates their performance in identifying and categorizing offensive language in social media. PERSPECTIVE is an API, that serves multiple machine learning models for the improvement of conversations online, as well as a toxicity detection system, trained on a wide variety of comments from platforms across the Internet. BERT is a recently popular language representation model, fine tuned per task and achieving state of the art performance in multiple NLP tasks. PERSPECTIVE performed better than BERT in detecting toxicity, but BERT was much better in categorizing the offensive type. Both baselines were ranked surprisingly high in the SEMEVAL-2019 OFFENSEVAL competition, PERSPECTIVE in detecting an offensive post (12th) and BERT in categorizing it (11th). The main contribution of this paper is the assessment of two strong baselines for the identification (PERSPECTIVE) and the categorization (BERT) of offensive language with little or no additional training data.
Tasks Language Identification
Published 2019-06-01
URL https://www.aclweb.org/anthology/S19-2102/
PDF https://www.aclweb.org/anthology/S19-2102
PWC https://paperswithcode.com/paper/convai-at-semeval-2019-task-6-offensive
Repo
Framework

Complexity of Training ReLU Neural Networks

Title Complexity of Training ReLU Neural Networks
Authors Digvijay Boob, Santanu S. Dey, Guanghui Lan
Abstract In this paper, we explore some basic questions on complexity of training Neural networks with ReLU activation function. We show that it is NP-hard to train a two-hidden layer feedforward ReLU neural network. If dimension d of the data is fixed then we show that there exists a polynomial time algorithm for the same training problem. We also show that if sufficient over-parameterization is provided in the first hidden layer of ReLU neural network then there is a polynomial time algorithm which finds weights such that output of the over-parameterized ReLU neural network matches with the output of the given data.
Tasks
Published 2019-05-01
URL https://openreview.net/forum?id=HyVbhi0cYX
PDF https://openreview.net/pdf?id=HyVbhi0cYX
PWC https://paperswithcode.com/paper/complexity-of-training-relu-neural-networks
Repo
Framework

Dense Procedure Captioning in Narrated Instructional Videos

Title Dense Procedure Captioning in Narrated Instructional Videos
Authors Botian Shi, Lei Ji, Yaobo Liang, Nan Duan, Peng Chen, Zhendong Niu, Ming Zhou
Abstract Understanding narrated instructional videos is important for both research and real-world web applications. Motivated by video dense captioning, we propose a model to generate procedure captions from narrated instructional videos which are a sequence of step-wise clips with description. Previous works on video dense captioning learn video segments and generate captions without considering transcripts. We argue that transcripts in narrated instructional videos can enhance video representation by providing fine-grained complimentary and semantic textual information. In this paper, we introduce a framework to (1) extract procedures by a cross-modality module, which fuses video content with the entire transcript; and (2) generate captions by encoding video frames as well as a snippet of transcripts within each extracted procedure. Experiments show that our model can achieve state-of-the-art performance in procedure extraction and captioning, and the ablation studies demonstrate that both the video frames and the transcripts are important for the task.
Tasks
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1641/
PDF https://www.aclweb.org/anthology/P19-1641
PWC https://paperswithcode.com/paper/dense-procedure-captioning-in-narrated
Repo
Framework

ISA-VAE: Independent Subspace Analysis with Variational Autoencoders

Title ISA-VAE: Independent Subspace Analysis with Variational Autoencoders
Authors Jan Stühmer, Richard Turner, Sebastian Nowozin
Abstract Recent work has shown increased interest in using the Variational Autoencoder (VAE) framework to discover interpretable representations of data in an unsupervised way. These methods have focussed largely on modifying the variational cost function to achieve this goal. However, we show that methods like beta-VAE simplify the tendency of variational inference to underfit causing pathological over-pruning and over-orthogonalization of learned components. In this paper we take a complementary approach: to modify the probabilistic model to encourage structured latent variable representations to be discovered. Specifically, the standard VAE probabilistic model is unidentifiable: the likelihood of the parameters is invariant under rotations of the latent space. This means there is no pressure to identify each true factor of variation with a latent variable. We therefore employ a rich prior distribution, akin to the ICA model, that breaks the rotational symmetry. Extensive quantitative and qualitative experiments demonstrate that the proposed prior mitigates the trade-off introduced by modified cost functions like beta-VAE and TCVAE between reconstruction loss and disentanglement. The proposed prior allows to improve these approaches with respect to both disentanglement and reconstruction quality significantly over the state of the art.
Tasks
Published 2019-05-01
URL https://openreview.net/forum?id=rJl_NhR9K7
PDF https://openreview.net/pdf?id=rJl_NhR9K7
PWC https://paperswithcode.com/paper/isa-vae-independent-subspace-analysis-with
Repo
Framework

K-Best Transformation Synchronization

Title K-Best Transformation Synchronization
Authors Yifan Sun, Jiacheng Zhuo, Arnav Mohan, Qixing Huang
Abstract In this paper, we introduce the problem of K-best transformation synchronization for the purpose of multiple scan matching. Given noisy pair-wise transformations computed between a subset of depth scan pairs, K-best transformation synchronization seeks to output multiple consistent relative transformations. This problem naturally arises in many geometry reconstruction applications, where the underlying object possesses self-symmetry. For approximately symmetric or even non-symmetric objects, K-best solutions offer an intermediate presentation for recovering the underlying single-best solution. We introduce a simple yet robust iterative algorithm for K-best transformation synchronization, which alternates between transformation propagation and transformation clustering. We present theoretical guarantees on the robust and exact recoveries of our algorithm. Experimental results demonstrate the advantage of our approach against state-of-the-art transformation synchronization techniques on both synthetic and real datasets.
Tasks
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Sun_K-Best_Transformation_Synchronization_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Sun_K-Best_Transformation_Synchronization_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/k-best-transformation-synchronization
Repo
Framework
comments powered by Disqus