January 24, 2020

2697 words 13 mins read

Paper Group NANR 115

Attentive Task-Agnostic Meta-Learning for Few-Shot Text Classification. Neural Rendering Model: Joint Generation and Prediction for Semi-Supervised Learning. Multilingual Language Models for Named Entity Recognition in German and English. Choosing between Long and Short Word Forms in Mandarin. Extracting relations between outcomes and significance …

Attentive Task-Agnostic Meta-Learning for Few-Shot Text Classification


Title	Attentive Task-Agnostic Meta-Learning for Few-Shot Text Classification
Authors	Xiang Jiang, Mohammad Havaei, Gabriel Chartrand, Hassan Chouaib, Thomas Vincent, Andrew Jesson, Nicolas Chapados, Stan Matwin
Abstract	Current deep learning based text classification methods are limited by their ability to achieve fast learning and generalization when the data is scarce. We address this problem by integrating a meta-learning procedure that uses the knowledge learned across many tasks as an inductive bias towards better natural language understanding. Inspired by the Model-Agnostic Meta-Learning framework (MAML), we introduce the Attentive Task-Agnostic Meta-Learning (ATAML) algorithm for text classification. The proposed ATAML is designed to encourage task-agnostic representation learning by way of task-agnostic parameterization and facilitate task-specific adaptation via attention mechanisms. We provide evidence to show that the attention mechanism in ATAML has a synergistic effect on learning performance. Our experimental results reveal that, for few-shot text classification tasks, gradient-based meta-learning approaches ourperform popular transfer learning methods. In comparisons with models trained from random initialization, pretrained models and meta trained MAML, our proposed ATAML method generalizes better on single-label and multi-label classification tasks in miniRCV1 and miniReuters-21578 datasets.
Tasks	Meta-Learning, Multi-Label Classification, Representation Learning, Text Classification, Transfer Learning
Published	2019-05-01
URL	https://openreview.net/forum?id=SyxMWh09KX
PDF	https://openreview.net/pdf?id=SyxMWh09KX
PWC	https://paperswithcode.com/paper/attentive-task-agnostic-meta-learning-for-few
Repo
Framework

Neural Rendering Model: Joint Generation and Prediction for Semi-Supervised Learning


Title	Neural Rendering Model: Joint Generation and Prediction for Semi-Supervised Learning
Authors	Nhat Ho, Tan Nguyen, Ankit B. Patel, Anima Anandkumar, Michael I. Jordan, Richard G. Baraniuk
Abstract	Unsupervised and semi-supervised learning are important problems that are especially challenging with complex data like natural images. Progress on these problems would accelerate if we had access to appropriate generative models under which to pose the associated inference tasks. Inspired by the success of Convolutional Neural Networks (CNNs) for supervised prediction in images, we design the Neural Rendering Model (NRM), a new hierarchical probabilistic generative model whose inference calculations correspond to those in a CNN. The NRM introduces a small set of latent variables at each level of the model and enforces dependencies among all the latent variables via a conjugate prior distribution. The conjugate prior yields a new regularizer for learning based on the paths rendered in the generative model for training CNNs–the Rendering Path Normalization (RPN). We demonstrate that this regularizer improves generalization both in theory and in practice. Likelihood estimation in the NRM yields the new Max-Min cross entropy training loss, which suggests a new deep network architecture–the Max- Min network–which exceeds or matches the state-of-art for semi-supervised and supervised learning on SVHN, CIFAR10, and CIFAR100.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=B1lx42A9Ym
PDF	https://openreview.net/pdf?id=B1lx42A9Ym
PWC	https://paperswithcode.com/paper/neural-rendering-model-joint-generation-and
Repo
Framework

Multilingual Language Models for Named Entity Recognition in German and English


Title	Multilingual Language Models for Named Entity Recognition in German and English
Authors	Antonia Baumann
Abstract	We assess the language specificity of recent language models by exploring the potential of a multilingual language model. In particular, we evaluate Google{'}s multilingual BERT (mBERT) model on Named Entity Recognition (NER) in German and English. We expand the work on language model fine-tuning by Howard and Ruder (2018), applying it to the BERT architecture. We successfully reproduce the NER results published by Devlin et al. (2019).Our results show that the multilingual language model generalises well for NER in the chosen languages, matching the native model in English and comparing well with recent approaches for German. However, it does not benefit from the added fine-tuning methods.
Tasks	Language Modelling, Named Entity Recognition
Published	2019-09-01
URL	https://www.aclweb.org/anthology/R19-2004/
PDF	https://www.aclweb.org/anthology/R19-2004
PWC	https://paperswithcode.com/paper/multilingual-language-models-for-named-entity
Repo
Framework

Choosing between Long and Short Word Forms in Mandarin


Title	Choosing between Long and Short Word Forms in Mandarin
Authors	Lin Li, Kees van Deemter, Denis Paperno, Jingyu Fan
Abstract	Between 80{%} and 90{%} of all Chinese words have long and short form such as 老虎/虎 (lao-hu/hu , tiger) (Duanmu:2013). Consequently, the choice between long and short forms is a key problem for lexical choice across NLP and NLG. Following an earlier work on abbreviations in English (Mahowald et al, 2013), we bring a probabilistic perspective to these questions, using both a behavioral and a corpus-based approach. We hypothesized that there is a higher probability of choosing short form in supportive context than in neutral context in Mandarin. Consistent with our prediction, our findings revealed that predictability of contexts makes effect on speakers{'} long and short form choice.
Tasks
Published	2019-10-01
URL	https://www.aclweb.org/anthology/W19-8605/
PDF	https://www.aclweb.org/anthology/W19-8605
PWC	https://paperswithcode.com/paper/choosing-between-long-and-short-word-forms-in
Repo
Framework

Extracting relations between outcomes and significance levels in Randomized Controlled Trials (RCTs) publications


Title	Extracting relations between outcomes and significance levels in Randomized Controlled Trials (RCTs) publications
Authors	Anna Koroleva, Patrick Paroubek
Abstract	Randomized controlled trials assess the effects of an experimental intervention by comparing it to a control intervention with regard to some variables - trial outcomes. Statistical hypothesis testing is used to test if the experimental intervention is superior to the control. Statistical significance is typically reported for the measured outcomes and is an important characteristic of the results. We propose a machine learning approach to automatically extract reported outcomes, significance levels and the relation between them. We annotated a corpus of 663 sentences with 2,552 outcome - significance level relations (1,372 positive and 1,180 negative relations). We compared several classifiers, using a manually crafted feature set, and a number of deep learning models. The best performance (F-measure of 94{%}) was shown by the BioBERT fine-tuned model.
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5038/
PDF	https://www.aclweb.org/anthology/W19-5038
PWC	https://paperswithcode.com/paper/extracting-relations-between-outcomes-and
Repo
Framework

The Rhetorical Structure of Attribution


Title	The Rhetorical Structure of Attribution
Authors	Andrew Potter
Abstract	The relational status of Attribution in Rhetorical Structure Theory has been a matter of ongoing debate. Although several researchers have weighed in on the topic, and although numerous studies have relied upon attributional structures for their analyses, nothing approaching consensus has emerged. This paper identifies three basic issues that must be resolved to determine the relational status of attributions. These are identified as the Discourse Units Issue, the Nuclearity Issue, and the Relation Identification Issue. These three issues are analyzed from the perspective of classical RST. A finding of this analysis is that the nuclearity and the relational identification of attribution structures are shown to depend on the writer{'}s intended effect, such that attributional relations cannot be considered as a single relation, but rather as attributional instances of other RST relations.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/W19-2706/
PDF	https://www.aclweb.org/anthology/W19-2706
PWC	https://paperswithcode.com/paper/the-rhetorical-structure-of-attribution
Repo
Framework

Order-Preserving Wasserstein Discriminant Analysis


Title	Order-Preserving Wasserstein Discriminant Analysis
Authors	Bing Su, Jiahuan Zhou, Ying Wu
Abstract	Supervised dimensionality reduction for sequence data projects the observations in sequences onto a low-dimensional subspace to better separate different sequence classes. It is typically more challenging than conventional dimensionality reduction for static data, because measuring the separability of sequences involves non-linear procedures to manipulate the temporal structures. This paper presents a linear method, namely Order-preserving Wasserstein Discriminant Analysis (OWDA), which learns the projection by maximizing the inter-class distance and minimizing the intra-class scatter. For each class, OWDA extracts the order-preserving Wasserstein barycenter and constructs the intra-class scatter as the dispersion of the training sequences around the barycenter. The inter-class distance is measured as the order-preserving Wasserstein distance between the corresponding barycenters. OWDA is able to concentrate on the distinctive differences among classes by lifting the geometric relations with temporal constraints. Experiments show that OWDA achieves competitive results on three 3D action recognition datasets.
Tasks	3D Human Action Recognition, Dimensionality Reduction
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Su_Order-Preserving_Wasserstein_Discriminant_Analysis_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Su_Order-Preserving_Wasserstein_Discriminant_Analysis_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/order-preserving-wasserstein-discriminant
Repo
Framework

Learning Partially Observed PDE Dynamics with Neural Networks


Title	Learning Partially Observed PDE Dynamics with Neural Networks
Authors	Ibrahim Ayed, Emmanuel De Bézenac, Arthur Pajot, Patrick Gallinari
Abstract	Spatio-Temporal processes bear a central importance in many applied scientific fields. Generally, differential equations are used to describe these processes. In this work, we address the problem of learning spatio-temporal dynamics with neural networks when only partial information on the system’s state is available. Taking inspiration from the dynamical system approach, we outline a general framework in which complex dynamics generated by families of differential equations can be learned in a principled way. Two models are derived from this framework. We demonstrate how they can be applied in practice by considering the problem of forecasting fluid flows. We show how the underlying equations fit into our formalism and evaluate our method by comparing with standard baselines.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=HyefgnCqFm
PDF	https://openreview.net/pdf?id=HyefgnCqFm
PWC	https://paperswithcode.com/paper/learning-partially-observed-pde-dynamics-with
Repo
Framework

Microsoft Icecaps: An Open-Source Toolkit for Conversation Modeling


Title	Microsoft Icecaps: An Open-Source Toolkit for Conversation Modeling
Authors	Vighnesh Leonardo Shiv, Chris Quirk, Anshuman Suri, Xiang Gao, Khuram Shahid, Nithya Govindarajan, Yizhe Zhang, Jianfeng Gao, Michel Galley, Chris Brockett, Tulasi Menon, Bill Dolan
Abstract	The Intelligent Conversation Engine: Code and Pre-trained Systems (Microsoft Icecaps) is an upcoming open-source natural language processing repository. Icecaps wraps TensorFlow functionality in a modular component-based architecture, presenting an intuitive and flexible paradigm for constructing sophisticated learning setups. Capabilities include multitask learning between models with shared parameters, upgraded language model decoding features, a range of built-in architectures, and a user-friendly data processing pipeline. The system is targeted toward conversational tasks, exploring diverse response generation, coherence, and knowledge grounding. Icecaps also provides pre-trained conversational models that can be either used directly or loaded for fine-tuning or bootstrapping other models; these models power an online demo of our framework.
Tasks	Language Modelling
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-3021/
PDF	https://www.aclweb.org/anthology/P19-3021
PWC	https://paperswithcode.com/paper/microsoft-icecaps-an-open-source-toolkit-for
Repo
Framework

Using GANs for Generation of Realistic City-Scale Ride Sharing/Hailing Data Sets


Title	Using GANs for Generation of Realistic City-Scale Ride Sharing/Hailing Data Sets
Authors	Abhinav Jauhri, Brad Stocks, Jian Hui Li, Koichi Yamada, John Paul Shen
Abstract	This paper focuses on the synthetic generation of human mobility data in urban areas. We present a novel and scalable application of Generative Adversarial Networks (GANs) for modeling and generating human mobility data. We leverage actual ride requests from ride sharing/hailing services from four major cities in the US to train our GANs model. Our model captures the spatial and temporal variability of the ride-request patterns observed for all four cities on any typical day and over any typical week. Previous works have succinctly characterized the spatial and temporal properties of human mobility data sets using the fractal dimensionality and the densification power law, respectively, which we utilize to validate our GANs-generated synthetic data sets. Such synthetic data sets can avoid privacy concerns and be extremely useful for researchers and policy makers on urban mobility and intelligent transportation.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=H1eMBn09Km
PDF	https://openreview.net/pdf?id=H1eMBn09Km
PWC	https://paperswithcode.com/paper/using-gans-for-generation-of-realistic-city
Repo
Framework

ConvAI at SemEval-2019 Task 6: Offensive Language Identification and Categorization with Perspective and BERT


Title	ConvAI at SemEval-2019 Task 6: Offensive Language Identification and Categorization with Perspective and BERT
Authors	John Pavlopoulos, Nithum Thain, Lucas Dixon, Ion Androutsopoulos
Abstract	This paper presents the application of two strong baseline systems for toxicity detection and evaluates their performance in identifying and categorizing offensive language in social media. PERSPECTIVE is an API, that serves multiple machine learning models for the improvement of conversations online, as well as a toxicity detection system, trained on a wide variety of comments from platforms across the Internet. BERT is a recently popular language representation model, fine tuned per task and achieving state of the art performance in multiple NLP tasks. PERSPECTIVE performed better than BERT in detecting toxicity, but BERT was much better in categorizing the offensive type. Both baselines were ranked surprisingly high in the SEMEVAL-2019 OFFENSEVAL competition, PERSPECTIVE in detecting an offensive post (12th) and BERT in categorizing it (11th). The main contribution of this paper is the assessment of two strong baselines for the identification (PERSPECTIVE) and the categorization (BERT) of offensive language with little or no additional training data.
Tasks	Language Identification
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2102/
PDF	https://www.aclweb.org/anthology/S19-2102
PWC	https://paperswithcode.com/paper/convai-at-semeval-2019-task-6-offensive
Repo
Framework

Complexity of Training ReLU Neural Networks


Title	Complexity of Training ReLU Neural Networks
Authors	Digvijay Boob, Santanu S. Dey, Guanghui Lan
Abstract	In this paper, we explore some basic questions on complexity of training Neural networks with ReLU activation function. We show that it is NP-hard to train a two-hidden layer feedforward ReLU neural network. If dimension d of the data is fixed then we show that there exists a polynomial time algorithm for the same training problem. We also show that if sufficient over-parameterization is provided in the first hidden layer of ReLU neural network then there is a polynomial time algorithm which finds weights such that output of the over-parameterized ReLU neural network matches with the output of the given data.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=HyVbhi0cYX
PDF	https://openreview.net/pdf?id=HyVbhi0cYX
PWC	https://paperswithcode.com/paper/complexity-of-training-relu-neural-networks
Repo
Framework

Dense Procedure Captioning in Narrated Instructional Videos


Title	Dense Procedure Captioning in Narrated Instructional Videos
Authors	Botian Shi, Lei Ji, Yaobo Liang, Nan Duan, Peng Chen, Zhendong Niu, Ming Zhou
Abstract	Understanding narrated instructional videos is important for both research and real-world web applications. Motivated by video dense captioning, we propose a model to generate procedure captions from narrated instructional videos which are a sequence of step-wise clips with description. Previous works on video dense captioning learn video segments and generate captions without considering transcripts. We argue that transcripts in narrated instructional videos can enhance video representation by providing fine-grained complimentary and semantic textual information. In this paper, we introduce a framework to (1) extract procedures by a cross-modality module, which fuses video content with the entire transcript; and (2) generate captions by encoding video frames as well as a snippet of transcripts within each extracted procedure. Experiments show that our model can achieve state-of-the-art performance in procedure extraction and captioning, and the ablation studies demonstrate that both the video frames and the transcripts are important for the task.
Tasks
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1641/
PDF	https://www.aclweb.org/anthology/P19-1641
PWC	https://paperswithcode.com/paper/dense-procedure-captioning-in-narrated
Repo
Framework

ISA-VAE: Independent Subspace Analysis with Variational Autoencoders


Title	ISA-VAE: Independent Subspace Analysis with Variational Autoencoders
Authors	Jan Stühmer, Richard Turner, Sebastian Nowozin
Abstract	Recent work has shown increased interest in using the Variational Autoencoder (VAE) framework to discover interpretable representations of data in an unsupervised way. These methods have focussed largely on modifying the variational cost function to achieve this goal. However, we show that methods like beta-VAE simplify the tendency of variational inference to underfit causing pathological over-pruning and over-orthogonalization of learned components. In this paper we take a complementary approach: to modify the probabilistic model to encourage structured latent variable representations to be discovered. Specifically, the standard VAE probabilistic model is unidentifiable: the likelihood of the parameters is invariant under rotations of the latent space. This means there is no pressure to identify each true factor of variation with a latent variable. We therefore employ a rich prior distribution, akin to the ICA model, that breaks the rotational symmetry. Extensive quantitative and qualitative experiments demonstrate that the proposed prior mitigates the trade-off introduced by modified cost functions like beta-VAE and TCVAE between reconstruction loss and disentanglement. The proposed prior allows to improve these approaches with respect to both disentanglement and reconstruction quality significantly over the state of the art.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=rJl_NhR9K7
PDF	https://openreview.net/pdf?id=rJl_NhR9K7
PWC	https://paperswithcode.com/paper/isa-vae-independent-subspace-analysis-with
Repo
Framework

K-Best Transformation Synchronization


Title	K-Best Transformation Synchronization
Authors	Yifan Sun, Jiacheng Zhuo, Arnav Mohan, Qixing Huang
Abstract	In this paper, we introduce the problem of K-best transformation synchronization for the purpose of multiple scan matching. Given noisy pair-wise transformations computed between a subset of depth scan pairs, K-best transformation synchronization seeks to output multiple consistent relative transformations. This problem naturally arises in many geometry reconstruction applications, where the underlying object possesses self-symmetry. For approximately symmetric or even non-symmetric objects, K-best solutions offer an intermediate presentation for recovering the underlying single-best solution. We introduce a simple yet robust iterative algorithm for K-best transformation synchronization, which alternates between transformation propagation and transformation clustering. We present theoretical guarantees on the robust and exact recoveries of our algorithm. Experimental results demonstrate the advantage of our approach against state-of-the-art transformation synchronization techniques on both synthetic and real datasets.
Tasks
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Sun_K-Best_Transformation_Synchronization_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Sun_K-Best_Transformation_Synchronization_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/k-best-transformation-synchronization
Repo
Framework