April 1, 2020

3048 words 15 mins read

Paper Group NANR 60

LAVAE: Disentangling Location and Appearance. Proactive Sequence Generator via Knowledge Acquisition. Provable Representation Learning for Imitation Learning via Bi-level Optimization. A Perturbation Analysis of Input Transformations for Adversarial Attacks. Robustified Importance Sampling for Covariate Shift. Learning Space Partitions for Nearest …

LAVAE: Disentangling Location and Appearance


Title	LAVAE: Disentangling Location and Appearance
Authors	Anonymous
Abstract	We propose a probabilistic generative model for unsupervised learning of structured, interpretable, object-based representations of visual scenes. We use amortized variational inference to train the generative model end-to-end. The learned representations of object location and appearance are fully disentangled, and objects are represented independently of each other in the latent space. Unlike previous approaches that disentangle location and appearance, ours generalizes seamlessly to scenes with many more objects than encountered in the training regime. We evaluate the proposed model on multi-MNIST and multi-dSprites data sets.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HkxDheHFDr
PDF	https://openreview.net/pdf?id=HkxDheHFDr
PWC	https://paperswithcode.com/paper/lavae-disentangling-location-and-appearance
Repo
Framework

Proactive Sequence Generator via Knowledge Acquisition


Title	Proactive Sequence Generator via Knowledge Acquisition
Authors	Anonymous
Abstract	Sequence-to-sequence models such as transformers, which are now being used in a wide variety of NLP tasks, typically need to have very high capacity in order to perform well. Unfortunately, in production, memory size and inference speed are all strictly constrained. To address this problem, Knowledge Distillation (KD), a technique to train small models to mimic larger pre-trained models, has drawn lots of attention. The KD approach basically attempts to maximize recall, i.e., ranking Top-k”tokens in teacher models as higher as possible, however, whereas precision is more important for sequence generation because of exposure bias. Motivated by this, we develop Knowledge Acquisition (KA) where student models receive log q(y_ty_{<t},x) as rewards when producing the next token y_t given previous tokens y_{<t} and the source sentence x. We demonstrate the effectiveness of our approach on WMT’17 De-En and IWSLT’15 Th-En translation tasks, with experimental results showing that our approach gains +0.7-1.1 BLEU score compared to token-level knowledge distillation.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJehf0VKwS
PDF	https://openreview.net/pdf?id=rJehf0VKwS
PWC	https://paperswithcode.com/paper/proactive-sequence-generator-via-knowledge
Repo
Framework

Provable Representation Learning for Imitation Learning via Bi-level Optimization


Title	Provable Representation Learning for Imitation Learning via Bi-level Optimization
Authors	Anonymous
Abstract	A common strategy in modern learning systems is to learn a representation which is useful for many tasks, a.k.a, representation learning. We study this strategy in the imitation learning setting where multiple experts trajectories are available. We formulate representation learning as a bi-level optimization problem where the “outer” optimization tries to learn the joint representation and the “inner” optimization encodes the imitation learning setup and tries to learn task-specific parameters. We instantiate this framework for the cases where the imitation setting being behavior cloning and observation alone. Theoretically, we provably show using our framework that representation learning can reduce the sample complexity of imitation learning in both settings. We also provide proof-of-concept experiments to verify our theoretical findings.
Tasks	Imitation Learning, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HkxnclHKDr
PDF	https://openreview.net/pdf?id=HkxnclHKDr
PWC	https://paperswithcode.com/paper/provable-representation-learning-for
Repo
Framework

A Perturbation Analysis of Input Transformations for Adversarial Attacks


Title	A Perturbation Analysis of Input Transformations for Adversarial Attacks
Authors	Anonymous
Abstract	The existence of adversarial examples, or intentional mis-predictions constructed from small changes to correctly predicted examples, is one of the most significant challenges in neural network research today. Ironically, many new defenses are based on a simple observation - the adversarial inputs themselves are not robust and small perturbations to the attacking input often recover the desired prediction. While the intuition is somewhat clear, a detailed understanding of this phenomenon is missing from the research literature. This paper presents a comprehensive experimental analysis of when and why perturbation defenses work and potential mechanisms that could explain their effectiveness (or ineffectiveness) in different settings.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rkePU0VYDr
PDF	https://openreview.net/pdf?id=rkePU0VYDr
PWC	https://paperswithcode.com/paper/a-perturbation-analysis-of-input
Repo
Framework

Robustified Importance Sampling for Covariate Shift


Title	Robustified Importance Sampling for Covariate Shift
Authors	Anonymous
Abstract	In many learning problems, the training and testing data follow different distributions and a particularly common situation is the covariate shift. To correct for sampling biases, most approaches, including the popular kernel mean matching (KMM), focus on estimating the importance weights between the two distributions. Reweighting-based methods, however, are exposed to high variance when the distributional discrepancy is large. On the other hand, the alternate approach of using nonparametric regression (NR) incurs high bias when the training size is limited. In this paper, we propose and analyze a new estimator that systematically integrates the residuals of NR with KMM reweighting, based on a control-variate perspective. The proposed estimator is shown to either outperform or match the best-known existing rates for both KMM and NR, and thus is a robust combination of both estimators. The experiments shows our estimator works well in practice.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJe0rkBYvS
PDF	https://openreview.net/pdf?id=rJe0rkBYvS
PWC	https://paperswithcode.com/paper/robustified-importance-sampling-for-covariate
Repo
Framework

Learning Space Partitions for Nearest Neighbor Search


Title	Learning Space Partitions for Nearest Neighbor Search
Authors	Anonymous
Abstract	Space partitions of $\mathbb{R}^d$ underlie a vast and important class of fast nearest neighbor search (NNS) algorithms. Inspired by recent theoretical work on NNS for general metric spaces (Andoni et al. 2018b,c), we develop a new framework for building space partitions reducing the problem to balanced graph partitioning followed by supervised classification. We instantiate this general approach with the KaHIP graph partitioner (Sanders and Schulz 2013) and neural networks, respectively, to obtain a new partitioning procedure called Neural Locality-Sensitive Hashing (Neural LSH). On several standard benchmarks for NNS (Aumuller et al. 2017), our experiments show that the partitions obtained by Neural LSH consistently outperform partitions found by quantization-based and tree-based methods as well as classic, data-oblivious LSH.
Tasks	graph partitioning, Quantization
Published	2020-01-01
URL	https://openreview.net/forum?id=rkenmREFDr
PDF	https://openreview.net/pdf?id=rkenmREFDr
PWC	https://paperswithcode.com/paper/learning-space-partitions-for-nearest
Repo
Framework

Gradient-based training of Gaussian Mixture Models in High-Dimensional Spaces


Title	Gradient-based training of Gaussian Mixture Models in High-Dimensional Spaces
Authors	Anonymous
Abstract	We present an approach for efficiently training Gaussian Mixture Models (GMMs) with Stochastic Gradient Descent (SGD) on large amounts of high-dimensional data (e.g., images). In such a scenario, SGD is strongly superior in terms of execution time and memory usage, although it is conceptually more complex than the traditional Expectation-Maximization (EM) algorithm. For enabling SGD training, we propose three novel ideas: First, we show that minimizing an upper bound to the GMM log likelihood instead of the full one is feasible and numerically much more stable way in high-dimensional spaces. Secondly, we propose a new regularizer that prevents SGD from converging to pathological local minima. And lastly, we present a simple method for enforcing the constraints inherent to GMM training when using SGD. We also propose an SGD-compatible simplification to the full GMM model based on local principal directions, which avoids excessive memory use in high-dimensional spaces due to quadratic growth of covariance matrices. Experiments on several standard image datasets show the validity of our approach, and we provide a publicly available TensorFlow implementation.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HyeKcgHFvS
PDF	https://openreview.net/pdf?id=HyeKcgHFvS
PWC	https://paperswithcode.com/paper/gradient-based-training-of-gaussian-mixture
Repo
Framework

Text Embedding Bank Module for Detailed Image Paragraph Caption


Title	Text Embedding Bank Module for Detailed Image Paragraph Caption
Authors	Anonymous
Abstract	Image paragraph captioning is the task of automatically generating multiple sentences for describing images in grain-fined and coherent text. Existing typical deep learning-based models for image captioning consist of an image encoder to extract visual features and a language model decoder, which has shown promising results in single high-level sentence generation. However, only the word-level scalar guiding signal is available when the image encoder is optimized to extract visual features. The inconsistency between the parallel extraction of visual features and sequential text supervision limits its success when the length of the generated text is long (more than 50 words). In this paper, we propose a new module, called the Text Embedding Bank (TEB) module, to address the problem for image paragraph captioning. This module uses the paragraph vector model to learn fixed-length feature representations from a variable-length paragraph. We refer to the fixed-length feature as the TEB. This TEB module plays two roles to benefit paragraph captioning performance. First, it acts as a form of global and coherent deep supervision to regularize visual feature extraction in the image encoder. Second, it acts as a distributed memory to provide features of the whole paragraph to the language model, which alleviating the long-term dependency problem. Adding this module to two existing state-of-the-art methods achieves a new state-of-the-art result by a large margin on the paragraph captioning Visual Genome dataset.
Tasks	Image Captioning, Image Paragraph Captioning, Language Modelling
Published	2020-01-01
URL	https://openreview.net/forum?id=Sygt9yBtPS
PDF	https://openreview.net/pdf?id=Sygt9yBtPS
PWC	https://paperswithcode.com/paper/text-embedding-bank-module-for-detailed-image
Repo
Framework

Generative Imputation and Stochastic Prediction


Title	Generative Imputation and Stochastic Prediction
Authors	Anonymous
Abstract	In many machine learning applications, we are faced with incomplete datasets. In the literature, missing data imputation techniques have been mostly concerned with filling missing values. However, the existence of missing values is synonymous with uncertainties not only over the distribution of missing values but also over target class assignments that require careful consideration. In this paper, we propose a simple and effective method for imputing missing features and estimating the distribution of target assignments given incomplete data. In order to make imputations, we train a simple and effective generator network to generate imputations that a discriminator network is tasked to distinguish. Following this, a predictor network is trained using the imputed samples from the generator network to capture the classification uncertainties and make predictions accordingly. The proposed method is evaluated on CIFAR-10 image dataset as well as three real-world tabular classification datasets, under different missingness rates and structures. Our experimental results show the effectiveness of the proposed method in generating imputations as well as providing estimates for the class uncertainties in a classification task when faced with missing values.
Tasks	Imputation
Published	2020-01-01
URL	https://openreview.net/forum?id=B1em9h4KDS
PDF	https://openreview.net/pdf?id=B1em9h4KDS
PWC	https://paperswithcode.com/paper/generative-imputation-and-stochastic-1
Repo
Framework

On The Difficulty of Warm-Starting Neural Network Training


Title	On The Difficulty of Warm-Starting Neural Network Training
Authors	Anonymous
Abstract	In many real-world deployments of machine learning systems, data arrive piecemeal. These learning scenarios may be passive, where data arrive incrementally due to structural properties of the problem (e.g., daily financial data) or active, where samples are selected according to a measure of their quality (e.g., experimental design). In both of these cases, we are building a sequence of models that incorporate an increasing amount of data. We would like each of these models in the sequence to be performant and take advantage of all the data that are available to that point. Conventional intuition suggests that when solving a sequence of related optimization problems of this form, it should be possible to initialize using the solution of the previous iterate—to “warm start’’ the optimization rather than initialize from scratch—and see reductions in wall-clock time. However, in practice this warm-starting seems to yield poorer generalization performance than models that have fresh random initializations, even though the final training losses are similar. While it appears that some hyperparameter settings allow a practitioner to close this generalization gap, they seem to only do so in regimes that damage the wall-clock gains of the warm start. Nevertheless, it is highly desirable to be able to warm-start neural network training, as it would dramatically reduce the resource usage associated with the construction of performant deep learning systems. In this work, we take a closer look at this empirical phenomenon and try to understand when and how it occurs. Although the present investigation did not lead to a solution, we hope that a thorough articulation of the problem will spur new research that may lead to improved methods that consume fewer resources during training.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=ryl0cAVtPH
PDF	https://openreview.net/pdf?id=ryl0cAVtPH
PWC	https://paperswithcode.com/paper/on-the-difficulty-of-warm-starting-neural-1
Repo
Framework

Consistency-Based Semi-Supervised Active Learning: Towards Minimizing Labeling Budget


Title	Consistency-Based Semi-Supervised Active Learning: Towards Minimizing Labeling Budget
Authors	Anonymous
Abstract	Active learning (AL) aims to integrate data labeling and model training in a unified way, and to minimize the labeling budget by prioritizing the selection of high value data that can best improve model performance. Readily-available unlabeled data are used to evaluate selection mechanisms, but are not used for model training in conventional pool-based AL. To minimize the labeling budget, we unify unlabeled sample selection and model training based on two principles. First, we exploit both labeled and unlabeled data using semi-supervised learning (SSL) to distill information from unlabeled data that improves representation learning and sample selection. Second, we propose a simple yet effective selection metric that is coherent with the training objective such that the selected samples are effective at improving model performance. Our experimental results demonstrate superior performance with our proposed principles for limited labeled data compared to alternative AL and SSL combinations. In addition, we study the AL phenomena of `cold start’, which is becoming an increasingly more important factor to enable optimal unification of data labeling, model training and labeling budget minimization. We propose a measure that is found to be empirically correlated with the AL target loss. This measure can be used to assist in determining the proper start size. \|
Tasks	Active Learning, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HJl8SkBYPr
PDF	https://openreview.net/pdf?id=HJl8SkBYPr
PWC	https://paperswithcode.com/paper/consistency-based-semi-supervised-active
Repo
Framework

Multilingual Alignment of Contextual Word Representations


Title	Multilingual Alignment of Contextual Word Representations
Authors	Anonymous
Abstract	We propose procedures for evaluating and strengthening contextual embedding alignment and show that they are useful in understanding and improving multilingual BERT. In particular, after our proposed alignment procedure, BERT exhibits significantly improved zero-shot performance on XNLI compared to the base model, remarkably matching fully-supervised models for Bulgarian and Greek. Further, using non-contextual and contextual versions of word retrieval, we show that BERT outperforms fastText while being able to distinguish between multiple uses of a word, suggesting that pre-training subsumes word vectors for learning cross-lingual signals. Finally, we use the contextual word retrieval task to gain a better understanding of the strengths and weaknesses of multilingual pre-training.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=r1xCMyBtPS
PDF	https://openreview.net/pdf?id=r1xCMyBtPS
PWC	https://paperswithcode.com/paper/multilingual-alignment-of-contextual-word
Repo
Framework

SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses


Title	SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses
Authors	Anonymous
Abstract	Unsupervised domain adaptive object detection aims to learn a robust detector on the domain shift circumstance, where the training (source) domain is label-rich with bounding box annotations, while the testing (target) domain is label-agnostic and the feature distributions between training and testing domains are dissimilar or even totally different. In this paper, we propose a gradient detach based Stacked Complementary Losses (SCL) method that uses detection objective (cross entropy and smooth l1 regression) as the primary objective, and cuts in several auxiliary losses in different network stages to utilize information from the complement data (target images) that can be effective in adapting model parameters to both source and target domains. A gradient detach operation is applied between detection and context sub-networks during training to force networks to learn discriminative representations. We argue that the conventional training with primary objective mainly leverages the information from the source-domain for maximizing likelihood and ignores the complement data in shallow layers of networks, which leads to an insufficient integration within different domains. Thus, our proposed method is a more syncretic adaptation learning process. We conduct comprehensive experiments on seven datasets, the results demonstrate that our method performs favorably better than the state-of-the-art methods by a large margin. For instance, from Cityscapes to FoggyCityscapes, we achieve 37.9% mAP, outperforming the previous art Strong-Weak by 3.6%.
Tasks	Object Detection
Published	2020-01-01
URL	https://openreview.net/forum?id=rJx5_hNFwr
PDF	https://openreview.net/pdf?id=rJx5_hNFwr
PWC	https://paperswithcode.com/paper/scl-towards-accurate-domain-adaptive-object-1
Repo
Framework

Distributed Online Optimization with Long-Term Constraints


Title	Distributed Online Optimization with Long-Term Constraints
Authors	Anonymous
Abstract	We consider distributed online convex optimization problems, where the distributed system consists of various computing units connected through a time-varying communication graph. In each time step, each computing unit selects a constrained vector, experiences a loss equal to an arbitrary convex function evaluated at this vector, and may communicate to its neighbors in the graph. The objective is to minimize the system-wide loss accumulated over time. We propose a decentralized algorithm with regret and cumulative constraint violation in ${\cal O}(T^{\max{c,1-c} })$ and ${\cal O}(T^{1-c/2})$, respectively, for any $c\in (0,1)$, where $T$ is the time horizon. When the loss functions are strongly convex, we establish improved regret and constraint violation upper bounds in ${\cal O}(\log(T))$ and ${\cal O}(\sqrt{T\log(T)})$. These regret scalings match those obtained by state-of-the-art algorithms and fundamental limits in the corresponding centralized online optimization problem (for both convex and strongly convex loss functions). In the case of bandit feedback, the proposed algorithms achieve a regret and constraint violation in ${\cal O}(T^{\max{c,1-c/3 } })$ and ${\cal O}(T^{1-c/2})$ for any $c\in (0,1)$. We numerically illustrate the performance of our algorithms for the particular case of distributed online regularized linear regression problems.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Syee1pVtDS
PDF	https://openreview.net/pdf?id=Syee1pVtDS
PWC	https://paperswithcode.com/paper/distributed-online-optimization-with-long
Repo
Framework

Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents


Title	Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents
Authors	Anonymous
Abstract	As deep reinforcement learning driven by visual perception becomes more widely used there is a growing need to better understand and probe the learned agents. Understanding the decision making process and its relationship to visual inputs can be very valuable to identify problems in learned behavior. However, this topic has been relatively under-explored in the research community. In this work we present a method for synthesizing visual inputs of interest for a trained agent. Such inputs or states could be situations in which specific actions are necessary. Further, critical states in which a very high or a very low reward can be achieved are often interesting to understand the situational awareness of the system as they can correspond to risky states. To this end, we learn a generative model over the state space of the environment and use its latent space to optimize a target function for the state of interest. In our experiments we show that this method can generate insights for a variety of environments and reinforcement learning methods. We explore results in the standard Atari benchmark games as well as in an autonomous driving simulator. Based on the efficiency with which we have been able to identify behavioural weaknesses with this technique, we believe this general approach could serve as an important tool for AI safety applications.
Tasks	Autonomous Driving, Decision Making
Published	2020-01-01
URL	https://openreview.net/forum?id=rylvYaNYDH
PDF	https://openreview.net/pdf?id=rylvYaNYDH
PWC	https://paperswithcode.com/paper/finding-and-visualizing-weaknesses-of-deep-1
Repo
Framework