Paper Group NANR 60
LAVAE: Disentangling Location and Appearance. Proactive Sequence Generator via Knowledge Acquisition. Provable Representation Learning for Imitation Learning via Bi-level Optimization. A Perturbation Analysis of Input Transformations for Adversarial Attacks. Robustified Importance Sampling for Covariate Shift. Learning Space Partitions for Nearest …
LAVAE: Disentangling Location and Appearance
Title | LAVAE: Disentangling Location and Appearance |
Authors | Anonymous |
Abstract | We propose a probabilistic generative model for unsupervised learning of structured, interpretable, object-based representations of visual scenes. We use amortized variational inference to train the generative model end-to-end. The learned representations of object location and appearance are fully disentangled, and objects are represented independently of each other in the latent space. Unlike previous approaches that disentangle location and appearance, ours generalizes seamlessly to scenes with many more objects than encountered in the training regime. We evaluate the proposed model on multi-MNIST and multi-dSprites data sets. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HkxDheHFDr |
https://openreview.net/pdf?id=HkxDheHFDr | |
PWC | https://paperswithcode.com/paper/lavae-disentangling-location-and-appearance |
Repo | |
Framework | |
Proactive Sequence Generator via Knowledge Acquisition
Title | Proactive Sequence Generator via Knowledge Acquisition |
Authors | Anonymous |
Abstract | Sequence-to-sequence models such as transformers, which are now being used in a wide variety of NLP tasks, typically need to have very high capacity in order to perform well. Unfortunately, in production, memory size and inference speed are all strictly constrained. To address this problem, Knowledge Distillation (KD), a technique to train small models to mimic larger pre-trained models, has drawn lots of attention. The KD approach basically attempts to maximize recall, i.e., ranking Top-k”tokens in teacher models as higher as possible, however, whereas precision is more important for sequence generation because of exposure bias. Motivated by this, we develop Knowledge Acquisition (KA) where student models receive log q(y_ty_{<t},x) as rewards when producing the next token y_t given previous tokens y_{<t} and the source sentence x. We demonstrate the effectiveness of our approach on WMT’17 De-En and IWSLT’15 Th-En translation tasks, with experimental results showing that our approach gains +0.7-1.1 BLEU score compared to token-level knowledge distillation. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJehf0VKwS |
https://openreview.net/pdf?id=rJehf0VKwS | |
PWC | https://paperswithcode.com/paper/proactive-sequence-generator-via-knowledge |
Repo | |
Framework | |
Provable Representation Learning for Imitation Learning via Bi-level Optimization
Title | Provable Representation Learning for Imitation Learning via Bi-level Optimization |
Authors | Anonymous |
Abstract | A common strategy in modern learning systems is to learn a representation which is useful for many tasks, a.k.a, representation learning. We study this strategy in the imitation learning setting where multiple experts trajectories are available. We formulate representation learning as a bi-level optimization problem where the “outer” optimization tries to learn the joint representation and the “inner” optimization encodes the imitation learning setup and tries to learn task-specific parameters. We instantiate this framework for the cases where the imitation setting being behavior cloning and observation alone. Theoretically, we provably show using our framework that representation learning can reduce the sample complexity of imitation learning in both settings. We also provide proof-of-concept experiments to verify our theoretical findings. |
Tasks | Imitation Learning, Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HkxnclHKDr |
https://openreview.net/pdf?id=HkxnclHKDr | |
PWC | https://paperswithcode.com/paper/provable-representation-learning-for |
Repo | |
Framework | |
A Perturbation Analysis of Input Transformations for Adversarial Attacks
Title | A Perturbation Analysis of Input Transformations for Adversarial Attacks |
Authors | Anonymous |
Abstract | The existence of adversarial examples, or intentional mis-predictions constructed from small changes to correctly predicted examples, is one of the most significant challenges in neural network research today. Ironically, many new defenses are based on a simple observation - the adversarial inputs themselves are not robust and small perturbations to the attacking input often recover the desired prediction. While the intuition is somewhat clear, a detailed understanding of this phenomenon is missing from the research literature. This paper presents a comprehensive experimental analysis of when and why perturbation defenses work and potential mechanisms that could explain their effectiveness (or ineffectiveness) in different settings. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkePU0VYDr |
https://openreview.net/pdf?id=rkePU0VYDr | |
PWC | https://paperswithcode.com/paper/a-perturbation-analysis-of-input |
Repo | |
Framework | |
Robustified Importance Sampling for Covariate Shift
Title | Robustified Importance Sampling for Covariate Shift |
Authors | Anonymous |
Abstract | In many learning problems, the training and testing data follow different distributions and a particularly common situation is the covariate shift. To correct for sampling biases, most approaches, including the popular kernel mean matching (KMM), focus on estimating the importance weights between the two distributions. Reweighting-based methods, however, are exposed to high variance when the distributional discrepancy is large. On the other hand, the alternate approach of using nonparametric regression (NR) incurs high bias when the training size is limited. In this paper, we propose and analyze a new estimator that systematically integrates the residuals of NR with KMM reweighting, based on a control-variate perspective. The proposed estimator is shown to either outperform or match the best-known existing rates for both KMM and NR, and thus is a robust combination of both estimators. The experiments shows our estimator works well in practice. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJe0rkBYvS |
https://openreview.net/pdf?id=rJe0rkBYvS | |
PWC | https://paperswithcode.com/paper/robustified-importance-sampling-for-covariate |
Repo | |
Framework | |
Learning Space Partitions for Nearest Neighbor Search
Title | Learning Space Partitions for Nearest Neighbor Search |
Authors | Anonymous |
Abstract | Space partitions of $\mathbb{R}^d$ underlie a vast and important class of fast nearest neighbor search (NNS) algorithms. Inspired by recent theoretical work on NNS for general metric spaces (Andoni et al. 2018b,c), we develop a new framework for building space partitions reducing the problem to balanced graph partitioning followed by supervised classification. We instantiate this general approach with the KaHIP graph partitioner (Sanders and Schulz 2013) and neural networks, respectively, to obtain a new partitioning procedure called Neural Locality-Sensitive Hashing (Neural LSH). On several standard benchmarks for NNS (Aumuller et al. 2017), our experiments show that the partitions obtained by Neural LSH consistently outperform partitions found by quantization-based and tree-based methods as well as classic, data-oblivious LSH. |
Tasks | graph partitioning, Quantization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkenmREFDr |
https://openreview.net/pdf?id=rkenmREFDr | |
PWC | https://paperswithcode.com/paper/learning-space-partitions-for-nearest |
Repo | |
Framework | |
Gradient-based training of Gaussian Mixture Models in High-Dimensional Spaces
Title | Gradient-based training of Gaussian Mixture Models in High-Dimensional Spaces |
Authors | Anonymous |
Abstract | We present an approach for efficiently training Gaussian Mixture Models (GMMs) with Stochastic Gradient Descent (SGD) on large amounts of high-dimensional data (e.g., images). In such a scenario, SGD is strongly superior in terms of execution time and memory usage, although it is conceptually more complex than the traditional Expectation-Maximization (EM) algorithm. For enabling SGD training, we propose three novel ideas: First, we show that minimizing an upper bound to the GMM log likelihood instead of the full one is feasible and numerically much more stable way in high-dimensional spaces. Secondly, we propose a new regularizer that prevents SGD from converging to pathological local minima. And lastly, we present a simple method for enforcing the constraints inherent to GMM training when using SGD. We also propose an SGD-compatible simplification to the full GMM model based on local principal directions, which avoids excessive memory use in high-dimensional spaces due to quadratic growth of covariance matrices. Experiments on several standard image datasets show the validity of our approach, and we provide a publicly available TensorFlow implementation. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyeKcgHFvS |
https://openreview.net/pdf?id=HyeKcgHFvS | |
PWC | https://paperswithcode.com/paper/gradient-based-training-of-gaussian-mixture |
Repo | |
Framework | |
Text Embedding Bank Module for Detailed Image Paragraph Caption
Title | Text Embedding Bank Module for Detailed Image Paragraph Caption |
Authors | Anonymous |
Abstract | Image paragraph captioning is the task of automatically generating multiple sentences for describing images in grain-fined and coherent text. Existing typical deep learning-based models for image captioning consist of an image encoder to extract visual features and a language model decoder, which has shown promising results in single high-level sentence generation. However, only the word-level scalar guiding signal is available when the image encoder is optimized to extract visual features. The inconsistency between the parallel extraction of visual features and sequential text supervision limits its success when the length of the generated text is long (more than 50 words). In this paper, we propose a new module, called the Text Embedding Bank (TEB) module, to address the problem for image paragraph captioning. This module uses the paragraph vector model to learn fixed-length feature representations from a variable-length paragraph. We refer to the fixed-length feature as the TEB. This TEB module plays two roles to benefit paragraph captioning performance. First, it acts as a form of global and coherent deep supervision to regularize visual feature extraction in the image encoder. Second, it acts as a distributed memory to provide features of the whole paragraph to the language model, which alleviating the long-term dependency problem. Adding this module to two existing state-of-the-art methods achieves a new state-of-the-art result by a large margin on the paragraph captioning Visual Genome dataset. |
Tasks | Image Captioning, Image Paragraph Captioning, Language Modelling |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Sygt9yBtPS |
https://openreview.net/pdf?id=Sygt9yBtPS | |
PWC | https://paperswithcode.com/paper/text-embedding-bank-module-for-detailed-image |
Repo | |
Framework | |
Generative Imputation and Stochastic Prediction
Title | Generative Imputation and Stochastic Prediction |
Authors | Anonymous |
Abstract | In many machine learning applications, we are faced with incomplete datasets. In the literature, missing data imputation techniques have been mostly concerned with filling missing values. However, the existence of missing values is synonymous with uncertainties not only over the distribution of missing values but also over target class assignments that require careful consideration. In this paper, we propose a simple and effective method for imputing missing features and estimating the distribution of target assignments given incomplete data. In order to make imputations, we train a simple and effective generator network to generate imputations that a discriminator network is tasked to distinguish. Following this, a predictor network is trained using the imputed samples from the generator network to capture the classification uncertainties and make predictions accordingly. The proposed method is evaluated on CIFAR-10 image dataset as well as three real-world tabular classification datasets, under different missingness rates and structures. Our experimental results show the effectiveness of the proposed method in generating imputations as well as providing estimates for the class uncertainties in a classification task when faced with missing values. |
Tasks | Imputation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1em9h4KDS |
https://openreview.net/pdf?id=B1em9h4KDS | |
PWC | https://paperswithcode.com/paper/generative-imputation-and-stochastic-1 |
Repo | |
Framework | |
On The Difficulty of Warm-Starting Neural Network Training
Title | On The Difficulty of Warm-Starting Neural Network Training |
Authors | Anonymous |
Abstract | In many real-world deployments of machine learning systems, data arrive piecemeal. These learning scenarios may be passive, where data arrive incrementally due to structural properties of the problem (e.g., daily financial data) or active, where samples are selected according to a measure of their quality (e.g., experimental design). In both of these cases, we are building a sequence of models that incorporate an increasing amount of data. We would like each of these models in the sequence to be performant and take advantage of all the data that are available to that point. Conventional intuition suggests that when solving a sequence of related optimization problems of this form, it should be possible to initialize using the solution of the previous iterate—to “warm start’’ the optimization rather than initialize from scratch—and see reductions in wall-clock time. However, in practice this warm-starting seems to yield poorer generalization performance than models that have fresh random initializations, even though the final training losses are similar. While it appears that some hyperparameter settings allow a practitioner to close this generalization gap, they seem to only do so in regimes that damage the wall-clock gains of the warm start. Nevertheless, it is highly desirable to be able to warm-start neural network training, as it would dramatically reduce the resource usage associated with the construction of performant deep learning systems. In this work, we take a closer look at this empirical phenomenon and try to understand when and how it occurs. Although the present investigation did not lead to a solution, we hope that a thorough articulation of the problem will spur new research that may lead to improved methods that consume fewer resources during training. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryl0cAVtPH |
https://openreview.net/pdf?id=ryl0cAVtPH | |
PWC | https://paperswithcode.com/paper/on-the-difficulty-of-warm-starting-neural-1 |
Repo | |
Framework | |
Consistency-Based Semi-Supervised Active Learning: Towards Minimizing Labeling Budget
Title | Consistency-Based Semi-Supervised Active Learning: Towards Minimizing Labeling Budget |
Authors | Anonymous |
Abstract | Active learning (AL) aims to integrate data labeling and model training in a unified way, and to minimize the labeling budget by prioritizing the selection of high value data that can best improve model performance. Readily-available unlabeled data are used to evaluate selection mechanisms, but are not used for model training in conventional pool-based AL. To minimize the labeling budget, we unify unlabeled sample selection and model training based on two principles. First, we exploit both labeled and unlabeled data using semi-supervised learning (SSL) to distill information from unlabeled data that improves representation learning and sample selection. Second, we propose a simple yet effective selection metric that is coherent with the training objective such that the selected samples are effective at improving model performance. Our experimental results demonstrate superior performance with our proposed principles for limited labeled data compared to alternative AL and SSL combinations. In addition, we study the AL phenomena of `cold start’, which is becoming an increasingly more important factor to enable optimal unification of data labeling, model training and labeling budget minimization. We propose a measure that is found to be empirically correlated with the AL target loss. This measure can be used to assist in determining the proper start size. | |
Tasks | Active Learning, Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJl8SkBYPr |
https://openreview.net/pdf?id=HJl8SkBYPr | |
PWC | https://paperswithcode.com/paper/consistency-based-semi-supervised-active |
Repo | |
Framework | |
Multilingual Alignment of Contextual Word Representations
Title | Multilingual Alignment of Contextual Word Representations |
Authors | Anonymous |
Abstract | We propose procedures for evaluating and strengthening contextual embedding alignment and show that they are useful in understanding and improving multilingual BERT. In particular, after our proposed alignment procedure, BERT exhibits significantly improved zero-shot performance on XNLI compared to the base model, remarkably matching fully-supervised models for Bulgarian and Greek. Further, using non-contextual and contextual versions of word retrieval, we show that BERT outperforms fastText while being able to distinguish between multiple uses of a word, suggesting that pre-training subsumes word vectors for learning cross-lingual signals. Finally, we use the contextual word retrieval task to gain a better understanding of the strengths and weaknesses of multilingual pre-training. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1xCMyBtPS |
https://openreview.net/pdf?id=r1xCMyBtPS | |
PWC | https://paperswithcode.com/paper/multilingual-alignment-of-contextual-word |
Repo | |
Framework | |
SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses
Title | SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses |
Authors | Anonymous |
Abstract | Unsupervised domain adaptive object detection aims to learn a robust detector on the domain shift circumstance, where the training (source) domain is label-rich with bounding box annotations, while the testing (target) domain is label-agnostic and the feature distributions between training and testing domains are dissimilar or even totally different. In this paper, we propose a gradient detach based Stacked Complementary Losses (SCL) method that uses detection objective (cross entropy and smooth l1 regression) as the primary objective, and cuts in several auxiliary losses in different network stages to utilize information from the complement data (target images) that can be effective in adapting model parameters to both source and target domains. A gradient detach operation is applied between detection and context sub-networks during training to force networks to learn discriminative representations. We argue that the conventional training with primary objective mainly leverages the information from the source-domain for maximizing likelihood and ignores the complement data in shallow layers of networks, which leads to an insufficient integration within different domains. Thus, our proposed method is a more syncretic adaptation learning process. We conduct comprehensive experiments on seven datasets, the results demonstrate that our method performs favorably better than the state-of-the-art methods by a large margin. For instance, from Cityscapes to FoggyCityscapes, we achieve 37.9% mAP, outperforming the previous art Strong-Weak by 3.6%. |
Tasks | Object Detection |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJx5_hNFwr |
https://openreview.net/pdf?id=rJx5_hNFwr | |
PWC | https://paperswithcode.com/paper/scl-towards-accurate-domain-adaptive-object-1 |
Repo | |
Framework | |
Distributed Online Optimization with Long-Term Constraints
Title | Distributed Online Optimization with Long-Term Constraints |
Authors | Anonymous |
Abstract | We consider distributed online convex optimization problems, where the distributed system consists of various computing units connected through a time-varying communication graph. In each time step, each computing unit selects a constrained vector, experiences a loss equal to an arbitrary convex function evaluated at this vector, and may communicate to its neighbors in the graph. The objective is to minimize the system-wide loss accumulated over time. We propose a decentralized algorithm with regret and cumulative constraint violation in ${\cal O}(T^{\max{c,1-c} })$ and ${\cal O}(T^{1-c/2})$, respectively, for any $c\in (0,1)$, where $T$ is the time horizon. When the loss functions are strongly convex, we establish improved regret and constraint violation upper bounds in ${\cal O}(\log(T))$ and ${\cal O}(\sqrt{T\log(T)})$. These regret scalings match those obtained by state-of-the-art algorithms and fundamental limits in the corresponding centralized online optimization problem (for both convex and strongly convex loss functions). In the case of bandit feedback, the proposed algorithms achieve a regret and constraint violation in ${\cal O}(T^{\max{c,1-c/3 } })$ and ${\cal O}(T^{1-c/2})$ for any $c\in (0,1)$. We numerically illustrate the performance of our algorithms for the particular case of distributed online regularized linear regression problems. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Syee1pVtDS |
https://openreview.net/pdf?id=Syee1pVtDS | |
PWC | https://paperswithcode.com/paper/distributed-online-optimization-with-long |
Repo | |
Framework | |
Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents
Title | Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents |
Authors | Anonymous |
Abstract | As deep reinforcement learning driven by visual perception becomes more widely used there is a growing need to better understand and probe the learned agents. Understanding the decision making process and its relationship to visual inputs can be very valuable to identify problems in learned behavior. However, this topic has been relatively under-explored in the research community. In this work we present a method for synthesizing visual inputs of interest for a trained agent. Such inputs or states could be situations in which specific actions are necessary. Further, critical states in which a very high or a very low reward can be achieved are often interesting to understand the situational awareness of the system as they can correspond to risky states. To this end, we learn a generative model over the state space of the environment and use its latent space to optimize a target function for the state of interest. In our experiments we show that this method can generate insights for a variety of environments and reinforcement learning methods. We explore results in the standard Atari benchmark games as well as in an autonomous driving simulator. Based on the efficiency with which we have been able to identify behavioural weaknesses with this technique, we believe this general approach could serve as an important tool for AI safety applications. |
Tasks | Autonomous Driving, Decision Making |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rylvYaNYDH |
https://openreview.net/pdf?id=rylvYaNYDH | |
PWC | https://paperswithcode.com/paper/finding-and-visualizing-weaknesses-of-deep-1 |
Repo | |
Framework | |