Paper Group NANR 98
Deep automodulators. GLAD: Learning Sparse Graph Recovery. Regional based query in graph active learning. A critical analysis of self-supervision, or what we can learn from a single image. FLAT MANIFOLD VAES. PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search. Detecting Extrapolation with Local Ensembles. DBA: Distribute …
Deep automodulators
Title | Deep automodulators |
Authors | Anonymous |
Abstract | We introduce a novel autoencoder model that deviates from traditional autoencoders by using the full latent vector to independently modulate each layer in the decoder. We demonstrate how such an ‘automodulator’ allows for a principled approach to enforce latent space disentanglement, mixing of latent codes, and a straightforward way to utilize prior information that can be construed as a scale-specific invariance. Unlike GANs, autoencoder models can directly operate on new real input samples. This makes our model directly suitable for applications involving real-world inputs. As the architectural backbone, we extend recent generative autoencoder models that retain input identity and image sharpness at high resolutions better than VAEs. We show that our model achieves state-of-the-art latent space disentanglement and achieves high quality and diversity of output samples, as well as faithfulness of reconstructions. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkeNqkBFPB |
https://openreview.net/pdf?id=rkeNqkBFPB | |
PWC | https://paperswithcode.com/paper/deep-automodulators |
Repo | |
Framework | |
GLAD: Learning Sparse Graph Recovery
Title | GLAD: Learning Sparse Graph Recovery |
Authors | Anonymous |
Abstract | Recovering sparse conditional independence graphs from data is a fundamental problem in machine learning with wide applications. A popular formulation of the problem is an $\ell_1$ regularized maximum likelihood estimation. Many convex optimization algorithms have been designed to solve this formulation to recover the graph structure. Recently, there is a surge of interest to learn algorithms directly based on data, and in this case, learn to map empirical covariance to the sparse precision matrix. However, it is a challenging task in this case, since the symmetric positive definiteness (SPD) and sparsity of the matrix are not easy to enforce in learned algorithms, and a direct mapping from data to precision matrix may contain many parameters. We propose a deep learning architecture, GLAD, which uses an Alternating Minimization (AM) algorithm as our model inductive bias, and learns the model parameters via supervised learning. We show that GLAD learns a very compact and effective model for recovering sparse graphs from data. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkxpMTEtPB |
https://openreview.net/pdf?id=BkxpMTEtPB | |
PWC | https://paperswithcode.com/paper/glad-learning-sparse-graph-recovery |
Repo | |
Framework | |
Regional based query in graph active learning
Title | Regional based query in graph active learning |
Authors | Anonymous |
Abstract | Graph convolution networks (GCN) have emerged as a leading method to classify nodes and graphs. These GCN have been combined with active learning (AL) methods, when a small chosen set of tagged examples can be used. Most AL-GCN use the sample class uncertainty as selection criteria, and not the graph. In contrast, representative sampling uses the graph, but not the prediction. We propose to combine the two and query nodes based on the uncertainty of the graph around them. We here propose two novel methods to select optimal nodes in AL-GCN that explicitly use the graph information to query for optimal nodes. The first method named regional uncertainty is an extension of the classical entropy measure, but instead of sampling nodes with high entropy, we propose to sample nodes surrounded by nodes of different classes, or nodes with high ambiguity. The second method called Adaptive Page-Rank is an extension of the page-rank algorithm, where nodes that have a low probability of being reached by random walks from tagged nodes are selected. We show that the latter is optimal when the fraction of tagged nodes is low, and when this fraction grows to one over the average degree, the regional uncertainty performs better than all existing methods. While we have tested these methods on graphs, such methods can be extended to any classification problem, where a distance can be defined between the input samples. |
Tasks | Active Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkeHt34Fwr |
https://openreview.net/pdf?id=BkeHt34Fwr | |
PWC | https://paperswithcode.com/paper/regional-based-query-in-graph-active-learning-1 |
Repo | |
Framework | |
A critical analysis of self-supervision, or what we can learn from a single image
Title | A critical analysis of self-supervision, or what we can learn from a single image |
Authors | Anonymous |
Abstract | We look critically at popular self-supervision techniques for learning deep convolutional neural networks without manual labels. We show that three different and representative methods, BiGAN, RotNet and DeepCluster, can learn the first few layers of a convolutional network from a single image as well as using millions of images and manual labels, provided that strong data augmentation is used. However, for deeper layers the gap with manual supervision cannot be closed even if millions of unlabelled images are used for training. We conclude that: (1) the weights of the early layers of deep networks contain limited information about the statistics of natural images, that (2) such low-level statistics can be learned through self-supervision just as well as through strong supervision, and that (3) the low-level statistics can be captured via synthetic transformations instead of using a large image dataset. |
Tasks | Data Augmentation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1esx6EYvr |
https://openreview.net/pdf?id=B1esx6EYvr | |
PWC | https://paperswithcode.com/paper/a-critical-analysis-of-self-supervision-or |
Repo | |
Framework | |
FLAT MANIFOLD VAES
Title | FLAT MANIFOLD VAES |
Authors | Anonymous |
Abstract | Latent-variable models represent observed data by mapping a prior distribution over some latent space to an observed space. Often, the prior distribution is specified by the user to be very simple, effectively shifting the burden of a learning algorithm to the estimation of a highly non-linear likelihood function. This poses a problem for the calculation of a popular distance function, the geodesic between data points in the latent space, as this is often solved iteratively via numerical methods. These are less effective if the problem at hand is not well captured by first or second-order approximations. In this work, we propose less complex likelihood functions by allowing complex distributions and explicitly penalising the curvature of the decoder. This results in geodesics which are approximated well by the Euclidean distance in latent space, decreasing the runtime by a factor of 1,000 with little loss in accuracy. |
Tasks | Latent Variable Models |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkgWIxSFvr |
https://openreview.net/pdf?id=SkgWIxSFvr | |
PWC | https://paperswithcode.com/paper/flat-manifold-vaes |
Repo | |
Framework | |
PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search
Title | PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search |
Authors | Anonymous |
Abstract | Differentiable architecture search (DARTS) provided a fast solution in finding effective network architectures, but suffered from large memory and computing overheads in jointly training a super-net and searching for an optimal architecture. In this paper, we present a novel approach, namely Partially-Connected DARTS, by sampling a small part of super-net to reduce the redundancy in exploring the network space, thereby performing a more efficient search without comprising the performance. In particular, we perform operation search in a subset of channels while bypassing the held out part in a shortcut. This strategy may suffer from an undesired inconsistency on selecting the edges of super-net caused by sampling different channels. We solve it by introducing edge normalization, which adds a new set of edge-level hyper-parameters to reduce uncertainty in search. Thanks to the reduced memory cost, PC-DARTS can be trained with a larger batch size and, consequently, enjoy both faster speed and higher training stability. Experiment results demonstrate the effectiveness of the proposed method. Specifically, we achieve an error rate of 2.57% on CIFAR10 within merely 0.1 GPU-days for architecture search, and a state-of-the-art top-1 error rate of 24.2% on ImageNet (under the mobile setting) within 3.8 GPU-days for search. Our code has been made available at https://www.dropbox.com/sh/on9lg3rpx1r6dkf/AABG5mt0sMHjnEJyoRnLEYW4a?dl=0. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJlS634tPr |
https://openreview.net/pdf?id=BJlS634tPr | |
PWC | https://paperswithcode.com/paper/pc-darts-partial-channel-connections-for-1 |
Repo | |
Framework | |
Detecting Extrapolation with Local Ensembles
Title | Detecting Extrapolation with Local Ensembles |
Authors | Anonymous |
Abstract | We present local ensembles, a method for detecting extrapolation at test time in a pre-trained model. We focus on underdetermination as a key component of extrapolation: we aim to detect when many possible predictions are consistent with the training data and model class. Our method uses local second-order information to approximate the variance of predictions across an ensemble of models from the same class. We compute this approximation by estimating the norm of the component of a test point’s gradient that aligns with the low-curvature directions of the Hessian, and provide a tractable method for estimating this quantity. Experimentally, we show that our method is capable of detecting when a pre-trained model is extrapolating on test data, with applications to out-of-distribution detection, detecting spurious correlates, and active learning. |
Tasks | Active Learning, Out-of-Distribution Detection |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJl6bANtwH |
https://openreview.net/pdf?id=BJl6bANtwH | |
PWC | https://paperswithcode.com/paper/detecting-extrapolation-with-local-ensembles-1 |
Repo | |
Framework | |
DBA: Distributed Backdoor Attacks against Federated Learning
Title | DBA: Distributed Backdoor Attacks against Federated Learning |
Authors | Anonymous |
Abstract | Backdoor attacks aim to manipulate a subset of training data by injecting adversarial triggers such that machine learning models trained on the tampered dataset will make arbitrarily (targeted) incorrect prediction on the testset with the same trigger embedded. While federated learning (FL) is capable of aggregating information provided by different parties for training a better model, its distributed learning methodology and inherently heterogeneous data distribution across parties may bring new vulnerabilities. In addition to recent centralized backdoor attacks on FL where each party embeds the same global trigger during training, we propose the distributed backdoor attack (DBA) — a novel threat assessment framework developed by fully exploiting the distributed nature of FL. DBA decomposes a global trigger pattern into separate local patterns and embed them into the training set of different adversarial parties respectively. Compared to standard centralized backdoors, we show that DBA is substantially more persistent and stealthy against FL on diverse datasets such as finance and image data. We conduct extensive experiments to show that the attack success rate of DBA is significantly higher than centralized backdoors under different settings. Moreover, we find that distributed attacks are indeed more insidious, as DBA can evade two state-of-the-art robust FL algorithms against centralized backdoors. We also provide explanations for the effectiveness of DBA via feature visual interpretation and feature importance ranking. To further explore the properties of DBA, we test the attack performance by varying different trigger factors, including local trigger variations (size, gap, and location), scaling factor in FL, data distribution, and poison ratio and interval. Our proposed DBA and thorough evaluation results shed lights on characterizing the robustness of FL. |
Tasks | Feature Importance |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkgyS0VFvr |
https://openreview.net/pdf?id=rkgyS0VFvr | |
PWC | https://paperswithcode.com/paper/dba-distributed-backdoor-attacks-against |
Repo | |
Framework | |
Reformer: The Efficient Transformer
Title | Reformer: The Efficient Transformer |
Authors | Anonymous |
Abstract | Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O(L^2) to O(L), where L is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkgNKkHtvB |
https://openreview.net/pdf?id=rkgNKkHtvB | |
PWC | https://paperswithcode.com/paper/reformer-the-efficient-transformer |
Repo | |
Framework | |
On summarized validation curves and generalization
Title | On summarized validation curves and generalization |
Authors | Anonymous |
Abstract | The validation curve is widely used for model selection and hyper-parameter search with the curve usually summarized over all the training tasks. However, this summarization tends to lose the intricacies of the per-task curves and it isn’t able to reflect if all the tasks are at their validation optimum even if the summarized curve might be. In this work, we explore this loss of information, how it affects the model at testing and how to detect it using interval plots. We propose two techniques as a proof-of-concept of the potential gain in the test performance when per-task validation curves are accounted for. Our experiments on three large datasets show up to a 2.5% increase (averaged over multiple trials) in the test accuracy rate when model selection uses the per-task validation maximums instead of the summarized validation maximum. This potential increase is not a result of any modification to the model but rather at what point of training the weights were selected from. This presents an exciting direction for new training and model selection techniques that rely on more than just averaged metrics. |
Tasks | Model Selection |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1lRg0VKDr |
https://openreview.net/pdf?id=S1lRg0VKDr | |
PWC | https://paperswithcode.com/paper/on-summarized-validation-curves-and |
Repo | |
Framework | |
BEYOND SUPERVISED LEARNING: RECOGNIZING UNSEEN ATTRIBUTE-OBJECT PAIRS WITH VISION-LANGUAGE FUSION AND ATTRACTOR NETWORKS
Title | BEYOND SUPERVISED LEARNING: RECOGNIZING UNSEEN ATTRIBUTE-OBJECT PAIRS WITH VISION-LANGUAGE FUSION AND ATTRACTOR NETWORKS |
Authors | Hui Chen, Zhixiong Nan, Nanning Zheng |
Abstract | This paper handles a challenging problem, unseen attribute-object pair recognition, which asks a model to simultaneously recognize the attribute type and the object type of a given image while this attribute-object pair is not included in the training set. In the past years, the conventional classifier-based methods, which recognize unseen attribute-object pairs by composing separately-trained attribute classifiers and object classifiers, are strongly frustrated. Different from conventional methods, we propose a generative model with a visual pathway and a linguistic pathway. In each pathway, the attractor network is involved to learn the intrinsic feature representation to explore the inner relationship between the attribute and the object. With the learned features in both pathways, the unseen attribute-object pair is recognized by finding out the pair whose linguistic feature closely matches the visual feature of the given image. On two public datasets, our model achieves impressive experiment results, notably outperforming the state-of-the-art methods. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Byx0PREtDH |
https://openreview.net/pdf?id=Byx0PREtDH | |
PWC | https://paperswithcode.com/paper/beyond-supervised-learning-recognizing-unseen |
Repo | |
Framework | |
Escaping Saddle Points Faster with Stochastic Momentum
Title | Escaping Saddle Points Faster with Stochastic Momentum |
Authors | Anonymous |
Abstract | Stochastic gradient descent (SGD) with stochastic momentum is popular in nonconvex stochastic optimization and particularly for the training of deep neural networks. In standard SGD, parameters are updated by improving along the path of the gradient at the current iterate on a batch of examples, where the addition of a ``momentum’’ term biases the update in the direction of the previous change in parameters. In non-stochastic convex optimization one can show that a momentum adjustment provably reduces convergence time in many settings, yet such results have been elusive in the stochastic and non-convex settings. At the same time, a widely-observed empirical phenomenon is that in training deep networks stochastic momentum appears to significantly improve convergence time, variants of it have flourished in the development of other popular update methods, e.g. ADAM, AMSGrad, etc. Yet theoretical justification for the use of stochastic momentum has remained a significant open question. In this paper we propose an answer: stochastic momentum improves deep network training because it modifies SGD to escape saddle points faster and, consequently, to more quickly find a second order stationary point. Our theoretical results also shed light on the related question of how to choose the ideal momentum parameter–our analysis suggests that $\beta \in [0,1)$ should be large (close to 1), which comports with empirical findings. We also provide experimental findings that further validate these conclusions. | |
Tasks | Stochastic Optimization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkeNfp4tPr |
https://openreview.net/pdf?id=rkeNfp4tPr | |
PWC | https://paperswithcode.com/paper/escaping-saddle-points-faster-with-stochastic |
Repo | |
Framework | |
SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models
Title | SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models |
Authors | Anonymous |
Abstract | The standard variational lower bounds used to train latent variable models produce biased estimates of most quantities of interest. We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series. If parameterized by an encoder-decoder architecture, the parameters of the encoder can be optimized to minimize its variance of this estimator. We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost. This estimator also allows use of latent variable models for tasks where unbiased estimators, rather than marginal likelihood lower bounds, are preferred, such as minimizing reverse KL divergences and estimating score functions. |
Tasks | Latent Variable Models |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SylkYeHtwr |
https://openreview.net/pdf?id=SylkYeHtwr | |
PWC | https://paperswithcode.com/paper/sumo-unbiased-estimation-of-log-marginal |
Repo | |
Framework | |
Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control
Title | Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control |
Authors | Anonymous |
Abstract | In this paper, we introduce Symplectic ODE-Net (SymODEN), a deep learning framework which can infer the dynamics of a physical system from observed state trajectories. To achieve better generalization with fewer training samples, SymODEN incorporates appropriate inductive bias by designing the associated computation graph in a physics-informed manner. In particular, we enforce Hamiltonian dynamics with control to learn the underlying dynamics in a transparent way which can then be leveraged to draw insight about relevant physical aspects of the system, such as mass and potential energy. In addition, we propose a parametrization which can enforce this Hamiltonian formalism even when the generalized coordinate data is embedded in a high-dimensional space or we can only access velocity data instead of generalized momentum. This framework, by offering interpretable, physically-consistent models for physical systems, opens up new possibilities for synthesizing model-based control strategies. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryxmb1rKDS |
https://openreview.net/pdf?id=ryxmb1rKDS | |
PWC | https://paperswithcode.com/paper/symplectic-ode-net-learning-hamiltonian-1 |
Repo | |
Framework | |
Intrinsically Motivated Discovery of Diverse Patterns in Self-Organizing Systems
Title | Intrinsically Motivated Discovery of Diverse Patterns in Self-Organizing Systems |
Authors | Anonymous |
Abstract | In many complex dynamical systems, artificial or natural, one can observe self-organization of patterns emerging from local rules. Cellular automata, like the Game of Life (GOL), have been widely used as abstract models enabling the study of various aspects of self-organization and morphogenesis, such as the emergence of spatially localized patterns. However, findings of self-organized patterns in such models have so far relied on manual tuning of parameters and initial states, and on the human eye to identify interesting patterns. In this paper, we formulate the problem of automated discovery of diverse self-organized patterns in such high-dimensional complex dynamical systems, as well as a framework for experimentation and evaluation. Using a continuous GOL as a testbed, we show that recent intrinsically-motivated machine learning algorithms (POP-IMGEPs), initially developed for learning of inverse models in robotics, can be transposed and used in this novel application area. These algorithms combine intrinsically-motivated goal exploration and unsupervised learning of goal space representations. Goal space representations describe the interesting features of patterns for which diverse variations should be discovered. In particular, we compare various approaches to define and learn goal space representations from the perspective of discovering diverse spatially localized patterns. Moreover, we introduce an extension of a state-of-the-art POP-IMGEP algorithm which incrementally learns a goal representation using a deep auto-encoder, and the use of CPPN primitives for generating initialization parameters. We show that it is more efficient than several baselines and equally efficient as a system pre-trained on a hand-made database of patterns identified by human experts. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkg6sJHYDr |
https://openreview.net/pdf?id=rkg6sJHYDr | |
PWC | https://paperswithcode.com/paper/intrinsically-motivated-discovery-of-diverse |
Repo | |
Framework | |