April 1, 2020

2855 words 14 mins read

Paper Group NANR 27

BasisVAE: Orthogonal Latent Space for Deep Disentangled Representation. WHAT DATA IS USEFUL FOR MY DATA: TRANSFER LEARNING WITH A MIXTURE OF SELF-SUPERVISED EXPERTS. Siamese Attention Networks. Mixture Density Networks Find Viewpoint the Dominant Factor for Accurate Spatial Offset Regression. MULTI-STAGE INFLUENCE FUNCTION. GraphMix: Regularized Tr …

BasisVAE: Orthogonal Latent Space for Deep Disentangled Representation


Title	BasisVAE: Orthogonal Latent Space for Deep Disentangled Representation
Authors	Anonymous
Abstract	The variational autoencoder, one of the generative models, defines the latent space for the data representation, and uses variational inference to infer the posterior probability. Several methods have been devised to disentangle the latent space for controlling the generative model easily. However, due to the excessive constraints, the more disentangled the latent space is, the lower quality the generative model has. A disentangled generative model would allocate a single feature of the generated data to the only single latent variable. In this paper, we propose a method to decompose the latent space into basis, and reconstruct it by linear combination of the latent bases. The proposed model called BasisVAE consists of the encoder that extracts the features of data and estimates the coefficients for linear combination of the latent bases, and the decoder that reconstructs the data with the combined latent bases. In this method, a single latent basis is subject to change in a single generative factor, and relatively invariant to the changes in other factors. It maintains the performance while relaxing the constraint for disentanglement on a basis, as we no longer need to decompose latent space on a standard basis. Experiments on the well-known benchmark datasets of MNIST, 3DFaces and CelebA demonstrate the efficacy of the proposed method, compared to other state-of-the-art methods. The proposed model not only defines the latent space to be separated by the generative factors, but also shows the better quality of the generated and reconstructed images. The disentangled representation is verified with the generated images and the simple classifier trained on the output of the encoder.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=S1gEFkrtvH
PDF	https://openreview.net/pdf?id=S1gEFkrtvH
PWC	https://paperswithcode.com/paper/basisvae-orthogonal-latent-space-for-deep
Repo
Framework

WHAT DATA IS USEFUL FOR MY DATA: TRANSFER LEARNING WITH A MIXTURE OF SELF-SUPERVISED EXPERTS


Title	WHAT DATA IS USEFUL FOR MY DATA: TRANSFER LEARNING WITH A MIXTURE OF SELF-SUPERVISED EXPERTS
Authors	Anonymous
Abstract	Transfer learning has proven to be a successful way to train high performing deep learning models in various applications for which little labeled data is available. In transfer learning, one pre-trains the model on a large dataset such as Imagenet or MS-COCO, and fine-tunes its weights on the target domain. In our work, we claim that in the new era of ever increasing number of massive datasets, selecting the relevant pre-training data itself is a critical issue. We introduce a new problem in which available datasets are stored in one centralized location, i.e., a dataserver. We assume that a client, a target application with its own small labeled dataset, is only interested in fetching a subset of the server’s data that is most relevant to its own target domain. We propose a novel method that aims to optimally select subsets of data from the dataserver given a particular target client. We perform data selection by employing a mixture of experts model in a series of dataserver- client transactions with a small computational cost. We show the effectiveness of our work in several transfer learning scenarios, demonstrating state-of-the-art per- formance on several target datasets and tasks such as image classification, object detection and instance segmentation. We will make our framework available as a web-service, serving data to users trying to improve performance in their A.I. application.
Tasks	Image Classification, Instance Segmentation, Object Detection, Semantic Segmentation, Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=r1lI3ertwH
PDF	https://openreview.net/pdf?id=r1lI3ertwH
PWC	https://paperswithcode.com/paper/what-data-is-useful-for-my-data-transfer
Repo
Framework

Siamese Attention Networks


Title	Siamese Attention Networks
Authors	Anonymous
Abstract	Attention operators have been widely applied on data of various orders and dimensions such as texts, images, and videos. One challenge of applying attention operators is the excessive usage of computational resources. This is due to the usage of dot product and softmax operator when computing similarity scores. In this work, we propose the Siamese similarity function that uses a feed-forward network to compute similarity scores. This results in the Siamese attention operator (SAO). In particular, SAO leads to a dramatic reduction in the requirement of computational resources. Experimental results show that our SAO can save 94% memory usage and speed up the computation by a factor of 58 compared to the regular attention operator. The computational advantage of SAO is even larger on higher-order and higher-dimensional data. Results on image classification and restoration tasks demonstrate that networks with SAOs are as effective as models with regular attention operator, while significantly outperform those without attention operators.
Tasks	Image Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=BJglA3NKwS
PDF	https://openreview.net/pdf?id=BJglA3NKwS
PWC	https://paperswithcode.com/paper/siamese-attention-networks
Repo
Framework

Mixture Density Networks Find Viewpoint the Dominant Factor for Accurate Spatial Offset Regression


Title	Mixture Density Networks Find Viewpoint the Dominant Factor for Accurate Spatial Offset Regression
Authors	Anonymous
Abstract	Offset regression is a standard method for spatial localization in many vision tasks, including human pose estimation, object detection, and instance segmentation. However, if high localization accuracy is crucial for a task, convolutional neural networks will offset regression usually struggle to deliver. This can be attributed to the locality of the convolution operation, exacerbated by variance in scale, clutter, and viewpoint. An even more fundamental issue is the multi-modality of real-world images. As a consequence, they cannot be approximated adequately using a single mode model. Instead, we propose to use mixture density networks (MDN) for offset regression, allowing the model to manage various modes efficiently and learning to predict full conditional density of the outputs given the input. On 2D human pose estimation in the wild, which requires accurate localisation of body keypoints, we show that this yields significant improvement in localization accuracy. In particular, our experiments reveal viewpoint variation as the dominant multi-modal factor. Further, by carefully initializing MDN parameters, we do not face any instabilities in training, which is known to be a big obstacle for widespread deployment of MDN. The method can be readily applied to any task with a spatial regression component. Our findings highlight the multi-modal nature of real-world vision, and the significance of explicitly accounting for viewpoint variation, at least when spatial localization is concerned.
Tasks	Instance Segmentation, Object Detection, Pose Estimation, Semantic Segmentation
Published	2020-01-01
URL	https://openreview.net/forum?id=ByeYOerFvr
PDF	https://openreview.net/pdf?id=ByeYOerFvr
PWC	https://paperswithcode.com/paper/mixture-density-networks-find-viewpoint-the
Repo
Framework

MULTI-STAGE INFLUENCE FUNCTION


Title	MULTI-STAGE INFLUENCE FUNCTION
Authors	Anonymous
Abstract	Multi-stage training and knowledge transfer from a large-scale pretrain task to various fine-tune end tasks have revolutionized natural language processing (NLP) and computer vision (CV), with state-of-the-art performances constantly being improved. In this paper, we develop a multi-stage influence function score to track predictions from a finetune model all the way back to the pretrain data. With this score, we can identify the pretrain examples in the pretrain task that contribute most to a prediction in the fine-tune task. The proposed multi-stage influence function generalizes the original influence function for a single model in Koh et al 2017, thereby enabling influence computation through both pretrain and fine-tune models. We test our proposed method in various experiments to show its effectiveness and potential applications.
Tasks	Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=r1geR1BKPr
PDF	https://openreview.net/pdf?id=r1geR1BKPr
PWC	https://paperswithcode.com/paper/multi-stage-influence-function
Repo
Framework

GraphMix: Regularized Training of Graph Neural Networks for Semi-Supervised Learning


Title	GraphMix: Regularized Training of Graph Neural Networks for Semi-Supervised Learning
Authors	Anonymous
Abstract	We present GraphMix, a regularization technique for Graph Neural Network based semi-supervised object classification, leveraging the recent advances in the regularization of classical deep neural networks. Specifically, we propose a unified approach in which we train a fully-connected network jointly with the graph neural network via parameter sharing, interpolation-based regularization and self-predicted-targets. Our proposed method is architecture agnostic in the sense that it can be applied to any variant of graph neural networks which applies a parametric transformation to the features of the graph nodes. Despite its simplicity, with GraphMix we can consistently improve results and achieve or closely match state-of-the-art performance using even simpler architectures such as Graph Convolutional Networks, across three established graph benchmarks: Cora, Citeseer and Pubmed citation network datasets, as well as three newly proposed datasets : Cora-Full, Co-author-CS and Co-author-Physics.
Tasks	Object Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=SkxG-CVFDH
PDF	https://openreview.net/pdf?id=SkxG-CVFDH
PWC	https://paperswithcode.com/paper/graphmix-regularized-training-of-graph-neural-1
Repo
Framework

TriMap: Large-scale Dimensionality Reduction Using Triplets


Title	TriMap: Large-scale Dimensionality Reduction Using Triplets
Authors	Anonymous
Abstract	We introduce ``TriMap’'; a dimensionality reduction technique based on triplet constraints that preserves the global accuracy of the data better than the other commonly used methods such as t-SNE, LargeVis, and UMAP. To quantify the global accuracy, we introduce a score which roughly reflects the relative placement of the clusters rather than the individual points. We empirically show the excellent performance of TriMap on a large variety of datasets in terms of the quality of the embedding as well as the runtime. On our performance benchmarks, TriMap easily scales to millions of points without depleting the memory and clearly outperforms t-SNE, LargeVis, and UMAP in terms of runtime. \|
Tasks	Dimensionality Reduction
Published	2020-01-01
URL	https://openreview.net/forum?id=BkeOp6EKDH
PDF	https://openreview.net/pdf?id=BkeOp6EKDH
PWC	https://paperswithcode.com/paper/trimap-large-scale-dimensionality-reduction-1
Repo
Framework

A Signal Propagation Perspective for Pruning Neural Networks at Initialization


Title	A Signal Propagation Perspective for Pruning Neural Networks at Initialization
Authors	Anonymous
Abstract	Network pruning is a promising avenue for compressing deep neural networks. A typical approach to pruning starts by training a model and removing redundant parameters while minimizing the impact on what is learned. Alternatively, a recent approach shows that pruning can be done at initialization prior to training, based on a pruning criterion called connection sensitivity. However, it remains unclear exactly why pruning an untrained, randomly initialized neural network is effective. In this work, by noting connection sensitivity as a form of gradients, we formally characterize initialization conditions to ensure reliable connection sensitivity measurements, which in turn yields effective pruning results. Moreover, we analyze the signal propagation properties of the resulting pruned networks and introduce a simple, data-free method to improve their trainability. Our modifications to the existing pruning at initialization method lead to improved results on all tested network models for image classification tasks. Furthermore, we empirically study the effect of supervision for pruning and demonstrate that our signal propagation perspective, combined with unsupervised pruning, can be useful in various scenarios where pruning is applied to non-standard arbitrarily-designed architectures.
Tasks	Image Classification, Network Pruning
Published	2020-01-01
URL	https://openreview.net/forum?id=HJeTo2VFwH
PDF	https://openreview.net/pdf?id=HJeTo2VFwH
PWC	https://paperswithcode.com/paper/a-signal-propagation-perspective-for-pruning-1
Repo
Framework

Low Rank Training of Deep Neural Networks for Emerging Memory Technology


Title	Low Rank Training of Deep Neural Networks for Emerging Memory Technology
Authors	Anonymous
Abstract	The recent success of neural networks for solving difficult decision tasks has incentivized incorporating smart decision making “at the edge.” However, this work has traditionally focused on neural network inference, rather than training, due to memory and compute limitations, especially in emerging non-volatile memory systems, where writes are energetically costly and reduce lifespan. Yet, the ability to train at the edge is becoming increasingly important as it enables applications such as real-time adaptability to device drift and environmental variation, user customization, and federated learning across devices. In this work, we address four key challenges for training on edge devices with non-volatile memory: low weight update density, weight quantization, low auxiliary memory, and online learning. We present a low-rank training scheme that addresses these four challenges while maintaining computational efficiency. We then demonstrate the technique on a representative convolutional neural network across several adaptation problems, where it out-performs standard SGD both in accuracy and in number of weight updates.
Tasks	Decision Making, Quantization
Published	2020-01-01
URL	https://openreview.net/forum?id=SkeXL0NKwH
PDF	https://openreview.net/pdf?id=SkeXL0NKwH
PWC	https://paperswithcode.com/paper/low-rank-training-of-deep-neural-networks-for
Repo
Framework

Measuring causal influence with back-to-back regression: the linear case


Title	Measuring causal influence with back-to-back regression: the linear case
Authors	Anonymous
Abstract	Identifying causes from observations can be particularly challenging when i) potential factors are difficult to manipulate individually and ii) observations are complex and multi-dimensional. To address this issue, we introduce “Back-to-Back” regression (B2B), a method designed to efficiently measure, from a set of co-varying factors, the causal influences that most plausibly account for multidimensional observations. After proving the consistency of B2B and its links to other linear approaches, we show that our method outperforms least-squares regression and cross-decomposition techniques (e.g. canonical correlation analysis and partial least squares) on causal identification. Finally, we apply B2B to neuroimaging recordings of 102 subjects reading word sequences. The results show that the early and late brain representations, caused by low- and high-level word features respectively, are more reliably detected with B2B than with other standard techniques.
Tasks	Causal Identification
Published	2020-01-01
URL	https://openreview.net/forum?id=B1lKDlHtwS
PDF	https://openreview.net/pdf?id=B1lKDlHtwS
PWC	https://paperswithcode.com/paper/measuring-causal-influence-with-back-to-back
Repo
Framework

Interactive Classification by Asking Informative Questions


Title	Interactive Classification by Asking Informative Questions
Authors	Anonymous
Abstract	We propose an interactive classification approach for natural language queries. Instead of classifying given the natural language query only, we ask the user for additional information using a sequence of binary and multiple-choice questions. At each turn, we use a policy controller to decide if to present a question or pro-vide the user the final answer, and select the best question to ask by maximizing the system information gain. Our formulation enables bootstrapping the system without any interaction data, instead relying on non-interactive crowdsourcing an-notation tasks. Our evaluation shows the interaction helps the system increase its accuracy and handle ambiguous queries, while our approach effectively balances the number of questions and the final accuracy.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rkx9_gHtvS
PDF	https://openreview.net/pdf?id=rkx9_gHtvS
PWC	https://paperswithcode.com/paper/interactive-classification-by-asking
Repo
Framework

Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial Attacks


Title	Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial Attacks
Authors	Anonymous
Abstract	We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle, providing us with loss function evaluations. We employ Markov Random Fields (MRF) to exploit the structure of input data to systematically model the covariance structure of the gradients. The MRF structure in addition to Bayesian inference for the gradients facilitates one-step attacks akin to Fast Gradient Sign Method (FGSM) albeit in the black-box setting. The resulting method uses fewer queries than the current state of the art to achieve comparable performance. In particular, in the regime of lower query budgets, we show that our method is particularly effective in terms of fewer average queries with high attack accuracy while employing one-step attacks.
Tasks	Bayesian Inference
Published	2020-01-01
URL	https://openreview.net/forum?id=H1g4M0EtPS
PDF	https://openreview.net/pdf?id=H1g4M0EtPS
PWC	https://paperswithcode.com/paper/gaussian-mrf-covariance-modeling-for
Repo
Framework

MIST: Multiple Instance Spatial Transformer Networks


Title	MIST: Multiple Instance Spatial Transformer Networks
Authors	Anonymous
Abstract	We propose a deep network that can be trained to tackle image reconstruction and classification problems that involve detection of multiple object instances, without any supervision regarding their whereabouts. The network learns to extract the most significant top-K patches, and feeds these patches to a task-specific network – e.g., auto-encoder or classifier – to solve a domain specific problem. The challenge in training such a network is the non-differentiable top-K selection process. To address this issue, we lift the training optimization problem by treating the result of top-K selection as a slack variable, resulting in a simple, yet effective, multi-stage training. Our method is able to learn to detect recurrent structures in the training dataset by learning to reconstruct images. It can also learn to localize structures when only knowledge on the occurrence of the object is provided, and in doing so it outperforms the state-of-the-art.
Tasks	Image Reconstruction
Published	2020-01-01
URL	https://openreview.net/forum?id=rJeGJaEtPH
PDF	https://openreview.net/pdf?id=rJeGJaEtPH
PWC	https://paperswithcode.com/paper/mist-multiple-instance-spatial-transformer-1
Repo
Framework

Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning


Title	Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning
Authors	Anonymous
Abstract	The posteriors over neural network weights are high dimensional and multimodal. Each mode typically characterizes a meaningfully different representation of the data. We develop Cyclical Stochastic Gradient MCMC (SG-MCMC) to automatically explore such distributions. In particular, we propose a cyclical stepsize schedule, where larger steps discover new modes, and smaller steps characterize each mode. We prove non-asymptotic convergence theory of our proposed algorithm. Moreover, we provide extensive experimental results, including ImageNet, to demonstrate the effectiveness of cyclical SG-MCMC in learning complex multimodal distributions, especially for fully Bayesian inference with modern deep neural networks.
Tasks	Bayesian Inference
Published	2020-01-01
URL	https://openreview.net/forum?id=rkeS1RVtPS
PDF	https://openreview.net/pdf?id=rkeS1RVtPS
PWC	https://paperswithcode.com/paper/cyclical-stochastic-gradient-mcmc-for-1
Repo
Framework

Temporal Difference Weighted Ensemble For Reinforcement Learning


Title	Temporal Difference Weighted Ensemble For Reinforcement Learning
Authors	Anonymous
Abstract	Combining multiple function approximators in machine learning models typically leads to better performance and robustness compared with a single function. In reinforcement learning, ensemble algorithms such as an averaging method and a majority voting method are not always optimal, because each function can learn fundamentally different optimal trajectories from exploration. In this paper, we propose a Temporal Difference Weighted (TDW) algorithm, an ensemble method that adjusts weights of each contribution based on accumulated temporal difference errors. The advantage of this algorithm is that it improves ensemble performance by reducing weights of Q-functions unfamiliar with current trajectories. We provide experimental results for Gridworld tasks and Atari tasks that show significant performance improvements compared with baseline algorithms.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rkej86VYvB
PDF	https://openreview.net/pdf?id=rkej86VYvB
PWC	https://paperswithcode.com/paper/temporal-difference-weighted-ensemble-for
Repo
Framework