Paper Group NANR 27
BasisVAE: Orthogonal Latent Space for Deep Disentangled Representation. WHAT DATA IS USEFUL FOR MY DATA: TRANSFER LEARNING WITH A MIXTURE OF SELF-SUPERVISED EXPERTS. Siamese Attention Networks. Mixture Density Networks Find Viewpoint the Dominant Factor for Accurate Spatial Offset Regression. MULTI-STAGE INFLUENCE FUNCTION. GraphMix: Regularized Tr …
BasisVAE: Orthogonal Latent Space for Deep Disentangled Representation
Title | BasisVAE: Orthogonal Latent Space for Deep Disentangled Representation |
Authors | Anonymous |
Abstract | The variational autoencoder, one of the generative models, defines the latent space for the data representation, and uses variational inference to infer the posterior probability. Several methods have been devised to disentangle the latent space for controlling the generative model easily. However, due to the excessive constraints, the more disentangled the latent space is, the lower quality the generative model has. A disentangled generative model would allocate a single feature of the generated data to the only single latent variable. In this paper, we propose a method to decompose the latent space into basis, and reconstruct it by linear combination of the latent bases. The proposed model called BasisVAE consists of the encoder that extracts the features of data and estimates the coefficients for linear combination of the latent bases, and the decoder that reconstructs the data with the combined latent bases. In this method, a single latent basis is subject to change in a single generative factor, and relatively invariant to the changes in other factors. It maintains the performance while relaxing the constraint for disentanglement on a basis, as we no longer need to decompose latent space on a standard basis. Experiments on the well-known benchmark datasets of MNIST, 3DFaces and CelebA demonstrate the efficacy of the proposed method, compared to other state-of-the-art methods. The proposed model not only defines the latent space to be separated by the generative factors, but also shows the better quality of the generated and reconstructed images. The disentangled representation is verified with the generated images and the simple classifier trained on the output of the encoder. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1gEFkrtvH |
https://openreview.net/pdf?id=S1gEFkrtvH | |
PWC | https://paperswithcode.com/paper/basisvae-orthogonal-latent-space-for-deep |
Repo | |
Framework | |
WHAT DATA IS USEFUL FOR MY DATA: TRANSFER LEARNING WITH A MIXTURE OF SELF-SUPERVISED EXPERTS
Title | WHAT DATA IS USEFUL FOR MY DATA: TRANSFER LEARNING WITH A MIXTURE OF SELF-SUPERVISED EXPERTS |
Authors | Anonymous |
Abstract | Transfer learning has proven to be a successful way to train high performing deep learning models in various applications for which little labeled data is available. In transfer learning, one pre-trains the model on a large dataset such as Imagenet or MS-COCO, and fine-tunes its weights on the target domain. In our work, we claim that in the new era of ever increasing number of massive datasets, selecting the relevant pre-training data itself is a critical issue. We introduce a new problem in which available datasets are stored in one centralized location, i.e., a dataserver. We assume that a client, a target application with its own small labeled dataset, is only interested in fetching a subset of the server’s data that is most relevant to its own target domain. We propose a novel method that aims to optimally select subsets of data from the dataserver given a particular target client. We perform data selection by employing a mixture of experts model in a series of dataserver- client transactions with a small computational cost. We show the effectiveness of our work in several transfer learning scenarios, demonstrating state-of-the-art per- formance on several target datasets and tasks such as image classification, object detection and instance segmentation. We will make our framework available as a web-service, serving data to users trying to improve performance in their A.I. application. |
Tasks | Image Classification, Instance Segmentation, Object Detection, Semantic Segmentation, Transfer Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1lI3ertwH |
https://openreview.net/pdf?id=r1lI3ertwH | |
PWC | https://paperswithcode.com/paper/what-data-is-useful-for-my-data-transfer |
Repo | |
Framework | |
Siamese Attention Networks
Title | Siamese Attention Networks |
Authors | Anonymous |
Abstract | Attention operators have been widely applied on data of various orders and dimensions such as texts, images, and videos. One challenge of applying attention operators is the excessive usage of computational resources. This is due to the usage of dot product and softmax operator when computing similarity scores. In this work, we propose the Siamese similarity function that uses a feed-forward network to compute similarity scores. This results in the Siamese attention operator (SAO). In particular, SAO leads to a dramatic reduction in the requirement of computational resources. Experimental results show that our SAO can save 94% memory usage and speed up the computation by a factor of 58 compared to the regular attention operator. The computational advantage of SAO is even larger on higher-order and higher-dimensional data. Results on image classification and restoration tasks demonstrate that networks with SAOs are as effective as models with regular attention operator, while significantly outperform those without attention operators. |
Tasks | Image Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJglA3NKwS |
https://openreview.net/pdf?id=BJglA3NKwS | |
PWC | https://paperswithcode.com/paper/siamese-attention-networks |
Repo | |
Framework | |
Mixture Density Networks Find Viewpoint the Dominant Factor for Accurate Spatial Offset Regression
Title | Mixture Density Networks Find Viewpoint the Dominant Factor for Accurate Spatial Offset Regression |
Authors | Anonymous |
Abstract | Offset regression is a standard method for spatial localization in many vision tasks, including human pose estimation, object detection, and instance segmentation. However, if high localization accuracy is crucial for a task, convolutional neural networks will offset regression usually struggle to deliver. This can be attributed to the locality of the convolution operation, exacerbated by variance in scale, clutter, and viewpoint. An even more fundamental issue is the multi-modality of real-world images. As a consequence, they cannot be approximated adequately using a single mode model. Instead, we propose to use mixture density networks (MDN) for offset regression, allowing the model to manage various modes efficiently and learning to predict full conditional density of the outputs given the input. On 2D human pose estimation in the wild, which requires accurate localisation of body keypoints, we show that this yields significant improvement in localization accuracy. In particular, our experiments reveal viewpoint variation as the dominant multi-modal factor. Further, by carefully initializing MDN parameters, we do not face any instabilities in training, which is known to be a big obstacle for widespread deployment of MDN. The method can be readily applied to any task with a spatial regression component. Our findings highlight the multi-modal nature of real-world vision, and the significance of explicitly accounting for viewpoint variation, at least when spatial localization is concerned. |
Tasks | Instance Segmentation, Object Detection, Pose Estimation, Semantic Segmentation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ByeYOerFvr |
https://openreview.net/pdf?id=ByeYOerFvr | |
PWC | https://paperswithcode.com/paper/mixture-density-networks-find-viewpoint-the |
Repo | |
Framework | |
MULTI-STAGE INFLUENCE FUNCTION
Title | MULTI-STAGE INFLUENCE FUNCTION |
Authors | Anonymous |
Abstract | Multi-stage training and knowledge transfer from a large-scale pretrain task to various fine-tune end tasks have revolutionized natural language processing (NLP) and computer vision (CV), with state-of-the-art performances constantly being improved. In this paper, we develop a multi-stage influence function score to track predictions from a finetune model all the way back to the pretrain data. With this score, we can identify the pretrain examples in the pretrain task that contribute most to a prediction in the fine-tune task. The proposed multi-stage influence function generalizes the original influence function for a single model in Koh et al 2017, thereby enabling influence computation through both pretrain and fine-tune models. We test our proposed method in various experiments to show its effectiveness and potential applications. |
Tasks | Transfer Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1geR1BKPr |
https://openreview.net/pdf?id=r1geR1BKPr | |
PWC | https://paperswithcode.com/paper/multi-stage-influence-function |
Repo | |
Framework | |
GraphMix: Regularized Training of Graph Neural Networks for Semi-Supervised Learning
Title | GraphMix: Regularized Training of Graph Neural Networks for Semi-Supervised Learning |
Authors | Anonymous |
Abstract | We present GraphMix, a regularization technique for Graph Neural Network based semi-supervised object classification, leveraging the recent advances in the regularization of classical deep neural networks. Specifically, we propose a unified approach in which we train a fully-connected network jointly with the graph neural network via parameter sharing, interpolation-based regularization and self-predicted-targets. Our proposed method is architecture agnostic in the sense that it can be applied to any variant of graph neural networks which applies a parametric transformation to the features of the graph nodes. Despite its simplicity, with GraphMix we can consistently improve results and achieve or closely match state-of-the-art performance using even simpler architectures such as Graph Convolutional Networks, across three established graph benchmarks: Cora, Citeseer and Pubmed citation network datasets, as well as three newly proposed datasets : Cora-Full, Co-author-CS and Co-author-Physics. |
Tasks | Object Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkxG-CVFDH |
https://openreview.net/pdf?id=SkxG-CVFDH | |
PWC | https://paperswithcode.com/paper/graphmix-regularized-training-of-graph-neural-1 |
Repo | |
Framework | |
TriMap: Large-scale Dimensionality Reduction Using Triplets
Title | TriMap: Large-scale Dimensionality Reduction Using Triplets |
Authors | Anonymous |
Abstract | We introduce ``TriMap’'; a dimensionality reduction technique based on triplet constraints that preserves the global accuracy of the data better than the other commonly used methods such as t-SNE, LargeVis, and UMAP. To quantify the global accuracy, we introduce a score which roughly reflects the relative placement of the clusters rather than the individual points. We empirically show the excellent performance of TriMap on a large variety of datasets in terms of the quality of the embedding as well as the runtime. On our performance benchmarks, TriMap easily scales to millions of points without depleting the memory and clearly outperforms t-SNE, LargeVis, and UMAP in terms of runtime. | |
Tasks | Dimensionality Reduction |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkeOp6EKDH |
https://openreview.net/pdf?id=BkeOp6EKDH | |
PWC | https://paperswithcode.com/paper/trimap-large-scale-dimensionality-reduction-1 |
Repo | |
Framework | |
A Signal Propagation Perspective for Pruning Neural Networks at Initialization
Title | A Signal Propagation Perspective for Pruning Neural Networks at Initialization |
Authors | Anonymous |
Abstract | Network pruning is a promising avenue for compressing deep neural networks. A typical approach to pruning starts by training a model and removing redundant parameters while minimizing the impact on what is learned. Alternatively, a recent approach shows that pruning can be done at initialization prior to training, based on a pruning criterion called connection sensitivity. However, it remains unclear exactly why pruning an untrained, randomly initialized neural network is effective. In this work, by noting connection sensitivity as a form of gradients, we formally characterize initialization conditions to ensure reliable connection sensitivity measurements, which in turn yields effective pruning results. Moreover, we analyze the signal propagation properties of the resulting pruned networks and introduce a simple, data-free method to improve their trainability. Our modifications to the existing pruning at initialization method lead to improved results on all tested network models for image classification tasks. Furthermore, we empirically study the effect of supervision for pruning and demonstrate that our signal propagation perspective, combined with unsupervised pruning, can be useful in various scenarios where pruning is applied to non-standard arbitrarily-designed architectures. |
Tasks | Image Classification, Network Pruning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJeTo2VFwH |
https://openreview.net/pdf?id=HJeTo2VFwH | |
PWC | https://paperswithcode.com/paper/a-signal-propagation-perspective-for-pruning-1 |
Repo | |
Framework | |
Low Rank Training of Deep Neural Networks for Emerging Memory Technology
Title | Low Rank Training of Deep Neural Networks for Emerging Memory Technology |
Authors | Anonymous |
Abstract | The recent success of neural networks for solving difficult decision tasks has incentivized incorporating smart decision making “at the edge.” However, this work has traditionally focused on neural network inference, rather than training, due to memory and compute limitations, especially in emerging non-volatile memory systems, where writes are energetically costly and reduce lifespan. Yet, the ability to train at the edge is becoming increasingly important as it enables applications such as real-time adaptability to device drift and environmental variation, user customization, and federated learning across devices. In this work, we address four key challenges for training on edge devices with non-volatile memory: low weight update density, weight quantization, low auxiliary memory, and online learning. We present a low-rank training scheme that addresses these four challenges while maintaining computational efficiency. We then demonstrate the technique on a representative convolutional neural network across several adaptation problems, where it out-performs standard SGD both in accuracy and in number of weight updates. |
Tasks | Decision Making, Quantization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkeXL0NKwH |
https://openreview.net/pdf?id=SkeXL0NKwH | |
PWC | https://paperswithcode.com/paper/low-rank-training-of-deep-neural-networks-for |
Repo | |
Framework | |
Measuring causal influence with back-to-back regression: the linear case
Title | Measuring causal influence with back-to-back regression: the linear case |
Authors | Anonymous |
Abstract | Identifying causes from observations can be particularly challenging when i) potential factors are difficult to manipulate individually and ii) observations are complex and multi-dimensional. To address this issue, we introduce “Back-to-Back” regression (B2B), a method designed to efficiently measure, from a set of co-varying factors, the causal influences that most plausibly account for multidimensional observations. After proving the consistency of B2B and its links to other linear approaches, we show that our method outperforms least-squares regression and cross-decomposition techniques (e.g. canonical correlation analysis and partial least squares) on causal identification. Finally, we apply B2B to neuroimaging recordings of 102 subjects reading word sequences. The results show that the early and late brain representations, caused by low- and high-level word features respectively, are more reliably detected with B2B than with other standard techniques. |
Tasks | Causal Identification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1lKDlHtwS |
https://openreview.net/pdf?id=B1lKDlHtwS | |
PWC | https://paperswithcode.com/paper/measuring-causal-influence-with-back-to-back |
Repo | |
Framework | |
Interactive Classification by Asking Informative Questions
Title | Interactive Classification by Asking Informative Questions |
Authors | Anonymous |
Abstract | We propose an interactive classification approach for natural language queries. Instead of classifying given the natural language query only, we ask the user for additional information using a sequence of binary and multiple-choice questions. At each turn, we use a policy controller to decide if to present a question or pro-vide the user the final answer, and select the best question to ask by maximizing the system information gain. Our formulation enables bootstrapping the system without any interaction data, instead relying on non-interactive crowdsourcing an-notation tasks. Our evaluation shows the interaction helps the system increase its accuracy and handle ambiguous queries, while our approach effectively balances the number of questions and the final accuracy. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkx9_gHtvS |
https://openreview.net/pdf?id=rkx9_gHtvS | |
PWC | https://paperswithcode.com/paper/interactive-classification-by-asking |
Repo | |
Framework | |
Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial Attacks
Title | Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial Attacks |
Authors | Anonymous |
Abstract | We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle, providing us with loss function evaluations. We employ Markov Random Fields (MRF) to exploit the structure of input data to systematically model the covariance structure of the gradients. The MRF structure in addition to Bayesian inference for the gradients facilitates one-step attacks akin to Fast Gradient Sign Method (FGSM) albeit in the black-box setting. The resulting method uses fewer queries than the current state of the art to achieve comparable performance. In particular, in the regime of lower query budgets, we show that our method is particularly effective in terms of fewer average queries with high attack accuracy while employing one-step attacks. |
Tasks | Bayesian Inference |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1g4M0EtPS |
https://openreview.net/pdf?id=H1g4M0EtPS | |
PWC | https://paperswithcode.com/paper/gaussian-mrf-covariance-modeling-for |
Repo | |
Framework | |
MIST: Multiple Instance Spatial Transformer Networks
Title | MIST: Multiple Instance Spatial Transformer Networks |
Authors | Anonymous |
Abstract | We propose a deep network that can be trained to tackle image reconstruction and classification problems that involve detection of multiple object instances, without any supervision regarding their whereabouts. The network learns to extract the most significant top-K patches, and feeds these patches to a task-specific network – e.g., auto-encoder or classifier – to solve a domain specific problem. The challenge in training such a network is the non-differentiable top-K selection process. To address this issue, we lift the training optimization problem by treating the result of top-K selection as a slack variable, resulting in a simple, yet effective, multi-stage training. Our method is able to learn to detect recurrent structures in the training dataset by learning to reconstruct images. It can also learn to localize structures when only knowledge on the occurrence of the object is provided, and in doing so it outperforms the state-of-the-art. |
Tasks | Image Reconstruction |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJeGJaEtPH |
https://openreview.net/pdf?id=rJeGJaEtPH | |
PWC | https://paperswithcode.com/paper/mist-multiple-instance-spatial-transformer-1 |
Repo | |
Framework | |
Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning
Title | Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning |
Authors | Anonymous |
Abstract | The posteriors over neural network weights are high dimensional and multimodal. Each mode typically characterizes a meaningfully different representation of the data. We develop Cyclical Stochastic Gradient MCMC (SG-MCMC) to automatically explore such distributions. In particular, we propose a cyclical stepsize schedule, where larger steps discover new modes, and smaller steps characterize each mode. We prove non-asymptotic convergence theory of our proposed algorithm. Moreover, we provide extensive experimental results, including ImageNet, to demonstrate the effectiveness of cyclical SG-MCMC in learning complex multimodal distributions, especially for fully Bayesian inference with modern deep neural networks. |
Tasks | Bayesian Inference |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkeS1RVtPS |
https://openreview.net/pdf?id=rkeS1RVtPS | |
PWC | https://paperswithcode.com/paper/cyclical-stochastic-gradient-mcmc-for-1 |
Repo | |
Framework | |
Temporal Difference Weighted Ensemble For Reinforcement Learning
Title | Temporal Difference Weighted Ensemble For Reinforcement Learning |
Authors | Anonymous |
Abstract | Combining multiple function approximators in machine learning models typically leads to better performance and robustness compared with a single function. In reinforcement learning, ensemble algorithms such as an averaging method and a majority voting method are not always optimal, because each function can learn fundamentally different optimal trajectories from exploration. In this paper, we propose a Temporal Difference Weighted (TDW) algorithm, an ensemble method that adjusts weights of each contribution based on accumulated temporal difference errors. The advantage of this algorithm is that it improves ensemble performance by reducing weights of Q-functions unfamiliar with current trajectories. We provide experimental results for Gridworld tasks and Atari tasks that show significant performance improvements compared with baseline algorithms. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkej86VYvB |
https://openreview.net/pdf?id=rkej86VYvB | |
PWC | https://paperswithcode.com/paper/temporal-difference-weighted-ensemble-for |
Repo | |
Framework | |