Paper Group NANR 102
Multitask Soft Option Learning. FACE SUPER-RESOLUTION GUIDED BY 3D FACIAL PRIORS. Analysis and Interpretation of Deep CNN Representations as Perceptual Quality Features. Manifold Modeling in Embedded Space: A Perspective for Interpreting “Deep Image Prior”. Intriguing Properties of Adversarial Training at Scale. Tranquil Clouds: Neural Networks for …
Multitask Soft Option Learning
Title | Multitask Soft Option Learning |
Authors | Anonymous |
Abstract | We present Multitask Soft Option Learning (MSOL), a hierarchical multi-task framework based on Planning-as-Inference. MSOL extends the concept of Options, using separate variational posteriors for each task, regularized by a shared prior. The learned soft-options are temporally extended, allowing a higher-level master policy to train faster on new tasks by making decisions with lower frequency. Additionally, MSOL allows fine-tuning of soft-options for new tasks without unlearning previously useful behavior, and avoids problems with local minima in multitask training. We demonstrate empirically that MSOL significantly outperforms both hierarchical and flat transfer-learning baselines in challenging multi-task environments. |
Tasks | Transfer Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkeDGJBKvB |
https://openreview.net/pdf?id=BkeDGJBKvB | |
PWC | https://paperswithcode.com/paper/multitask-soft-option-learning-1 |
Repo | |
Framework | |
FACE SUPER-RESOLUTION GUIDED BY 3D FACIAL PRIORS
Title | FACE SUPER-RESOLUTION GUIDED BY 3D FACIAL PRIORS |
Authors | Anonymous |
Abstract | State-of-the-art face super-resolution methods employ deep convolutional neural networks to learn a mapping between low- and high-resolution facial patterns by exploring local appearance knowledge. However, most of these methods do not well exploit facial structures and identity information, and struggle to deal with facial images that exhibit large pose variation and misalignment. In this paper, we propose a novel face super-resolution method that explicitly incorporates 3D facial priors which grasp the sharp facial structures. Firstly, the 3D face rendering branch is set up to obtain 3D priors of salient facial structures and identity knowledge. Secondly, the Spatial Attention Mechanism is used to better exploit this hierarchical information (i.e. intensity similarity, 3D facial structure, identity content) for the super-resolution problem. Extensive experiments demonstrate that the proposed algorithm achieves superior face super-resolution results and outperforms the state-of-the-art. |
Tasks | Super-Resolution |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJeOHJHFPH |
https://openreview.net/pdf?id=HJeOHJHFPH | |
PWC | https://paperswithcode.com/paper/face-super-resolution-guided-by-3d-facial |
Repo | |
Framework | |
Analysis and Interpretation of Deep CNN Representations as Perceptual Quality Features
Title | Analysis and Interpretation of Deep CNN Representations as Perceptual Quality Features |
Authors | Anonymous |
Abstract | Pre-trained Deep Convolutional Neural Network (CNN) features have popularly been used as full-reference perceptual quality features for CNN based image quality assessment, super-resolution, image restoration and a variety of image-to-image translation problems. In this paper, to get more insight, we link basic human visual perception to characteristics of learned deep CNN representations as a novel and first attempt to interpret them. We characterize the frequency and orientation tuning of channels in trained object detection deep CNNs (e.g., VGG-16) by applying grating stimuli of different spatial frequencies and orientations as input. We observe that the behavior of CNN channels as spatial frequency and orientation selective filters can be used to link basic human visual perception models to their characteristics. Doing so, we develop a theory to get more insight into deep CNN representations as perceptual quality features. We conclude that sensitivity to spatial frequencies that have lower contrast masking thresholds in human visual perception and a definite and strong orientation selectivity are important attributes of deep CNN channels that deliver better perceptual quality features. |
Tasks | Image Quality Assessment, Image Restoration, Image-to-Image Translation, Object Detection, Super-Resolution |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJlLvnEtDB |
https://openreview.net/pdf?id=BJlLvnEtDB | |
PWC | https://paperswithcode.com/paper/analysis-and-interpretation-of-deep-cnn |
Repo | |
Framework | |
Manifold Modeling in Embedded Space: A Perspective for Interpreting “Deep Image Prior”
Title | Manifold Modeling in Embedded Space: A Perspective for Interpreting “Deep Image Prior” |
Authors | Anonymous |
Abstract | Deep image prior (DIP), which utilizes a deep convolutional network (ConvNet) structure itself as an image prior, has attracted huge attentions in computer vision community. It empirically shows the effectiveness of ConvNet structure for various image restoration applications. However, why the DIP works so well is still unknown, and why convolution operation is essential for image reconstruction or enhancement is not very clear. In this study, we tackle these questions. The proposed approach is dividing the convolution into delay-embedding'' and transformation (\ie encoder-decoder)'', and proposing a simple, but essential, image/tensor modeling method which is closely related to dynamical systems and self-similarity. The proposed method named as manifold modeling in embedded space (MMES) is implemented by using a novel denoising-auto-encoder in combination with multi-way delay-embedding transform. In spite of its simplicity, the image/tensor completion and super-resolution results of MMES are quite similar even competitive to DIP in our extensive experiments, and these results would help us for reinterpreting/characterizing the DIP from a perspective of ``low-dimensional patch-manifold prior’'. | |
Tasks | Denoising, Image Reconstruction, Image Restoration, Super-Resolution |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJgBra4YDS |
https://openreview.net/pdf?id=SJgBra4YDS | |
PWC | https://paperswithcode.com/paper/manifold-modeling-in-embedded-space-a-1 |
Repo | |
Framework | |
Intriguing Properties of Adversarial Training at Scale
Title | Intriguing Properties of Adversarial Training at Scale |
Authors | Anonymous |
Abstract | Adversarial training is one of the main defenses against adversarial attacks. In this paper, we provide the first rigorous study on diagnosing elements of large-scale adversarial training on ImageNet, which reveals two intriguing properties. First, we study the role of normalization. Batch normalization (BN) is a crucial element for achieving state-of-the-art performance on many vision tasks, but we show it may prevent networks from obtaining strong robustness in adversarial training. One unexpected observation is that, for models trained with BN, simply removing clean images from training data largely boosts adversarial robustness, i.e., 18.3%. We relate this phenomenon to the hypothesis that clean images and adversarial images are drawn from two different domains. This two-domain hypothesis may explain the issue of BN when training with a mixture of clean and adversarial images, as estimating normalization statistics of this mixture distribution is challenging. Guided by this two-domain hypothesis, we show disentangling the mixture distribution for normalization, i.e., applying separate BNs to clean and adversarial images for statistics estimation, achieves much stronger robustness. Additionally, we find that enforcing BNs to behave consistently at training and testing can further enhance robustness. Second, we study the role of network capacity. We find our so-called “deep” networks are still shallow for the task of adversarial learning. Unlike traditional classification tasks where accuracy is only marginally improved by adding more layers to “deep” networks (e.g., ResNet-152), adversarial training exhibits a much stronger demand on deeper networks to achieve higher adversarial robustness. This robustness improvement can be observed substantially and consistently even by pushing the network capacity to an unprecedented scale, i.e., ResNet-638. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyxJhCEFDS |
https://openreview.net/pdf?id=HyxJhCEFDS | |
PWC | https://paperswithcode.com/paper/intriguing-properties-of-adversarial-training-1 |
Repo | |
Framework | |
Tranquil Clouds: Neural Networks for Learning Temporally Coherent Features in Point Clouds
Title | Tranquil Clouds: Neural Networks for Learning Temporally Coherent Features in Point Clouds |
Authors | Anonymous |
Abstract | Point clouds, as a form of Lagrangian representation, allow for powerful and flexible applications in a large number of computational disciplines. We propose a novel deep-learning method to learn stable and temporally coherent feature spaces for points clouds that change over time. We identify a set of inherent problems with these approaches: without knowledge of the time dimension, the inferred solutions can exhibit strong flickering, and easy solutions to suppress this flickering can result in undesirable local minima that manifest themselves as halo structures. We propose a novel temporal loss function that takes into account higher time derivatives of the point positions, and encourages mingling, i.e., to prevent the aforementioned halos. We combine these techniques in a super-resolution method with a truncation approach to flexibly adapt the size of the generated positions. We show that our method works for large, deforming point sets from different sources to demonstrate the flexibility of our approach. |
Tasks | Super-Resolution |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJeKh3VYDH |
https://openreview.net/pdf?id=BJeKh3VYDH | |
PWC | https://paperswithcode.com/paper/tranquil-clouds-neural-networks-for-learning-1 |
Repo | |
Framework | |
Neural Embeddings for Nearest Neighbor Search Under Edit Distance
Title | Neural Embeddings for Nearest Neighbor Search Under Edit Distance |
Authors | Anonymous |
Abstract | The edit distance between two sequences is an important metric with many applications. The drawback, however, is the high computational cost of many basic problems involving this notion, such as the nearest neighbor search. A natural approach to overcoming this issue is to embed the sequences into a vector space such that the geometric distance in the target space approximates the edit distance in the original space. However, the known edit distance embedding algorithms, such as Chakraborty et al.(2016), construct embeddings that are data-independent, i.e., do not exploit any structure of embedded sets of strings. In this paper we propose an alternative approach, which learns the embedding function according to the data distribution. Our experiments show that the new algorithm has much better empirical performance than prior data-independent methods. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJlWIANtPH |
https://openreview.net/pdf?id=HJlWIANtPH | |
PWC | https://paperswithcode.com/paper/neural-embeddings-for-nearest-neighbor-search |
Repo | |
Framework | |
On importance-weighted autoencoders
Title | On importance-weighted autoencoders |
Authors | Anonymous |
Abstract | The importance weighted autoencoder (IWAE) (Burda et al., 2016) is a popular variational-inference method which achieves a tighter evidence bound (and hence a lower bias) than standard variational autoencoders by optimising a multi-sample objective, i.e. an objective that is expressible as an integral over $K > 1$ Monte Carlo samples. Unfortunately, IWAE crucially relies on the availability of reparametrisations and even if these exist, the multi-sample objective leads to inference-network gradients which break down as $K$ is increased (Rainforth et al., 2018). This breakdown can only be circumvented by removing high-variance score-function terms, either by heuristically ignoring them (which yields the ‘sticking-the-landing’ IWAE (IWAE-STL) gradient from Roeder et al. (2017)) or through an identity from Tucker et al. (2019) (which yields the ‘doubly-reparametrised’ IWAE (IWAE-DREG) gradient). In this work, we argue that directly optimising the proposal distribution in importance sampling as in the reweighted wake-sleep (RWS) algorithm from Bornschein & Bengio (2015) is preferable to optimising IWAE-type multi-sample objectives. To formalise this argument, we introduce an adaptive-importance sampling framework termed adaptive importance sampling for learning (AISLE) which slightly generalises the RWS algorithm. We then show that AISLE admits IWAE-STL and IWAE-DREG (i.e. the IWAE-gradients which avoid breakdown) as special cases. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryg7jhEtPB |
https://openreview.net/pdf?id=ryg7jhEtPB | |
PWC | https://paperswithcode.com/paper/on-importance-weighted-autoencoders |
Repo | |
Framework | |
HOW THE CHOICE OF ACTIVATION AFFECTS TRAINING OF OVERPARAMETRIZED NEURAL NETS
Title | HOW THE CHOICE OF ACTIVATION AFFECTS TRAINING OF OVERPARAMETRIZED NEURAL NETS |
Authors | Anonymous |
Abstract | It is well-known that overparametrized neural networks trained using gradient based methods quickly achieve small training error with appropriate hyperparameter settings. Recent papers have proved this statement theoretically for highly overparametrized networks under reasonable assumptions. These results either assume that the activation function is ReLU or they depend on the minimum eigenvalue of a certain Gram matrix. In the latter case, existing works only prove that this minimum eigenvalue is non-zero and do not provide quantitative bounds which require that this eigenvalue be large. Empirically, a number of alternative activation functions have been proposed which tend to perform better than ReLU at least in some settings but no clear understanding has emerged. This state of affairs underscores the importance of theoretically understanding the impact of activation functions on training. In the present paper, we provide theoretical results about the effect of activation function on the training of highly overparametrized 2-layer neural networks. A crucial property that governs the performance of an activation is whether or not it is smooth: • For non-smooth activations such as ReLU, SELU, ELU, which are not smooth because there is a point where either the first order or second order derivative is discontinuous, all eigenvalues of the associated Gram matrix are large under minimal assumptions on the data. • For smooth activations such as tanh, swish, polynomial, which have derivatives of all orders at all points, the situation is more complex: if the subspace spanned by the data has small dimension then the minimum eigenvalue of the Gram matrix can be small leading to slow training. But if the dimension is large and the data satisfies another mild condition, then the eigenvalues are large. If we allow deep networks, then the small data dimension is not a limitation provided that the depth is sufficient. We discuss a number of extensions and applications of these results. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkgfdeBYvH |
https://openreview.net/pdf?id=rkgfdeBYvH | |
PWC | https://paperswithcode.com/paper/how-the-choice-of-activation-affects-training |
Repo | |
Framework | |
Differentiation of Blackbox Combinatorial Solvers
Title | Differentiation of Blackbox Combinatorial Solvers |
Authors | Anonymous |
Abstract | Achieving fusion of deep learning with combinatorial algorithms promises transformative changes to artificial intelligence. One possible approach is to introduce combinatorial building blocks into neural networks. Such end-to-end architectures have the potential to tackle combinatorial problems on raw input data such as ensuring global consistency in multi-object tracking or route planning on maps in robotics. In this work, we present a method that implements an efficient backward pass through blackbox implementations of combinatorial solvers with linear objective functions. We provide both theoretical and experimental backing. In particular, we incorporate the Gurobi MIP solver, Blossom V algorithm, and Dijkstra’s algorithm into architectures that extract suitable features from raw inputs for the traveling salesman problem, the min-cost perfect matching problem and the shortest path problem. |
Tasks | Multi-Object Tracking, Object Tracking |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkevoJSYPB |
https://openreview.net/pdf?id=BkevoJSYPB | |
PWC | https://paperswithcode.com/paper/differentiation-of-blackbox-combinatorial |
Repo | |
Framework | |
A Group-Theoretic Framework for Knowledge Graph Embedding
Title | A Group-Theoretic Framework for Knowledge Graph Embedding |
Authors | Anonymous |
Abstract | We have rigorously proved the existence of a group algebraic structure hidden in relational knowledge embedding problems, which suggests that a group-based embedding framework is essential for model design. Our theoretical analysis explores merely the intrinsic property of the embedding problem itself without introducing extra designs. Using the proposed framework, one could construct embedding models that naturally accommodate all possible local graph patterns, which are necessary for reproducing a complete graph from atomic knowledge triplets. We reconstruct many state-of-the-art models from the framework and re-interpret them as embeddings with different groups. Moreover, we also propose new instantiation models using simple continuous non-abelian groups. |
Tasks | Graph Embedding, Knowledge Graph Embedding |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1e30AEKPr |
https://openreview.net/pdf?id=r1e30AEKPr | |
PWC | https://paperswithcode.com/paper/a-group-theoretic-framework-for-knowledge |
Repo | |
Framework | |
Global graph curvature
Title | Global graph curvature |
Authors | Anonymous |
Abstract | Recently, non-Euclidean spaces became popular for embedding structured data. However, determining suitable geometry and, in particular, curvature for a given dataset is still an open problem. In this paper, we define a notion of global graph curvature, specifically catered to the problem of embedding graphs, and analyze the problem of estimating this curvature using only graph-based characteristics (without actual graph embedding). We show that optimal curvature essentially depends on dimensionality of the embedding space and loss function one aims to minimize via embedding. We review the existing notions of local curvature (e.g., Ollivier-Ricci curvature) and analyze their properties theoretically and empirically. In particular, we show that such curvatures are often unable to properly estimate the global one. Hence, we propose a new estimator of global graph curvature specifically designed for zero-one loss function. |
Tasks | Graph Embedding |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ByeDl1BYvH |
https://openreview.net/pdf?id=ByeDl1BYvH | |
PWC | https://paperswithcode.com/paper/global-graph-curvature |
Repo | |
Framework | |
Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform
Title | Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform |
Authors | Anonymous |
Abstract | Strictly enforcing orthonormality constraints on parameter matrices has been shown advantageous in deep learning. This amounts to Riemannian optimization on the Stiefel manifold, which, however, is computationally expensive. To address this challenge, we present two main contributions: (1) A new efficient retraction map based on an iterative Cayley transform for optimization updates, and (2) An implicit vector transport mechanism based on the combination of a projection of the momentum and the Cayley transform on the Stiefel manifold. We specify two new optimization algorithms: Cayley SGD with momentum, and Cayley ADAM on the Stiefel manifold. Convergence of the Cayley SGD is theoretically analyzed. Our experiments for CNN training demonstrate that both algorithms: (a) Use less running time per iteration relative to existing approaches which also enforce orthonormality of CNN parameters; and (b) Achieve faster convergence rates than the baseline SGD and ADAM algorithms without compromising the CNN’s performance. The Cayley SGD and Cayley ADAM are also shown to reduce the training time for optimizing the unitary transition matrices in RNNs. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJxV-ANKDH |
https://openreview.net/pdf?id=HJxV-ANKDH | |
PWC | https://paperswithcode.com/paper/efficient-riemannian-optimization-on-the |
Repo | |
Framework | |
Towards Understanding the Regularization of Adversarial Robustness on Neural Networks
Title | Towards Understanding the Regularization of Adversarial Robustness on Neural Networks |
Authors | Anonymous |
Abstract | The problem of adversarial examples has shown that modern Neural Network (NN) models could be rather fragile. Among the most promising techniques to solve the problem, one is to require the model to be {\it $\epsilon$-adversarially robust} (AR); that is, to require the model not to change predicted labels when any given input examples are perturbed within a certain range. However, it is widely observed that such methods would lead to standard performance degradation, i.e., the degradation on natural examples. In this work, we study the degradation through the regularization perspective. We identify quantities from generalization analysis of NNs; with the identified quantities we empirically find that AR is achieved by regularizing/biasing NNs towards less confident solutions by making the changes in the feature space (induced by changes in the instance space) of most layers smoother uniformly in all directions; so to a certain extent, it prevents sudden change in prediction w.r.t. perturbations. However, the end result of such smoothing concentrates samples around decision boundaries, resulting in less confident solutions, and leads to worse standard performance. Our studies suggest that one might consider ways that build AR into NNs in a gentler way to avoid the problematic regularization. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJlkgaNKvr |
https://openreview.net/pdf?id=BJlkgaNKvr | |
PWC | https://paperswithcode.com/paper/towards-understanding-the-regularization-of |
Repo | |
Framework | |
Neural tangent kernels, transportation mappings, and universal approximation
Title | Neural tangent kernels, transportation mappings, and universal approximation |
Authors | Anonymous |
Abstract | This paper establishes rates of universal approximation for the neural tangent kernel (NTK) in the standard setting of microscopic changes to initial weights. Concretely, given a target function f, a target width m, and a target approximation error eps>0, then with high probability, moving the initial weight vectors a distance B/(eps * sqrt{m}) will give a linearized finite-width NTK which is (sqrt(eps) + B/sqrt(eps * m))^2-close to both the target function f, and also the shallow network which this NTK linearized. The constant B can be independent of eps — particular cases studied here include f having good Fourier transform or RKHS norm — though in the worse case it scales roughly as 1/eps^d for general continuous functions. The method of proof is to rewrite f with equality as an infinite-width linearized network whose weights are a transport mapping applied to random initialization, and to then sample from this transport mapping. This proof therefore provides another perspective on the scaling behavior of the NTK: redundancy in the weights due to resampling allows weights to be scaled down. Since the approximation rates match those in the literature for shallow networks, this work implies that universal approximation is not reliant upon any behavior outside the NTK regime. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HklQYxBKwS |
https://openreview.net/pdf?id=HklQYxBKwS | |
PWC | https://paperswithcode.com/paper/neural-tangent-kernels-transportation |
Repo | |
Framework | |