April 1, 2020

3101 words 15 mins read

Paper Group NANR 102

Multitask Soft Option Learning. FACE SUPER-RESOLUTION GUIDED BY 3D FACIAL PRIORS. Analysis and Interpretation of Deep CNN Representations as Perceptual Quality Features. Manifold Modeling in Embedded Space: A Perspective for Interpreting “Deep Image Prior”. Intriguing Properties of Adversarial Training at Scale. Tranquil Clouds: Neural Networks for …

Multitask Soft Option Learning


Title	Multitask Soft Option Learning
Authors	Anonymous
Abstract	We present Multitask Soft Option Learning (MSOL), a hierarchical multi-task framework based on Planning-as-Inference. MSOL extends the concept of Options, using separate variational posteriors for each task, regularized by a shared prior. The learned soft-options are temporally extended, allowing a higher-level master policy to train faster on new tasks by making decisions with lower frequency. Additionally, MSOL allows fine-tuning of soft-options for new tasks without unlearning previously useful behavior, and avoids problems with local minima in multitask training. We demonstrate empirically that MSOL significantly outperforms both hierarchical and flat transfer-learning baselines in challenging multi-task environments.
Tasks	Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=BkeDGJBKvB
PDF	https://openreview.net/pdf?id=BkeDGJBKvB
PWC	https://paperswithcode.com/paper/multitask-soft-option-learning-1
Repo
Framework

FACE SUPER-RESOLUTION GUIDED BY 3D FACIAL PRIORS


Title	FACE SUPER-RESOLUTION GUIDED BY 3D FACIAL PRIORS
Authors	Anonymous
Abstract	State-of-the-art face super-resolution methods employ deep convolutional neural networks to learn a mapping between low- and high-resolution facial patterns by exploring local appearance knowledge. However, most of these methods do not well exploit facial structures and identity information, and struggle to deal with facial images that exhibit large pose variation and misalignment. In this paper, we propose a novel face super-resolution method that explicitly incorporates 3D facial priors which grasp the sharp facial structures. Firstly, the 3D face rendering branch is set up to obtain 3D priors of salient facial structures and identity knowledge. Secondly, the Spatial Attention Mechanism is used to better exploit this hierarchical information (i.e. intensity similarity, 3D facial structure, identity content) for the super-resolution problem. Extensive experiments demonstrate that the proposed algorithm achieves superior face super-resolution results and outperforms the state-of-the-art.
Tasks	Super-Resolution
Published	2020-01-01
URL	https://openreview.net/forum?id=HJeOHJHFPH
PDF	https://openreview.net/pdf?id=HJeOHJHFPH
PWC	https://paperswithcode.com/paper/face-super-resolution-guided-by-3d-facial
Repo
Framework

Analysis and Interpretation of Deep CNN Representations as Perceptual Quality Features


Title	Analysis and Interpretation of Deep CNN Representations as Perceptual Quality Features
Authors	Anonymous
Abstract	Pre-trained Deep Convolutional Neural Network (CNN) features have popularly been used as full-reference perceptual quality features for CNN based image quality assessment, super-resolution, image restoration and a variety of image-to-image translation problems. In this paper, to get more insight, we link basic human visual perception to characteristics of learned deep CNN representations as a novel and first attempt to interpret them. We characterize the frequency and orientation tuning of channels in trained object detection deep CNNs (e.g., VGG-16) by applying grating stimuli of different spatial frequencies and orientations as input. We observe that the behavior of CNN channels as spatial frequency and orientation selective filters can be used to link basic human visual perception models to their characteristics. Doing so, we develop a theory to get more insight into deep CNN representations as perceptual quality features. We conclude that sensitivity to spatial frequencies that have lower contrast masking thresholds in human visual perception and a definite and strong orientation selectivity are important attributes of deep CNN channels that deliver better perceptual quality features.
Tasks	Image Quality Assessment, Image Restoration, Image-to-Image Translation, Object Detection, Super-Resolution
Published	2020-01-01
URL	https://openreview.net/forum?id=BJlLvnEtDB
PDF	https://openreview.net/pdf?id=BJlLvnEtDB
PWC	https://paperswithcode.com/paper/analysis-and-interpretation-of-deep-cnn
Repo
Framework

Manifold Modeling in Embedded Space: A Perspective for Interpreting “Deep Image Prior”


Title	Manifold Modeling in Embedded Space: A Perspective for Interpreting “Deep Image Prior”
Authors	Anonymous
Abstract	Deep image prior (DIP), which utilizes a deep convolutional network (ConvNet) structure itself as an image prior, has attracted huge attentions in computer vision community. It empirically shows the effectiveness of ConvNet structure for various image restoration applications. However, why the DIP works so well is still unknown, and why convolution operation is essential for image reconstruction or enhancement is not very clear. In this study, we tackle these questions. The proposed approach is dividing the convolution into `delay-embedding'' and` transformation (\ie encoder-decoder)'', and proposing a simple, but essential, image/tensor modeling method which is closely related to dynamical systems and self-similarity. The proposed method named as manifold modeling in embedded space (MMES) is implemented by using a novel denoising-auto-encoder in combination with multi-way delay-embedding transform. In spite of its simplicity, the image/tensor completion and super-resolution results of MMES are quite similar even competitive to DIP in our extensive experiments, and these results would help us for reinterpreting/characterizing the DIP from a perspective of ``low-dimensional patch-manifold prior’'. \|
Tasks	Denoising, Image Reconstruction, Image Restoration, Super-Resolution
Published	2020-01-01
URL	https://openreview.net/forum?id=SJgBra4YDS
PDF	https://openreview.net/pdf?id=SJgBra4YDS
PWC	https://paperswithcode.com/paper/manifold-modeling-in-embedded-space-a-1
Repo
Framework

Intriguing Properties of Adversarial Training at Scale


Title	Intriguing Properties of Adversarial Training at Scale
Authors	Anonymous
Abstract	Adversarial training is one of the main defenses against adversarial attacks. In this paper, we provide the first rigorous study on diagnosing elements of large-scale adversarial training on ImageNet, which reveals two intriguing properties. First, we study the role of normalization. Batch normalization (BN) is a crucial element for achieving state-of-the-art performance on many vision tasks, but we show it may prevent networks from obtaining strong robustness in adversarial training. One unexpected observation is that, for models trained with BN, simply removing clean images from training data largely boosts adversarial robustness, i.e., 18.3%. We relate this phenomenon to the hypothesis that clean images and adversarial images are drawn from two different domains. This two-domain hypothesis may explain the issue of BN when training with a mixture of clean and adversarial images, as estimating normalization statistics of this mixture distribution is challenging. Guided by this two-domain hypothesis, we show disentangling the mixture distribution for normalization, i.e., applying separate BNs to clean and adversarial images for statistics estimation, achieves much stronger robustness. Additionally, we find that enforcing BNs to behave consistently at training and testing can further enhance robustness. Second, we study the role of network capacity. We find our so-called “deep” networks are still shallow for the task of adversarial learning. Unlike traditional classification tasks where accuracy is only marginally improved by adding more layers to “deep” networks (e.g., ResNet-152), adversarial training exhibits a much stronger demand on deeper networks to achieve higher adversarial robustness. This robustness improvement can be observed substantially and consistently even by pushing the network capacity to an unprecedented scale, i.e., ResNet-638.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HyxJhCEFDS
PDF	https://openreview.net/pdf?id=HyxJhCEFDS
PWC	https://paperswithcode.com/paper/intriguing-properties-of-adversarial-training-1
Repo
Framework

Tranquil Clouds: Neural Networks for Learning Temporally Coherent Features in Point Clouds


Title	Tranquil Clouds: Neural Networks for Learning Temporally Coherent Features in Point Clouds
Authors	Anonymous
Abstract	Point clouds, as a form of Lagrangian representation, allow for powerful and flexible applications in a large number of computational disciplines. We propose a novel deep-learning method to learn stable and temporally coherent feature spaces for points clouds that change over time. We identify a set of inherent problems with these approaches: without knowledge of the time dimension, the inferred solutions can exhibit strong flickering, and easy solutions to suppress this flickering can result in undesirable local minima that manifest themselves as halo structures. We propose a novel temporal loss function that takes into account higher time derivatives of the point positions, and encourages mingling, i.e., to prevent the aforementioned halos. We combine these techniques in a super-resolution method with a truncation approach to flexibly adapt the size of the generated positions. We show that our method works for large, deforming point sets from different sources to demonstrate the flexibility of our approach.
Tasks	Super-Resolution
Published	2020-01-01
URL	https://openreview.net/forum?id=BJeKh3VYDH
PDF	https://openreview.net/pdf?id=BJeKh3VYDH
PWC	https://paperswithcode.com/paper/tranquil-clouds-neural-networks-for-learning-1
Repo
Framework

Neural Embeddings for Nearest Neighbor Search Under Edit Distance


Title	Neural Embeddings for Nearest Neighbor Search Under Edit Distance
Authors	Anonymous
Abstract	The edit distance between two sequences is an important metric with many applications. The drawback, however, is the high computational cost of many basic problems involving this notion, such as the nearest neighbor search. A natural approach to overcoming this issue is to embed the sequences into a vector space such that the geometric distance in the target space approximates the edit distance in the original space. However, the known edit distance embedding algorithms, such as Chakraborty et al.(2016), construct embeddings that are data-independent, i.e., do not exploit any structure of embedded sets of strings. In this paper we propose an alternative approach, which learns the embedding function according to the data distribution. Our experiments show that the new algorithm has much better empirical performance than prior data-independent methods.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HJlWIANtPH
PDF	https://openreview.net/pdf?id=HJlWIANtPH
PWC	https://paperswithcode.com/paper/neural-embeddings-for-nearest-neighbor-search
Repo
Framework

On importance-weighted autoencoders


Title	On importance-weighted autoencoders
Authors	Anonymous
Abstract	The importance weighted autoencoder (IWAE) (Burda et al., 2016) is a popular variational-inference method which achieves a tighter evidence bound (and hence a lower bias) than standard variational autoencoders by optimising a multi-sample objective, i.e. an objective that is expressible as an integral over $K > 1$ Monte Carlo samples. Unfortunately, IWAE crucially relies on the availability of reparametrisations and even if these exist, the multi-sample objective leads to inference-network gradients which break down as $K$ is increased (Rainforth et al., 2018). This breakdown can only be circumvented by removing high-variance score-function terms, either by heuristically ignoring them (which yields the ‘sticking-the-landing’ IWAE (IWAE-STL) gradient from Roeder et al. (2017)) or through an identity from Tucker et al. (2019) (which yields the ‘doubly-reparametrised’ IWAE (IWAE-DREG) gradient). In this work, we argue that directly optimising the proposal distribution in importance sampling as in the reweighted wake-sleep (RWS) algorithm from Bornschein & Bengio (2015) is preferable to optimising IWAE-type multi-sample objectives. To formalise this argument, we introduce an adaptive-importance sampling framework termed adaptive importance sampling for learning (AISLE) which slightly generalises the RWS algorithm. We then show that AISLE admits IWAE-STL and IWAE-DREG (i.e. the IWAE-gradients which avoid breakdown) as special cases.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=ryg7jhEtPB
PDF	https://openreview.net/pdf?id=ryg7jhEtPB
PWC	https://paperswithcode.com/paper/on-importance-weighted-autoencoders
Repo
Framework

HOW THE CHOICE OF ACTIVATION AFFECTS TRAINING OF OVERPARAMETRIZED NEURAL NETS


Title	HOW THE CHOICE OF ACTIVATION AFFECTS TRAINING OF OVERPARAMETRIZED NEURAL NETS
Authors	Anonymous
Abstract	It is well-known that overparametrized neural networks trained using gradient based methods quickly achieve small training error with appropriate hyperparameter settings. Recent papers have proved this statement theoretically for highly overparametrized networks under reasonable assumptions. These results either assume that the activation function is ReLU or they depend on the minimum eigenvalue of a certain Gram matrix. In the latter case, existing works only prove that this minimum eigenvalue is non-zero and do not provide quantitative bounds which require that this eigenvalue be large. Empirically, a number of alternative activation functions have been proposed which tend to perform better than ReLU at least in some settings but no clear understanding has emerged. This state of affairs underscores the importance of theoretically understanding the impact of activation functions on training. In the present paper, we provide theoretical results about the effect of activation function on the training of highly overparametrized 2-layer neural networks. A crucial property that governs the performance of an activation is whether or not it is smooth: • For non-smooth activations such as ReLU, SELU, ELU, which are not smooth because there is a point where either the ﬁrst order or second order derivative is discontinuous, all eigenvalues of the associated Gram matrix are large under minimal assumptions on the data. • For smooth activations such as tanh, swish, polynomial, which have derivatives of all orders at all points, the situation is more complex: if the subspace spanned by the data has small dimension then the minimum eigenvalue of the Gram matrix can be small leading to slow training. But if the dimension is large and the data satisﬁes another mild condition, then the eigenvalues are large. If we allow deep networks, then the small data dimension is not a limitation provided that the depth is sufﬁcient. We discuss a number of extensions and applications of these results.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rkgfdeBYvH
PDF	https://openreview.net/pdf?id=rkgfdeBYvH
PWC	https://paperswithcode.com/paper/how-the-choice-of-activation-affects-training
Repo
Framework

Differentiation of Blackbox Combinatorial Solvers


Title	Differentiation of Blackbox Combinatorial Solvers
Authors	Anonymous
Abstract	Achieving fusion of deep learning with combinatorial algorithms promises transformative changes to artificial intelligence. One possible approach is to introduce combinatorial building blocks into neural networks. Such end-to-end architectures have the potential to tackle combinatorial problems on raw input data such as ensuring global consistency in multi-object tracking or route planning on maps in robotics. In this work, we present a method that implements an efficient backward pass through blackbox implementations of combinatorial solvers with linear objective functions. We provide both theoretical and experimental backing. In particular, we incorporate the Gurobi MIP solver, Blossom V algorithm, and Dijkstra’s algorithm into architectures that extract suitable features from raw inputs for the traveling salesman problem, the min-cost perfect matching problem and the shortest path problem.
Tasks	Multi-Object Tracking, Object Tracking
Published	2020-01-01
URL	https://openreview.net/forum?id=BkevoJSYPB
PDF	https://openreview.net/pdf?id=BkevoJSYPB
PWC	https://paperswithcode.com/paper/differentiation-of-blackbox-combinatorial
Repo
Framework

A Group-Theoretic Framework for Knowledge Graph Embedding


Title	A Group-Theoretic Framework for Knowledge Graph Embedding
Authors	Anonymous
Abstract	We have rigorously proved the existence of a group algebraic structure hidden in relational knowledge embedding problems, which suggests that a group-based embedding framework is essential for model design. Our theoretical analysis explores merely the intrinsic property of the embedding problem itself without introducing extra designs. Using the proposed framework, one could construct embedding models that naturally accommodate all possible local graph patterns, which are necessary for reproducing a complete graph from atomic knowledge triplets. We reconstruct many state-of-the-art models from the framework and re-interpret them as embeddings with different groups. Moreover, we also propose new instantiation models using simple continuous non-abelian groups.
Tasks	Graph Embedding, Knowledge Graph Embedding
Published	2020-01-01
URL	https://openreview.net/forum?id=r1e30AEKPr
PDF	https://openreview.net/pdf?id=r1e30AEKPr
PWC	https://paperswithcode.com/paper/a-group-theoretic-framework-for-knowledge
Repo
Framework

Global graph curvature


Title	Global graph curvature
Authors	Anonymous
Abstract	Recently, non-Euclidean spaces became popular for embedding structured data. However, determining suitable geometry and, in particular, curvature for a given dataset is still an open problem. In this paper, we define a notion of global graph curvature, specifically catered to the problem of embedding graphs, and analyze the problem of estimating this curvature using only graph-based characteristics (without actual graph embedding). We show that optimal curvature essentially depends on dimensionality of the embedding space and loss function one aims to minimize via embedding. We review the existing notions of local curvature (e.g., Ollivier-Ricci curvature) and analyze their properties theoretically and empirically. In particular, we show that such curvatures are often unable to properly estimate the global one. Hence, we propose a new estimator of global graph curvature specifically designed for zero-one loss function.
Tasks	Graph Embedding
Published	2020-01-01
URL	https://openreview.net/forum?id=ByeDl1BYvH
PDF	https://openreview.net/pdf?id=ByeDl1BYvH
PWC	https://paperswithcode.com/paper/global-graph-curvature
Repo
Framework

Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform


Title	Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform
Authors	Anonymous
Abstract	Strictly enforcing orthonormality constraints on parameter matrices has been shown advantageous in deep learning. This amounts to Riemannian optimization on the Stiefel manifold, which, however, is computationally expensive. To address this challenge, we present two main contributions: (1) A new efficient retraction map based on an iterative Cayley transform for optimization updates, and (2) An implicit vector transport mechanism based on the combination of a projection of the momentum and the Cayley transform on the Stiefel manifold. We specify two new optimization algorithms: Cayley SGD with momentum, and Cayley ADAM on the Stiefel manifold. Convergence of the Cayley SGD is theoretically analyzed. Our experiments for CNN training demonstrate that both algorithms: (a) Use less running time per iteration relative to existing approaches which also enforce orthonormality of CNN parameters; and (b) Achieve faster convergence rates than the baseline SGD and ADAM algorithms without compromising the CNN’s performance. The Cayley SGD and Cayley ADAM are also shown to reduce the training time for optimizing the unitary transition matrices in RNNs.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HJxV-ANKDH
PDF	https://openreview.net/pdf?id=HJxV-ANKDH
PWC	https://paperswithcode.com/paper/efficient-riemannian-optimization-on-the
Repo
Framework

Towards Understanding the Regularization of Adversarial Robustness on Neural Networks


Title	Towards Understanding the Regularization of Adversarial Robustness on Neural Networks
Authors	Anonymous
Abstract	The problem of adversarial examples has shown that modern Neural Network (NN) models could be rather fragile. Among the most promising techniques to solve the problem, one is to require the model to be {\it $\epsilon$-adversarially robust} (AR); that is, to require the model not to change predicted labels when any given input examples are perturbed within a certain range. However, it is widely observed that such methods would lead to standard performance degradation, i.e., the degradation on natural examples. In this work, we study the degradation through the regularization perspective. We identify quantities from generalization analysis of NNs; with the identified quantities we empirically find that AR is achieved by regularizing/biasing NNs towards less confident solutions by making the changes in the feature space (induced by changes in the instance space) of most layers smoother uniformly in all directions; so to a certain extent, it prevents sudden change in prediction w.r.t. perturbations. However, the end result of such smoothing concentrates samples around decision boundaries, resulting in less confident solutions, and leads to worse standard performance. Our studies suggest that one might consider ways that build AR into NNs in a gentler way to avoid the problematic regularization.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJlkgaNKvr
PDF	https://openreview.net/pdf?id=BJlkgaNKvr
PWC	https://paperswithcode.com/paper/towards-understanding-the-regularization-of
Repo
Framework

Neural tangent kernels, transportation mappings, and universal approximation


Title	Neural tangent kernels, transportation mappings, and universal approximation
Authors	Anonymous
Abstract	This paper establishes rates of universal approximation for the neural tangent kernel (NTK) in the standard setting of microscopic changes to initial weights. Concretely, given a target function f, a target width m, and a target approximation error eps>0, then with high probability, moving the initial weight vectors a distance B/(eps * sqrt{m}) will give a linearized finite-width NTK which is (sqrt(eps) + B/sqrt(eps * m))^2-close to both the target function f, and also the shallow network which this NTK linearized. The constant B can be independent of eps — particular cases studied here include f having good Fourier transform or RKHS norm — though in the worse case it scales roughly as 1/eps^d for general continuous functions. The method of proof is to rewrite f with equality as an infinite-width linearized network whose weights are a transport mapping applied to random initialization, and to then sample from this transport mapping. This proof therefore provides another perspective on the scaling behavior of the NTK: redundancy in the weights due to resampling allows weights to be scaled down. Since the approximation rates match those in the literature for shallow networks, this work implies that universal approximation is not reliant upon any behavior outside the NTK regime.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HklQYxBKwS
PDF	https://openreview.net/pdf?id=HklQYxBKwS
PWC	https://paperswithcode.com/paper/neural-tangent-kernels-transportation
Repo
Framework