April 1, 2020

3009 words 15 mins read

Paper Group NANR 109

Paper Group NANR 109

BOOSTING ENCODER-DECODER CNN FOR INVERSE PROBLEMS. On Layer Normalization in the Transformer Architecture. Learning RNNs with Commutative State Transitions. Active Learning Graph Neural Networks via Node Feature Propagation. JAUNE: Justified And Unified Neural language Evaluation. Differential Privacy in Adversarial Learning with Provable Robustnes …

BOOSTING ENCODER-DECODER CNN FOR INVERSE PROBLEMS

Title BOOSTING ENCODER-DECODER CNN FOR INVERSE PROBLEMS
Authors Anonymous
Abstract Encoder-decoder convolutional neural networks (CNN) have been extensively used for various inverse problems. However, their prediction error for unseen test data is difficult to estimate a priori, since the neural networks are trained using only selected data and their architectures are largely considered blackboxes. This poses a fundamental challenge in improving the performance of neural networks. Recently, it was shown that Stein’s unbiased risk estimator (SURE) can be used as an unbiased estimator of the prediction error for denoising problems. However, the computation of the divergence term in SURE is difficult to implement in a neural network framework, and the condition to avoid trivial identity mapping is not well defined. In this paper, inspired by the finding that an encoder-decoder CNN can be expressed as a piecewise linear representation, we provide a close form expression of the unbiased estimator for the prediction error. The close form representation leads to a novel boosting scheme to prevent a neural network from converging to an identity mapping so that it can enhance the performance. Experimental results show that the proposed algorithm provides consistent improvement in various inverse problems.
Tasks Denoising
Published 2020-01-01
URL https://openreview.net/forum?id=BJevihVtwB
PDF https://openreview.net/pdf?id=BJevihVtwB
PWC https://paperswithcode.com/paper/boosting-encoder-decoder-cnn-for-inverse
Repo
Framework

On Layer Normalization in the Transformer Architecture

Title On Layer Normalization in the Transformer Architecture
Authors Anonymous
Abstract The Transformer architecture is popularly used in natural language processing tasks. To train a Transformer model, a carefully designed learning rate warm-up stage is usually needed: the learning rate has to be set to an extremely small value at the beginning of the optimization and then gradually increases in some given number of iterations. Such a stage is shown to be crucial to the final performance and brings more hyper-parameter tunings. In this paper, we study why the learning rate warm-up stage is important in training the Transformer and theoretically show that the location of layer normalization matters. It can be proved that at the beginning of the optimization, for the original Transformer, which places the layer normalization between the residual blocks, the expected gradients of the parameters near the output layer are large. Then using a large learning rate on those gradients makes the training unstable. The warm-up stage is practically helpful to avoid this problem. Such an analysis motivates us to investigate a slightly modified Transformer architecture which locates the layer normalization inside the residual blocks. We show that the gradients in this Transformer architecture are well-behaved at initialization. Given these findings, we are the first to show that this Transformer variant is easier and faster to train. The learning rate warm-up stage can be safely removed, and the training time can be largely reduced on a wide range of applications.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=B1x8anVFPr
PDF https://openreview.net/pdf?id=B1x8anVFPr
PWC https://paperswithcode.com/paper/on-layer-normalization-in-the-transformer
Repo
Framework

Learning RNNs with Commutative State Transitions

Title Learning RNNs with Commutative State Transitions
Authors Anonymous
Abstract Many machine learning tasks involve analysis of set valued inputs, and thus the learned functions are expected to be permutation invariant. Recent works (e.g., Deep Sets) have sought to characterize the neural architectures which result in permutation invariance. These typically correspond to applying the same pointwise function to all set components, followed by sum aggregation. Here we take a different approach to such architectures and focus on recursive architectures such as RNNs, which are not permutation invariant in general, but can implement permutation invariant functions in a very compact manner. We first show that commutativity and associativity of the state transition function result in permutation invariance. Next, we derive a regularizer that minimizes the degree of non-commutativity in the transitions. Finally, we demonstrate that the resulting method outperforms other methods for learning permutation invariant models, due to its use of recursive computation.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=Bklu2grKwB
PDF https://openreview.net/pdf?id=Bklu2grKwB
PWC https://paperswithcode.com/paper/learning-rnns-with-commutative-state
Repo
Framework

Active Learning Graph Neural Networks via Node Feature Propagation

Title Active Learning Graph Neural Networks via Node Feature Propagation
Authors Anonymous
Abstract Graph Neural Networks (GNNs) for prediction tasks like node classification or edge prediction have received increasing attention in recent machine learning from graphically structured data. However, a large quantity of labeled graphs is difficult to obtain, which significantly limit the true success of GNNs. Although active learning has been widely studied for addressing label-sparse issues with other data types like text, images, etc., how to make it effective over graphs is an open question for research. In this paper, we present the investigation on active learning with GNNs for node classification tasks. Specifically, we propose a new method, which uses node feature propagation followed by K-Medoids clustering of the nodes for instance selection in active learning. With a theoretical bound analysis we justify the design choice of our approach. In our experiments on four benchmark dataset, the proposed method outperforms other representative baseline methods consistently and significantly.
Tasks Active Learning, Node Classification
Published 2020-01-01
URL https://openreview.net/forum?id=HylwpREtDr
PDF https://openreview.net/pdf?id=HylwpREtDr
PWC https://paperswithcode.com/paper/active-learning-graph-neural-networks-via
Repo
Framework

JAUNE: Justified And Unified Neural language Evaluation

Title JAUNE: Justified And Unified Neural language Evaluation
Authors Anonymous
Abstract We review the limitations of BLEU and ROUGE – the most popular metrics used to assess reference summaries against hypothesis summaries, and introduce JAUNE: a set of criteria for what a good metric should behave like and propose concrete ways to use recent Transformers-based Language Models to assess reference summaries against hypothesis summaries.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=r1gx60NKPS
PDF https://openreview.net/pdf?id=r1gx60NKPS
PWC https://paperswithcode.com/paper/jaune-justified-and-unified-neural-language
Repo
Framework

Differential Privacy in Adversarial Learning with Provable Robustness

Title Differential Privacy in Adversarial Learning with Provable Robustness
Authors Anonymous
Abstract In this paper, we aim to develop a novel mechanism to preserve differential privacy (DP) in adversarial learning for deep neural networks, with provable robustness to adversarial examples. We leverage the sequential composition theory in DP, to establish a new connection between DP preservation and provable robustness. To address the trade-off among model utility, privacy loss, and robustness, we design an original, differentially private, adversarial objective function, based on the post-processing property in DP, to tighten the sensitivity of our model. An end-to-end theoretical analysis and thorough evaluations show that our mechanism notably improves the robustness of DP deep neural networks.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=Byg-An4tPr
PDF https://openreview.net/pdf?id=Byg-An4tPr
PWC https://paperswithcode.com/paper/differential-privacy-in-adversarial-learning
Repo
Framework

Benefit of Interpolation in Nearest Neighbor Algorithms

Title Benefit of Interpolation in Nearest Neighbor Algorithms
Authors Anonymous
Abstract The over-parameterized models attract much attention in the era of data science and deep learning. It is empirically observed that although these models, e.g. deep neural networks, over-fit the training data, they can still achieve small testing error, and sometimes even outperform traditional algorithms which are designed to avoid over-fitting. The major goal of this work is to sharply quantify the benefit of data interpolation in the context of nearest neighbors (NN) algorithm. Specifically, we consider a class of interpolated weighting schemes and then carefully characterize their asymptotic performances. Our analysis reveals a U-shaped performance curve with respect to the level of data interpolation, and proves that a mild degree of data interpolation strictly improves the prediction accuracy and statistical stability over those of the (un-interpolated) optimal $k$NN algorithm. This theoretically justifies (predicts) the existence of the second U-shaped curve in the recently discovered double descent phenomenon. Note that our goal in this study is not to promote the use of interpolated-NN method, but to obtain theoretical insights on data interpolation inspired by the aforementioned phenomenon.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=Ske5UANYDB
PDF https://openreview.net/pdf?id=Ske5UANYDB
PWC https://paperswithcode.com/paper/benefit-of-interpolation-in-nearest-neighbor-1
Repo
Framework

Likelihood Contribution based Multi-scale Architecture for Generative Flows

Title Likelihood Contribution based Multi-scale Architecture for Generative Flows
Authors Anonymous
Abstract Deep generative modeling using flows has gained popularity owing to the tractable exact log-likelihood estimation with efficient training and synthesis process. However, flow models suffer from the challenge of having high dimensional latent space, same in dimension as the input space. An effective solution to the above challenge as proposed by Dinh et al. (2016) is a multi-scale architecture, which is based on iterative early factorization of a part of the total dimensions at regular intervals. Prior works on generative flows involving a multi-scale architecture perform the dimension factorization based on a static masking. We propose a novel multi-scale architecture that performs data dependent factorization to decide which dimensions should pass through more flow layers. To facilitate the same, we introduce a heuristic based on the contribution of each dimension to the total log-likelihood which encodes the importance of the dimensions. Our proposed heuristic is readily obtained as part of the flow training process, enabling versatile implementation of our likelihood contribution based multi-scale architecture for generic flow models. We present such an implementation for the original flow introduced in Dinh et al. (2016), and demonstrate improvements in log-likelihood score and sampling quality on standard image benchmarks. We also conduct ablation studies to compare proposed method with other options for dimension factorization.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=H1eRI04KPB
PDF https://openreview.net/pdf?id=H1eRI04KPB
PWC https://paperswithcode.com/paper/likelihood-contribution-based-multi-scale
Repo
Framework

Acutum: When Generalization Meets Adaptability

Title Acutum: When Generalization Meets Adaptability
Authors Anonymous
Abstract In spite of the slow convergence, stochastic gradient descent (SGD) is still the most practical optimization method due to its outstanding generalization ability and simplicity. On the other hand, adaptive methods have attracted much more attention of optimization and machine learning communities, both for the leverage of life-long information and for the deep and fundamental mathematical theory. Taking the best of both worlds is the most exciting and challenging question in the field of optimization for machine learning. In this paper, we take a small step towards such ultimate goal. We revisit existing adaptive methods from a novel point of view, which reveals a fresh understanding of momentum. Our new intuition empowers us to remove the second moments in Adam without the loss of performance. Based on our view, we propose a new method, named acute adaptive momentum (Acutum). To the best of our knowledge, Acutum is the first adaptive gradient method without second moments. Experimentally, we demonstrate that our method has a faster convergence rate than Adam/Amsgrad, and generalizes as well as SGD with momentum. We also provide a convergence analysis of our proposed method to complement our intuition.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=S1xJ4JHFvS
PDF https://openreview.net/pdf?id=S1xJ4JHFvS
PWC https://paperswithcode.com/paper/acutum-when-generalization-meets-adaptability
Repo
Framework

Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning

Title Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning
Authors Anonymous
Abstract Semi-supervised learning, i.e. jointly learning from labeled an unlabeled samples, is an active research topic due to its key role on relaxing human annotation constraints. In the context of image classification, recent advances to learn from unlabeled samples are mainly focused on consistency regularization methods that encourage invariant predictions for different perturbations of unlabeled samples. We, conversely, propose to learn from unlabeled data by generating soft pseudo-labels using the network predictions. We show that a naive pseudo-labeling overfits to incorrect pseudo-labels due to the so-called confirmation bias and demonstrate that mixup augmentation and setting a minimum number of labeled samples per mini-batch are effective regularization techniques for reducing it. The proposed approach achieves state-of-the-art results in CIFAR-10/100 and Mini-ImageNet despite being much simpler than other state-of-the-art. These results demonstrate that pseudo-labeling can outperform consistency regularization methods, while the opposite was supposed in previous work. Code will be made available.
Tasks Image Classification
Published 2020-01-01
URL https://openreview.net/forum?id=rJel41BtDH
PDF https://openreview.net/pdf?id=rJel41BtDH
PWC https://paperswithcode.com/paper/pseudo-labeling-and-confirmation-bias-in-deep-1
Repo
Framework

Differentiable learning of numerical rules in knowledge graphs

Title Differentiable learning of numerical rules in knowledge graphs
Authors Anonymous
Abstract Rules over a knowledge graph (KG) capture interpretable patterns in data and can be used for KG cleaning and completion. Inspired by the TensorLog differentiable logic framework, which compiles rule inference into a sequence of differentiable operations, recently a method called Neural LP has been proposed for learning the parameters as well as the structure of rules. However, it is limited with respect to the treatment of numerical features like age, weight or scientific measurements. We address this limitation by extending Neural LP to learn rules with numerical values, e.g., “People younger than 18 typically live with their parents”. We demonstrate how dynamic programming and cumulative sum operations can be exploited to ensure efficiency of such extension. Our novel approach allows us to extract more expressive rules with aggregates, which are of higher quality and yield more accurate predictions compared to rules learned by the state-of-the-art methods, as shown by our experiments on synthetic and real-world datasets.
Tasks Knowledge Graphs
Published 2020-01-01
URL https://openreview.net/forum?id=rJleKgrKwS
PDF https://openreview.net/pdf?id=rJleKgrKwS
PWC https://paperswithcode.com/paper/differentiable-learning-of-numerical-rules-in
Repo
Framework

Weight-space symmetry in neural network loss landscapes revisited

Title Weight-space symmetry in neural network loss landscapes revisited
Authors Anonymous
Abstract Neural network training depends on the structure of the underlying loss landscape, i.e. local minima, saddle points, flat plateaus, and loss barriers. In relation to the structure of the landscape, we study the permutation symmetry of neurons in each layer of a deep neural network, which gives rise not only to multiple equivalent global minima of the loss function but also to critical points in between partner minima. In a network of $d-1$ hidden layers with $n_k$ neurons in layers $k = 1, \ldots, d$, we construct continuous paths between equivalent global minima that lead through a `permutation point’ where the input and output weight vectors of two neurons in the same hidden layer $k$ collide and interchange. We show that such permutation points are critical points which lie inside high-dimensional subspaces of equal loss, contributing to the global flatness of the landscape. We also find that a permutation point for the exchange of neurons $i$ and $j$ transits into a flat high-dimensional plateau that enables all $n_k!$ permutations of neurons in a given layer $k$ at the same loss value. Moreover, we introduce higher-order permutation points by exploiting the hierarchical structure in the loss landscapes of neural networks, and find that the number of $K$-th order permutation points is much larger than the (already huge) number of equivalent global minima – at least by a polynomial factor of order $K$. In two tasks, we demonstrate numerically with our path finding method that continuous paths between partner minima exist: first, in a toy network with a single hidden layer on a function approximation task and, second, in a multilayer network on the MNIST task. Our geometric approach yields a lower bound on the number of critical points generated by weight-space symmetries and provides a simple intuitive link between previous theoretical results and numerical observations. |
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rkxmPgrKwB
PDF https://openreview.net/pdf?id=rkxmPgrKwB
PWC https://paperswithcode.com/paper/weight-space-symmetry-in-neural-network-loss
Repo
Framework

Global Concavity and Optimization in a Class of Dynamic Discrete Choice Models

Title Global Concavity and Optimization in a Class of Dynamic Discrete Choice Models
Authors Yiding Feng, Ekaterina Khmelnitskaya, Denis Nekipelov
Abstract Discrete choice models with unobserved heterogeneity are commonly used Econometric models for dynamic Economic behavior which have been adopted in practice to predict behavior of individuals and firms from schooling and job choices to strategic decisions in market competition. These models feature optimizing agents who choose among a finite set of options in a sequence of periods and receive choice-specific payoffs that depend on both variables that are observed by the agent and recorded in the data and variables that are only observed by the agent but not recorded in the data. Existing work in Econometrics assumes that optimizing agents are fully rational and requires finding a functional fixed point to find the optimal policy. We show that in an important class of discrete choice models the value function is globally concave in the policy. That means that simple algorithms that do not require fixed point computation, such as the policy gradient algorithm, globally converge to the optimal policy. This finding can both be used to relax behavioral assumption regarding the optimizing agents and to facilitate Econometric analysis of dynamic behavior. In particular, we demonstrate significant computational advantages in using a simple implementation policy gradient algorithm over existing “nested fixed point” algorithms used in Econometrics.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=H1efEp4Yvr
PDF https://openreview.net/pdf?id=H1efEp4Yvr
PWC https://paperswithcode.com/paper/global-concavity-and-optimization-in-a-class
Repo
Framework
Title REFINING MONTE CARLO TREE SEARCH AGENTS BY MONTE CARLO TREE SEARCH
Authors Anonymous
Abstract Reinforcement learning methods that continuously learn neural networks by episode generation with game tree search have been successful in two-person complete information deterministic games such as chess, shogi, and Go. However, there are only reports of practical cases and there are little evidence to guarantee the stability and the final performance of learning process. In this research, the coordination of episode generation was focused on. By means of regarding the entire system as game tree search, the new method can handle the trade-off between exploitation and exploration during episode generation. The experiments with a small problem showed that it had robust performance compared to the existing method, Alpha Zero.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HkgU3xBtDS
PDF https://openreview.net/pdf?id=HkgU3xBtDS
PWC https://paperswithcode.com/paper/refining-monte-carlo-tree-search-agents-by
Repo
Framework

Training Provably Robust Models by Polyhedral Envelope Regularization

Title Training Provably Robust Models by Polyhedral Envelope Regularization
Authors Anonymous
Abstract Training certifiable neural networks enables one to obtain models with robustness guarantees against adversarial attacks. In this work, we use a linear approximation to bound model’s output given an input adversarial budget. This allows us to bound the adversary-free region in the data neighborhood by a polyhedral envelope and yields finer-grained certified robustness than existing methods. We further exploit this certifier to introduce a framework called polyhedral envelope regular- ization (PER), which encourages larger polyhedral envelopes and thus improves the provable robustness of the models. We demonstrate the flexibility and effectiveness of our framework on standard benchmarks; it applies to networks with general activation functions and obtains comparable or better robustness guarantees than state-of-the-art methods, with very little cost in clean accuracy, i.e., without over-regularizing the model.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=Bkg75aVKDH
PDF https://openreview.net/pdf?id=Bkg75aVKDH
PWC https://paperswithcode.com/paper/training-provably-robust-models-by-polyhedral
Repo
Framework
comments powered by Disqus