July 29, 2019

3005 words 15 mins read

Paper Group AWR 120

Paper Group AWR 120

Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. Stochastic Gradient Descent as Approximate Bayesian Inference. Bayesian Optimization with Gradients. Riemannian approach to batch normalization. Match-Tensor: a Deep Relevance Model for Search. DeepIEP: a Peptide Sequence Model of Isoelectric Point (IEP/pI) using …

Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning

Title Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning
Authors Sébastien Forestier, Yoan Mollard, Pierre-Yves Oudeyer
Abstract Intrinsically motivated spontaneous exploration is a key enabler of autonomous lifelong learning in human children. It allows them to discover and acquire large repertoires of skills through self-generation, self-selection, self-ordering and self-experimentation of learning goals. We present the unsupervised multi-goal reinforcement learning formal framework as well as an algorithmic approach called intrinsically motivated goal exploration processes (IMGEP) to enable similar properties of autonomous learning in machines. The IMGEP algorithmic architecture relies on several principles: 1) self-generation of goals as parameterized reinforcement learning problems; 2) selection of goals based on intrinsic rewards; 3) exploration with parameterized time-bounded policies and fast incremental goal-parameterized policy search; 4) systematic reuse of information acquired when targeting a goal for improving other goals. We present a particularly efficient form of IMGEP that uses a modular representation of goal spaces as well as intrinsic rewards based on learning progress. We show how IMGEPs automatically generate a learning curriculum within an experimental setup where a real humanoid robot can explore multiple spaces of goals with several hundred continuous dimensions. While no particular target goal is provided to the system beforehand, this curriculum allows the discovery of skills of increasing complexity, that act as stepping stone for learning more complex skills (like nested tool use). We show that learning several spaces of diverse problems can be more efficient for learning complex skills than only trying to directly learn these complex skills. We illustrate the computational efficiency of IMGEPs as these robotic experiments use a simple memory-based low-level policy representations and search algorithm, enabling the whole system to learn online and incrementally on a Raspberry Pi 3.
Tasks Multi-Goal Reinforcement Learning
Published 2017-08-07
URL http://arxiv.org/abs/1708.02190v1
PDF http://arxiv.org/pdf/1708.02190v1.pdf
PWC https://paperswithcode.com/paper/intrinsically-motivated-goal-exploration
Repo https://github.com/flowersteam/geppg
Framework none

Stochastic Gradient Descent as Approximate Bayesian Inference

Title Stochastic Gradient Descent as Approximate Bayesian Inference
Authors Stephan Mandt, Matthew D. Hoffman, David M. Blei
Abstract Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a Markov chain with a stationary distribution. With this perspective, we derive several new results. (1) We show that constant SGD can be used as an approximate Bayesian posterior inference algorithm. Specifically, we show how to adjust the tuning parameters of constant SGD to best match the stationary distribution to a posterior, minimizing the Kullback-Leibler divergence between these two distributions. (2) We demonstrate that constant SGD gives rise to a new variational EM algorithm that optimizes hyperparameters in complex probabilistic models. (3) We also propose SGD with momentum for sampling and show how to adjust the damping coefficient accordingly. (4) We analyze MCMC algorithms. For Langevin Dynamics and Stochastic Gradient Fisher Scoring, we quantify the approximation errors due to finite learning rates. Finally (5), we use the stochastic process perspective to give a short proof of why Polyak averaging is optimal. Based on this idea, we propose a scalable approximate MCMC algorithm, the Averaged Stochastic Gradient Sampler.
Tasks Bayesian Inference
Published 2017-04-13
URL http://arxiv.org/abs/1704.04289v2
PDF http://arxiv.org/pdf/1704.04289v2.pdf
PWC https://paperswithcode.com/paper/stochastic-gradient-descent-as-approximate
Repo https://github.com/taohu88/BayesianML
Framework none

Bayesian Optimization with Gradients

Title Bayesian Optimization with Gradients
Authors Jian Wu, Matthias Poloczek, Andrew Gordon Wilson, Peter I. Frazier
Abstract Bayesian optimization has been successful at global optimization of expensive-to-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to decrease the number of objective function evaluations required for good performance. In particular, we develop a novel Bayesian optimization algorithm, the derivative-enabled knowledge-gradient (dKG), for which we show one-step Bayes-optimality, asymptotic consistency, and greater one-step value of information than is possible in the derivative-free setting. Our procedure accommodates noisy and incomplete derivative information, comes in both sequential and batch forms, and can optionally reduce the computational cost of inference through automatically selected retention of a single directional derivative. We also compute the d-KG acquisition function and its gradient using a novel fast discretization-free technique. We show d-KG provides state-of-the-art performance compared to a wide range of optimization procedures with and without gradients, on benchmarks including logistic regression, deep learning, kernel learning, and k-nearest neighbors.
Tasks
Published 2017-03-13
URL http://arxiv.org/abs/1703.04389v3
PDF http://arxiv.org/pdf/1703.04389v3.pdf
PWC https://paperswithcode.com/paper/bayesian-optimization-with-gradients
Repo https://github.com/wujian16/Cornell-MOE
Framework none

Riemannian approach to batch normalization

Title Riemannian approach to batch normalization
Authors Minhyung Cho, Jaehyung Lee
Abstract Batch Normalization (BN) has proven to be an effective algorithm for deep neural network training by normalizing the input to each neuron and reducing the internal covariate shift. The space of weight vectors in the BN layer can be naturally interpreted as a Riemannian manifold, which is invariant to linear scaling of weights. Following the intrinsic geometry of this manifold provides a new learning rule that is more efficient and easier to analyze. We also propose intuitive and effective gradient clipping and regularization methods for the proposed algorithm by utilizing the geometry of the manifold. The resulting algorithm consistently outperforms the original BN on various types of network architectures and datasets.
Tasks
Published 2017-09-27
URL http://arxiv.org/abs/1709.09603v3
PDF http://arxiv.org/pdf/1709.09603v3.pdf
PWC https://paperswithcode.com/paper/riemannian-approach-to-batch-normalization
Repo https://github.com/MinhyungCho/riemannian-batch-normalization
Framework tf
Title Match-Tensor: a Deep Relevance Model for Search
Authors Aaron Jaech, Hetunandan Kamisetty, Eric Ringger, Charlie Clarke
Abstract The application of Deep Neural Networks for ranking in search engines may obviate the need for the extensive feature engineering common to current learning-to-rank methods. However, we show that combining simple relevance matching features like BM25 with existing Deep Neural Net models often substantially improves the accuracy of these models, indicating that they do not capture essential local relevance matching signals. We describe a novel deep Recurrent Neural Net-based model that we call Match-Tensor. The architecture of the Match-Tensor model simultaneously accounts for both local relevance matching and global topicality signals allowing for a rich interplay between them when computing the relevance of a document to a query. On a large held-out test set consisting of social media documents, we demonstrate not only that Match-Tensor outperforms BM25 and other classes of DNNs but also that it largely subsumes signals present in these models.
Tasks Feature Engineering, Learning-To-Rank
Published 2017-01-26
URL http://arxiv.org/abs/1701.07795v1
PDF http://arxiv.org/pdf/1701.07795v1.pdf
PWC https://paperswithcode.com/paper/match-tensor-a-deep-relevance-model-for
Repo https://github.com/cspoh/IRDM2017
Framework tf

DeepIEP: a Peptide Sequence Model of Isoelectric Point (IEP/pI) using Recurrent Neural Networks (RNNs)

Title DeepIEP: a Peptide Sequence Model of Isoelectric Point (IEP/pI) using Recurrent Neural Networks (RNNs)
Authors Esben Jannik Bjerrum
Abstract The isoelectric point (IEP or pI) is the pH where the net charge on the molecular ensemble of peptides and proteins is zero. This physical-chemical property is dependent on protonable/deprotonable sidechains and their pKa values. Here an pI prediction model is trained from a database of peptide sequences and pIs using a recurrent neural network (RNN) with long short-term memory (LSTM) cells. The trained model obtains an RMSE and R$^2$ of 0.28 and 0.95 for the external test set. The model is not based on pKa values, but prediction of constructed test sequences show similar rankings as already known pKa values. The prediction depends mostly on the existence of known acidic and basic amino acids with fine-adjusted based on the neighboring sequence and position of the charged amino acids in the peptide chain.
Tasks
Published 2017-12-27
URL http://arxiv.org/abs/1712.09553v1
PDF http://arxiv.org/pdf/1712.09553v1.pdf
PWC https://paperswithcode.com/paper/deepiep-a-peptide-sequence-model-of
Repo https://github.com/EBjerrum/DeepIEP
Framework tf

TensorFlow Distributions

Title TensorFlow Distributions
Authors Joshua V. Dillon, Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, Rif A. Saurous
Abstract The TensorFlow Distributions library implements a vision of probability theory adapted to the modern deep-learning paradigm of end-to-end differentiable computation. Building on two basic abstractions, it offers flexible building blocks for probabilistic computation. Distributions provide fast, numerically stable methods for generating samples and computing statistics, e.g., log density. Bijectors provide composable volume-tracking transformations with automatic caching. Together these enable modular construction of high dimensional distributions and transformations not possible with previous libraries (e.g., pixelCNNs, autoregressive flows, and reversible residual networks). They are the workhorse behind deep probabilistic programming systems like Edward and empower fast black-box inference in probabilistic models built on deep-network components. TensorFlow Distributions has proven an important part of the TensorFlow toolkit within Google and in the broader deep learning community.
Tasks Probabilistic Programming
Published 2017-11-28
URL http://arxiv.org/abs/1711.10604v1
PDF http://arxiv.org/pdf/1711.10604v1.pdf
PWC https://paperswithcode.com/paper/tensorflow-distributions
Repo https://github.com/nicola-decao/s-vae
Framework tf

Long-term Forecasting using Higher Order Tensor RNNs

Title Long-term Forecasting using Higher Order Tensor RNNs
Authors Rose Yu, Stephan Zheng, Anima Anandkumar, Yisong Yue
Abstract We present Higher-Order Tensor RNN (HOT-RNN), a novel family of neural sequence architectures for multivariate forecasting in environments with nonlinear dynamics. Long-term forecasting in such systems is highly challenging, since there exist long-term temporal dependencies, higher-order correlations and sensitivity to error propagation. Our proposed recurrent architecture addresses these issues by learning the nonlinear dynamics directly using higher-order moments and higher-order state transition functions. Furthermore, we decompose the higher-order structure using the tensor-train decomposition to reduce the number of parameters while preserving the model performance. We theoretically establish the approximation guarantees and the variance bound for HOT-RNN for general sequence inputs. We also demonstrate 5% ~ 12% improvements for long-term prediction over general RNN and LSTM architectures on a range of simulated environments with nonlinear dynamics, as well on real-world time series data.
Tasks Time Series
Published 2017-10-31
URL https://arxiv.org/abs/1711.00073v3
PDF https://arxiv.org/pdf/1711.00073v3.pdf
PWC https://paperswithcode.com/paper/long-term-forecasting-using-tensor-train-rnns
Repo https://github.com/yuqirose/tensor_train_RNN
Framework tf

Differentiable Learning of Logical Rules for Knowledge Base Reasoning

Title Differentiable Learning of Logical Rules for Knowledge Base Reasoning
Authors Fan Yang, Zhilin Yang, William W. Cohen
Abstract We study the problem of learning probabilistic first-order logical rules for knowledge base reasoning. This learning problem is difficult because it requires learning the parameters in a continuous space as well as the structure in a discrete space. We propose a framework, Neural Logic Programming, that combines the parameter and structure learning of first-order logical rules in an end-to-end differentiable model. This approach is inspired by a recently-developed differentiable logic called TensorLog, where inference tasks can be compiled into sequences of differentiable operations. We design a neural controller system that learns to compose these operations. Empirically, our method outperforms prior work on multiple knowledge base benchmark datasets, including Freebase and WikiMovies.
Tasks
Published 2017-02-27
URL http://arxiv.org/abs/1702.08367v3
PDF http://arxiv.org/pdf/1702.08367v3.pdf
PWC https://paperswithcode.com/paper/differentiable-learning-of-logical-rules-for
Repo https://github.com/fanyangxyz/Neural-LP
Framework tf

Regularizing Face Verification Nets For Pain Intensity Regression

Title Regularizing Face Verification Nets For Pain Intensity Regression
Authors Feng Wang, Xiang Xiang, Chang Liu, Trac D. Tran, Austin Reiter, Gregory D. Hager, Harry Quon, Jian Cheng, Alan L. Yuille
Abstract Limited labeled data are available for the research of estimating facial expression intensities. For instance, the ability to train deep networks for automated pain assessment is limited by small datasets with labels of patient-reported pain intensities. Fortunately, fine-tuning from a data-extensive pre-trained domain, such as face verification, can alleviate this problem. In this paper, we propose a network that fine-tunes a state-of-the-art face verification network using a regularized regression loss and additional data with expression labels. In this way, the expression intensity regression task can benefit from the rich feature representations trained on a huge amount of data for face verification. The proposed regularized deep regressor is applied to estimate the pain expression intensity and verified on the widely-used UNBC-McMaster Shoulder-Pain dataset, achieving the state-of-the-art performance. A weighted evaluation metric is also proposed to address the imbalance issue of different pain intensities.
Tasks Face Verification, Pain Intensity Regression
Published 2017-02-22
URL http://arxiv.org/abs/1702.06925v3
PDF http://arxiv.org/pdf/1702.06925v3.pdf
PWC https://paperswithcode.com/paper/regularizing-face-verification-nets-for-pain
Repo https://github.com/happynear/PainRegression
Framework none

Efficient, sparse representation of manifold distance matrices for classical scaling

Title Efficient, sparse representation of manifold distance matrices for classical scaling
Authors Javier S. Turek, Alexander Huth
Abstract Geodesic distance matrices can reveal shape properties that are largely invariant to non-rigid deformations, and thus are often used to analyze and represent 3-D shapes. However, these matrices grow quadratically with the number of points. Thus for large point sets it is common to use a low-rank approximation to the distance matrix, which fits in memory and can be efficiently analyzed using methods such as multidimensional scaling (MDS). In this paper we present a novel sparse method for efficiently representing geodesic distance matrices using biharmonic interpolation. This method exploits knowledge of the data manifold to learn a sparse interpolation operator that approximates distances using a subset of points. We show that our method is 2x faster and uses 20x less memory than current leading methods for solving MDS on large point sets, with similar quality. This enables analyses of large point sets that were previously infeasible.
Tasks
Published 2017-05-30
URL http://arxiv.org/abs/1705.10887v2
PDF http://arxiv.org/pdf/1705.10887v2.pdf
PWC https://paperswithcode.com/paper/efficient-sparse-representation-of-manifold
Repo https://github.com/alexhuth/BHA
Framework none

Positive-Unlabeled Learning with Non-Negative Risk Estimator

Title Positive-Unlabeled Learning with Non-Negative Risk Estimator
Authors Ryuichi Kiryo, Gang Niu, Marthinus C. du Plessis, Masashi Sugiyama
Abstract From only positive (P) and unlabeled (U) data, a binary classifier could be trained with PU learning, in which the state of the art is unbiased PU learning. However, if its model is very flexible, empirical risks on training data will go negative, and we will suffer from serious overfitting. In this paper, we propose a non-negative risk estimator for PU learning: when getting minimized, it is more robust against overfitting, and thus we are able to use very flexible models (such as deep neural networks) given limited P data. Moreover, we analyze the bias, consistency, and mean-squared-error reduction of the proposed risk estimator, and bound the estimation error of the resulting empirical risk minimizer. Experiments demonstrate that our risk estimator fixes the overfitting problem of its unbiased counterparts.
Tasks
Published 2017-03-02
URL http://arxiv.org/abs/1703.00593v2
PDF http://arxiv.org/pdf/1703.00593v2.pdf
PWC https://paperswithcode.com/paper/positive-unlabeled-learning-with-non-negative
Repo https://github.com/kiryor/nnPUlearning
Framework none

Wide Inference Network for Image Denoising via Learning Pixel-distribution Prior

Title Wide Inference Network for Image Denoising via Learning Pixel-distribution Prior
Authors Peng Liu, Ruogu Fang
Abstract We explore an innovative strategy for image denoising by using convolutional neural networks (CNN) to learn similar pixel-distribution features from noisy images. Many types of image noise follow a certain pixel-distribution in common, such as additive white Gaussian noise (AWGN). By increasing CNN’s width with larger reception fields and more channels in each layer, CNNs can reveal the ability to extract more accurate pixel-distribution features. The key to our approach is a discovery that wider CNNs with more convolutions tend to learn the similar pixel-distribution features, which reveals a new strategy to solve low-level vision problems effectively that the inference mapping primarily relies on the priors behind the noise property instead of deeper CNNs with more stacked nonlinear layers. We evaluate our work, Wide inference Networks (WIN), on AWGN and demonstrate that by learning pixel-distribution features from images, WIN-based network consistently achieves significantly better performance than current state-of-the-art deep CNN-based methods in both quantitative and visual evaluations. \textit{Code and models are available at \url{https://github.com/cswin/WIN}}.
Tasks Denoising, Image Denoising
Published 2017-07-17
URL http://arxiv.org/abs/1707.05414v5
PDF http://arxiv.org/pdf/1707.05414v5.pdf
PWC https://paperswithcode.com/paper/wide-inference-network-for-image-denoising
Repo https://github.com/shibuiwilliam/DeepLearningDenoise
Framework none

Convolutional Dictionary Learning: Acceleration and Convergence

Title Convolutional Dictionary Learning: Acceleration and Convergence
Authors Il Yong Chun, Jeffrey A. Fessler
Abstract Convolutional dictionary learning (CDL or sparsifying CDL) has many applications in image processing and computer vision. There has been growing interest in developing efficient algorithms for CDL, mostly relying on the augmented Lagrangian (AL) method or the variant alternating direction method of multipliers (ADMM). When their parameters are properly tuned, AL methods have shown fast convergence in CDL. However, the parameter tuning process is not trivial due to its data dependence and, in practice, the convergence of AL methods depends on the AL parameters for nonconvex CDL problems. To moderate these problems, this paper proposes a new practically feasible and convergent Block Proximal Gradient method using a Majorizer (BPG-M) for CDL. The BPG-M-based CDL is investigated with different block updating schemes and majorization matrix designs, and further accelerated by incorporating some momentum coefficient formulas and restarting techniques. All of the methods investigated incorporate a boundary artifacts removal (or, more generally, sampling) operator in the learning model. Numerical experiments show that, without needing any parameter tuning process, the proposed BPG-M approach converges more stably to desirable solutions of lower objective values than the existing state-of-the-art ADMM algorithm and its memory-efficient variant do. Compared to the ADMM approaches, the BPG-M method using a multi-block updating scheme is particularly useful in single-threaded CDL algorithm handling large datasets, due to its lower memory requirement and no polynomial computational complexity. Image denoising experiments show that, for relatively strong additive white Gaussian noise, the filters learned by BPG-M-based CDL outperform those trained by the ADMM approach.
Tasks Denoising, Dictionary Learning, Image Denoising
Published 2017-07-03
URL http://arxiv.org/abs/1707.00389v2
PDF http://arxiv.org/pdf/1707.00389v2.pdf
PWC https://paperswithcode.com/paper/convolutional-dictionary-learning
Repo https://github.com/mechatoz/convolt
Framework none

Recurrent Inference Machines for Solving Inverse Problems

Title Recurrent Inference Machines for Solving Inverse Problems
Authors Patrick Putzky, Max Welling
Abstract Much of the recent research on solving iterative inference problems focuses on moving away from hand-chosen inference algorithms and towards learned inference. In the latter, the inference process is unrolled in time and interpreted as a recurrent neural network (RNN) which allows for joint learning of model and inference parameters with back-propagation through time. In this framework, the RNN architecture is directly derived from a hand-chosen inference algorithm, effectively limiting its capabilities. We propose a learning framework, called Recurrent Inference Machines (RIM), in which we turn algorithm construction the other way round: Given data and a task, train an RNN to learn an inference algorithm. Because RNNs are Turing complete [1, 2] they are capable to implement any inference algorithm. The framework allows for an abstraction which removes the need for domain knowledge. We demonstrate in several image restoration experiments that this abstraction is effective, allowing us to achieve state-of-the-art performance on image denoising and super-resolution tasks and superior across-task generalization.
Tasks Denoising, Image Denoising, Image Restoration, Super-Resolution
Published 2017-06-13
URL http://arxiv.org/abs/1706.04008v1
PDF http://arxiv.org/pdf/1706.04008v1.pdf
PWC https://paperswithcode.com/paper/recurrent-inference-machines-for-solving
Repo https://github.com/pputzky/invertible_rim
Framework pytorch
comments powered by Disqus