July 29, 2019

3005 words 15 mins read

Paper Group AWR 120

Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. Stochastic Gradient Descent as Approximate Bayesian Inference. Bayesian Optimization with Gradients. Riemannian approach to batch normalization. Match-Tensor: a Deep Relevance Model for Search. DeepIEP: a Peptide Sequence Model of Isoelectric Point (IEP/pI) using …

Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning


Title	Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning
Authors	Sébastien Forestier, Yoan Mollard, Pierre-Yves Oudeyer
Abstract	Intrinsically motivated spontaneous exploration is a key enabler of autonomous lifelong learning in human children. It allows them to discover and acquire large repertoires of skills through self-generation, self-selection, self-ordering and self-experimentation of learning goals. We present the unsupervised multi-goal reinforcement learning formal framework as well as an algorithmic approach called intrinsically motivated goal exploration processes (IMGEP) to enable similar properties of autonomous learning in machines. The IMGEP algorithmic architecture relies on several principles: 1) self-generation of goals as parameterized reinforcement learning problems; 2) selection of goals based on intrinsic rewards; 3) exploration with parameterized time-bounded policies and fast incremental goal-parameterized policy search; 4) systematic reuse of information acquired when targeting a goal for improving other goals. We present a particularly efficient form of IMGEP that uses a modular representation of goal spaces as well as intrinsic rewards based on learning progress. We show how IMGEPs automatically generate a learning curriculum within an experimental setup where a real humanoid robot can explore multiple spaces of goals with several hundred continuous dimensions. While no particular target goal is provided to the system beforehand, this curriculum allows the discovery of skills of increasing complexity, that act as stepping stone for learning more complex skills (like nested tool use). We show that learning several spaces of diverse problems can be more efficient for learning complex skills than only trying to directly learn these complex skills. We illustrate the computational efficiency of IMGEPs as these robotic experiments use a simple memory-based low-level policy representations and search algorithm, enabling the whole system to learn online and incrementally on a Raspberry Pi 3.
Tasks	Multi-Goal Reinforcement Learning
Published	2017-08-07
URL	http://arxiv.org/abs/1708.02190v1
PDF	http://arxiv.org/pdf/1708.02190v1.pdf
PWC	https://paperswithcode.com/paper/intrinsically-motivated-goal-exploration
Repo	https://github.com/flowersteam/geppg
Framework	none

Stochastic Gradient Descent as Approximate Bayesian Inference


Title	Stochastic Gradient Descent as Approximate Bayesian Inference
Authors	Stephan Mandt, Matthew D. Hoffman, David M. Blei
Abstract	Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a Markov chain with a stationary distribution. With this perspective, we derive several new results. (1) We show that constant SGD can be used as an approximate Bayesian posterior inference algorithm. Specifically, we show how to adjust the tuning parameters of constant SGD to best match the stationary distribution to a posterior, minimizing the Kullback-Leibler divergence between these two distributions. (2) We demonstrate that constant SGD gives rise to a new variational EM algorithm that optimizes hyperparameters in complex probabilistic models. (3) We also propose SGD with momentum for sampling and show how to adjust the damping coefficient accordingly. (4) We analyze MCMC algorithms. For Langevin Dynamics and Stochastic Gradient Fisher Scoring, we quantify the approximation errors due to finite learning rates. Finally (5), we use the stochastic process perspective to give a short proof of why Polyak averaging is optimal. Based on this idea, we propose a scalable approximate MCMC algorithm, the Averaged Stochastic Gradient Sampler.
Tasks	Bayesian Inference
Published	2017-04-13
URL	http://arxiv.org/abs/1704.04289v2
PDF	http://arxiv.org/pdf/1704.04289v2.pdf
PWC	https://paperswithcode.com/paper/stochastic-gradient-descent-as-approximate
Repo	https://github.com/taohu88/BayesianML
Framework	none

Bayesian Optimization with Gradients


Title	Bayesian Optimization with Gradients
Authors	Jian Wu, Matthias Poloczek, Andrew Gordon Wilson, Peter I. Frazier
Abstract	Bayesian optimization has been successful at global optimization of expensive-to-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to decrease the number of objective function evaluations required for good performance. In particular, we develop a novel Bayesian optimization algorithm, the derivative-enabled knowledge-gradient (dKG), for which we show one-step Bayes-optimality, asymptotic consistency, and greater one-step value of information than is possible in the derivative-free setting. Our procedure accommodates noisy and incomplete derivative information, comes in both sequential and batch forms, and can optionally reduce the computational cost of inference through automatically selected retention of a single directional derivative. We also compute the d-KG acquisition function and its gradient using a novel fast discretization-free technique. We show d-KG provides state-of-the-art performance compared to a wide range of optimization procedures with and without gradients, on benchmarks including logistic regression, deep learning, kernel learning, and k-nearest neighbors.
Tasks
Published	2017-03-13
URL	http://arxiv.org/abs/1703.04389v3
PDF	http://arxiv.org/pdf/1703.04389v3.pdf
PWC	https://paperswithcode.com/paper/bayesian-optimization-with-gradients
Repo	https://github.com/wujian16/Cornell-MOE
Framework	none

Riemannian approach to batch normalization


Title	Riemannian approach to batch normalization
Authors	Minhyung Cho, Jaehyung Lee
Abstract	Batch Normalization (BN) has proven to be an effective algorithm for deep neural network training by normalizing the input to each neuron and reducing the internal covariate shift. The space of weight vectors in the BN layer can be naturally interpreted as a Riemannian manifold, which is invariant to linear scaling of weights. Following the intrinsic geometry of this manifold provides a new learning rule that is more efficient and easier to analyze. We also propose intuitive and effective gradient clipping and regularization methods for the proposed algorithm by utilizing the geometry of the manifold. The resulting algorithm consistently outperforms the original BN on various types of network architectures and datasets.
Tasks
Published	2017-09-27
URL	http://arxiv.org/abs/1709.09603v3
PDF	http://arxiv.org/pdf/1709.09603v3.pdf
PWC	https://paperswithcode.com/paper/riemannian-approach-to-batch-normalization
Repo	https://github.com/MinhyungCho/riemannian-batch-normalization
Framework	tf

Match-Tensor: a Deep Relevance Model for Search


Title	Match-Tensor: a Deep Relevance Model for Search
Authors	Aaron Jaech, Hetunandan Kamisetty, Eric Ringger, Charlie Clarke
Abstract	The application of Deep Neural Networks for ranking in search engines may obviate the need for the extensive feature engineering common to current learning-to-rank methods. However, we show that combining simple relevance matching features like BM25 with existing Deep Neural Net models often substantially improves the accuracy of these models, indicating that they do not capture essential local relevance matching signals. We describe a novel deep Recurrent Neural Net-based model that we call Match-Tensor. The architecture of the Match-Tensor model simultaneously accounts for both local relevance matching and global topicality signals allowing for a rich interplay between them when computing the relevance of a document to a query. On a large held-out test set consisting of social media documents, we demonstrate not only that Match-Tensor outperforms BM25 and other classes of DNNs but also that it largely subsumes signals present in these models.
Tasks	Feature Engineering, Learning-To-Rank
Published	2017-01-26
URL	http://arxiv.org/abs/1701.07795v1
PDF	http://arxiv.org/pdf/1701.07795v1.pdf
PWC	https://paperswithcode.com/paper/match-tensor-a-deep-relevance-model-for
Repo	https://github.com/cspoh/IRDM2017
Framework	tf

DeepIEP: a Peptide Sequence Model of Isoelectric Point (IEP/pI) using Recurrent Neural Networks (RNNs)


Title	DeepIEP: a Peptide Sequence Model of Isoelectric Point (IEP/pI) using Recurrent Neural Networks (RNNs)
Authors	Esben Jannik Bjerrum
Abstract	The isoelectric point (IEP or pI) is the pH where the net charge on the molecular ensemble of peptides and proteins is zero. This physical-chemical property is dependent on protonable/deprotonable sidechains and their pKa values. Here an pI prediction model is trained from a database of peptide sequences and pIs using a recurrent neural network (RNN) with long short-term memory (LSTM) cells. The trained model obtains an RMSE and R$^2$ of 0.28 and 0.95 for the external test set. The model is not based on pKa values, but prediction of constructed test sequences show similar rankings as already known pKa values. The prediction depends mostly on the existence of known acidic and basic amino acids with fine-adjusted based on the neighboring sequence and position of the charged amino acids in the peptide chain.
Tasks
Published	2017-12-27
URL	http://arxiv.org/abs/1712.09553v1
PDF	http://arxiv.org/pdf/1712.09553v1.pdf
PWC	https://paperswithcode.com/paper/deepiep-a-peptide-sequence-model-of
Repo	https://github.com/EBjerrum/DeepIEP
Framework	tf

TensorFlow Distributions


Title	TensorFlow Distributions
Authors	Joshua V. Dillon, Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, Rif A. Saurous
Abstract	The TensorFlow Distributions library implements a vision of probability theory adapted to the modern deep-learning paradigm of end-to-end differentiable computation. Building on two basic abstractions, it offers flexible building blocks for probabilistic computation. Distributions provide fast, numerically stable methods for generating samples and computing statistics, e.g., log density. Bijectors provide composable volume-tracking transformations with automatic caching. Together these enable modular construction of high dimensional distributions and transformations not possible with previous libraries (e.g., pixelCNNs, autoregressive flows, and reversible residual networks). They are the workhorse behind deep probabilistic programming systems like Edward and empower fast black-box inference in probabilistic models built on deep-network components. TensorFlow Distributions has proven an important part of the TensorFlow toolkit within Google and in the broader deep learning community.
Tasks	Probabilistic Programming
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10604v1
PDF	http://arxiv.org/pdf/1711.10604v1.pdf
PWC	https://paperswithcode.com/paper/tensorflow-distributions
Repo	https://github.com/nicola-decao/s-vae
Framework	tf

Long-term Forecasting using Higher Order Tensor RNNs


Title	Long-term Forecasting using Higher Order Tensor RNNs
Authors	Rose Yu, Stephan Zheng, Anima Anandkumar, Yisong Yue
Abstract	We present Higher-Order Tensor RNN (HOT-RNN), a novel family of neural sequence architectures for multivariate forecasting in environments with nonlinear dynamics. Long-term forecasting in such systems is highly challenging, since there exist long-term temporal dependencies, higher-order correlations and sensitivity to error propagation. Our proposed recurrent architecture addresses these issues by learning the nonlinear dynamics directly using higher-order moments and higher-order state transition functions. Furthermore, we decompose the higher-order structure using the tensor-train decomposition to reduce the number of parameters while preserving the model performance. We theoretically establish the approximation guarantees and the variance bound for HOT-RNN for general sequence inputs. We also demonstrate 5% ~ 12% improvements for long-term prediction over general RNN and LSTM architectures on a range of simulated environments with nonlinear dynamics, as well on real-world time series data.
Tasks	Time Series
Published	2017-10-31
URL	https://arxiv.org/abs/1711.00073v3
PDF	https://arxiv.org/pdf/1711.00073v3.pdf
PWC	https://paperswithcode.com/paper/long-term-forecasting-using-tensor-train-rnns
Repo	https://github.com/yuqirose/tensor_train_RNN
Framework	tf

Differentiable Learning of Logical Rules for Knowledge Base Reasoning


Title	Differentiable Learning of Logical Rules for Knowledge Base Reasoning
Authors	Fan Yang, Zhilin Yang, William W. Cohen
Abstract	We study the problem of learning probabilistic first-order logical rules for knowledge base reasoning. This learning problem is difficult because it requires learning the parameters in a continuous space as well as the structure in a discrete space. We propose a framework, Neural Logic Programming, that combines the parameter and structure learning of first-order logical rules in an end-to-end differentiable model. This approach is inspired by a recently-developed differentiable logic called TensorLog, where inference tasks can be compiled into sequences of differentiable operations. We design a neural controller system that learns to compose these operations. Empirically, our method outperforms prior work on multiple knowledge base benchmark datasets, including Freebase and WikiMovies.
Tasks
Published	2017-02-27
URL	http://arxiv.org/abs/1702.08367v3
PDF	http://arxiv.org/pdf/1702.08367v3.pdf
PWC	https://paperswithcode.com/paper/differentiable-learning-of-logical-rules-for
Repo	https://github.com/fanyangxyz/Neural-LP
Framework	tf

Regularizing Face Verification Nets For Pain Intensity Regression


Title	Regularizing Face Verification Nets For Pain Intensity Regression
Authors	Feng Wang, Xiang Xiang, Chang Liu, Trac D. Tran, Austin Reiter, Gregory D. Hager, Harry Quon, Jian Cheng, Alan L. Yuille
Abstract	Limited labeled data are available for the research of estimating facial expression intensities. For instance, the ability to train deep networks for automated pain assessment is limited by small datasets with labels of patient-reported pain intensities. Fortunately, fine-tuning from a data-extensive pre-trained domain, such as face verification, can alleviate this problem. In this paper, we propose a network that fine-tunes a state-of-the-art face verification network using a regularized regression loss and additional data with expression labels. In this way, the expression intensity regression task can benefit from the rich feature representations trained on a huge amount of data for face verification. The proposed regularized deep regressor is applied to estimate the pain expression intensity and verified on the widely-used UNBC-McMaster Shoulder-Pain dataset, achieving the state-of-the-art performance. A weighted evaluation metric is also proposed to address the imbalance issue of different pain intensities.
Tasks	Face Verification, Pain Intensity Regression
Published	2017-02-22
URL	http://arxiv.org/abs/1702.06925v3
PDF	http://arxiv.org/pdf/1702.06925v3.pdf
PWC	https://paperswithcode.com/paper/regularizing-face-verification-nets-for-pain
Repo	https://github.com/happynear/PainRegression
Framework	none

Efficient, sparse representation of manifold distance matrices for classical scaling


Title	Efficient, sparse representation of manifold distance matrices for classical scaling
Authors	Javier S. Turek, Alexander Huth
Abstract	Geodesic distance matrices can reveal shape properties that are largely invariant to non-rigid deformations, and thus are often used to analyze and represent 3-D shapes. However, these matrices grow quadratically with the number of points. Thus for large point sets it is common to use a low-rank approximation to the distance matrix, which fits in memory and can be efficiently analyzed using methods such as multidimensional scaling (MDS). In this paper we present a novel sparse method for efficiently representing geodesic distance matrices using biharmonic interpolation. This method exploits knowledge of the data manifold to learn a sparse interpolation operator that approximates distances using a subset of points. We show that our method is 2x faster and uses 20x less memory than current leading methods for solving MDS on large point sets, with similar quality. This enables analyses of large point sets that were previously infeasible.
Tasks
Published	2017-05-30
URL	http://arxiv.org/abs/1705.10887v2
PDF	http://arxiv.org/pdf/1705.10887v2.pdf
PWC	https://paperswithcode.com/paper/efficient-sparse-representation-of-manifold
Repo	https://github.com/alexhuth/BHA
Framework	none

Positive-Unlabeled Learning with Non-Negative Risk Estimator


Title	Positive-Unlabeled Learning with Non-Negative Risk Estimator
Authors	Ryuichi Kiryo, Gang Niu, Marthinus C. du Plessis, Masashi Sugiyama
Abstract	From only positive (P) and unlabeled (U) data, a binary classifier could be trained with PU learning, in which the state of the art is unbiased PU learning. However, if its model is very flexible, empirical risks on training data will go negative, and we will suffer from serious overfitting. In this paper, we propose a non-negative risk estimator for PU learning: when getting minimized, it is more robust against overfitting, and thus we are able to use very flexible models (such as deep neural networks) given limited P data. Moreover, we analyze the bias, consistency, and mean-squared-error reduction of the proposed risk estimator, and bound the estimation error of the resulting empirical risk minimizer. Experiments demonstrate that our risk estimator fixes the overfitting problem of its unbiased counterparts.
Tasks
Published	2017-03-02
URL	http://arxiv.org/abs/1703.00593v2
PDF	http://arxiv.org/pdf/1703.00593v2.pdf
PWC	https://paperswithcode.com/paper/positive-unlabeled-learning-with-non-negative
Repo	https://github.com/kiryor/nnPUlearning
Framework	none

Wide Inference Network for Image Denoising via Learning Pixel-distribution Prior


Title	Wide Inference Network for Image Denoising via Learning Pixel-distribution Prior
Authors	Peng Liu, Ruogu Fang
Abstract	We explore an innovative strategy for image denoising by using convolutional neural networks (CNN) to learn similar pixel-distribution features from noisy images. Many types of image noise follow a certain pixel-distribution in common, such as additive white Gaussian noise (AWGN). By increasing CNN’s width with larger reception fields and more channels in each layer, CNNs can reveal the ability to extract more accurate pixel-distribution features. The key to our approach is a discovery that wider CNNs with more convolutions tend to learn the similar pixel-distribution features, which reveals a new strategy to solve low-level vision problems effectively that the inference mapping primarily relies on the priors behind the noise property instead of deeper CNNs with more stacked nonlinear layers. We evaluate our work, Wide inference Networks (WIN), on AWGN and demonstrate that by learning pixel-distribution features from images, WIN-based network consistently achieves significantly better performance than current state-of-the-art deep CNN-based methods in both quantitative and visual evaluations. \textit{Code and models are available at \url{https://github.com/cswin/WIN}}.
Tasks	Denoising, Image Denoising
Published	2017-07-17
URL	http://arxiv.org/abs/1707.05414v5
PDF	http://arxiv.org/pdf/1707.05414v5.pdf
PWC	https://paperswithcode.com/paper/wide-inference-network-for-image-denoising
Repo	https://github.com/shibuiwilliam/DeepLearningDenoise
Framework	none

Convolutional Dictionary Learning: Acceleration and Convergence


Title	Convolutional Dictionary Learning: Acceleration and Convergence
Authors	Il Yong Chun, Jeffrey A. Fessler
Abstract	Convolutional dictionary learning (CDL or sparsifying CDL) has many applications in image processing and computer vision. There has been growing interest in developing efficient algorithms for CDL, mostly relying on the augmented Lagrangian (AL) method or the variant alternating direction method of multipliers (ADMM). When their parameters are properly tuned, AL methods have shown fast convergence in CDL. However, the parameter tuning process is not trivial due to its data dependence and, in practice, the convergence of AL methods depends on the AL parameters for nonconvex CDL problems. To moderate these problems, this paper proposes a new practically feasible and convergent Block Proximal Gradient method using a Majorizer (BPG-M) for CDL. The BPG-M-based CDL is investigated with different block updating schemes and majorization matrix designs, and further accelerated by incorporating some momentum coefficient formulas and restarting techniques. All of the methods investigated incorporate a boundary artifacts removal (or, more generally, sampling) operator in the learning model. Numerical experiments show that, without needing any parameter tuning process, the proposed BPG-M approach converges more stably to desirable solutions of lower objective values than the existing state-of-the-art ADMM algorithm and its memory-efficient variant do. Compared to the ADMM approaches, the BPG-M method using a multi-block updating scheme is particularly useful in single-threaded CDL algorithm handling large datasets, due to its lower memory requirement and no polynomial computational complexity. Image denoising experiments show that, for relatively strong additive white Gaussian noise, the filters learned by BPG-M-based CDL outperform those trained by the ADMM approach.
Tasks	Denoising, Dictionary Learning, Image Denoising
Published	2017-07-03
URL	http://arxiv.org/abs/1707.00389v2
PDF	http://arxiv.org/pdf/1707.00389v2.pdf
PWC	https://paperswithcode.com/paper/convolutional-dictionary-learning
Repo	https://github.com/mechatoz/convolt
Framework	none

Recurrent Inference Machines for Solving Inverse Problems


Title	Recurrent Inference Machines for Solving Inverse Problems
Authors	Patrick Putzky, Max Welling
Abstract	Much of the recent research on solving iterative inference problems focuses on moving away from hand-chosen inference algorithms and towards learned inference. In the latter, the inference process is unrolled in time and interpreted as a recurrent neural network (RNN) which allows for joint learning of model and inference parameters with back-propagation through time. In this framework, the RNN architecture is directly derived from a hand-chosen inference algorithm, effectively limiting its capabilities. We propose a learning framework, called Recurrent Inference Machines (RIM), in which we turn algorithm construction the other way round: Given data and a task, train an RNN to learn an inference algorithm. Because RNNs are Turing complete [1, 2] they are capable to implement any inference algorithm. The framework allows for an abstraction which removes the need for domain knowledge. We demonstrate in several image restoration experiments that this abstraction is effective, allowing us to achieve state-of-the-art performance on image denoising and super-resolution tasks and superior across-task generalization.
Tasks	Denoising, Image Denoising, Image Restoration, Super-Resolution
Published	2017-06-13
URL	http://arxiv.org/abs/1706.04008v1
PDF	http://arxiv.org/pdf/1706.04008v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-inference-machines-for-solving
Repo	https://github.com/pputzky/invertible_rim
Framework	pytorch