Paper Group AWR 120
Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. Stochastic Gradient Descent as Approximate Bayesian Inference. Bayesian Optimization with Gradients. Riemannian approach to batch normalization. Match-Tensor: a Deep Relevance Model for Search. DeepIEP: a Peptide Sequence Model of Isoelectric Point (IEP/pI) using …
Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning
Title | Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning |
Authors | Sébastien Forestier, Yoan Mollard, Pierre-Yves Oudeyer |
Abstract | Intrinsically motivated spontaneous exploration is a key enabler of autonomous lifelong learning in human children. It allows them to discover and acquire large repertoires of skills through self-generation, self-selection, self-ordering and self-experimentation of learning goals. We present the unsupervised multi-goal reinforcement learning formal framework as well as an algorithmic approach called intrinsically motivated goal exploration processes (IMGEP) to enable similar properties of autonomous learning in machines. The IMGEP algorithmic architecture relies on several principles: 1) self-generation of goals as parameterized reinforcement learning problems; 2) selection of goals based on intrinsic rewards; 3) exploration with parameterized time-bounded policies and fast incremental goal-parameterized policy search; 4) systematic reuse of information acquired when targeting a goal for improving other goals. We present a particularly efficient form of IMGEP that uses a modular representation of goal spaces as well as intrinsic rewards based on learning progress. We show how IMGEPs automatically generate a learning curriculum within an experimental setup where a real humanoid robot can explore multiple spaces of goals with several hundred continuous dimensions. While no particular target goal is provided to the system beforehand, this curriculum allows the discovery of skills of increasing complexity, that act as stepping stone for learning more complex skills (like nested tool use). We show that learning several spaces of diverse problems can be more efficient for learning complex skills than only trying to directly learn these complex skills. We illustrate the computational efficiency of IMGEPs as these robotic experiments use a simple memory-based low-level policy representations and search algorithm, enabling the whole system to learn online and incrementally on a Raspberry Pi 3. |
Tasks | Multi-Goal Reinforcement Learning |
Published | 2017-08-07 |
URL | http://arxiv.org/abs/1708.02190v1 |
http://arxiv.org/pdf/1708.02190v1.pdf | |
PWC | https://paperswithcode.com/paper/intrinsically-motivated-goal-exploration |
Repo | https://github.com/flowersteam/geppg |
Framework | none |
Stochastic Gradient Descent as Approximate Bayesian Inference
Title | Stochastic Gradient Descent as Approximate Bayesian Inference |
Authors | Stephan Mandt, Matthew D. Hoffman, David M. Blei |
Abstract | Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a Markov chain with a stationary distribution. With this perspective, we derive several new results. (1) We show that constant SGD can be used as an approximate Bayesian posterior inference algorithm. Specifically, we show how to adjust the tuning parameters of constant SGD to best match the stationary distribution to a posterior, minimizing the Kullback-Leibler divergence between these two distributions. (2) We demonstrate that constant SGD gives rise to a new variational EM algorithm that optimizes hyperparameters in complex probabilistic models. (3) We also propose SGD with momentum for sampling and show how to adjust the damping coefficient accordingly. (4) We analyze MCMC algorithms. For Langevin Dynamics and Stochastic Gradient Fisher Scoring, we quantify the approximation errors due to finite learning rates. Finally (5), we use the stochastic process perspective to give a short proof of why Polyak averaging is optimal. Based on this idea, we propose a scalable approximate MCMC algorithm, the Averaged Stochastic Gradient Sampler. |
Tasks | Bayesian Inference |
Published | 2017-04-13 |
URL | http://arxiv.org/abs/1704.04289v2 |
http://arxiv.org/pdf/1704.04289v2.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-gradient-descent-as-approximate |
Repo | https://github.com/taohu88/BayesianML |
Framework | none |
Bayesian Optimization with Gradients
Title | Bayesian Optimization with Gradients |
Authors | Jian Wu, Matthias Poloczek, Andrew Gordon Wilson, Peter I. Frazier |
Abstract | Bayesian optimization has been successful at global optimization of expensive-to-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to decrease the number of objective function evaluations required for good performance. In particular, we develop a novel Bayesian optimization algorithm, the derivative-enabled knowledge-gradient (dKG), for which we show one-step Bayes-optimality, asymptotic consistency, and greater one-step value of information than is possible in the derivative-free setting. Our procedure accommodates noisy and incomplete derivative information, comes in both sequential and batch forms, and can optionally reduce the computational cost of inference through automatically selected retention of a single directional derivative. We also compute the d-KG acquisition function and its gradient using a novel fast discretization-free technique. We show d-KG provides state-of-the-art performance compared to a wide range of optimization procedures with and without gradients, on benchmarks including logistic regression, deep learning, kernel learning, and k-nearest neighbors. |
Tasks | |
Published | 2017-03-13 |
URL | http://arxiv.org/abs/1703.04389v3 |
http://arxiv.org/pdf/1703.04389v3.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-optimization-with-gradients |
Repo | https://github.com/wujian16/Cornell-MOE |
Framework | none |
Riemannian approach to batch normalization
Title | Riemannian approach to batch normalization |
Authors | Minhyung Cho, Jaehyung Lee |
Abstract | Batch Normalization (BN) has proven to be an effective algorithm for deep neural network training by normalizing the input to each neuron and reducing the internal covariate shift. The space of weight vectors in the BN layer can be naturally interpreted as a Riemannian manifold, which is invariant to linear scaling of weights. Following the intrinsic geometry of this manifold provides a new learning rule that is more efficient and easier to analyze. We also propose intuitive and effective gradient clipping and regularization methods for the proposed algorithm by utilizing the geometry of the manifold. The resulting algorithm consistently outperforms the original BN on various types of network architectures and datasets. |
Tasks | |
Published | 2017-09-27 |
URL | http://arxiv.org/abs/1709.09603v3 |
http://arxiv.org/pdf/1709.09603v3.pdf | |
PWC | https://paperswithcode.com/paper/riemannian-approach-to-batch-normalization |
Repo | https://github.com/MinhyungCho/riemannian-batch-normalization |
Framework | tf |
Match-Tensor: a Deep Relevance Model for Search
Title | Match-Tensor: a Deep Relevance Model for Search |
Authors | Aaron Jaech, Hetunandan Kamisetty, Eric Ringger, Charlie Clarke |
Abstract | The application of Deep Neural Networks for ranking in search engines may obviate the need for the extensive feature engineering common to current learning-to-rank methods. However, we show that combining simple relevance matching features like BM25 with existing Deep Neural Net models often substantially improves the accuracy of these models, indicating that they do not capture essential local relevance matching signals. We describe a novel deep Recurrent Neural Net-based model that we call Match-Tensor. The architecture of the Match-Tensor model simultaneously accounts for both local relevance matching and global topicality signals allowing for a rich interplay between them when computing the relevance of a document to a query. On a large held-out test set consisting of social media documents, we demonstrate not only that Match-Tensor outperforms BM25 and other classes of DNNs but also that it largely subsumes signals present in these models. |
Tasks | Feature Engineering, Learning-To-Rank |
Published | 2017-01-26 |
URL | http://arxiv.org/abs/1701.07795v1 |
http://arxiv.org/pdf/1701.07795v1.pdf | |
PWC | https://paperswithcode.com/paper/match-tensor-a-deep-relevance-model-for |
Repo | https://github.com/cspoh/IRDM2017 |
Framework | tf |
DeepIEP: a Peptide Sequence Model of Isoelectric Point (IEP/pI) using Recurrent Neural Networks (RNNs)
Title | DeepIEP: a Peptide Sequence Model of Isoelectric Point (IEP/pI) using Recurrent Neural Networks (RNNs) |
Authors | Esben Jannik Bjerrum |
Abstract | The isoelectric point (IEP or pI) is the pH where the net charge on the molecular ensemble of peptides and proteins is zero. This physical-chemical property is dependent on protonable/deprotonable sidechains and their pKa values. Here an pI prediction model is trained from a database of peptide sequences and pIs using a recurrent neural network (RNN) with long short-term memory (LSTM) cells. The trained model obtains an RMSE and R$^2$ of 0.28 and 0.95 for the external test set. The model is not based on pKa values, but prediction of constructed test sequences show similar rankings as already known pKa values. The prediction depends mostly on the existence of known acidic and basic amino acids with fine-adjusted based on the neighboring sequence and position of the charged amino acids in the peptide chain. |
Tasks | |
Published | 2017-12-27 |
URL | http://arxiv.org/abs/1712.09553v1 |
http://arxiv.org/pdf/1712.09553v1.pdf | |
PWC | https://paperswithcode.com/paper/deepiep-a-peptide-sequence-model-of |
Repo | https://github.com/EBjerrum/DeepIEP |
Framework | tf |
TensorFlow Distributions
Title | TensorFlow Distributions |
Authors | Joshua V. Dillon, Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, Rif A. Saurous |
Abstract | The TensorFlow Distributions library implements a vision of probability theory adapted to the modern deep-learning paradigm of end-to-end differentiable computation. Building on two basic abstractions, it offers flexible building blocks for probabilistic computation. Distributions provide fast, numerically stable methods for generating samples and computing statistics, e.g., log density. Bijectors provide composable volume-tracking transformations with automatic caching. Together these enable modular construction of high dimensional distributions and transformations not possible with previous libraries (e.g., pixelCNNs, autoregressive flows, and reversible residual networks). They are the workhorse behind deep probabilistic programming systems like Edward and empower fast black-box inference in probabilistic models built on deep-network components. TensorFlow Distributions has proven an important part of the TensorFlow toolkit within Google and in the broader deep learning community. |
Tasks | Probabilistic Programming |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10604v1 |
http://arxiv.org/pdf/1711.10604v1.pdf | |
PWC | https://paperswithcode.com/paper/tensorflow-distributions |
Repo | https://github.com/nicola-decao/s-vae |
Framework | tf |
Long-term Forecasting using Higher Order Tensor RNNs
Title | Long-term Forecasting using Higher Order Tensor RNNs |
Authors | Rose Yu, Stephan Zheng, Anima Anandkumar, Yisong Yue |
Abstract | We present Higher-Order Tensor RNN (HOT-RNN), a novel family of neural sequence architectures for multivariate forecasting in environments with nonlinear dynamics. Long-term forecasting in such systems is highly challenging, since there exist long-term temporal dependencies, higher-order correlations and sensitivity to error propagation. Our proposed recurrent architecture addresses these issues by learning the nonlinear dynamics directly using higher-order moments and higher-order state transition functions. Furthermore, we decompose the higher-order structure using the tensor-train decomposition to reduce the number of parameters while preserving the model performance. We theoretically establish the approximation guarantees and the variance bound for HOT-RNN for general sequence inputs. We also demonstrate 5% ~ 12% improvements for long-term prediction over general RNN and LSTM architectures on a range of simulated environments with nonlinear dynamics, as well on real-world time series data. |
Tasks | Time Series |
Published | 2017-10-31 |
URL | https://arxiv.org/abs/1711.00073v3 |
https://arxiv.org/pdf/1711.00073v3.pdf | |
PWC | https://paperswithcode.com/paper/long-term-forecasting-using-tensor-train-rnns |
Repo | https://github.com/yuqirose/tensor_train_RNN |
Framework | tf |
Differentiable Learning of Logical Rules for Knowledge Base Reasoning
Title | Differentiable Learning of Logical Rules for Knowledge Base Reasoning |
Authors | Fan Yang, Zhilin Yang, William W. Cohen |
Abstract | We study the problem of learning probabilistic first-order logical rules for knowledge base reasoning. This learning problem is difficult because it requires learning the parameters in a continuous space as well as the structure in a discrete space. We propose a framework, Neural Logic Programming, that combines the parameter and structure learning of first-order logical rules in an end-to-end differentiable model. This approach is inspired by a recently-developed differentiable logic called TensorLog, where inference tasks can be compiled into sequences of differentiable operations. We design a neural controller system that learns to compose these operations. Empirically, our method outperforms prior work on multiple knowledge base benchmark datasets, including Freebase and WikiMovies. |
Tasks | |
Published | 2017-02-27 |
URL | http://arxiv.org/abs/1702.08367v3 |
http://arxiv.org/pdf/1702.08367v3.pdf | |
PWC | https://paperswithcode.com/paper/differentiable-learning-of-logical-rules-for |
Repo | https://github.com/fanyangxyz/Neural-LP |
Framework | tf |
Regularizing Face Verification Nets For Pain Intensity Regression
Title | Regularizing Face Verification Nets For Pain Intensity Regression |
Authors | Feng Wang, Xiang Xiang, Chang Liu, Trac D. Tran, Austin Reiter, Gregory D. Hager, Harry Quon, Jian Cheng, Alan L. Yuille |
Abstract | Limited labeled data are available for the research of estimating facial expression intensities. For instance, the ability to train deep networks for automated pain assessment is limited by small datasets with labels of patient-reported pain intensities. Fortunately, fine-tuning from a data-extensive pre-trained domain, such as face verification, can alleviate this problem. In this paper, we propose a network that fine-tunes a state-of-the-art face verification network using a regularized regression loss and additional data with expression labels. In this way, the expression intensity regression task can benefit from the rich feature representations trained on a huge amount of data for face verification. The proposed regularized deep regressor is applied to estimate the pain expression intensity and verified on the widely-used UNBC-McMaster Shoulder-Pain dataset, achieving the state-of-the-art performance. A weighted evaluation metric is also proposed to address the imbalance issue of different pain intensities. |
Tasks | Face Verification, Pain Intensity Regression |
Published | 2017-02-22 |
URL | http://arxiv.org/abs/1702.06925v3 |
http://arxiv.org/pdf/1702.06925v3.pdf | |
PWC | https://paperswithcode.com/paper/regularizing-face-verification-nets-for-pain |
Repo | https://github.com/happynear/PainRegression |
Framework | none |
Efficient, sparse representation of manifold distance matrices for classical scaling
Title | Efficient, sparse representation of manifold distance matrices for classical scaling |
Authors | Javier S. Turek, Alexander Huth |
Abstract | Geodesic distance matrices can reveal shape properties that are largely invariant to non-rigid deformations, and thus are often used to analyze and represent 3-D shapes. However, these matrices grow quadratically with the number of points. Thus for large point sets it is common to use a low-rank approximation to the distance matrix, which fits in memory and can be efficiently analyzed using methods such as multidimensional scaling (MDS). In this paper we present a novel sparse method for efficiently representing geodesic distance matrices using biharmonic interpolation. This method exploits knowledge of the data manifold to learn a sparse interpolation operator that approximates distances using a subset of points. We show that our method is 2x faster and uses 20x less memory than current leading methods for solving MDS on large point sets, with similar quality. This enables analyses of large point sets that were previously infeasible. |
Tasks | |
Published | 2017-05-30 |
URL | http://arxiv.org/abs/1705.10887v2 |
http://arxiv.org/pdf/1705.10887v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-sparse-representation-of-manifold |
Repo | https://github.com/alexhuth/BHA |
Framework | none |
Positive-Unlabeled Learning with Non-Negative Risk Estimator
Title | Positive-Unlabeled Learning with Non-Negative Risk Estimator |
Authors | Ryuichi Kiryo, Gang Niu, Marthinus C. du Plessis, Masashi Sugiyama |
Abstract | From only positive (P) and unlabeled (U) data, a binary classifier could be trained with PU learning, in which the state of the art is unbiased PU learning. However, if its model is very flexible, empirical risks on training data will go negative, and we will suffer from serious overfitting. In this paper, we propose a non-negative risk estimator for PU learning: when getting minimized, it is more robust against overfitting, and thus we are able to use very flexible models (such as deep neural networks) given limited P data. Moreover, we analyze the bias, consistency, and mean-squared-error reduction of the proposed risk estimator, and bound the estimation error of the resulting empirical risk minimizer. Experiments demonstrate that our risk estimator fixes the overfitting problem of its unbiased counterparts. |
Tasks | |
Published | 2017-03-02 |
URL | http://arxiv.org/abs/1703.00593v2 |
http://arxiv.org/pdf/1703.00593v2.pdf | |
PWC | https://paperswithcode.com/paper/positive-unlabeled-learning-with-non-negative |
Repo | https://github.com/kiryor/nnPUlearning |
Framework | none |
Wide Inference Network for Image Denoising via Learning Pixel-distribution Prior
Title | Wide Inference Network for Image Denoising via Learning Pixel-distribution Prior |
Authors | Peng Liu, Ruogu Fang |
Abstract | We explore an innovative strategy for image denoising by using convolutional neural networks (CNN) to learn similar pixel-distribution features from noisy images. Many types of image noise follow a certain pixel-distribution in common, such as additive white Gaussian noise (AWGN). By increasing CNN’s width with larger reception fields and more channels in each layer, CNNs can reveal the ability to extract more accurate pixel-distribution features. The key to our approach is a discovery that wider CNNs with more convolutions tend to learn the similar pixel-distribution features, which reveals a new strategy to solve low-level vision problems effectively that the inference mapping primarily relies on the priors behind the noise property instead of deeper CNNs with more stacked nonlinear layers. We evaluate our work, Wide inference Networks (WIN), on AWGN and demonstrate that by learning pixel-distribution features from images, WIN-based network consistently achieves significantly better performance than current state-of-the-art deep CNN-based methods in both quantitative and visual evaluations. \textit{Code and models are available at \url{https://github.com/cswin/WIN}}. |
Tasks | Denoising, Image Denoising |
Published | 2017-07-17 |
URL | http://arxiv.org/abs/1707.05414v5 |
http://arxiv.org/pdf/1707.05414v5.pdf | |
PWC | https://paperswithcode.com/paper/wide-inference-network-for-image-denoising |
Repo | https://github.com/shibuiwilliam/DeepLearningDenoise |
Framework | none |
Convolutional Dictionary Learning: Acceleration and Convergence
Title | Convolutional Dictionary Learning: Acceleration and Convergence |
Authors | Il Yong Chun, Jeffrey A. Fessler |
Abstract | Convolutional dictionary learning (CDL or sparsifying CDL) has many applications in image processing and computer vision. There has been growing interest in developing efficient algorithms for CDL, mostly relying on the augmented Lagrangian (AL) method or the variant alternating direction method of multipliers (ADMM). When their parameters are properly tuned, AL methods have shown fast convergence in CDL. However, the parameter tuning process is not trivial due to its data dependence and, in practice, the convergence of AL methods depends on the AL parameters for nonconvex CDL problems. To moderate these problems, this paper proposes a new practically feasible and convergent Block Proximal Gradient method using a Majorizer (BPG-M) for CDL. The BPG-M-based CDL is investigated with different block updating schemes and majorization matrix designs, and further accelerated by incorporating some momentum coefficient formulas and restarting techniques. All of the methods investigated incorporate a boundary artifacts removal (or, more generally, sampling) operator in the learning model. Numerical experiments show that, without needing any parameter tuning process, the proposed BPG-M approach converges more stably to desirable solutions of lower objective values than the existing state-of-the-art ADMM algorithm and its memory-efficient variant do. Compared to the ADMM approaches, the BPG-M method using a multi-block updating scheme is particularly useful in single-threaded CDL algorithm handling large datasets, due to its lower memory requirement and no polynomial computational complexity. Image denoising experiments show that, for relatively strong additive white Gaussian noise, the filters learned by BPG-M-based CDL outperform those trained by the ADMM approach. |
Tasks | Denoising, Dictionary Learning, Image Denoising |
Published | 2017-07-03 |
URL | http://arxiv.org/abs/1707.00389v2 |
http://arxiv.org/pdf/1707.00389v2.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-dictionary-learning |
Repo | https://github.com/mechatoz/convolt |
Framework | none |
Recurrent Inference Machines for Solving Inverse Problems
Title | Recurrent Inference Machines for Solving Inverse Problems |
Authors | Patrick Putzky, Max Welling |
Abstract | Much of the recent research on solving iterative inference problems focuses on moving away from hand-chosen inference algorithms and towards learned inference. In the latter, the inference process is unrolled in time and interpreted as a recurrent neural network (RNN) which allows for joint learning of model and inference parameters with back-propagation through time. In this framework, the RNN architecture is directly derived from a hand-chosen inference algorithm, effectively limiting its capabilities. We propose a learning framework, called Recurrent Inference Machines (RIM), in which we turn algorithm construction the other way round: Given data and a task, train an RNN to learn an inference algorithm. Because RNNs are Turing complete [1, 2] they are capable to implement any inference algorithm. The framework allows for an abstraction which removes the need for domain knowledge. We demonstrate in several image restoration experiments that this abstraction is effective, allowing us to achieve state-of-the-art performance on image denoising and super-resolution tasks and superior across-task generalization. |
Tasks | Denoising, Image Denoising, Image Restoration, Super-Resolution |
Published | 2017-06-13 |
URL | http://arxiv.org/abs/1706.04008v1 |
http://arxiv.org/pdf/1706.04008v1.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-inference-machines-for-solving |
Repo | https://github.com/pputzky/invertible_rim |
Framework | pytorch |