July 27, 2019

2778 words 14 mins read

Paper Group ANR 627

Paper Group ANR 627

Learning to Warm-Start Bayesian Hyperparameter Optimization. Deep Relaxation: partial differential equations for optimizing deep neural networks. Phase Transitions in the Pooled Data Problem. Bayesian Inference of Log Determinants. Refining Source Representations with Relation Networks for Neural Machine Translation. Exact Blur Measure Outperforms …

Learning to Warm-Start Bayesian Hyperparameter Optimization

Title Learning to Warm-Start Bayesian Hyperparameter Optimization
Authors Jungtaek Kim, Saehoon Kim, Seungjin Choi
Abstract Hyperparameter optimization aims to find the optimal hyperparameter configuration of a machine learning model, which provides the best performance on a validation dataset. Manual search usually leads to get stuck in a local hyperparameter configuration, and heavily depends on human intuition and experience. A simple alternative of manual search is random/grid search on a space of hyperparameters, which still undergoes extensive evaluations of validation errors in order to find its best configuration. Bayesian optimization that is a global optimization method for black-box functions is now popular for hyperparameter optimization, since it greatly reduces the number of validation error evaluations required, compared to random/grid search. Bayesian optimization generally finds the best hyperparameter configuration from random initialization without any prior knowledge. This motivates us to let Bayesian optimization start from the configurations that were successful on similar datasets, which are able to remarkably minimize the number of evaluations. In this paper, we propose deep metric learning to learn meta-features over datasets such that the similarity over them is effectively measured by Euclidean distance between their associated meta-features. To this end, we introduce a Siamese network composed of deep feature and meta-feature extractors, where deep feature extractor provides a semantic representation of each instance in a dataset and meta-feature extractor aggregates a set of deep features to encode a single representation over a dataset. Then, our learned meta-features are used to select a few datasets similar to the new dataset, so that hyperparameters in similar datasets are adopted as initializations to warm-start Bayesian hyperparameter optimization.
Tasks Hyperparameter Optimization, Metric Learning
Published 2017-10-17
URL http://arxiv.org/abs/1710.06219v3
PDF http://arxiv.org/pdf/1710.06219v3.pdf
PWC https://paperswithcode.com/paper/learning-to-warm-start-bayesian
Repo
Framework

Deep Relaxation: partial differential equations for optimizing deep neural networks

Title Deep Relaxation: partial differential equations for optimizing deep neural networks
Authors Pratik Chaudhari, Adam Oberman, Stanley Osher, Stefano Soatto, Guillaume Carlier
Abstract In this paper we establish a connection between non-convex optimization methods for training deep neural networks and nonlinear partial differential equations (PDEs). Relaxation techniques arising in statistical physics which have already been used successfully in this context are reinterpreted as solutions of a viscous Hamilton-Jacobi PDE. Using a stochastic control interpretation allows we prove that the modified algorithm performs better in expectation that stochastic gradient descent. Well-known PDE regularity results allow us to analyze the geometry of the relaxed energy landscape, confirming empirical evidence. The PDE is derived from a stochastic homogenization problem, which arises in the implementation of the algorithm. The algorithms scale well in practice and can effectively tackle the high dimensionality of modern neural networks.
Tasks
Published 2017-04-17
URL http://arxiv.org/abs/1704.04932v2
PDF http://arxiv.org/pdf/1704.04932v2.pdf
PWC https://paperswithcode.com/paper/deep-relaxation-partial-differential
Repo
Framework

Phase Transitions in the Pooled Data Problem

Title Phase Transitions in the Pooled Data Problem
Authors Jonathan Scarlett, Volkan Cevher
Abstract In this paper, we study the pooled data problem of identifying the labels associated with a large collection of items, based on a sequence of pooled tests revealing the counts of each label within the pool. In the noiseless setting, we identify an exact asymptotic threshold on the required number of tests with optimal decoding, and prove a phase transition between complete success and complete failure. In addition, we present a novel noisy variation of the problem, and provide an information-theoretic framework for characterizing the required number of tests for general random noise models. Our results reveal that noise can make the problem considerably more difficult, with strict increases in the scaling laws even at low noise levels. Finally, we demonstrate similar behavior in an approximate recovery setting, where a given number of errors is allowed in the decoded labels.
Tasks
Published 2017-10-18
URL http://arxiv.org/abs/1710.06766v1
PDF http://arxiv.org/pdf/1710.06766v1.pdf
PWC https://paperswithcode.com/paper/phase-transitions-in-the-pooled-data-problem
Repo
Framework

Bayesian Inference of Log Determinants

Title Bayesian Inference of Log Determinants
Authors Jack Fitzsimons, Kurt Cutajar, Michael Osborne, Stephen Roberts, Maurizio Filippone
Abstract The log-determinant of a kernel matrix appears in a variety of machine learning problems, ranging from determinantal point processes and generalized Markov random fields, through to the training of Gaussian processes. Exact calculation of this term is often intractable when the size of the kernel matrix exceeds a few thousand. In the spirit of probabilistic numerics, we reinterpret the problem of computing the log-determinant as a Bayesian inference problem. In particular, we combine prior knowledge in the form of bounds from matrix theory and evidence derived from stochastic trace estimation to obtain probabilistic estimates for the log-determinant and its associated uncertainty within a given computational budget. Beyond its novelty and theoretic appeal, the performance of our proposal is competitive with state-of-the-art approaches to approximating the log-determinant, while also quantifying the uncertainty due to budget-constrained evidence.
Tasks Bayesian Inference, Gaussian Processes, Point Processes
Published 2017-04-05
URL http://arxiv.org/abs/1704.01445v1
PDF http://arxiv.org/pdf/1704.01445v1.pdf
PWC https://paperswithcode.com/paper/bayesian-inference-of-log-determinants
Repo
Framework

Refining Source Representations with Relation Networks for Neural Machine Translation

Title Refining Source Representations with Relation Networks for Neural Machine Translation
Authors Wen Zhang, Jiawei Hu, Yang Feng, Qun Liu
Abstract Although neural machine translation (NMT) with the encoder-decoder framework has achieved great success in recent times, it still suffers from some drawbacks: RNNs tend to forget old information which is often useful and the encoder only operates through words without considering word relationship. To solve these problems, we introduce a relation networks (RN) into NMT to refine the encoding representations of the source. In our method, the RN first augments the representation of each source word with its neighbors and reasons all the possible pairwise relations between them. Then the source representations and all the relations are fed to the attention module and the decoder together, keeping the main encoder-decoder architecture unchanged. Experiments on two Chinese-to-English data sets in different scales both show that our method can outperform the competitive baselines significantly.
Tasks Machine Translation
Published 2017-09-12
URL http://arxiv.org/abs/1709.03980v3
PDF http://arxiv.org/pdf/1709.03980v3.pdf
PWC https://paperswithcode.com/paper/refining-source-representations-with-relation
Repo
Framework

Exact Blur Measure Outperforms Conventional Learned Features for Depth Finding

Title Exact Blur Measure Outperforms Conventional Learned Features for Depth Finding
Authors Akbar Saadat
Abstract Image analysis methods that are based on exact blur values are faced with the computational complexities due to blur measurement error. This atmosphere encourages scholars to look for handcrafted and learned features for finding depth from a single image. This paper introduces a novel exact realization for blur measures on digital images and implements it on a new measure of defocus Gaussian blur at edge points in Depth From Defocus (DFD) methods with the potential to change this atmosphere. The experiments on real images indicate superiority of the proposed measure in error performance over conventional learned features in the state-of the-art single image based depth estimation methods.
Tasks Depth Estimation
Published 2017-08-31
URL http://arxiv.org/abs/1709.00072v1
PDF http://arxiv.org/pdf/1709.00072v1.pdf
PWC https://paperswithcode.com/paper/exact-blur-measure-outperforms-conventional
Repo
Framework
Title Comparison Based Nearest Neighbor Search
Authors Siavash Haghiri, Debarghya Ghoshdastidar, Ulrike von Luxburg
Abstract We consider machine learning in a comparison-based setting where we are given a set of points in a metric space, but we have no access to the actual distances between the points. Instead, we can only ask an oracle whether the distance between two points $i$ and $j$ is smaller than the distance between the points $i$ and $k$. We are concerned with data structures and algorithms to find nearest neighbors based on such comparisons. We focus on a simple yet effective algorithm that recursively splits the space by first selecting two random pivot points and then assigning all other points to the closer of the two (comparison tree). We prove that if the metric space satisfies certain expansion conditions, then with high probability the height of the comparison tree is logarithmic in the number of points, leading to efficient search performance. We also provide an upper bound for the failure probability to return the true nearest neighbor. Experiments show that the comparison tree is competitive with algorithms that have access to the actual distance values, and needs less triplet comparisons than other competitors.
Tasks
Published 2017-04-05
URL http://arxiv.org/abs/1704.01460v1
PDF http://arxiv.org/pdf/1704.01460v1.pdf
PWC https://paperswithcode.com/paper/comparison-based-nearest-neighbor-search
Repo
Framework

Shape recognition of volcanic ash by simple convolutional neural network

Title Shape recognition of volcanic ash by simple convolutional neural network
Authors Daigo Shoji, Rina Noguchi
Abstract Shape analyses of tephra grains result in understanding eruption mechanism of volcanoes. However, we have to define and select parameter set such as convexity for the precise discrimination of tephra grains. Selection of the best parameter set for the recognition of tephra shapes is complicated. Actually, many shape parameters have been suggested. Recently, neural network has made a great success in the field of machine learning. Convolutional neural network can recognize the shape of images without human bias and shape parameters. We applied the simple convolutional neural network developed for the handwritten digits to the recognition of tephra shapes. The network was trained by Morphologi tephra images, and it can recognize the tephra shapes with approximately 90% of accuracy.
Tasks
Published 2017-06-22
URL http://arxiv.org/abs/1706.07178v1
PDF http://arxiv.org/pdf/1706.07178v1.pdf
PWC https://paperswithcode.com/paper/shape-recognition-of-volcanic-ash-by-simple
Repo
Framework

Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis

Title Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis
Authors Maxim Raginsky, Alexander Rakhlin, Matus Telgarsky
Abstract Stochastic Gradient Langevin Dynamics (SGLD) is a popular variant of Stochastic Gradient Descent, where properly scaled isotropic Gaussian noise is added to an unbiased estimate of the gradient at each iteration. This modest change allows SGLD to escape local minima and suffices to guarantee asymptotic convergence to global minimizers for sufficiently regular non-convex objectives (Gelfand and Mitter, 1991). The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks. As in the asymptotic setting, our analysis relates the discrete-time SGLD Markov chain to a continuous-time diffusion process. A new tool that drives the results is the use of weighted transportation cost inequalities to quantify the rate of convergence of SGLD to a stationary distribution in the Euclidean $2$-Wasserstein distance.
Tasks
Published 2017-02-13
URL http://arxiv.org/abs/1702.03849v3
PDF http://arxiv.org/pdf/1702.03849v3.pdf
PWC https://paperswithcode.com/paper/non-convex-learning-via-stochastic-gradient
Repo
Framework

An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits

Title An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits
Authors Yevgeny Seldin, Gábor Lugosi
Abstract We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(\ln t)^3$ to $(\ln t)^2$ and eliminates an additive factor of order $\Delta e^{1/\Delta^2}$, where $\Delta$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.
Tasks
Published 2017-02-20
URL http://arxiv.org/abs/1702.06103v2
PDF http://arxiv.org/pdf/1702.06103v2.pdf
PWC https://paperswithcode.com/paper/an-improved-parametrization-and-analysis-of
Repo
Framework

Spectral Mixture Kernels for Multi-Output Gaussian Processes

Title Spectral Mixture Kernels for Multi-Output Gaussian Processes
Authors Gabriel Parra, Felipe Tobar
Abstract Early approaches to multiple-output Gaussian processes (MOGPs) relied on linear combinations of independent, latent, single-output Gaussian processes (GPs). This resulted in cross-covariance functions with limited parametric interpretation, thus conflicting with the ability of single-output GPs to understand lengthscales, frequencies and magnitudes to name a few. On the contrary, current approaches to MOGP are able to better interpret the relationship between different channels by directly modelling the cross-covariances as a spectral mixture kernel with a phase shift. We extend this rationale and propose a parametric family of complex-valued cross-spectral densities and then build on Cram'er’s Theorem (the multivariate version of Bochner’s Theorem) to provide a principled approach to design multivariate covariance functions. The so-constructed kernels are able to model delays among channels in addition to phase differences and are thus more expressive than previous methods, while also providing full parametric interpretation of the relationship across channels. The proposed method is first validated on synthetic data and then compared to existing MOGP methods on two real-world examples.
Tasks Gaussian Processes
Published 2017-09-05
URL http://arxiv.org/abs/1709.01298v2
PDF http://arxiv.org/pdf/1709.01298v2.pdf
PWC https://paperswithcode.com/paper/spectral-mixture-kernels-for-multi-output
Repo
Framework

Landmark Diffusion Maps (L-dMaps): Accelerated manifold learning out-of-sample extension

Title Landmark Diffusion Maps (L-dMaps): Accelerated manifold learning out-of-sample extension
Authors Andrew W. Long, Andrew L. Ferguson
Abstract Diffusion maps are a nonlinear manifold learning technique based on harmonic analysis of a diffusion process over the data. Out-of-sample extensions with computational complexity $\mathcal{O}(N)$, where $N$ is the number of points comprising the manifold, frustrate applications to online learning applications requiring rapid embedding of high-dimensional data streams. We propose landmark diffusion maps (L-dMaps) to reduce the complexity to $\mathcal{O}(M)$, where $M \ll N$ is the number of landmark points selected using pruned spanning trees or k-medoids. Offering $(N/M)$ speedups in out-of-sample extension, L-dMaps enables the application of diffusion maps to high-volume and/or high-velocity streaming data. We illustrate our approach on three datasets: the Swiss roll, molecular simulations of a C$_{24}$H$_{50}$ polymer chain, and biomolecular simulations of alanine dipeptide. We demonstrate up to 50-fold speedups in out-of-sample extension for the molecular systems with less than 4% errors in manifold reconstruction fidelity relative to calculations over the full dataset.
Tasks
Published 2017-06-28
URL http://arxiv.org/abs/1706.09396v1
PDF http://arxiv.org/pdf/1706.09396v1.pdf
PWC https://paperswithcode.com/paper/landmark-diffusion-maps-l-dmaps-accelerated
Repo
Framework

Inverse Ising problem in continuous time: A latent variable approach

Title Inverse Ising problem in continuous time: A latent variable approach
Authors Christian Donner, Manfred Opper
Abstract We consider the inverse Ising problem, i.e. the inference of network couplings from observed spin trajectories for a model with continuous time Glauber dynamics. By introducing two sets of auxiliary latent random variables we render the likelihood into a form, which allows for simple iterative inference algorithms with analytical updates. The variables are: (1) Poisson variables to linearise an exponential term which is typical for point process likelihoods and (2) P'olya-Gamma variables, which make the likelihood quadratic in the coupling parameters. Using the augmented likelihood, we derive an expectation-maximization (EM) algorithm to obtain the maximum likelihood estimate of network parameters. Using a third set of latent variables we extend the EM algorithm to sparse couplings via L1 regularization. Finally, we develop an efficient approximate Bayesian inference algorithm using a variational approach. We demonstrate the performance of our algorithms on data simulated from an Ising model. For data which are simulated from a more biologically plausible network with spiking neurons, we show that the Ising model captures well the low order statistics of the data and how the Ising couplings are related to the underlying synaptic structure of the simulated network.
Tasks Bayesian Inference
Published 2017-09-04
URL http://arxiv.org/abs/1709.04495v3
PDF http://arxiv.org/pdf/1709.04495v3.pdf
PWC https://paperswithcode.com/paper/inverse-ising-problem-in-continuous-time-a
Repo
Framework

Logistic Regression as Soft Perceptron Learning

Title Logistic Regression as Soft Perceptron Learning
Authors Raul Rojas
Abstract We comment on the fact that gradient ascent for logistic regression has a connection with the perceptron learning algorithm. Logistic learning is the “soft” variant of perceptron learning.
Tasks
Published 2017-08-24
URL http://arxiv.org/abs/1708.07826v1
PDF http://arxiv.org/pdf/1708.07826v1.pdf
PWC https://paperswithcode.com/paper/logistic-regression-as-soft-perceptron
Repo
Framework

Hierarchical semantic segmentation using modular convolutional neural networks

Title Hierarchical semantic segmentation using modular convolutional neural networks
Authors Sagi Eppel
Abstract Image recognition tasks that involve identifying parts of an object or the contents of a vessel can be viewed as a hierarchical problem, which can be solved by initial recognition of the main object, followed by recognition of its parts or contents. To achieve such modular recognition, it is necessary to use the output of one recognition method (which identifies the general object) as the input for a second method (which identifies the parts or contents). In recent years, convolutional neural networks have emerged as the dominant method for segmentation and classification of images. This work examines a method for serially connecting convolutional neural networks for semantic segmentation of materials inside transparent vessels. It applies one fully convolutional neural net to segment the image into vessel and background, and the vessel region is used as an input for a second net which recognizes the contents of the glass vessel. Transferring the segmentation map generated by the first nets to the second net was performed using the valve filter attention method that involves using different filters on different segments of the image. This modular semantic segmentation method outperforms a single step method in which both the vessel and its contents are identified using a single net. An advantage of the modular neural net is that it allows networks to be built from existing trained modules, as well the transfer and reuse of trained net modules without the need for any retraining of the assembled net.
Tasks Semantic Segmentation
Published 2017-10-14
URL http://arxiv.org/abs/1710.05126v1
PDF http://arxiv.org/pdf/1710.05126v1.pdf
PWC https://paperswithcode.com/paper/hierarchical-semantic-segmentation-using
Repo
Framework
comments powered by Disqus