Paper Group ANR 284
Actively Learning Hemimetrics with Applications to Eliciting User Preferences. Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent. Augmenting Supervised Neural Networks with Unsupervised Objectives for Large-scale Image Classification. Active Uncertainty Calibration in Bayesian ODE Solvers. Manifold Approximatio …
Actively Learning Hemimetrics with Applications to Eliciting User Preferences
Title | Actively Learning Hemimetrics with Applications to Eliciting User Preferences |
Authors | Adish Singla, Sebastian Tschiatschek, Andreas Krause |
Abstract | Motivated by an application of eliciting users’ preferences, we investigate the problem of learning hemimetrics, i.e., pairwise distances among a set of $n$ items that satisfy triangle inequalities and non-negativity constraints. In our application, the (asymmetric) distances quantify private costs a user incurs when substituting one item by another. We aim to learn these distances (costs) by asking the users whether they are willing to switch from one item to another for a given incentive offer. Without exploiting structural constraints of the hemimetric polytope, learning the distances between each pair of items requires $\Theta(n^2)$ queries. We propose an active learning algorithm that substantially reduces this sample complexity by exploiting the structural constraints on the version space of hemimetrics. Our proposed algorithm achieves provably-optimal sample complexity for various instances of the task. For example, when the items are embedded into $K$ tight clusters, the sample complexity of our algorithm reduces to $O(n K)$. Extensive experiments on a restaurant recommendation data set support the conclusions of our theoretical analysis. |
Tasks | Active Learning |
Published | 2016-05-23 |
URL | http://arxiv.org/abs/1605.07144v2 |
http://arxiv.org/pdf/1605.07144v2.pdf | |
PWC | https://paperswithcode.com/paper/actively-learning-hemimetrics-with |
Repo | |
Framework | |
Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent
Title | Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent |
Authors | Linnan Wang, Yi Yang, Martin Renqiang Min, Srimat Chakradhar |
Abstract | SGD is the widely adopted method to train CNN. Conceptually it approximates the population with a randomly sampled batch; then it evenly trains batches by conducting a gradient update on every batch in an epoch. In this paper, we demonstrate Sampling Bias, Intrinsic Image Difference and Fixed Cycle Pseudo Random Sampling differentiate batches in training, which then affect learning speeds on them. Because of this, the unbiased treatment of batches involved in SGD creates improper load balancing. To address this issue, we present Inconsistent Stochastic Gradient Descent (ISGD) to dynamically vary training effort according to learning statuses on batches. Specifically ISGD leverages techniques in Statistical Process Control to identify a undertrained batch. Once a batch is undertrained, ISGD solves a new subproblem, a chasing logic plus a conservative constraint, to accelerate the training on the batch while avoid drastic parameter changes. Extensive experiments on a variety of datasets demonstrate ISGD converges faster than SGD. In training AlexNet, ISGD is 21.05% faster than SGD to reach 56% top1 accuracy under the exactly same experiment setup. We also extend ISGD to work on multiGPU or heterogeneous distributed system based on data parallelism, enabling the batch size to be the key to scalability. Then we present the study of ISGD batch size to the learning rate, parallelism, synchronization cost, system saturation and scalability. We conclude the optimal ISGD batch size is machine dependent. Various experiments on a multiGPU system validate our claim. In particular, ISGD trains AlexNet to 56.3% top1 and 80.1% top5 accuracy in 11.5 hours with 4 NVIDIA TITAN X at the batch size of 1536. |
Tasks | |
Published | 2016-03-17 |
URL | http://arxiv.org/abs/1603.05544v3 |
http://arxiv.org/pdf/1603.05544v3.pdf | |
PWC | https://paperswithcode.com/paper/accelerating-deep-neural-network-training |
Repo | |
Framework | |
Augmenting Supervised Neural Networks with Unsupervised Objectives for Large-scale Image Classification
Title | Augmenting Supervised Neural Networks with Unsupervised Objectives for Large-scale Image Classification |
Authors | Yuting Zhang, Kibok Lee, Honglak Lee |
Abstract | Unsupervised learning and supervised learning are key research topics in deep learning. However, as high-capacity supervised neural networks trained with a large amount of labels have achieved remarkable success in many computer vision tasks, the availability of large-scale labeled images reduced the significance of unsupervised learning. Inspired by the recent trend toward revisiting the importance of unsupervised learning, we investigate joint supervised and unsupervised learning in a large-scale setting by augmenting existing neural networks with decoding pathways for reconstruction. First, we demonstrate that the intermediate activations of pretrained large-scale classification networks preserve almost all the information of input images except a portion of local spatial details. Then, by end-to-end training of the entire augmented architecture with the reconstructive objective, we show improvement of the network performance for supervised tasks. We evaluate several variants of autoencoders, including the recently proposed “what-where” autoencoder that uses the encoder pooling switches, to study the importance of the architecture design. Taking the 16-layer VGGNet trained under the ImageNet ILSVRC 2012 protocol as a strong baseline for image classification, our methods improve the validation-set accuracy by a noticeable margin. |
Tasks | Image Classification |
Published | 2016-06-21 |
URL | http://arxiv.org/abs/1606.06582v1 |
http://arxiv.org/pdf/1606.06582v1.pdf | |
PWC | https://paperswithcode.com/paper/augmenting-supervised-neural-networks-with |
Repo | |
Framework | |
Active Uncertainty Calibration in Bayesian ODE Solvers
Title | Active Uncertainty Calibration in Bayesian ODE Solvers |
Authors | Hans Kersting, Philipp Hennig |
Abstract | There is resurging interest, in statistics and machine learning, in solvers for ordinary differential equations (ODEs) that return probability measures instead of point estimates. Recently, Conrad et al. introduced a sampling-based class of methods that are ‘well-calibrated’ in a specific sense. But the computational cost of these methods is significantly above that of classic methods. On the other hand, Schober et al. pointed out a precise connection between classic Runge-Kutta ODE solvers and Gaussian filters, which gives only a rough probabilistic calibration, but at negligible cost overhead. By formulating the solution of ODEs as approximate inference in linear Gaussian SDEs, we investigate a range of probabilistic ODE solvers, that bridge the trade-off between computational cost and probabilistic calibration, and identify the inaccurate gradient measurement as the crucial source of uncertainty. We propose the novel filtering-based method Bayesian Quadrature filtering (BQF) which uses Bayesian quadrature to actively learn the imprecision in the gradient measurement by collecting multiple gradient evaluations. |
Tasks | Calibration |
Published | 2016-05-11 |
URL | http://arxiv.org/abs/1605.03364v3 |
http://arxiv.org/pdf/1605.03364v3.pdf | |
PWC | https://paperswithcode.com/paper/active-uncertainty-calibration-in-bayesian |
Repo | |
Framework | |
Manifold Approximation by Moving Least-Squares Projection (MMLS)
Title | Manifold Approximation by Moving Least-Squares Projection (MMLS) |
Authors | Barak Sober, David Levin |
Abstract | In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located “near” the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether. |
Tasks | Dimensionality Reduction |
Published | 2016-06-22 |
URL | https://arxiv.org/abs/1606.07104v7 |
https://arxiv.org/pdf/1606.07104v7.pdf | |
PWC | https://paperswithcode.com/paper/manifold-approximation-by-moving-least |
Repo | |
Framework | |
Hi Detector, What’s Wrong with that Object? Identifying Irregular Object From Images by Modelling the Detection Score Distribution
Title | Hi Detector, What’s Wrong with that Object? Identifying Irregular Object From Images by Modelling the Detection Score Distribution |
Authors | Peng Wang, Lingqiao Liu, Chunhua Shen, Anton van den Hengel, Heng Tao Shen |
Abstract | In this work, we study the challenging problem of identifying the irregular status of objects from images in an “open world” setting, that is, distinguishing the irregular status of an object category from its regular status as well as objects from other categories in the absence of “irregular object” training data. To address this problem, we propose a novel approach by inspecting the distribution of the detection scores at multiple image regions based on the detector trained from the “regular object” and “other objects”. The key observation motivating our approach is that for “regular object” images as well as “other objects” images, the region-level scores follow their own essential patterns in terms of both the score values and the spatial distributions while the detection scores obtained from an “irregular object” image tend to break these patterns. To model this distribution, we propose to use Gaussian Processes (GP) to construct two separate generative models for the case of the “regular object” and the “other objects”. More specifically, we design a new covariance function to simultaneously model the detection score at a single region and the score dependencies at multiple regions. We finally demonstrate the superior performance of our method on a large dataset newly proposed in this paper. |
Tasks | Gaussian Processes |
Published | 2016-02-14 |
URL | http://arxiv.org/abs/1602.04422v1 |
http://arxiv.org/pdf/1602.04422v1.pdf | |
PWC | https://paperswithcode.com/paper/hi-detector-whats-wrong-with-that-object |
Repo | |
Framework | |
Oracle Complexity of Second-Order Methods for Finite-Sum Problems
Title | Oracle Complexity of Second-Order Methods for Finite-Sum Problems |
Authors | Yossi Arjevani, Ohad Shamir |
Abstract | Finite-sum optimization problems are ubiquitous in machine learning, and are commonly solved using first-order methods which rely on gradient computations. Recently, there has been growing interest in \emph{second-order} methods, which rely on both gradients and Hessians. In principle, second-order methods can require much fewer iterations than first-order methods, and hold the promise for more efficient algorithms. Although computing and manipulating Hessians is prohibitive for high-dimensional problems in general, the Hessians of individual functions in finite-sum problems can often be efficiently computed, e.g. because they possess a low-rank structure. Can second-order information indeed be used to solve such problems more efficiently? In this paper, we provide evidence that the answer – perhaps surprisingly – is negative, at least in terms of worst-case guarantees. However, we also discuss what additional assumptions and algorithmic approaches might potentially circumvent this negative result. |
Tasks | |
Published | 2016-11-15 |
URL | http://arxiv.org/abs/1611.04982v2 |
http://arxiv.org/pdf/1611.04982v2.pdf | |
PWC | https://paperswithcode.com/paper/oracle-complexity-of-second-order-methods-for |
Repo | |
Framework | |
Similarity Registration Problems for 2D/3D Ultrasound Calibration
Title | Similarity Registration Problems for 2D/3D Ultrasound Calibration |
Authors | Francisco Vasconcelos, Donald Peebles, Sebastien Ourselin, Danail Stoyanov |
Abstract | We propose a minimal solution for the similarity registration (rigid pose and scale) between two sets of 3D lines, and also between a set of co-planar points and a set of 3D lines. The first problem is solved up to 8 discrete solutions with a minimum of 2 line-line correspondences, while the second is solved up to 4 discrete solutions using 4 point-line correspondences. We use these algorithms to perform the extrinsic calibration between a pose tracking sensor and a 2D/3D ultrasound (US) curvilinear probe using a tracked needle as calibration target. The needle is tracked as a 3D line, and is scanned by the ultrasound as either a 3D line (3D US) or as a 2D point (2D US). Since the scale factor that converts US scan units to metric coordinates is unknown, the calibration is formulated as a similarity registration problem. We present results with both synthetic and real data and show that the minimum solutions outperform the correspondent non-minimal linear formulations. |
Tasks | Calibration, Pose Tracking |
Published | 2016-07-31 |
URL | http://arxiv.org/abs/1608.00247v1 |
http://arxiv.org/pdf/1608.00247v1.pdf | |
PWC | https://paperswithcode.com/paper/similarity-registration-problems-for-2d3d |
Repo | |
Framework | |
Bayesian inference in hierarchical models by combining independent posteriors
Title | Bayesian inference in hierarchical models by combining independent posteriors |
Authors | Ritabrata Dutta, Paul Blomstedt, Samuel Kaski |
Abstract | Hierarchical models are versatile tools for joint modeling of data sets arising from different, but related, sources. Fully Bayesian inference may, however, become computationally prohibitive if the source-specific data models are complex, or if the number of sources is very large. To facilitate computation, we propose an approach, where inference is first made independently for the parameters of each data set, whereupon the obtained posterior samples are used as observed data in a substitute hierarchical model, based on a scaled likelihood function. Compared to direct inference in a full hierarchical model, the approach has the advantage of being able to speed up convergence by breaking down the initial large inference problem into smaller individual subproblems with better convergence properties. Moreover it enables parallel processing of the possibly complex inferences of the source-specific parameters, which may otherwise create a computational bottleneck if processed jointly as part of a hierarchical model. The approach is illustrated with both simulated and real data. |
Tasks | Bayesian Inference |
Published | 2016-03-30 |
URL | http://arxiv.org/abs/1603.09272v2 |
http://arxiv.org/pdf/1603.09272v2.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-inference-in-hierarchical-models-by |
Repo | |
Framework | |
A new algorithm for identity verification based on the analysis of a handwritten dynamic signature
Title | A new algorithm for identity verification based on the analysis of a handwritten dynamic signature |
Authors | Krzysztof Cpalka, Marcin Zalasinski, Leszek Rutkowski |
Abstract | Identity verification based on authenticity assessment of a handwritten signature is an important issue in biometrics. There are many effective methods for signature verification taking into account dynamics of a signing process. Methods based on partitioning take a very important place among them. In this paper we propose a new approach to signature partitioning. Its most important feature is the possibility of selecting and processing of hybrid partitions in order to increase a precision of the test signature analysis. Partitions are formed by a combination of vertical and horizontal sections of the signature. Vertical sections correspond to the initial, middle, and final time moments of the signing process. In turn, horizontal sections correspond to the signature areas associated with high and low pen velocity and high and low pen pressure on the surface of a graphics tablet. Our previous research on vertical and horizontal sections of the dynamic signature (created independently) led us to develop the algorithm presented in this paper. Selection of sections, among others, allows us to define the stability of the signing process in the partitions, promoting signature areas of greater stability (and vice versa). In the test of the proposed method two databases were used: public MCYT-100 and paid BioSecure. |
Tasks | |
Published | 2016-10-05 |
URL | http://arxiv.org/abs/1610.01578v1 |
http://arxiv.org/pdf/1610.01578v1.pdf | |
PWC | https://paperswithcode.com/paper/a-new-algorithm-for-identity-verification |
Repo | |
Framework | |
Understanding and Exploiting Object Interaction Landscapes
Title | Understanding and Exploiting Object Interaction Landscapes |
Authors | Sören Pirk, Vojtech Krs, Kaimo Hu, Suren Deepak Rajasekaran, Hao Kang, Bedrich Benes, Yusuke Yoshiyasu, Leonidas J. Guibas |
Abstract | Interactions play a key role in understanding objects and scenes, for both virtual and real world agents. We introduce a new general representation for proximal interactions among physical objects that is agnostic to the type of objects or interaction involved. The representation is based on tracking particles on one of the participating objects and then observing them with sensors appropriately placed in the interaction volume or on the interaction surfaces. We show how to factorize these interaction descriptors and project them into a particular participating object so as to obtain a new functional descriptor for that object, its interaction landscape, capturing its observed use in a spatio-temporal framework. Interaction landscapes are independent of the particular interaction and capture subtle dynamic effects in how objects move and behave when in functional use. Our method relates objects based on their function, establishes correspondences between shapes based on functional key points and regions, and retrieves peer and partner objects with respect to an interaction. |
Tasks | |
Published | 2016-09-27 |
URL | http://arxiv.org/abs/1609.08685v2 |
http://arxiv.org/pdf/1609.08685v2.pdf | |
PWC | https://paperswithcode.com/paper/understanding-and-exploiting-object |
Repo | |
Framework | |
Sample Efficient Optimization for Learning Controllers for Bipedal Locomotion
Title | Sample Efficient Optimization for Learning Controllers for Bipedal Locomotion |
Authors | Rika Antonova, Akshara Rai, Christopher G. Atkeson |
Abstract | Learning policies for bipedal locomotion can be difficult, as experiments are expensive and simulation does not usually transfer well to hardware. To counter this, we need al- gorithms that are sample efficient and inherently safe. Bayesian Optimization is a powerful sample-efficient tool for optimizing non-convex black-box functions. However, its performance can degrade in higher dimensions. We develop a distance metric for bipedal locomotion that enhances the sample-efficiency of Bayesian Optimization and use it to train a 16 dimensional neuromuscular model for planar walking. This distance metric reflects some basic gait features of healthy walking and helps us quickly eliminate a majority of unstable controllers. With our approach we can learn policies for walking in less than 100 trials for a range of challenging settings. In simulation, we show results on two different costs and on various terrains including rough ground and ramps, sloping upwards and downwards. We also perturb our models with unknown inertial disturbances analogous with differences between simulation and hardware. These results are promising, as they indicate that this method can potentially be used to learn control policies on hardware. |
Tasks | |
Published | 2016-10-15 |
URL | http://arxiv.org/abs/1610.04795v1 |
http://arxiv.org/pdf/1610.04795v1.pdf | |
PWC | https://paperswithcode.com/paper/sample-efficient-optimization-for-learning |
Repo | |
Framework | |
A new kernel-based approach to system identification with quantized output data
Title | A new kernel-based approach to system identification with quantized output data |
Authors | Giulio Bottegal, Håkan Hjalmarsson, Gianluigi Pillonetto |
Abstract | In this paper we introduce a novel method for linear system identification with quantized output data. We model the impulse response as a zero-mean Gaussian process whose covariance (kernel) is given by the recently proposed stable spline kernel, which encodes information on regularity and exponential stability. This serves as a starting point to cast our system identification problem into a Bayesian framework. We employ Markov Chain Monte Carlo methods to provide an estimate of the system. In particular, we design two methods based on the so-called Gibbs sampler that allow also to estimate the kernel hyperparameters by marginal likelihood maximization via the expectation-maximization method. Numerical simulations show the effectiveness of the proposed scheme, as compared to the state-of-the-art kernel-based methods when these are employed in system identification with quantized data. |
Tasks | |
Published | 2016-10-03 |
URL | http://arxiv.org/abs/1610.00470v2 |
http://arxiv.org/pdf/1610.00470v2.pdf | |
PWC | https://paperswithcode.com/paper/a-new-kernel-based-approach-to-system |
Repo | |
Framework | |
Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction
Title | Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction |
Authors | Yichun Yin, Furu Wei, Li Dong, Kaimeng Xu, Ming Zhang, Ming Zhou |
Abstract | In this paper, we develop a novel approach to aspect term extraction based on unsupervised learning of distributed representations of words and dependency paths. The basic idea is to connect two words (w1 and w2) with the dependency path (r) between them in the embedding space. Specifically, our method optimizes the objective w1 + r = w2 in the low-dimensional space, where the multi-hop dependency paths are treated as a sequence of grammatical relations and modeled by a recurrent neural network. Then, we design the embedding features that consider linear context and dependency context information, for the conditional random field (CRF) based aspect term extraction. Experimental results on the SemEval datasets show that, (1) with only embedding features, we can achieve state-of-the-art results; (2) our embedding method which incorporates the syntactic information among words yields better performance than other representative ones in aspect term extraction. |
Tasks | |
Published | 2016-05-25 |
URL | http://arxiv.org/abs/1605.07843v1 |
http://arxiv.org/pdf/1605.07843v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-word-and-dependency-path |
Repo | |
Framework | |
Multi-task Recurrent Model for True Multilingual Speech Recognition
Title | Multi-task Recurrent Model for True Multilingual Speech Recognition |
Authors | Zhiyuan Tang, Lantian Li, Dong Wang |
Abstract | Research on multilingual speech recognition remains attractive yet challenging. Recent studies focus on learning shared structures under the multi-task paradigm, in particular a feature sharing structure. This approach has been found effective to improve performance on each individual language. However, this approach is only useful when the deployed system supports just one language. In a true multilingual scenario where multiple languages are allowed, performance will be significantly reduced due to the competition among languages in the decoding space. This paper presents a multi-task recurrent model that involves a multilingual speech recognition (ASR) component and a language recognition (LR) component, and the ASR component is informed of the language information by the LR component, leading to a language-aware recognition. We tested the approach on an English-Chinese bilingual recognition task. The results show that the proposed multi-task recurrent model can improve performance of multilingual recognition systems. |
Tasks | Speech Recognition |
Published | 2016-09-27 |
URL | http://arxiv.org/abs/1609.08337v1 |
http://arxiv.org/pdf/1609.08337v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-recurrent-model-for-true |
Repo | |
Framework | |