May 5, 2019

3403 words 16 mins read

Paper Group ANR 488

Hierarchical image simplification and segmentation based on Mumford-Shah-salient level line selection. MetaGrad: Multiple Learning Rates in Online Learning. Chi-squared Amplification: Identifying Hidden Hubs. Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets. Validation, comparison, and combination of algorithms for automatic detec …

Hierarchical image simplification and segmentation based on Mumford-Shah-salient level line selection


Title	Hierarchical image simplification and segmentation based on Mumford-Shah-salient level line selection
Authors	Yongchao Xu, Thierry Géraud, Laurent Najman
Abstract	Hierarchies, such as the tree of shapes, are popular representations for image simplification and segmentation thanks to their multiscale structures. Selecting meaningful level lines (boundaries of shapes) yields to simplify image while preserving intact salient structures. Many image simplification and segmentation methods are driven by the optimization of an energy functional, for instance the celebrated Mumford-Shah functional. In this paper, we propose an efficient approach to hierarchical image simplification and segmentation based on the minimization of the piecewise-constant Mumford-Shah functional. This method conforms to the current trend that consists in producing hierarchical results rather than a unique partition. Contrary to classical approaches which compute optimal hierarchical segmentations from an input hierarchy of segmentations, we rely on the tree of shapes, a unique and well-defined representation equivalent to the image. Simply put, we compute for each level line of the image an attribute function that characterizes its persistence under the energy minimization. Then we stack the level lines from meaningless ones to salient ones through a saliency map based on extinction values defined on the tree-based shape space. Qualitative illustrations and quantitative evaluation on Weizmann segmentation evaluation database demonstrate the state-of-the-art performance of our method.
Tasks
Published	2016-03-15
URL	http://arxiv.org/abs/1603.04838v2
PDF	http://arxiv.org/pdf/1603.04838v2.pdf
PWC	https://paperswithcode.com/paper/hierarchical-image-simplification-and
Repo
Framework

MetaGrad: Multiple Learning Rates in Online Learning


Title	MetaGrad: Multiple Learning Rates in Online Learning
Authors	Tim van Erven, Wouter M. Koolen
Abstract	In online convex optimization it is well known that certain subclasses of objective functions are much easier than arbitrary convex functions. We are interested in designing adaptive methods that can automatically get fast rates in as many such subclasses as possible, without any manual tuning. Previous adaptive methods are able to interpolate between strongly convex and general convex functions. We present a new method, MetaGrad, that adapts to a much broader class of functions, including exp-concave and strongly convex functions, but also various types of stochastic and non-stochastic functions without any curvature. For instance, MetaGrad can achieve logarithmic regret on the unregularized hinge loss, even though it has no curvature, if the data come from a favourable probability distribution. MetaGrad’s main feature is that it simultaneously considers multiple learning rates. Unlike previous methods with provable regret guarantees, however, its learning rates are not monotonically decreasing over time and are not tuned based on a theoretically derived bound on the regret. Instead, they are weighted directly proportional to their empirical performance on the data using a tilted exponential weights master algorithm.
Tasks
Published	2016-04-29
URL	http://arxiv.org/abs/1604.08740v3
PDF	http://arxiv.org/pdf/1604.08740v3.pdf
PWC	https://paperswithcode.com/paper/metagrad-multiple-learning-rates-in-online
Repo
Framework

Chi-squared Amplification: Identifying Hidden Hubs


Title	Chi-squared Amplification: Identifying Hidden Hubs
Authors	Ravi Kannan, Santosh Vempala
Abstract	We consider the following general hidden hubs model: an $n \times n$ random matrix $A$ with a subset $S$ of $k$ special rows (hubs): entries in rows outside $S$ are generated from the probability distribution $p_0 \sim N(0,\sigma_0^2)$; for each row in $S$, some $k$ of its entries are generated from $p_1 \sim N(0,\sigma_1^2)$, $\sigma_1>\sigma_0$, and the rest of the entries from $p_0$. The problem is to identify the high-degree hubs efficiently. This model includes and significantly generalizes the planted Gaussian Submatrix Model, where the special entries are all in a $k \times k$ submatrix. There are two well-known barriers: if $k\geq c\sqrt{n\ln n}$, just the row sums are sufficient to find $S$ in the general model. For the submatrix problem, this can be improved by a $\sqrt{\ln n}$ factor to $k \ge c\sqrt{n}$ by spectral methods or combinatorial methods. In the variant with $p_0=\pm 1$ (with probability $1/2$ each) and $p_1\equiv 1$, neither barrier has been broken. We give a polynomial-time algorithm to identify all the hidden hubs with high probability for $k \ge n^{0.5-\delta}$ for some $\delta >0$, when $\sigma_1^2>2\sigma_0^2$. The algorithm extends to the setting where planted entries might have different variances each at least as large as $\sigma_1^2$. We also show a nearly matching lower bound: for $\sigma_1^2 \le 2\sigma_0^2$, there is no polynomial-time Statistical Query algorithm for distinguishing between a matrix whose entries are all from $N(0,\sigma_0^2)$ and a matrix with $k=n^{0.5-\delta}$ hidden hubs for any $\delta >0$. The lower bound as well as the algorithm are related to whether the chi-squared distance of the two distributions diverges. At the critical value $\sigma_1^2=2\sigma_0^2$, we show that the general hidden hubs problem can be solved for $k\geq c\sqrt n(\ln n)^{1/4}$, improving on the naive row sum-based method.
Tasks
Published	2016-08-12
URL	http://arxiv.org/abs/1608.03643v2
PDF	http://arxiv.org/pdf/1608.03643v2.pdf
PWC	https://paperswithcode.com/paper/chi-squared-amplification-identifying-hidden
Repo
Framework

Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets


Title	Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets
Authors	Michael J. Anderson, Mihai Capotă, Javier S. Turek, Xia Zhu, Theodore L. Willke, Yida Wang, Po-Hsuan Chen, Jeremy R. Manning, Peter J. Ramadge, Kenneth A. Norman
Abstract	The scale of functional magnetic resonance image data is rapidly increasing as large multi-subject datasets are becoming widely available and high-resolution scanners are adopted. The inherent low-dimensionality of the information in this data has led neuroscientists to consider factor analysis methods to extract and analyze the underlying brain activity. In this work, we consider two recent multi-subject factor analysis methods: the Shared Response Model and Hierarchical Topographic Factor Analysis. We perform analytical, algorithmic, and code optimization to enable multi-node parallel implementations to scale. Single-node improvements result in 99x and 1812x speedups on these two methods, and enables the processing of larger datasets. Our distributed implementations show strong scaling of 3.3x and 5.5x respectively with 20 nodes on real datasets. We also demonstrate weak scaling on a synthetic dataset with 1024 subjects, on up to 1024 nodes and 32,768 cores.
Tasks
Published	2016-08-16
URL	http://arxiv.org/abs/1608.04647v2
PDF	http://arxiv.org/pdf/1608.04647v2.pdf
PWC	https://paperswithcode.com/paper/enabling-factor-analysis-on-thousand-subject
Repo
Framework

Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge


Title	Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge
Authors	Arnaud Arindra Adiyoso Setio, Alberto Traverso, Thomas de Bel, Moira S. N. Berens, Cas van den Bogaard, Piergiorgio Cerello, Hao Chen, Qi Dou, Maria Evelina Fantacci, Bram Geurts, Robbert van der Gugten, Pheng Ann Heng, Bart Jansen, Michael M. J. de Kaste, Valentin Kotov, Jack Yu-Hung Lin, Jeroen T. M. C. Manders, Alexander Sónora-Mengana, Juan Carlos García-Naranjo, Evgenia Papavasileiou, Mathias Prokop, Marco Saletta, Cornelia M Schaefer-Prokop, Ernst T. Scholten, Luuk Scholten, Miranda M. Snoeren, Ernesto Lopez Torres, Jef Vandemeulebroucke, Nicole Walasek, Guido C. A. Zuidhof, Bram van Ginneken, Colin Jacobs
Abstract	Automatic detection of pulmonary nodules in thoracic computed tomography (CT) scans has been an active area of research for the last two decades. However, there have only been few studies that provide a comparative performance evaluation of different systems on a common database. We have therefore set up the LUNA16 challenge, an objective evaluation framework for automatic nodule detection algorithms using the largest publicly available reference database of chest CT scans, the LIDC-IDRI data set. In LUNA16, participants develop their algorithm and upload their predictions on 888 CT scans in one of the two tracks: 1) the complete nodule detection track where a complete CAD system should be developed, or 2) the false positive reduction track where a provided set of nodule candidates should be classified. This paper describes the setup of LUNA16 and presents the results of the challenge so far. Moreover, the impact of combining individual systems on the detection performance was also investigated. It was observed that the leading solutions employed convolutional networks and used the provided set of nodule candidates. The combination of these solutions achieved an excellent sensitivity of over 95% at fewer than 1.0 false positives per scan. This highlights the potential of combining algorithms to improve the detection performance. Our observer study with four expert readers has shown that the best system detects nodules that were missed by expert readers who originally annotated the LIDC-IDRI data. We released this set of additional nodules for further development of CAD systems.
Tasks	Computed Tomography (CT)
Published	2016-12-23
URL	http://arxiv.org/abs/1612.08012v4
PDF	http://arxiv.org/pdf/1612.08012v4.pdf
PWC	https://paperswithcode.com/paper/validation-comparison-and-combination-of
Repo
Framework

Avoiding Wireheading with Value Reinforcement Learning


Title	Avoiding Wireheading with Value Reinforcement Learning
Authors	Tom Everitt, Marcus Hutter
Abstract	How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) is a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward – the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to learn a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent’s actions. The constraint is defined in terms of the agent’s belief distributions, and does not require an explicit specification of which actions constitute wireheading.
Tasks
Published	2016-05-10
URL	http://arxiv.org/abs/1605.03143v1
PDF	http://arxiv.org/pdf/1605.03143v1.pdf
PWC	https://paperswithcode.com/paper/avoiding-wireheading-with-value-reinforcement
Repo
Framework

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability


Title	Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability
Authors	Janis Keuper, Franz-Josef Pfreundt
Abstract	This paper presents a theoretical analysis and practical evaluation of the main bottlenecks towards a scalable distributed solution for the training of Deep Neuronal Networks (DNNs). The presented results show, that the current state of the art approach, using data-parallelized Stochastic Gradient Descent (SGD), is quickly turning into a vastly communication bound problem. In addition, we present simple but fixed theoretic constraints, preventing effective scaling of DNN training beyond only a few dozen nodes. This leads to poor scalability of DNN training in most practical scenarios.
Tasks
Published	2016-09-22
URL	http://arxiv.org/abs/1609.06870v4
PDF	http://arxiv.org/pdf/1609.06870v4.pdf
PWC	https://paperswithcode.com/paper/distributed-training-of-deep-neural-networks
Repo
Framework

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning


Title	Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Authors	Philip S. Thomas, Emma Brunskill
Abstract	In this paper we present a new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy. The ability to evaluate a policy from historical data is important for applications where the deployment of a bad policy can be dangerous or costly. We show empirically that our algorithm produces estimates that often have orders of magnitude lower mean squared error than existing methods—it makes more efficient use of the available data. Our new estimator is based on two advances: an extension of the doubly robust estimator (Jiang and Li, 2015), and a new way to mix between model based estimates and importance sampling based estimates.
Tasks
Published	2016-04-04
URL	http://arxiv.org/abs/1604.00923v1
PDF	http://arxiv.org/pdf/1604.00923v1.pdf
PWC	https://paperswithcode.com/paper/data-efficient-off-policy-policy-evaluation
Repo
Framework

Max-Margin Feature Selection


Title	Max-Margin Feature Selection
Authors	Yamuna Prasad, Dinesh Khandelwal, K. K. Biswas
Abstract	Many machine learning applications such as in vision, biology and social networking deal with data in high dimensions. Feature selection is typically employed to select a subset of features which im- proves generalization accuracy as well as reduces the computational cost of learning the model. One of the criteria used for feature selection is to jointly minimize the redundancy and maximize the rele- vance of the selected features. In this paper, we formulate the task of feature selection as a one class SVM problem in a space where features correspond to the data points and instances correspond to the dimensions. The goal is to look for a representative subset of the features (support vectors) which describes the boundary for the region where the set of the features (data points) exists. This leads to a joint optimization of relevance and redundancy in a principled max-margin framework. Additionally, our formulation enables us to leverage existing techniques for optimizing the SVM objective resulting in highly computationally efficient solutions for the task of feature selection. Specifically, we employ the dual coordinate descent algorithm (Hsieh et al., 2008), originally proposed for SVMs, for our formulation. We use a sparse representation to deal with data in very high dimensions. Experiments on seven publicly available benchmark datasets from a variety of domains show that our approach results in orders of magnitude faster solutions even while retaining the same level of accuracy compared to the state of the art feature selection techniques.
Tasks	Feature Selection
Published	2016-06-14
URL	http://arxiv.org/abs/1606.04506v1
PDF	http://arxiv.org/pdf/1606.04506v1.pdf
PWC	https://paperswithcode.com/paper/max-margin-feature-selection
Repo
Framework

A Theory of Local Matching: SIFT and Beyond


Title	A Theory of Local Matching: SIFT and Beyond
Authors	Hossein Mobahi, Stefano Soatto
Abstract	Why has SIFT been so successful? Why its extension, DSP-SIFT, can further improve SIFT? Is there a theory that can explain both? How can such theory benefit real applications? Can it suggest new algorithms with reduced computational complexity or new descriptors with better accuracy for matching? We construct a general theory of local descriptors for visual matching. Our theory relies on concepts in energy minimization and heat diffusion. We show that SIFT and DSP-SIFT approximate the solution the theory suggests. In particular, DSP-SIFT gives a better approximation to the theoretical solution; justifying why DSP-SIFT outperforms SIFT. Using the developed theory, we derive new descriptors that have fewer parameters and are potentially better in handling affine deformations.
Tasks
Published	2016-01-19
URL	http://arxiv.org/abs/1601.05116v1
PDF	http://arxiv.org/pdf/1601.05116v1.pdf
PWC	https://paperswithcode.com/paper/a-theory-of-local-matching-sift-and-beyond
Repo
Framework

Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs


Title	Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs
Authors	Chao Li, Yi Yang, Min Feng, Srimat Chakradhar, Huiyang Zhou
Abstract	Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve state-of-the-art recognition accuracy. Due to the substantial compute and memory operations, however, they require significant execution time. The massive parallel computing capability of GPUs make them as one of the ideal platforms to accelerate CNNs and a number of GPU-based CNN libraries have been developed. While existing works mainly focus on the computational efficiency of CNNs, the memory efficiency of CNNs have been largely overlooked. Yet CNNs have intricate data structures and their memory behavior can have significant impact on the performance. In this work, we study the memory efficiency of various CNN layers and reveal the performance implication from both data layouts and memory access patterns. Experiments show the universal effect of our proposed optimizations on both single layers and various networks, with up to 27.9x for a single layer and up to 5.6x on the whole networks.
Tasks
Published	2016-10-12
URL	http://arxiv.org/abs/1610.03618v1
PDF	http://arxiv.org/pdf/1610.03618v1.pdf
PWC	https://paperswithcode.com/paper/optimizing-memory-efficiency-for-deep
Repo
Framework

Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters


Title	Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters
Authors	Zeyuan Allen-Zhu, Yang Yuan, Karthik Sridharan
Abstract	The amount of data available in the world is growing faster than our ability to deal with it. However, if we take advantage of the internal \emph{structure}, data may become much smaller for machine learning purposes. In this paper we focus on one of the fundamental machine learning tasks, empirical risk minimization (ERM), and provide faster algorithms with the help from the clustering structure of the data. We introduce a simple notion of raw clustering that can be efficiently computed from the data, and propose two algorithms based on clustering information. Our accelerated algorithm ClusterACDM is built on a novel Haar transformation applied to the dual space of the ERM problem, and our variance-reduction based algorithm ClusterSVRG introduces a new gradient estimator using clustering. Our algorithms outperform their classical counterparts ACDM and SVRG respectively.
Tasks
Published	2016-02-05
URL	http://arxiv.org/abs/1602.02151v2
PDF	http://arxiv.org/pdf/1602.02151v2.pdf
PWC	https://paperswithcode.com/paper/exploiting-the-structure-stochastic-gradient
Repo
Framework

Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures


Title	Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures
Authors	Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart
Abstract	We describe a general technique that yields the first {\em Statistical Query lower bounds} for a range of fundamental high-dimensional learning problems involving Gaussian distributions. Our main results are for the problems of (1) learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of a single unknown Gaussian distribution. For each of these problems, we show a {\em super-polynomial gap} between the (information-theoretic) sample complexity and the computational complexity of {\em any} Statistical Query algorithm for the problem. Our SQ lower bound for Problem (1) is qualitatively matched by known learning algorithms for GMMs. Our lower bound for Problem (2) implies that the accuracy of the robust learning algorithm in~\cite{DiakonikolasKKLMS16} is essentially best possible among all polynomial-time SQ algorithms. Our SQ lower bounds are attained via a unified moment-matching technique that is useful in other contexts and may be of broader interest. Our technique yields nearly-tight lower bounds for a number of related unsupervised estimation problems. Specifically, for the problems of (3) robust covariance estimation in spectral norm, and (4) robust sparse mean estimation, we establish a quadratic {\em statistical–computational tradeoff} for SQ algorithms, matching known upper bounds. Finally, our technique can be used to obtain tight sample complexity lower bounds for high-dimensional {\em testing} problems. Specifically, for the classical problem of robustly {\em testing} an unknown mean (known covariance) Gaussian, our technique implies an information-theoretic sample lower bound that scales {\em linearly} in the dimension. Our sample lower bound matches the sample complexity of the corresponding robust {\em learning} problem and separates the sample complexity of robust testing from standard (non-robust) testing.
Tasks
Published	2016-11-10
URL	http://arxiv.org/abs/1611.03473v2
PDF	http://arxiv.org/pdf/1611.03473v2.pdf
PWC	https://paperswithcode.com/paper/statistical-query-lower-bounds-for-robust
Repo
Framework

A Proposal for Linguistic Similarity Datasets Based on Commonality Lists


Title	A Proposal for Linguistic Similarity Datasets Based on Commonality Lists
Authors	Dmitrijs Milajevs, Sascha Griffiths
Abstract	Similarity is a core notion that is used in psychology and two branches of linguistics: theoretical and computational. The similarity datasets that come from the two fields differ in design: psychological datasets are focused around a certain topic such as fruit names, while linguistic datasets contain words from various categories. The later makes humans assign low similarity scores to the words that have nothing in common and to the words that have contrast in meaning, making similarity scores ambiguous. In this work we discuss the similarity collection procedure for a multi-category dataset that avoids score ambiguity and suggest changes to the evaluation procedure to reflect the insights of psychological literature for word, phrase and sentence similarity. We suggest to ask humans to provide a list of commonalities and differences instead of numerical similarity scores and employ the structure of human judgements beyond pairwise similarity for model evaluation. We believe that the proposed approach will give rise to datasets that test meaning representation models more thoroughly with respect to the human treatment of similarity.
Tasks
Published	2016-05-15
URL	http://arxiv.org/abs/1605.04553v2
PDF	http://arxiv.org/pdf/1605.04553v2.pdf
PWC	https://paperswithcode.com/paper/a-proposal-for-linguistic-similarity-datasets
Repo
Framework

Locally Weighted Ensemble Clustering


Title	Locally Weighted Ensemble Clustering
Authors	Dong Huang, Chang-Dong Wang, Jian-Huang Lai
Abstract	Due to its ability to combine multiple base clusterings into a probably better and more robust clustering, the ensemble clustering technique has been attracting increasing attention in recent years. Despite the significant success, one limitation to most of the existing ensemble clustering methods is that they generally treat all base clusterings equally regardless of their reliability, which makes them vulnerable to low-quality base clusterings. Although some efforts have been made to (globally) evaluate and weight the base clusterings, yet these methods tend to view each base clustering as an individual and neglect the local diversity of clusters inside the same base clustering. It remains an open problem how to evaluate the reliability of clusters and exploit the local diversity in the ensemble to enhance the consensus performance, especially in the case when there is no access to data features or specific assumptions on data distribution. To address this, in this paper, we propose a novel ensemble clustering approach based on ensemble-driven cluster uncertainty estimation and local weighting strategy. In particular, the uncertainty of each cluster is estimated by considering the cluster labels in the entire ensemble via an entropic criterion. A novel ensemble-driven cluster validity measure is introduced, and a locally weighted co-association matrix is presented to serve as a summary for the ensemble of diverse clusters. With the local diversity in ensembles exploited, two novel consensus functions are further proposed. Extensive experiments on a variety of real-world datasets demonstrate the superiority of the proposed approach over the state-of-the-art.
Tasks
Published	2016-05-17
URL	https://arxiv.org/abs/1605.05011v3
PDF	https://arxiv.org/pdf/1605.05011v3.pdf
PWC	https://paperswithcode.com/paper/locally-weighted-ensemble-clustering
Repo
Framework