July 28, 2019

3310 words 16 mins read

Paper Group ANR 423

Horseshoe Regularization for Feature Subset Selection. Sparse High-Dimensional Linear Regression. Algorithmic Barriers and a Local Search Algorithm. AMTnet: Action-Micro-Tube Regression by End-to-end Trainable Deep Architecture. Recurrent Multimodal Interaction for Referring Image Segmentation. Online learnability of Statistical Relational Learning …

Horseshoe Regularization for Feature Subset Selection


Title	Horseshoe Regularization for Feature Subset Selection
Authors	Anindya Bhadra, Jyotishka Datta, Nicholas G. Polson, Brandon Willard
Abstract	Feature subset selection arises in many high-dimensional applications of statistics, such as compressed sensing and genomics. The $\ell_0$ penalty is ideal for this task, the caveat being it requires the NP-hard combinatorial evaluation of all models. A recent area of considerable interest is to develop efficient algorithms to fit models with a non-convex $\ell_\gamma$ penalty for $\gamma\in (0,1)$, which results in sparser models than the convex $\ell_1$ or lasso penalty, but is harder to fit. We propose an alternative, termed the horseshoe regularization penalty for feature subset selection, and demonstrate its theoretical and computational advantages. The distinguishing feature from existing non-convex optimization approaches is a full probabilistic representation of the penalty as the negative of the logarithm of a suitable prior, which in turn enables efficient expectation-maximization and local linear approximation algorithms for optimization and MCMC for uncertainty quantification. In synthetic and real data, the resulting algorithms provide better statistical performance, and the computation requires a fraction of time of state-of-the-art non-convex solvers.
Tasks
Published	2017-02-23
URL	http://arxiv.org/abs/1702.07400v2
PDF	http://arxiv.org/pdf/1702.07400v2.pdf
PWC	https://paperswithcode.com/paper/horseshoe-regularization-for-feature-subset
Repo
Framework

Sparse High-Dimensional Linear Regression. Algorithmic Barriers and a Local Search Algorithm


Title	Sparse High-Dimensional Linear Regression. Algorithmic Barriers and a Local Search Algorithm
Authors	David Gamarnik, Ilias Zadik
Abstract	We consider a sparse high dimensional regression model where the goal is to recover a $k$-sparse unknown vector $\beta^$ from $n$ noisy linear observations of the form $Y=X\beta^+W \in \mathbb{R}^n$ where $X \in \mathbb{R}^{n \times p}$ has iid $N(0,1)$ entries and $W \in \mathbb{R}^n$ has iid $N(0,\sigma^2)$ entries. Under certain assumptions on the parameters, an intriguing assymptotic gap appears between the minimum value of $n$, call it $n^$, for which the recovery is information theoretically possible, and the minimum value of $n$, call it $n_{\mathrm{alg}}$, for which an efficient algorithm is known to provably recover $\beta^$. In \cite{gamarnikzadik} it was conjectured that the gap is not artificial, in the sense that for sample sizes $n \in [n^,n_{\mathrm{alg}}]$ the problem is algorithmically hard. We support this conjecture in two ways. Firstly, we show that the optimal solution of the LASSO provably fails to $\ell_2$-stably recover the unknown vector $\beta^$ when $n \in [n^,c n_{\mathrm{alg}}]$, for some sufficiently small constant $c>0$. Secondly, we establish that $n_{\mathrm{alg}}$, up to a multiplicative constant factor, is a phase transition point for the appearance of a certain Overlap Gap Property (OGP) over the space of $k$-sparse vectors. The presence of such an Overlap Gap Property phase transition, which originates in statistical physics, is known to provide evidence of an algorithmic hardness. Finally we show that if $n>C n_{\mathrm{alg}}$ for some large enough constant $C>0$, a very simple algorithm based on a local search improvement rule is able both to $\ell_2$-stably recover the unknown vector $\beta^$ and to infer correctly its support, adding it to the list of provably successful algorithms for the high dimensional linear regression problem.
Tasks	Denoising
Published	2017-11-14
URL	https://arxiv.org/abs/1711.04952v2
PDF	https://arxiv.org/pdf/1711.04952v2.pdf
PWC	https://paperswithcode.com/paper/sparse-high-dimensional-linear-regression
Repo
Framework

AMTnet: Action-Micro-Tube Regression by End-to-end Trainable Deep Architecture


Title	AMTnet: Action-Micro-Tube Regression by End-to-end Trainable Deep Architecture
Authors	Suman Saha, Gurkirt Singh, Fabio Cuzzolin
Abstract	Dominant approaches to action detection can only provide sub-optimal solutions to the problem, as they rely on seeking frame-level detections, to later compose them into “action tubes” in a post-processing step. With this paper we radically depart from current practice, and take a first step towards the design and implementation of a deep network architecture able to classify and regress whole video subsets, so providing a truly optimal solution of the action detection problem. In this work, in particular, we propose a novel deep net framework able to regress and classify 3D region proposals spanning two successive video frames, whose core is an evolution of classical region proposal networks (RPNs). As such, our 3D-RPN net is able to effectively encode the temporal aspect of actions by purely exploiting appearance, as opposed to methods which heavily rely on expensive flow maps. The proposed model is end-to-end trainable and can be jointly optimised for action localisation and classification in a single step. At test time the network predicts “micro-tubes” encompassing two successive frames, which are linked up into complete action tubes via a new algorithm which exploits the temporal encoding learned by the network and cuts computation time by 50%. Promising results on the J-HMDB-21 and UCF-101 action detection datasets show that our model does outperform the state-of-the-art when relying purely on appearance.
Tasks	Action Detection
Published	2017-04-17
URL	http://arxiv.org/abs/1704.04952v2
PDF	http://arxiv.org/pdf/1704.04952v2.pdf
PWC	https://paperswithcode.com/paper/amtnet-action-micro-tube-regression-by-end-to
Repo
Framework

Recurrent Multimodal Interaction for Referring Image Segmentation


Title	Recurrent Multimodal Interaction for Referring Image Segmentation
Authors	Chenxi Liu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Alan Yuille
Abstract	In this paper we are interested in the problem of image segmentation given natural language descriptions, i.e. referring expressions. Existing works tackle this problem by first modeling images and sentences independently and then segment images by combining these two types of representations. We argue that learning word-to-image interaction is more native in the sense of jointly modeling two modalities for the image segmentation task, and we propose convolutional multimodal LSTM to encode the sequential interactions between individual words, visual information, and spatial information. We show that our proposed model outperforms the baseline model on benchmark datasets. In addition, we analyze the intermediate output of the proposed multimodal LSTM approach and empirically explain how this approach enforces a more effective word-to-image interaction.
Tasks	Semantic Segmentation
Published	2017-03-23
URL	http://arxiv.org/abs/1703.07939v2
PDF	http://arxiv.org/pdf/1703.07939v2.pdf
PWC	https://paperswithcode.com/paper/recurrent-multimodal-interaction-for
Repo
Framework

Online learnability of Statistical Relational Learning in anomaly detection


Title	Online learnability of Statistical Relational Learning in anomaly detection
Authors	Magnus Jändel, Pontus Svenson, Niclas Wadströmer
Abstract	Statistical Relational Learning (SRL) methods for anomaly detection are introduced via a security-related application. Operational requirements for online learning stability are outlined and compared to mathematical definitions as applied to the learning process of a representative SRL method - Bayesian Logic Programs (BLP). Since a formal proof of online stability appears to be impossible, tentative common sense requirements are formulated and tested by theoretical and experimental analysis of a simple and analytically tractable BLP model. It is found that learning algorithms in initial stages of online learning can lock on unstable false predictors that nevertheless comply with our tentative stability requirements and thus masquerade as bona fide solutions. The very expressiveness of SRL seems to cause significant stability issues in settings with many variables and scarce data. We conclude that reliable anomaly detection with SRL-methods requires monitoring by an overarching framework that may involve a comprehensive context knowledge base or human supervision.
Tasks	Anomaly Detection, Common Sense Reasoning, Relational Reasoning
Published	2017-05-18
URL	http://arxiv.org/abs/1705.06573v1
PDF	http://arxiv.org/pdf/1705.06573v1.pdf
PWC	https://paperswithcode.com/paper/online-learnability-of-statistical-relational
Repo
Framework

MLBench: How Good Are Machine Learning Clouds for Binary Classification Tasks on Structured Data?


Title	MLBench: How Good Are Machine Learning Clouds for Binary Classification Tasks on Structured Data?
Authors	Yu Liu, Hantian Zhang, Luyuan Zeng, Wentao Wu, Ce Zhang
Abstract	We conduct an empirical study of machine learning functionalities provided by major cloud service providers, which we call machine learning clouds. Machine learning clouds hold the promise of hiding all the sophistication of running large-scale machine learning: Instead of specifying how to run a machine learning task, users only specify what machine learning task to run and the cloud figures out the rest. Raising the level of abstraction, however, rarely comes free - a performance penalty is possible. How good, then, are current machine learning clouds on real-world machine learning workloads? We study this question with a focus on binary classication problems. We present mlbench, a novel benchmark constructed by harvesting datasets from Kaggle competitions. We then compare the performance of the top winning code available from Kaggle with that of running machine learning clouds from both Azure and Amazon on mlbench. Our comparative study reveals the strength and weakness of existing machine learning clouds and points out potential future directions for improvement.
Tasks
Published	2017-07-29
URL	http://arxiv.org/abs/1707.09562v3
PDF	http://arxiv.org/pdf/1707.09562v3.pdf
PWC	https://paperswithcode.com/paper/mlbench-how-good-are-machine-learning-clouds
Repo
Framework

Redefining Context Windows for Word Embedding Models: An Experimental Study


Title	Redefining Context Windows for Word Embedding Models: An Experimental Study
Authors	Pierre Lison, Andrey Kutuzov
Abstract	Distributional semantic models learn vector representations of words through the contexts they occur in. Although the choice of context (which often takes the form of a sliding window) has a direct influence on the resulting embeddings, the exact role of this model component is still not fully understood. This paper presents a systematic analysis of context windows based on a set of four distinct hyper-parameters. We train continuous Skip-Gram models on two English-language corpora for various combinations of these hyper-parameters, and evaluate them on both lexical similarity and analogy tasks. Notable experimental results are the positive impact of cross-sentential contexts and the surprisingly good performance of right-context windows.
Tasks
Published	2017-04-19
URL	http://arxiv.org/abs/1704.05781v1
PDF	http://arxiv.org/pdf/1704.05781v1.pdf
PWC	https://paperswithcode.com/paper/redefining-context-windows-for-word-embedding
Repo
Framework

Sparse Inverse Covariance Estimation for Chordal Structures


Title	Sparse Inverse Covariance Estimation for Chordal Structures
Authors	Salar Fattahi, Richard Y. Zhang, Somayeh Sojoudi
Abstract	In this paper, we consider the Graphical Lasso (GL), a popular optimization problem for learning the sparse representations of high-dimensional datasets, which is well-known to be computationally expensive for large-scale problems. Recently, we have shown that the sparsity pattern of the optimal solution of GL is equivalent to the one obtained from simply thresholding the sample covariance matrix, for sparse graphs under different conditions. We have also derived a closed-form solution that is optimal when the thresholded sample covariance matrix has an acyclic structure. As a major generalization of the previous result, in this paper we derive a closed-form solution for the GL for graphs with chordal structures. We show that the GL and thresholding equivalence conditions can significantly be simplified and are expected to hold for high-dimensional problems if the thresholded sample covariance matrix has a chordal structure. We then show that the GL and thresholding equivalence is enough to reduce the GL to a maximum determinant matrix completion problem and drive a recursive closed-form solution for the GL when the thresholded sample covariance matrix has a chordal structure. For large-scale problems with up to 450 million variables, the proposed method can solve the GL problem in less than 2 minutes, while the state-of-the-art methods converge in more than 2 hours.
Tasks	Matrix Completion
Published	2017-11-24
URL	http://arxiv.org/abs/1711.09131v1
PDF	http://arxiv.org/pdf/1711.09131v1.pdf
PWC	https://paperswithcode.com/paper/sparse-inverse-covariance-estimation-for
Repo
Framework

Exploiting gradients and Hessians in Bayesian optimization and Bayesian quadrature


Title	Exploiting gradients and Hessians in Bayesian optimization and Bayesian quadrature
Authors	Anqi Wu, Mikio C. Aoi, Jonathan W. Pillow
Abstract	An exciting branch of machine learning research focuses on methods for learning, optimizing, and integrating unknown functions that are difficult or costly to evaluate. A popular Bayesian approach to this problem uses a Gaussian process (GP) to construct a posterior distribution over the function of interest given a set of observed measurements, and selects new points to evaluate using the statistics of this posterior. Here we extend these methods to exploit derivative information from the unknown function. We describe methods for Bayesian optimization (BO) and Bayesian quadrature (BQ) in settings where first and second derivatives may be evaluated along with the function itself. We perform sampling-based inference in order to incorporate uncertainty over hyperparameters, and show that both hyperparameter and function uncertainty decrease much more rapidly when using derivative information. Moreover, we introduce techniques for overcoming ill-conditioning issues that have plagued earlier methods for gradient-enhanced Gaussian processes and kriging. We illustrate the efficacy of these methods using applications to real and simulated Bayesian optimization and quadrature problems, and show that exploting derivatives can provide substantial gains over standard methods.
Tasks	Gaussian Processes
Published	2017-03-31
URL	http://arxiv.org/abs/1704.00060v2
PDF	http://arxiv.org/pdf/1704.00060v2.pdf
PWC	https://paperswithcode.com/paper/exploiting-gradients-and-hessians-in-bayesian
Repo
Framework

Comparing Aggregators for Relational Probabilistic Models


Title	Comparing Aggregators for Relational Probabilistic Models
Authors	Seyed Mehran Kazemi, Bahare Fatemi, Alexandra Kim, Zilun Peng, Moumita Roy Tora, Xing Zeng, Matthew Dirks, David Poole
Abstract	Relational probabilistic models have the challenge of aggregation, where one variable depends on a population of other variables. Consider the problem of predicting gender from movie ratings; this is challenging because the number of movies per user and users per movie can vary greatly. Surprisingly, aggregation is not well understood. In this paper, we show that existing relational models (implicitly or explicitly) either use simple numerical aggregators that lose great amounts of information, or correspond to naive Bayes, logistic regression, or noisy-OR that suffer from overconfidence. We propose new simple aggregators and simple modifications of existing models that empirically outperform the existing ones. The intuition we provide on different (existing or new) models and their shortcomings plus our empirical findings promise to form the foundation for future representations.
Tasks
Published	2017-07-25
URL	http://arxiv.org/abs/1707.07785v1
PDF	http://arxiv.org/pdf/1707.07785v1.pdf
PWC	https://paperswithcode.com/paper/comparing-aggregators-for-relational
Repo
Framework

Low Permutation-rank Matrices: Structural Properties and Noisy Completion


Title	Low Permutation-rank Matrices: Structural Properties and Noisy Completion
Authors	Nihar B. Shah, Sivaraman Balakrishnan, Martin J. Wainwright
Abstract	We consider the problem of noisy matrix completion, in which the goal is to reconstruct a structured matrix whose entries are partially observed in noise. Standard approaches to this underdetermined inverse problem are based on assuming that the underlying matrix has low rank, or is well-approximated by a low rank matrix. In this paper, we propose a richer model based on what we term the “permutation-rank” of a matrix. We first describe how the classical non-negative rank model enforces restrictions that may be undesirable in practice, and how and these restrictions can be avoided by using the richer permutation-rank model. Second, we establish the minimax rates of estimation under the new permutation-based model, and prove that surprisingly, the minimax rates are equivalent up to logarithmic factors to those for estimation under the typical low rank model. Third, we analyze a computationally efficient singular-value-thresholding algorithm, known to be optimal for the low-rank setting, and show that it also simultaneously yields a consistent estimator for the low-permutation rank setting. Finally, we present various structural results characterizing the uniqueness of the permutation-rank decomposition, and characterizing convex approximations of the permutation-rank polytope.
Tasks	Matrix Completion
Published	2017-09-01
URL	http://arxiv.org/abs/1709.00127v1
PDF	http://arxiv.org/pdf/1709.00127v1.pdf
PWC	https://paperswithcode.com/paper/low-permutation-rank-matrices-structural
Repo
Framework

Window-of-interest based Multi-objective Evolutionary Search for Satisficing Concepts


Title	Window-of-interest based Multi-objective Evolutionary Search for Satisficing Concepts
Authors	Eliran Farhi, Amiram Moshaiov
Abstract	The set-based concept approach has been suggested as a means to simultaneously explore different design concepts, which are meaningful sub-sets of the entire set of solutions. Previous efforts concerning the suggested approach focused on either revealing the global front (s-Pareto front), of all the concepts, or on finding the concepts’ fronts, within a relaxation zone. In contrast, here the aim is to reveal which of the concepts have at least one solution with a performance vector within a pre-defined window-of-interest (WOI). This paper provides the rational for this new concept-based exploration problem, and suggests a WOI-based rather than Pareto-based multi-objective evolutionary algorithm. The proposed algorithm, which simultaneously explores different concepts, is tested using a recently suggested concept-based benchmarking approach. The numerical study of this paper shows that the algorithm can cope with various numerical difficulties in a simultaneous way, which outperforms a sequential exploration approach.
Tasks
Published	2017-07-04
URL	http://arxiv.org/abs/1707.00936v1
PDF	http://arxiv.org/pdf/1707.00936v1.pdf
PWC	https://paperswithcode.com/paper/window-of-interest-based-multi-objective
Repo
Framework

Two-Stream RNN/CNN for Action Recognition in 3D Videos


Title	Two-Stream RNN/CNN for Action Recognition in 3D Videos
Authors	Rui Zhao, Haider Ali, Patrick van der Smagt
Abstract	The recognition of actions from video sequences has many applications in health monitoring, assisted living, surveillance, and smart homes. Despite advances in sensing, in particular related to 3D video, the methodologies to process the data are still subject to research. We demonstrate superior results by a system which combines recurrent neural networks with convolutional neural networks in a voting approach. The gated-recurrent-unit-based neural networks are particularly well-suited to distinguish actions based on long-term information from optical tracking data; the 3D-CNNs focus more on detailed, recent information from video data. The resulting features are merged in an SVM which then classifies the movement. In this architecture, our method improves recognition rates of state-of-the-art methods by 14% on standard data sets.
Tasks	Temporal Action Localization
Published	2017-03-22
URL	http://arxiv.org/abs/1703.09783v2
PDF	http://arxiv.org/pdf/1703.09783v2.pdf
PWC	https://paperswithcode.com/paper/two-stream-rnncnn-for-action-recognition-in
Repo
Framework

Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks


Title	Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks
Authors	John Bradshaw, Alexander G. de G. Matthews, Zoubin Ghahramani
Abstract	Deep neural networks (DNNs) have excellent representative power and are state of the art classifiers on many tasks. However, they often do not capture their own uncertainties well making them less robust in the real world as they overconfidently extrapolate and do not notice domain shift. Gaussian processes (GPs) with RBF kernels on the other hand have better calibrated uncertainties and do not overconfidently extrapolate far from data in their training set. However, GPs have poor representational power and do not perform as well as DNNs on complex domains. In this paper we show that GP hybrid deep networks, GPDNNs, (GPs on top of DNNs and trained end-to-end) inherit the nice properties of both GPs and DNNs and are much more robust to adversarial examples. When extrapolating to adversarial examples and testing in domain shift settings, GPDNNs frequently output high entropy class probabilities corresponding to essentially “don’t know”. GPDNNs are therefore promising as deep architectures that know when they don’t know.
Tasks	Gaussian Processes
Published	2017-07-08
URL	http://arxiv.org/abs/1707.02476v1
PDF	http://arxiv.org/pdf/1707.02476v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-examples-uncertainty-and-transfer
Repo
Framework

A graph cut approach to 3D tree delineation, using integrated airborne LiDAR and hyperspectral imagery


Title	A graph cut approach to 3D tree delineation, using integrated airborne LiDAR and hyperspectral imagery
Authors	Juheon Lee, David Coomes, Carola-Bibiane Schonlieb, Xiaohao Cai, Jan Lellmann, Michele Dalponte, Yadvinder Malhi, Nathalie Butt, Mike Morecroft
Abstract	Recognising individual trees within remotely sensed imagery has important applications in forest ecology and management. Several algorithms for tree delineation have been suggested, mostly based on locating local maxima or inverted basins in raster canopy height models (CHMs) derived from Light Detection And Ranging (LiDAR) data or photographs. However, these algorithms often lead to inaccurate estimates of forest stand characteristics due to the limited information content of raster CHMs. Here we develop a 3D tree delineation method which uses graph cut to delineate trees from the full 3D LiDAR point cloud, and also makes use of any optical imagery available (hyperspectral imagery in our case). First, conventional methods are used to locate local maxima in the CHM and generate an initial map of trees. Second, a graph is built from the LiDAR point cloud, fused with the hyperspectral data. For computational efficiency, the feature space of hyperspectral imagery is reduced using robust PCA. Third, a multi-class normalised cut is applied to the graph, using the initial map of trees to constrain the number of clusters and their locations. Finally, recursive normalised cut is used to subdivide, if necessary, each of the clusters identified by the initial analysis. We call this approach Multiclass Cut followed by Recursive Cut (MCRC). The effectiveness of MCRC was tested using three datasets: i) NewFor, ii) a coniferous forest in the Italian Alps, and iii) a deciduous woodland in the UK. The performance of MCRC was usually superior to that of other delineation methods, and was further improved by including high-resolution optical imagery. Since MCRC delineates the entire LiDAR point cloud in 3D, it allows individual crown characteristics to be measured. By making full use of the data available, graph cut has the potential to considerably improve the accuracy of tree delineation.
Tasks
Published	2017-01-24
URL	http://arxiv.org/abs/1701.06715v1
PDF	http://arxiv.org/pdf/1701.06715v1.pdf
PWC	https://paperswithcode.com/paper/a-graph-cut-approach-to-3d-tree-delineation
Repo
Framework