Paper Group ANR 423
Horseshoe Regularization for Feature Subset Selection. Sparse High-Dimensional Linear Regression. Algorithmic Barriers and a Local Search Algorithm. AMTnet: Action-Micro-Tube Regression by End-to-end Trainable Deep Architecture. Recurrent Multimodal Interaction for Referring Image Segmentation. Online learnability of Statistical Relational Learning …
Horseshoe Regularization for Feature Subset Selection
Title | Horseshoe Regularization for Feature Subset Selection |
Authors | Anindya Bhadra, Jyotishka Datta, Nicholas G. Polson, Brandon Willard |
Abstract | Feature subset selection arises in many high-dimensional applications of statistics, such as compressed sensing and genomics. The $\ell_0$ penalty is ideal for this task, the caveat being it requires the NP-hard combinatorial evaluation of all models. A recent area of considerable interest is to develop efficient algorithms to fit models with a non-convex $\ell_\gamma$ penalty for $\gamma\in (0,1)$, which results in sparser models than the convex $\ell_1$ or lasso penalty, but is harder to fit. We propose an alternative, termed the horseshoe regularization penalty for feature subset selection, and demonstrate its theoretical and computational advantages. The distinguishing feature from existing non-convex optimization approaches is a full probabilistic representation of the penalty as the negative of the logarithm of a suitable prior, which in turn enables efficient expectation-maximization and local linear approximation algorithms for optimization and MCMC for uncertainty quantification. In synthetic and real data, the resulting algorithms provide better statistical performance, and the computation requires a fraction of time of state-of-the-art non-convex solvers. |
Tasks | |
Published | 2017-02-23 |
URL | http://arxiv.org/abs/1702.07400v2 |
http://arxiv.org/pdf/1702.07400v2.pdf | |
PWC | https://paperswithcode.com/paper/horseshoe-regularization-for-feature-subset |
Repo | |
Framework | |
Sparse High-Dimensional Linear Regression. Algorithmic Barriers and a Local Search Algorithm
Title | Sparse High-Dimensional Linear Regression. Algorithmic Barriers and a Local Search Algorithm |
Authors | David Gamarnik, Ilias Zadik |
Abstract | We consider a sparse high dimensional regression model where the goal is to recover a $k$-sparse unknown vector $\beta^$ from $n$ noisy linear observations of the form $Y=X\beta^+W \in \mathbb{R}^n$ where $X \in \mathbb{R}^{n \times p}$ has iid $N(0,1)$ entries and $W \in \mathbb{R}^n$ has iid $N(0,\sigma^2)$ entries. Under certain assumptions on the parameters, an intriguing assymptotic gap appears between the minimum value of $n$, call it $n^*$, for which the recovery is information theoretically possible, and the minimum value of $n$, call it $n_{\mathrm{alg}}$, for which an efficient algorithm is known to provably recover $\beta^*$. In \cite{gamarnikzadik} it was conjectured that the gap is not artificial, in the sense that for sample sizes $n \in [n^*,n_{\mathrm{alg}}]$ the problem is algorithmically hard. We support this conjecture in two ways. Firstly, we show that the optimal solution of the LASSO provably fails to $\ell_2$-stably recover the unknown vector $\beta^*$ when $n \in [n^*,c n_{\mathrm{alg}}]$, for some sufficiently small constant $c>0$. Secondly, we establish that $n_{\mathrm{alg}}$, up to a multiplicative constant factor, is a phase transition point for the appearance of a certain Overlap Gap Property (OGP) over the space of $k$-sparse vectors. The presence of such an Overlap Gap Property phase transition, which originates in statistical physics, is known to provide evidence of an algorithmic hardness. Finally we show that if $n>C n_{\mathrm{alg}}$ for some large enough constant $C>0$, a very simple algorithm based on a local search improvement rule is able both to $\ell_2$-stably recover the unknown vector $\beta^*$ and to infer correctly its support, adding it to the list of provably successful algorithms for the high dimensional linear regression problem. |
Tasks | Denoising |
Published | 2017-11-14 |
URL | https://arxiv.org/abs/1711.04952v2 |
https://arxiv.org/pdf/1711.04952v2.pdf | |
PWC | https://paperswithcode.com/paper/sparse-high-dimensional-linear-regression |
Repo | |
Framework | |
AMTnet: Action-Micro-Tube Regression by End-to-end Trainable Deep Architecture
Title | AMTnet: Action-Micro-Tube Regression by End-to-end Trainable Deep Architecture |
Authors | Suman Saha, Gurkirt Singh, Fabio Cuzzolin |
Abstract | Dominant approaches to action detection can only provide sub-optimal solutions to the problem, as they rely on seeking frame-level detections, to later compose them into “action tubes” in a post-processing step. With this paper we radically depart from current practice, and take a first step towards the design and implementation of a deep network architecture able to classify and regress whole video subsets, so providing a truly optimal solution of the action detection problem. In this work, in particular, we propose a novel deep net framework able to regress and classify 3D region proposals spanning two successive video frames, whose core is an evolution of classical region proposal networks (RPNs). As such, our 3D-RPN net is able to effectively encode the temporal aspect of actions by purely exploiting appearance, as opposed to methods which heavily rely on expensive flow maps. The proposed model is end-to-end trainable and can be jointly optimised for action localisation and classification in a single step. At test time the network predicts “micro-tubes” encompassing two successive frames, which are linked up into complete action tubes via a new algorithm which exploits the temporal encoding learned by the network and cuts computation time by 50%. Promising results on the J-HMDB-21 and UCF-101 action detection datasets show that our model does outperform the state-of-the-art when relying purely on appearance. |
Tasks | Action Detection |
Published | 2017-04-17 |
URL | http://arxiv.org/abs/1704.04952v2 |
http://arxiv.org/pdf/1704.04952v2.pdf | |
PWC | https://paperswithcode.com/paper/amtnet-action-micro-tube-regression-by-end-to |
Repo | |
Framework | |
Recurrent Multimodal Interaction for Referring Image Segmentation
Title | Recurrent Multimodal Interaction for Referring Image Segmentation |
Authors | Chenxi Liu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Alan Yuille |
Abstract | In this paper we are interested in the problem of image segmentation given natural language descriptions, i.e. referring expressions. Existing works tackle this problem by first modeling images and sentences independently and then segment images by combining these two types of representations. We argue that learning word-to-image interaction is more native in the sense of jointly modeling two modalities for the image segmentation task, and we propose convolutional multimodal LSTM to encode the sequential interactions between individual words, visual information, and spatial information. We show that our proposed model outperforms the baseline model on benchmark datasets. In addition, we analyze the intermediate output of the proposed multimodal LSTM approach and empirically explain how this approach enforces a more effective word-to-image interaction. |
Tasks | Semantic Segmentation |
Published | 2017-03-23 |
URL | http://arxiv.org/abs/1703.07939v2 |
http://arxiv.org/pdf/1703.07939v2.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-multimodal-interaction-for |
Repo | |
Framework | |
Online learnability of Statistical Relational Learning in anomaly detection
Title | Online learnability of Statistical Relational Learning in anomaly detection |
Authors | Magnus Jändel, Pontus Svenson, Niclas Wadströmer |
Abstract | Statistical Relational Learning (SRL) methods for anomaly detection are introduced via a security-related application. Operational requirements for online learning stability are outlined and compared to mathematical definitions as applied to the learning process of a representative SRL method - Bayesian Logic Programs (BLP). Since a formal proof of online stability appears to be impossible, tentative common sense requirements are formulated and tested by theoretical and experimental analysis of a simple and analytically tractable BLP model. It is found that learning algorithms in initial stages of online learning can lock on unstable false predictors that nevertheless comply with our tentative stability requirements and thus masquerade as bona fide solutions. The very expressiveness of SRL seems to cause significant stability issues in settings with many variables and scarce data. We conclude that reliable anomaly detection with SRL-methods requires monitoring by an overarching framework that may involve a comprehensive context knowledge base or human supervision. |
Tasks | Anomaly Detection, Common Sense Reasoning, Relational Reasoning |
Published | 2017-05-18 |
URL | http://arxiv.org/abs/1705.06573v1 |
http://arxiv.org/pdf/1705.06573v1.pdf | |
PWC | https://paperswithcode.com/paper/online-learnability-of-statistical-relational |
Repo | |
Framework | |
MLBench: How Good Are Machine Learning Clouds for Binary Classification Tasks on Structured Data?
Title | MLBench: How Good Are Machine Learning Clouds for Binary Classification Tasks on Structured Data? |
Authors | Yu Liu, Hantian Zhang, Luyuan Zeng, Wentao Wu, Ce Zhang |
Abstract | We conduct an empirical study of machine learning functionalities provided by major cloud service providers, which we call machine learning clouds. Machine learning clouds hold the promise of hiding all the sophistication of running large-scale machine learning: Instead of specifying how to run a machine learning task, users only specify what machine learning task to run and the cloud figures out the rest. Raising the level of abstraction, however, rarely comes free - a performance penalty is possible. How good, then, are current machine learning clouds on real-world machine learning workloads? We study this question with a focus on binary classication problems. We present mlbench, a novel benchmark constructed by harvesting datasets from Kaggle competitions. We then compare the performance of the top winning code available from Kaggle with that of running machine learning clouds from both Azure and Amazon on mlbench. Our comparative study reveals the strength and weakness of existing machine learning clouds and points out potential future directions for improvement. |
Tasks | |
Published | 2017-07-29 |
URL | http://arxiv.org/abs/1707.09562v3 |
http://arxiv.org/pdf/1707.09562v3.pdf | |
PWC | https://paperswithcode.com/paper/mlbench-how-good-are-machine-learning-clouds |
Repo | |
Framework | |
Redefining Context Windows for Word Embedding Models: An Experimental Study
Title | Redefining Context Windows for Word Embedding Models: An Experimental Study |
Authors | Pierre Lison, Andrey Kutuzov |
Abstract | Distributional semantic models learn vector representations of words through the contexts they occur in. Although the choice of context (which often takes the form of a sliding window) has a direct influence on the resulting embeddings, the exact role of this model component is still not fully understood. This paper presents a systematic analysis of context windows based on a set of four distinct hyper-parameters. We train continuous Skip-Gram models on two English-language corpora for various combinations of these hyper-parameters, and evaluate them on both lexical similarity and analogy tasks. Notable experimental results are the positive impact of cross-sentential contexts and the surprisingly good performance of right-context windows. |
Tasks | |
Published | 2017-04-19 |
URL | http://arxiv.org/abs/1704.05781v1 |
http://arxiv.org/pdf/1704.05781v1.pdf | |
PWC | https://paperswithcode.com/paper/redefining-context-windows-for-word-embedding |
Repo | |
Framework | |
Sparse Inverse Covariance Estimation for Chordal Structures
Title | Sparse Inverse Covariance Estimation for Chordal Structures |
Authors | Salar Fattahi, Richard Y. Zhang, Somayeh Sojoudi |
Abstract | In this paper, we consider the Graphical Lasso (GL), a popular optimization problem for learning the sparse representations of high-dimensional datasets, which is well-known to be computationally expensive for large-scale problems. Recently, we have shown that the sparsity pattern of the optimal solution of GL is equivalent to the one obtained from simply thresholding the sample covariance matrix, for sparse graphs under different conditions. We have also derived a closed-form solution that is optimal when the thresholded sample covariance matrix has an acyclic structure. As a major generalization of the previous result, in this paper we derive a closed-form solution for the GL for graphs with chordal structures. We show that the GL and thresholding equivalence conditions can significantly be simplified and are expected to hold for high-dimensional problems if the thresholded sample covariance matrix has a chordal structure. We then show that the GL and thresholding equivalence is enough to reduce the GL to a maximum determinant matrix completion problem and drive a recursive closed-form solution for the GL when the thresholded sample covariance matrix has a chordal structure. For large-scale problems with up to 450 million variables, the proposed method can solve the GL problem in less than 2 minutes, while the state-of-the-art methods converge in more than 2 hours. |
Tasks | Matrix Completion |
Published | 2017-11-24 |
URL | http://arxiv.org/abs/1711.09131v1 |
http://arxiv.org/pdf/1711.09131v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-inverse-covariance-estimation-for |
Repo | |
Framework | |
Exploiting gradients and Hessians in Bayesian optimization and Bayesian quadrature
Title | Exploiting gradients and Hessians in Bayesian optimization and Bayesian quadrature |
Authors | Anqi Wu, Mikio C. Aoi, Jonathan W. Pillow |
Abstract | An exciting branch of machine learning research focuses on methods for learning, optimizing, and integrating unknown functions that are difficult or costly to evaluate. A popular Bayesian approach to this problem uses a Gaussian process (GP) to construct a posterior distribution over the function of interest given a set of observed measurements, and selects new points to evaluate using the statistics of this posterior. Here we extend these methods to exploit derivative information from the unknown function. We describe methods for Bayesian optimization (BO) and Bayesian quadrature (BQ) in settings where first and second derivatives may be evaluated along with the function itself. We perform sampling-based inference in order to incorporate uncertainty over hyperparameters, and show that both hyperparameter and function uncertainty decrease much more rapidly when using derivative information. Moreover, we introduce techniques for overcoming ill-conditioning issues that have plagued earlier methods for gradient-enhanced Gaussian processes and kriging. We illustrate the efficacy of these methods using applications to real and simulated Bayesian optimization and quadrature problems, and show that exploting derivatives can provide substantial gains over standard methods. |
Tasks | Gaussian Processes |
Published | 2017-03-31 |
URL | http://arxiv.org/abs/1704.00060v2 |
http://arxiv.org/pdf/1704.00060v2.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-gradients-and-hessians-in-bayesian |
Repo | |
Framework | |
Comparing Aggregators for Relational Probabilistic Models
Title | Comparing Aggregators for Relational Probabilistic Models |
Authors | Seyed Mehran Kazemi, Bahare Fatemi, Alexandra Kim, Zilun Peng, Moumita Roy Tora, Xing Zeng, Matthew Dirks, David Poole |
Abstract | Relational probabilistic models have the challenge of aggregation, where one variable depends on a population of other variables. Consider the problem of predicting gender from movie ratings; this is challenging because the number of movies per user and users per movie can vary greatly. Surprisingly, aggregation is not well understood. In this paper, we show that existing relational models (implicitly or explicitly) either use simple numerical aggregators that lose great amounts of information, or correspond to naive Bayes, logistic regression, or noisy-OR that suffer from overconfidence. We propose new simple aggregators and simple modifications of existing models that empirically outperform the existing ones. The intuition we provide on different (existing or new) models and their shortcomings plus our empirical findings promise to form the foundation for future representations. |
Tasks | |
Published | 2017-07-25 |
URL | http://arxiv.org/abs/1707.07785v1 |
http://arxiv.org/pdf/1707.07785v1.pdf | |
PWC | https://paperswithcode.com/paper/comparing-aggregators-for-relational |
Repo | |
Framework | |
Low Permutation-rank Matrices: Structural Properties and Noisy Completion
Title | Low Permutation-rank Matrices: Structural Properties and Noisy Completion |
Authors | Nihar B. Shah, Sivaraman Balakrishnan, Martin J. Wainwright |
Abstract | We consider the problem of noisy matrix completion, in which the goal is to reconstruct a structured matrix whose entries are partially observed in noise. Standard approaches to this underdetermined inverse problem are based on assuming that the underlying matrix has low rank, or is well-approximated by a low rank matrix. In this paper, we propose a richer model based on what we term the “permutation-rank” of a matrix. We first describe how the classical non-negative rank model enforces restrictions that may be undesirable in practice, and how and these restrictions can be avoided by using the richer permutation-rank model. Second, we establish the minimax rates of estimation under the new permutation-based model, and prove that surprisingly, the minimax rates are equivalent up to logarithmic factors to those for estimation under the typical low rank model. Third, we analyze a computationally efficient singular-value-thresholding algorithm, known to be optimal for the low-rank setting, and show that it also simultaneously yields a consistent estimator for the low-permutation rank setting. Finally, we present various structural results characterizing the uniqueness of the permutation-rank decomposition, and characterizing convex approximations of the permutation-rank polytope. |
Tasks | Matrix Completion |
Published | 2017-09-01 |
URL | http://arxiv.org/abs/1709.00127v1 |
http://arxiv.org/pdf/1709.00127v1.pdf | |
PWC | https://paperswithcode.com/paper/low-permutation-rank-matrices-structural |
Repo | |
Framework | |
Window-of-interest based Multi-objective Evolutionary Search for Satisficing Concepts
Title | Window-of-interest based Multi-objective Evolutionary Search for Satisficing Concepts |
Authors | Eliran Farhi, Amiram Moshaiov |
Abstract | The set-based concept approach has been suggested as a means to simultaneously explore different design concepts, which are meaningful sub-sets of the entire set of solutions. Previous efforts concerning the suggested approach focused on either revealing the global front (s-Pareto front), of all the concepts, or on finding the concepts’ fronts, within a relaxation zone. In contrast, here the aim is to reveal which of the concepts have at least one solution with a performance vector within a pre-defined window-of-interest (WOI). This paper provides the rational for this new concept-based exploration problem, and suggests a WOI-based rather than Pareto-based multi-objective evolutionary algorithm. The proposed algorithm, which simultaneously explores different concepts, is tested using a recently suggested concept-based benchmarking approach. The numerical study of this paper shows that the algorithm can cope with various numerical difficulties in a simultaneous way, which outperforms a sequential exploration approach. |
Tasks | |
Published | 2017-07-04 |
URL | http://arxiv.org/abs/1707.00936v1 |
http://arxiv.org/pdf/1707.00936v1.pdf | |
PWC | https://paperswithcode.com/paper/window-of-interest-based-multi-objective |
Repo | |
Framework | |
Two-Stream RNN/CNN for Action Recognition in 3D Videos
Title | Two-Stream RNN/CNN for Action Recognition in 3D Videos |
Authors | Rui Zhao, Haider Ali, Patrick van der Smagt |
Abstract | The recognition of actions from video sequences has many applications in health monitoring, assisted living, surveillance, and smart homes. Despite advances in sensing, in particular related to 3D video, the methodologies to process the data are still subject to research. We demonstrate superior results by a system which combines recurrent neural networks with convolutional neural networks in a voting approach. The gated-recurrent-unit-based neural networks are particularly well-suited to distinguish actions based on long-term information from optical tracking data; the 3D-CNNs focus more on detailed, recent information from video data. The resulting features are merged in an SVM which then classifies the movement. In this architecture, our method improves recognition rates of state-of-the-art methods by 14% on standard data sets. |
Tasks | Temporal Action Localization |
Published | 2017-03-22 |
URL | http://arxiv.org/abs/1703.09783v2 |
http://arxiv.org/pdf/1703.09783v2.pdf | |
PWC | https://paperswithcode.com/paper/two-stream-rnncnn-for-action-recognition-in |
Repo | |
Framework | |
Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks
Title | Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks |
Authors | John Bradshaw, Alexander G. de G. Matthews, Zoubin Ghahramani |
Abstract | Deep neural networks (DNNs) have excellent representative power and are state of the art classifiers on many tasks. However, they often do not capture their own uncertainties well making them less robust in the real world as they overconfidently extrapolate and do not notice domain shift. Gaussian processes (GPs) with RBF kernels on the other hand have better calibrated uncertainties and do not overconfidently extrapolate far from data in their training set. However, GPs have poor representational power and do not perform as well as DNNs on complex domains. In this paper we show that GP hybrid deep networks, GPDNNs, (GPs on top of DNNs and trained end-to-end) inherit the nice properties of both GPs and DNNs and are much more robust to adversarial examples. When extrapolating to adversarial examples and testing in domain shift settings, GPDNNs frequently output high entropy class probabilities corresponding to essentially “don’t know”. GPDNNs are therefore promising as deep architectures that know when they don’t know. |
Tasks | Gaussian Processes |
Published | 2017-07-08 |
URL | http://arxiv.org/abs/1707.02476v1 |
http://arxiv.org/pdf/1707.02476v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-examples-uncertainty-and-transfer |
Repo | |
Framework | |
A graph cut approach to 3D tree delineation, using integrated airborne LiDAR and hyperspectral imagery
Title | A graph cut approach to 3D tree delineation, using integrated airborne LiDAR and hyperspectral imagery |
Authors | Juheon Lee, David Coomes, Carola-Bibiane Schonlieb, Xiaohao Cai, Jan Lellmann, Michele Dalponte, Yadvinder Malhi, Nathalie Butt, Mike Morecroft |
Abstract | Recognising individual trees within remotely sensed imagery has important applications in forest ecology and management. Several algorithms for tree delineation have been suggested, mostly based on locating local maxima or inverted basins in raster canopy height models (CHMs) derived from Light Detection And Ranging (LiDAR) data or photographs. However, these algorithms often lead to inaccurate estimates of forest stand characteristics due to the limited information content of raster CHMs. Here we develop a 3D tree delineation method which uses graph cut to delineate trees from the full 3D LiDAR point cloud, and also makes use of any optical imagery available (hyperspectral imagery in our case). First, conventional methods are used to locate local maxima in the CHM and generate an initial map of trees. Second, a graph is built from the LiDAR point cloud, fused with the hyperspectral data. For computational efficiency, the feature space of hyperspectral imagery is reduced using robust PCA. Third, a multi-class normalised cut is applied to the graph, using the initial map of trees to constrain the number of clusters and their locations. Finally, recursive normalised cut is used to subdivide, if necessary, each of the clusters identified by the initial analysis. We call this approach Multiclass Cut followed by Recursive Cut (MCRC). The effectiveness of MCRC was tested using three datasets: i) NewFor, ii) a coniferous forest in the Italian Alps, and iii) a deciduous woodland in the UK. The performance of MCRC was usually superior to that of other delineation methods, and was further improved by including high-resolution optical imagery. Since MCRC delineates the entire LiDAR point cloud in 3D, it allows individual crown characteristics to be measured. By making full use of the data available, graph cut has the potential to considerably improve the accuracy of tree delineation. |
Tasks | |
Published | 2017-01-24 |
URL | http://arxiv.org/abs/1701.06715v1 |
http://arxiv.org/pdf/1701.06715v1.pdf | |
PWC | https://paperswithcode.com/paper/a-graph-cut-approach-to-3d-tree-delineation |
Repo | |
Framework | |