Paper Group ANR 239
Data Dependent Convergence for Distributed Stochastic Optimization. Pruning Random Forests for Prediction on a Budget. In the Shadows, Shape Priors Shine: Using Occlusion to Improve Multi-Region Segmentation. Co-occurrence Feature Learning for Skeleton based Action Recognition using Regularized Deep LSTM Networks. AutoGP: Exploring the Capabilities …
Data Dependent Convergence for Distributed Stochastic Optimization
Title | Data Dependent Convergence for Distributed Stochastic Optimization |
Authors | Avleen S. Bijral |
Abstract | In this dissertation we propose alternative analysis of distributed stochastic gradient descent (SGD) algorithms that rely on spectral properties of the data covariance. As a consequence we can relate questions pertaining to speedups and convergence rates for distributed SGD to the data distribution instead of the regularity properties of the objective functions. More precisely we show that this rate depends on the spectral norm of the sample covariance matrix. An estimate of this norm can provide practitioners with guidance towards a potential gain in algorithm performance. For example many sparse datasets with low spectral norm prove to be amenable to gains in distributed settings. Towards establishing this data dependence we first study a distributed consensus-based SGD algorithm and show that the rate of convergence involves the spectral norm of the sample covariance matrix when the underlying data is assumed to be independent and identically distributed (homogenous). This dependence allows us to identify network regimes that prove to be beneficial for datasets with low sample covariance spectral norm. Existing consensus based analyses prove to be sub-optimal in the homogenous setting. Our analysis method also allows us to find data-dependent convergence rates as we limit the amount of communication. Spreading a fixed amount of data across more nodes slows convergence; in the asymptotic regime we show that adding more machines can help when minimizing twice-differentiable losses. Since the mini-batch results don’t follow from the consensus results we propose a different data dependent analysis thereby providing theoretical validation for why certain datasets are more amenable to mini-batching. We also provide empirical evidence for results in this thesis. |
Tasks | Stochastic Optimization |
Published | 2016-08-30 |
URL | http://arxiv.org/abs/1608.08337v1 |
http://arxiv.org/pdf/1608.08337v1.pdf | |
PWC | https://paperswithcode.com/paper/data-dependent-convergence-for-distributed |
Repo | |
Framework | |
Pruning Random Forests for Prediction on a Budget
Title | Pruning Random Forests for Prediction on a Budget |
Authors | Feng Nan, Joseph Wang, Venkatesh Saligrama |
Abstract | We propose to prune a random forest (RF) for resource-constrained prediction. We first construct a RF and then prune it to optimize expected feature cost & accuracy. We pose pruning RFs as a novel 0-1 integer program with linear constraints that encourages feature re-use. We establish total unimodularity of the constraint set to prove that the corresponding LP relaxation solves the original integer program. We then exploit connections to combinatorial optimization and develop an efficient primal-dual algorithm, scalable to large datasets. In contrast to our bottom-up approach, which benefits from good RF initialization, conventional methods are top-down acquiring features based on their utility value and is generally intractable, requiring heuristics. Empirically, our pruning algorithm outperforms existing state-of-the-art resource-constrained algorithms. |
Tasks | Combinatorial Optimization |
Published | 2016-06-16 |
URL | http://arxiv.org/abs/1606.05060v1 |
http://arxiv.org/pdf/1606.05060v1.pdf | |
PWC | https://paperswithcode.com/paper/pruning-random-forests-for-prediction-on-a |
Repo | |
Framework | |
In the Shadows, Shape Priors Shine: Using Occlusion to Improve Multi-Region Segmentation
Title | In the Shadows, Shape Priors Shine: Using Occlusion to Improve Multi-Region Segmentation |
Authors | Yuka Kihara, Matvey Soloviev, Tsuhan Chen |
Abstract | We present a new algorithm for multi-region segmentation of 2D images with objects that may partially occlude each other. Our algorithm is based on the observation hat human performance on this task is based both on prior knowledge about plausible shapes and taking into account the presence of occluding objects whose shape is already known - once an occluded region is identified, the shape prior can be used to guess the shape of the missing part. We capture the former aspect using a deep learning model of shape; for the latter, we simultaneously minimize the energy of all regions and consider only unoccluded pixels for data agreement. Existing algorithms incorporating object shape priors consider every object separately in turn and can’t distinguish genuine deviation from the expected shape from parts missing due to occlusion. We show that our method significantly improves on the performance of a representative algorithm, as evaluated on both preprocessed natural and synthetic images. Furthermore, on the synthetic images, we recover the ground truth segmentation with good accuracy. |
Tasks | |
Published | 2016-06-14 |
URL | http://arxiv.org/abs/1606.04590v1 |
http://arxiv.org/pdf/1606.04590v1.pdf | |
PWC | https://paperswithcode.com/paper/in-the-shadows-shape-priors-shine-using |
Repo | |
Framework | |
Co-occurrence Feature Learning for Skeleton based Action Recognition using Regularized Deep LSTM Networks
Title | Co-occurrence Feature Learning for Skeleton based Action Recognition using Regularized Deep LSTM Networks |
Authors | Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li, Li Shen, Xiaohui Xie |
Abstract | Skeleton based action recognition distinguishes human actions using the trajectories of skeleton joints, which provide a very good representation for describing actions. Considering that recurrent neural networks (RNNs) with Long Short-Term Memory (LSTM) can learn feature representations and model long-term temporal dependencies automatically, we propose an end-to-end fully connected deep LSTM network for skeleton based action recognition. Inspired by the observation that the co-occurrences of the joints intrinsically characterize human actions, we take the skeleton as the input at each time slot and introduce a novel regularization scheme to learn the co-occurrence features of skeleton joints. To train the deep LSTM network effectively, we propose a new dropout algorithm which simultaneously operates on the gates, cells, and output responses of the LSTM neurons. Experimental results on three human action recognition datasets consistently demonstrate the effectiveness of the proposed model. |
Tasks | Skeleton Based Action Recognition, Temporal Action Localization |
Published | 2016-03-24 |
URL | http://arxiv.org/abs/1603.07772v1 |
http://arxiv.org/pdf/1603.07772v1.pdf | |
PWC | https://paperswithcode.com/paper/co-occurrence-feature-learning-for-skeleton |
Repo | |
Framework | |
AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models
Title | AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models |
Authors | Karl Krauth, Edwin V. Bonilla, Kurt Cutajar, Maurizio Filippone |
Abstract | We investigate the capabilities and limitations of Gaussian process models by jointly exploring three complementary directions: (i) scalable and statistically efficient inference; (ii) flexible kernels; and (iii) objective functions for hyperparameter learning alternative to the marginal likelihood. Our approach outperforms all previously reported GP methods on the standard MNIST dataset; performs comparatively to previous kernel-based methods using the RECTANGLES-IMAGE dataset; and breaks the 1% error-rate barrier in GP models using the MNIST8M dataset, showing along the way the scalability of our method at unprecedented scale for GP models (8 million observations) in classification problems. Overall, our approach represents a significant breakthrough in kernel methods and GP models, bridging the gap between deep learning approaches and kernel machines. |
Tasks | |
Published | 2016-10-18 |
URL | http://arxiv.org/abs/1610.05392v3 |
http://arxiv.org/pdf/1610.05392v3.pdf | |
PWC | https://paperswithcode.com/paper/autogp-exploring-the-capabilities-and |
Repo | |
Framework | |
Clustering with Missing Features: A Penalized Dissimilarity Measure based approach
Title | Clustering with Missing Features: A Penalized Dissimilarity Measure based approach |
Authors | Shounak Datta, Supritam Bhattacharjee, Swagatam Das |
Abstract | Many real-world clustering problems are plagued by incomplete data characterized by missing or absent features for some or all of the data instances. Traditional clustering methods cannot be directly applied to such data without preprocessing by imputation or marginalization techniques. In this article, we overcome this drawback by utilizing a penalized dissimilarity measure which we refer to as the Feature Weighted Penalty based Dissimilarity (FWPD). Using the FWPD measure, we modify the traditional k-means clustering algorithm and the standard hierarchical agglomerative clustering algorithms so as to make them directly applicable to datasets with missing features. We present time complexity analyses for these new techniques and also undertake a detailed theoretical analysis showing that the new FWPD based k-means algorithm converges to a local optimum within a finite number of iterations. We also present a detailed method for simulating random as well as feature dependent missingness. We report extensive experiments on various benchmark datasets for different types of missingness showing that the proposed clustering techniques have generally better results compared to some of the most well-known imputation methods which are commonly used to handle such incomplete data. We append a possible extension of the proposed dissimilarity measure to the case of absent features (where the unobserved features are known to be undefined). |
Tasks | Imputation |
Published | 2016-04-22 |
URL | http://arxiv.org/abs/1604.06602v7 |
http://arxiv.org/pdf/1604.06602v7.pdf | |
PWC | https://paperswithcode.com/paper/clustering-with-missing-features-a-penalized |
Repo | |
Framework | |
Ensemble of Jointly Trained Deep Neural Network-Based Acoustic Models for Reverberant Speech Recognition
Title | Ensemble of Jointly Trained Deep Neural Network-Based Acoustic Models for Reverberant Speech Recognition |
Authors | Jeehye Lee, Myungin Lee, Joon-Hyuk Chang |
Abstract | Distant speech recognition is a challenge, particularly due to the corruption of speech signals by reverberation caused by large distances between the speaker and microphone. In order to cope with a wide range of reverberations in real-world situations, we present novel approaches for acoustic modeling including an ensemble of deep neural networks (DNNs) and an ensemble of jointly trained DNNs. First, multiple DNNs are established, each of which corresponds to a different reverberation time 60 (RT60) in a setup step. Also, each model in the ensemble of DNN acoustic models is further jointly trained, including both feature mapping and acoustic modeling, where the feature mapping is designed for the dereverberation as a front-end. In a testing phase, the two most likely DNNs are chosen from the DNN ensemble using maximum a posteriori (MAP) probabilities, computed in an online fashion by using maximum likelihood (ML)-based blind RT60 estimation and then the posterior probability outputs from two DNNs are combined using the ML-based weights as a simple average. Extensive experiments demonstrate that the proposed approach leads to substantial improvements in speech recognition accuracy over the conventional DNN baseline systems under diverse reverberant conditions. |
Tasks | Distant Speech Recognition, Speech Recognition |
Published | 2016-08-17 |
URL | http://arxiv.org/abs/1608.04983v1 |
http://arxiv.org/pdf/1608.04983v1.pdf | |
PWC | https://paperswithcode.com/paper/ensemble-of-jointly-trained-deep-neural |
Repo | |
Framework | |
Achieving Budget-optimality with Adaptive Schemes in Crowdsourcing
Title | Achieving Budget-optimality with Adaptive Schemes in Crowdsourcing |
Authors | Ashish Khetan, Sewoong Oh |
Abstract | Crowdsourcing platforms provide marketplaces where task requesters can pay to get labels on their data. Such markets have emerged recently as popular venues for collecting annotations that are crucial in training machine learning models in various applications. However, as jobs are tedious and payments are low, errors are common in such crowdsourced labels. A common strategy to overcome such noise in the answers is to add redundancy by getting multiple answers for each task and aggregating them using some methods such as majority voting. For such a system, there is a fundamental question of interest: how can we maximize the accuracy given a fixed budget on how many responses we can collect on the crowdsourcing system. We characterize this fundamental trade-off between the budget (how many answers the requester can collect in total) and the accuracy in the estimated labels. In particular, we ask whether adaptive task assignment schemes lead to a more efficient trade-off between the accuracy and the budget. Adaptive schemes, where tasks are assigned adaptively based on the data collected thus far, are widely used in practical crowdsourcing systems to efficiently use a given fixed budget. However, existing theoretical analyses of crowdsourcing systems suggest that the gain of adaptive task assignments is minimal. To bridge this gap, we investigate this question under a strictly more general probabilistic model, which has been recently introduced to model practical crowdsourced annotations. Under this generalized Dawid-Skene model, we characterize the fundamental trade-off between budget and accuracy. We introduce a novel adaptive scheme that matches this fundamental limit. We further quantify the fundamental gap between adaptive and non-adaptive schemes, by comparing the trade-off with the one for non-adaptive schemes. Our analyses confirm that the gap is significant. |
Tasks | |
Published | 2016-02-10 |
URL | http://arxiv.org/abs/1602.03481v3 |
http://arxiv.org/pdf/1602.03481v3.pdf | |
PWC | https://paperswithcode.com/paper/achieving-budget-optimality-with-adaptive |
Repo | |
Framework | |
Dynamic landscape models of coevolutionary games
Title | Dynamic landscape models of coevolutionary games |
Authors | Hendrik Richter |
Abstract | Players of coevolutionary games may update not only their strategies but also their networks of interaction. Based on interpreting the payoff of players as fitness, dynamic landscape models are proposed. The modeling procedure is carried out for Prisoner’s Dilemma (PD) and Snowdrift (SD) games that both use either birth–death (BD) or death–birth (DB) strategy updating. The main focus is on using dynamic fitness landscapes as a mathematical model of coevolutionary game dynamics. Hence, an alternative tool for analyzing coevolutionary games becomes available, and landscape measures such as modality, ruggedness and information content can be computed and analyzed. In addition, fixation properties of the games and quantifiers characterizing the interaction networks are calculated numerically. Relations are established between landscape properties expressed by landscape measures and quantifiers of coevolutionary game dynamics such as fixation probabilities, fixation times and network properties. |
Tasks | |
Published | 2016-11-28 |
URL | http://arxiv.org/abs/1611.09149v2 |
http://arxiv.org/pdf/1611.09149v2.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-landscape-models-of-coevolutionary |
Repo | |
Framework | |
Learning by tracking: Siamese CNN for robust target association
Title | Learning by tracking: Siamese CNN for robust target association |
Authors | Laura Leal-Taixé, Cristian Canton Ferrer, Konrad Schindler |
Abstract | This paper introduces a novel approach to the task of data association within the context of pedestrian tracking, by introducing a two-stage learning scheme to match pairs of detections. First, a Siamese convolutional neural network (CNN) is trained to learn descriptors encoding local spatio-temporal structures between the two input image patches, aggregating pixel values and optical flow information. Second, a set of contextual features derived from the position and size of the compared input patches are combined with the CNN output by means of a gradient boosting classifier to generate the final matching probability. This learning approach is validated by using a linear programming based multi-person tracker showing that even a simple and efficient tracker may outperform much more complex models when fed with our learned matching probabilities. Results on publicly available sequences show that our method meets state-of-the-art standards in multiple people tracking. |
Tasks | Multiple People Tracking, Optical Flow Estimation |
Published | 2016-04-26 |
URL | http://arxiv.org/abs/1604.07866v3 |
http://arxiv.org/pdf/1604.07866v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-by-tracking-siamese-cnn-for-robust |
Repo | |
Framework | |
Hierarchically Compositional Kernels for Scalable Nonparametric Learning
Title | Hierarchically Compositional Kernels for Scalable Nonparametric Learning |
Authors | Jie Chen, Haim Avron, Vikas Sindhwani |
Abstract | We propose a novel class of kernels to alleviate the high computational cost of large-scale nonparametric learning with kernel methods. The proposed kernel is defined based on a hierarchical partitioning of the underlying data domain, where the Nystr"om method (a globally low-rank approximation) is married with a locally lossless approximation in a hierarchical fashion. The kernel maintains (strict) positive-definiteness. The corresponding kernel matrix admits a recursively off-diagonal low-rank structure, which allows for fast linear algebra computations. Suppressing the factor of data dimension, the memory and arithmetic complexities for training a regression or a classifier are reduced from $O(n^2)$ and $O(n^3)$ to $O(nr)$ and $O(nr^2)$, respectively, where $n$ is the number of training examples and $r$ is the rank on each level of the hierarchy. Although other randomized approximate kernels entail a similar complexity, empirical results show that the proposed kernel achieves a matching performance with a smaller $r$. We demonstrate comprehensive experiments to show the effective use of the proposed kernel on data sizes up to the order of millions. |
Tasks | |
Published | 2016-08-02 |
URL | http://arxiv.org/abs/1608.00860v2 |
http://arxiv.org/pdf/1608.00860v2.pdf | |
PWC | https://paperswithcode.com/paper/hierarchically-compositional-kernels-for |
Repo | |
Framework | |
Parallel Stroked Multi Line: a model-based method for compressing large fingerprint databases
Title | Parallel Stroked Multi Line: a model-based method for compressing large fingerprint databases |
Authors | Hamid Mansouri, Hamid-Reza Pourreza |
Abstract | With increasing usage of fingerprints as an important biometric data, the need to compress the large fingerprint databases has become essential. The most recommended compression algorithm, even by standards, is JPEG2K. But at high compression rates, this algorithm is ineffective. In this paper, a model is proposed which is based on parallel lines with same orientations, arbitrary widths and same gray level values located on rectangle with constant gray level value as background. We refer to this algorithm as Parallel Stroked Multi Line (PSML). By using Adaptive Geometrical Wavelet and employing PSML, a compression algorithm is developed. This compression algorithm can preserve fingerprint structure and minutiae. The exact algorithm of computing the PSML model take exponential time. However, we have proposed an alternative approximation algorithm, which reduces the time complexity to $O(n^3)$. The proposed PSML alg. has significant advantage over Wedgelets Transform in PSNR value and visual quality in compressed images. The proposed method, despite the lower PSNR values than JPEG2K algorithm in common range of compression rates, in all compression rates have nearly equal or greater advantage over JPEG2K when used by Automatic Fingerprint Identification Systems (AFIS). At high compression rates, according to PSNR values, mean EER rate and visual quality, the encoded images with JPEG2K can not be identified from each other after compression. But, images encoded by the PSML alg. retained the sufficient information to maintain fingerprint identification performances similar to the ones obtained by raw images without compression. One the U.are.U 400 database, the mean EER rate for uncompressed images is 4.54%, while at 267:1 compression ratio, this value becomes 49.41% and 6.22% for JPEG2K and PSML, respectively. This result shows a significant improvement over the standard JPEG2K algorithm. |
Tasks | |
Published | 2016-01-10 |
URL | http://arxiv.org/abs/1601.02225v1 |
http://arxiv.org/pdf/1601.02225v1.pdf | |
PWC | https://paperswithcode.com/paper/parallel-stroked-multi-line-a-model-based |
Repo | |
Framework | |
Learning a Static Analyzer from Data
Title | Learning a Static Analyzer from Data |
Authors | Pavol Bielik, Veselin Raychev, Martin Vechev |
Abstract | To be practically useful, modern static analyzers must precisely model the effect of both, statements in the programming language as well as frameworks used by the program under analysis. While important, manually addressing these challenges is difficult for at least two reasons: (i) the effects on the overall analysis can be non-trivial, and (ii) as the size and complexity of modern libraries increase, so is the number of cases the analysis must handle. In this paper we present a new, automated approach for creating static analyzers: instead of manually providing the various inference rules of the analyzer, the key idea is to learn these rules from a dataset of programs. Our method consists of two ingredients: (i) a synthesis algorithm capable of learning a candidate analyzer from a given dataset, and (ii) a counter-example guided learning procedure which generates new programs beyond those in the initial dataset, critical for discovering corner cases and ensuring the learned analysis generalizes to unseen programs. We implemented and instantiated our approach to the task of learning JavaScript static analysis rules for a subset of points-to analysis and for allocation sites analysis. These are challenging yet important problems that have received significant research attention. We show that our approach is effective: our system automatically discovered practical and useful inference rules for many cases that are tricky to manually identify and are missed by state-of-the-art, manually tuned analyzers. |
Tasks | |
Published | 2016-11-06 |
URL | http://arxiv.org/abs/1611.01752v2 |
http://arxiv.org/pdf/1611.01752v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-a-static-analyzer-from-data |
Repo | |
Framework | |
Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks
Title | Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks |
Authors | Lukas Cavigelli, Dominic Bernath, Michele Magno, Luca Benini |
Abstract | Detecting and classifying targets in video streams from surveillance cameras is a cumbersome, error-prone and expensive task. Often, the incurred costs are prohibitive for real-time monitoring. This leads to data being stored locally or transmitted to a central storage site for post-incident examination. The required communication links and archiving of the video data are still expensive and this setup excludes preemptive actions to respond to imminent threats. An effective way to overcome these limitations is to build a smart camera that transmits alerts when relevant video sequences are detected. Deep neural networks (DNNs) have come to outperform humans in visual classifications tasks. The concept of DNNs and Convolutional Networks (ConvNets) can easily be extended to make use of higher-dimensional input data such as multispectral data. We explore this opportunity in terms of achievable accuracy and required computational effort. To analyze the precision of DNNs for scene labeling in an urban surveillance scenario we have created a dataset with 8 classes obtained in a field experiment. We combine an RGB camera with a 25-channel VIS-NIR snapshot sensor to assess the potential of multispectral image data for target classification. We evaluate several new DNNs, showing that the spectral information fused together with the RGB frames can be used to improve the accuracy of the system or to achieve similar accuracy with a 3x smaller computation effort. We achieve a very high per-pixel accuracy of 99.1%. Even for scarcely occurring, but particularly interesting classes, such as cars, 75% of the pixels are labeled correctly with errors occurring only around the border of the objects. This high accuracy was obtained with a training set of only 30 labeled images, paving the way for fast adaptation to various application scenarios. |
Tasks | Scene Labeling |
Published | 2016-11-09 |
URL | http://arxiv.org/abs/1611.03130v1 |
http://arxiv.org/pdf/1611.03130v1.pdf | |
PWC | https://paperswithcode.com/paper/computationally-efficient-target |
Repo | |
Framework | |
On the emergence of syntactic structures: quantifying and modelling duality of patterning
Title | On the emergence of syntactic structures: quantifying and modelling duality of patterning |
Authors | Vittorio Loreto, Pietro Gravino, Vito D. P. Servedio, Francesca Tria |
Abstract | The complex organization of syntax in hierarchical structures is one of the core design features of human language. Duality of patterning refers for instance to the organization of the meaningful elements in a language at two distinct levels: a combinatorial level where meaningless forms are combined into meaningful forms and a compositional level where meaningful forms are composed into larger lexical units. The question remains wide open regarding how such a structure could have emerged. Furthermore a clear mathematical framework to quantify this phenomenon is still lacking. The aim of this paper is that of addressing these two aspects in a self-consistent way. First, we introduce suitable measures to quantify the level of combinatoriality and compositionality in a language, and present a framework to estimate these observables in human natural languages. Second, we show that the theoretical predictions of a multi-agents modeling scheme, namely the Blending Game, are in surprisingly good agreement with empirical data. In the Blending Game a population of individuals plays language games aiming at success in communication. It is remarkable that the two sides of duality of patterning emerge simultaneously as a consequence of a pure cultural dynamics in a simulated environment that contains meaningful relations, provided a simple constraint on message transmission fidelity is also considered. |
Tasks | |
Published | 2016-02-11 |
URL | http://arxiv.org/abs/1602.03661v1 |
http://arxiv.org/pdf/1602.03661v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-emergence-of-syntactic-structures |
Repo | |
Framework | |