Paper Group AWR 20
Bayesian optimization of hyper-parameters in reservoir computing. Modeling Context in Referring Expressions. Aggregated Residual Transformations for Deep Neural Networks. Scale-constrained Unsupervised Evaluation Method for Multi-scale Image Segmentation. Depth Image Inpainting: Improving Low Rank Matrix Completion with Low Gradient Regularization. …
Bayesian optimization of hyper-parameters in reservoir computing
Title | Bayesian optimization of hyper-parameters in reservoir computing |
Authors | Jan Yperman, Thijs Becker |
Abstract | We describe a method for searching the optimal hyper-parameters in reservoir computing, which consists of a Gaussian process with Bayesian optimization. It provides an alternative to other frequently used optimization methods such as grid, random, or manual search. In addition to a set of optimal hyper-parameters, the method also provides a probability distribution of the cost function as a function of the hyper-parameters. We apply this method to two types of reservoirs: nonlinear delay nodes and echo state networks. It shows excellent performance on all considered benchmarks, either matching or significantly surpassing results found in the literature. In general, the algorithm achieves optimal results in fewer iterations when compared to other optimization methods. We have optimized up to six hyper-parameters simultaneously, which would have been infeasible using, e.g., grid search. Due to its automated nature, this method significantly reduces the need for expert knowledge when optimizing the hyper-parameters in reservoir computing. Existing software libraries for Bayesian optimization, such as Spearmint, make the implementation of the algorithm straightforward. A fork of the Spearmint framework along with a tutorial on how to use it in practice is available at https://bitbucket.org/uhasseltmachinelearning/spearmint/ |
Tasks | |
Published | 2016-11-16 |
URL | http://arxiv.org/abs/1611.05193v3 |
http://arxiv.org/pdf/1611.05193v3.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-optimization-of-hyper-parameters-in |
Repo | https://github.com/rednotion/parallel_esn_web |
Framework | none |
Modeling Context in Referring Expressions
Title | Modeling Context in Referring Expressions |
Authors | Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, Tamara L. Berg |
Abstract | Humans refer to objects in their environments all the time, especially in dialogue with other people. We explore generating and comprehending natural language referring expressions for objects in images. In particular, we focus on incorporating better measures of visual context into referring expression models and find that visual comparison to other objects within an image helps improve performance significantly. We also develop methods to tie the language generation process together, so that we generate expressions for all objects of a particular category jointly. Evaluation on three recent datasets - RefCOCO, RefCOCO+, and RefCOCOg, shows the advantages of our methods for both referring expression generation and comprehension. |
Tasks | Text Generation |
Published | 2016-07-31 |
URL | http://arxiv.org/abs/1608.00272v3 |
http://arxiv.org/pdf/1608.00272v3.pdf | |
PWC | https://paperswithcode.com/paper/modeling-context-in-referring-expressions |
Repo | https://github.com/lichengunc/refer |
Framework | none |
Aggregated Residual Transformations for Deep Neural Networks
Title | Aggregated Residual Transformations for Deep Neural Networks |
Authors | Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He |
Abstract | We present a simple, highly modularized network architecture for image classification. Our network is constructed by repeating a building block that aggregates a set of transformations with the same topology. Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy exposes a new dimension, which we call “cardinality” (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width. On the ImageNet-1K dataset, we empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy. Moreover, increasing cardinality is more effective than going deeper or wider when we increase the capacity. Our models, named ResNeXt, are the foundations of our entry to the ILSVRC 2016 classification task in which we secured 2nd place. We further investigate ResNeXt on an ImageNet-5K set and the COCO detection set, also showing better results than its ResNet counterpart. The code and models are publicly available online. |
Tasks | Image Classification |
Published | 2016-11-16 |
URL | http://arxiv.org/abs/1611.05431v2 |
http://arxiv.org/pdf/1611.05431v2.pdf | |
PWC | https://paperswithcode.com/paper/aggregated-residual-transformations-for-deep |
Repo | https://github.com/guilherme-pombo/keras_resnext_fpn |
Framework | none |
Scale-constrained Unsupervised Evaluation Method for Multi-scale Image Segmentation
Title | Scale-constrained Unsupervised Evaluation Method for Multi-scale Image Segmentation |
Authors | Yuhang Lu, Youchuan Wan, Gang Li |
Abstract | Unsupervised evaluation of segmentation quality is a crucial step in image segmentation applications. Previous unsupervised evaluation methods usually lacked the adaptability to multi-scale segmentation. A scale-constrained evaluation method that evaluates segmentation quality according to the specified target scale is proposed in this paper. First, regional saliency and merging cost are employed to describe intra-region homogeneity and inter-region heterogeneity, respectively. Subsequently, both of them are standardized into equivalent spectral distances of a predefined region. Finally, by analyzing the relationship between image characteristics and segmentation quality, we establish the evaluation model. Experimental results show that the proposed method outperforms four commonly used unsupervised methods in multi-scale evaluation tasks. |
Tasks | Semantic Segmentation |
Published | 2016-11-15 |
URL | http://arxiv.org/abs/1611.04850v1 |
http://arxiv.org/pdf/1611.04850v1.pdf | |
PWC | https://paperswithcode.com/paper/scale-constrained-unsupervised-evaluation |
Repo | https://github.com/Rudy423/SegEvaluation |
Framework | none |
Depth Image Inpainting: Improving Low Rank Matrix Completion with Low Gradient Regularization
Title | Depth Image Inpainting: Improving Low Rank Matrix Completion with Low Gradient Regularization |
Authors | Hongyang Xue, Shengming Zhang, Deng Cai |
Abstract | We consider the case of inpainting single depth images. Without corresponding color images, previous or next frames, depth image inpainting is quite challenging. One natural solution is to regard the image as a matrix and adopt the low rank regularization just as inpainting color images. However, the low rank assumption does not make full use of the properties of depth images. A shallow observation may inspire us to penalize the non-zero gradients by sparse gradient regularization. However, statistics show that though most pixels have zero gradients, there is still a non-ignorable part of pixels whose gradients are equal to 1. Based on this specific property of depth images , we propose a low gradient regularization method in which we reduce the penalty for gradient 1 while penalizing the non-zero gradients to allow for gradual depth changes. The proposed low gradient regularization is integrated with the low rank regularization into the low rank low gradient approach for depth image inpainting. We compare our proposed low gradient regularization with sparse gradient regularization. The experimental results show the effectiveness of our proposed approach. |
Tasks | Image Inpainting, Low-Rank Matrix Completion, Matrix Completion |
Published | 2016-04-20 |
URL | http://arxiv.org/abs/1604.05817v1 |
http://arxiv.org/pdf/1604.05817v1.pdf | |
PWC | https://paperswithcode.com/paper/depth-image-inpainting-improving-low-rank |
Repo | https://github.com/xuehy/depthInpainting |
Framework | none |
Auditing Black-box Models for Indirect Influence
Title | Auditing Black-box Models for Indirect Influence |
Authors | Philip Adler, Casey Falk, Sorelle A. Friedler, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, Suresh Venkatasubramanian |
Abstract | Data-trained predictive models see widespread use, but for the most part they are used as black boxes which output a prediction or score. It is therefore hard to acquire a deeper understanding of model behavior, and in particular how different features influence the model prediction. This is important when interpreting the behavior of complex models, or asserting that certain problematic attributes (like race or gender) are not unduly influencing decisions. In this paper, we present a technique for auditing black-box models, which lets us study the extent to which existing models take advantage of particular features in the dataset, without knowing how the models work. Our work focuses on the problem of indirect influence: how some features might indirectly influence outcomes via other, related features. As a result, we can find attribute influences even in cases where, upon further direct examination of the model, the attribute is not referred to by the model at all. Our approach does not require the black-box model to be retrained. This is important if (for example) the model is only accessible via an API, and contrasts our work with other methods that investigate feature influence like feature selection. We present experimental evidence for the effectiveness of our procedure using a variety of publicly available datasets and models. We also validate our procedure using techniques from interpretable learning and feature selection, as well as against other black-box auditing procedures. |
Tasks | Feature Selection |
Published | 2016-02-23 |
URL | http://arxiv.org/abs/1602.07043v2 |
http://arxiv.org/pdf/1602.07043v2.pdf | |
PWC | https://paperswithcode.com/paper/auditing-black-box-models-for-indirect |
Repo | https://github.com/cfalk/BlackBoxAuditing |
Framework | tf |
Low-effort place recognition with WiFi fingerprints using deep learning
Title | Low-effort place recognition with WiFi fingerprints using deep learning |
Authors | Michał Nowicki, Jan Wietrzykowski |
Abstract | Using WiFi signals for indoor localization is the main localization modality of the existing personal indoor localization systems operating on mobile devices. WiFi fingerprinting is also used for mobile robots, as WiFi signals are usually available indoors and can provide rough initial position estimate or can be used together with other positioning systems. Currently, the best solutions rely on filtering, manual data analysis, and time-consuming parameter tuning to achieve reliable and accurate localization. In this work, we propose to use deep neural networks to significantly lower the work-force burden of the localization system design, while still achieving satisfactory results. Assuming the state-of-the-art hierarchical approach, we employ the DNN system for building/floor classification. We show that stacked autoencoders allow to efficiently reduce the feature space in order to achieve robust and precise classification. The proposed architecture is verified on the publicly available UJIIndoorLoc dataset and the results are compared with other solutions. |
Tasks | |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.02049v2 |
http://arxiv.org/pdf/1611.02049v2.pdf | |
PWC | https://paperswithcode.com/paper/low-effort-place-recognition-with-wifi |
Repo | https://github.com/vohoaiviet/indoor_localization |
Framework | none |
Can Evolutionary Sampling Improve Bagged Ensembles?
Title | Can Evolutionary Sampling Improve Bagged Ensembles? |
Authors | Harsh Nisar, Bhanu Pratap Singh Rawat |
Abstract | Perturb and Combine (P&C) group of methods generate multiple versions of the predictor by perturbing the training set or construction and then combining them into a single predictor (Breiman, 1996b). The motive is to improve the accuracy in unstable classification and regression methods. One of the most well known method in this group is Bagging. Arcing or Adaptive Resampling and Combining methods like AdaBoost are smarter variants of P&C methods. In this extended abstract, we lay the groundwork for a new family of methods under the P&C umbrella, known as Evolutionary Sampling (ES). We employ Evolutionary algorithms to suggest smarter sampling in both the feature space (sub-spaces) as well as training samples. We discuss multiple fitness functions to assess ensembles and empirically compare our performance against randomized sampling of training data and feature sub-spaces. |
Tasks | |
Published | 2016-10-03 |
URL | http://arxiv.org/abs/1610.00465v1 |
http://arxiv.org/pdf/1610.00465v1.pdf | |
PWC | https://paperswithcode.com/paper/can-evolutionary-sampling-improve-bagged |
Repo | https://github.com/evoml/evoml |
Framework | none |
Wide Residual Networks
Title | Wide Residual Networks |
Authors | Sergey Zagoruyko, Nikos Komodakis |
Abstract | Deep residual networks were shown to be able to scale up to thousands of layers and still have improving performance. However, each fraction of a percent of improved accuracy costs nearly doubling the number of layers, and so training very deep residual networks has a problem of diminishing feature reuse, which makes these networks very slow to train. To tackle these problems, in this paper we conduct a detailed experimental study on the architecture of ResNet blocks, based on which we propose a novel architecture where we decrease depth and increase width of residual networks. We call the resulting network structures wide residual networks (WRNs) and show that these are far superior over their commonly used thin and very deep counterparts. For example, we demonstrate that even a simple 16-layer-deep wide residual network outperforms in accuracy and efficiency all previous deep residual networks, including thousand-layer-deep networks, achieving new state-of-the-art results on CIFAR, SVHN, COCO, and significant improvements on ImageNet. Our code and models are available at https://github.com/szagoruyko/wide-residual-networks |
Tasks | Image Classification |
Published | 2016-05-23 |
URL | http://arxiv.org/abs/1605.07146v4 |
http://arxiv.org/pdf/1605.07146v4.pdf | |
PWC | https://paperswithcode.com/paper/wide-residual-networks |
Repo | https://github.com/meliketoy/wide-resnet.pytorch |
Framework | pytorch |
Sigma Delta Quantized Networks
Title | Sigma Delta Quantized Networks |
Authors | Peter O’Connor, Max Welling |
Abstract | Deep neural networks can be obscenely wasteful. When processing video, a convolutional network expends a fixed amount of computation for each frame with no regard to the similarity between neighbouring frames. As a result, it ends up repeatedly doing very similar computations. To put an end to such waste, we introduce Sigma-Delta networks. With each new input, each layer in this network sends a discretized form of its change in activation to the next layer. Thus the amount of computation that the network does scales with the amount of change in the input and layer activations, rather than the size of the network. We introduce an optimization method for converting any pre-trained deep network into an optimally efficient Sigma-Delta network, and show that our algorithm, if run on the appropriate hardware, could cut at least an order of magnitude from the computational cost of processing video data. |
Tasks | |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.02024v2 |
http://arxiv.org/pdf/1611.02024v2.pdf | |
PWC | https://paperswithcode.com/paper/sigma-delta-quantized-networks |
Repo | https://github.com/petered/sigma-delta |
Framework | none |
Adjusting for Dropout Variance in Batch Normalization and Weight Initialization
Title | Adjusting for Dropout Variance in Batch Normalization and Weight Initialization |
Authors | Dan Hendrycks, Kevin Gimpel |
Abstract | We show how to adjust for the variance introduced by dropout with corrections to weight initialization and Batch Normalization, yielding higher accuracy. Though dropout can preserve the expected input to a neuron between train and test, the variance of the input differs. We thus propose a new weight initialization by correcting for the influence of dropout rates and an arbitrary nonlinearity’s influence on variance through simple corrective scalars. Since Batch Normalization trained with dropout estimates the variance of a layer’s incoming distribution with some inputs dropped, the variance also differs between train and test. After training a network with Batch Normalization and dropout, we simply update Batch Normalization’s variance moving averages with dropout off and obtain state of the art on CIFAR-10 and CIFAR-100 without data augmentation. |
Tasks | Data Augmentation |
Published | 2016-07-08 |
URL | http://arxiv.org/abs/1607.02488v2 |
http://arxiv.org/pdf/1607.02488v2.pdf | |
PWC | https://paperswithcode.com/paper/adjusting-for-dropout-variance-in-batch |
Repo | https://github.com/hendrycks/init |
Framework | tf |
The Z-loss: a shift and scale invariant classification loss belonging to the Spherical Family
Title | The Z-loss: a shift and scale invariant classification loss belonging to the Spherical Family |
Authors | Alexandre de Brébisson, Pascal Vincent |
Abstract | Despite being the standard loss function to train multi-class neural networks, the log-softmax has two potential limitations. First, it involves computations that scale linearly with the number of output classes, which can restrict the size of problems we are able to tackle with current hardware. Second, it remains unclear how close it matches the task loss such as the top-k error rate or other non-differentiable evaluation metrics which we aim to optimize ultimately. In this paper, we introduce an alternative classification loss function, the Z-loss, which is designed to address these two issues. Unlike the log-softmax, it has the desirable property of belonging to the spherical loss family (Vincent et al., 2015), a class of loss functions for which training can be performed very efficiently with a complexity independent of the number of output classes. We show experimentally that it significantly outperforms the other spherical loss functions previously investigated. Furthermore, we show on a word language modeling task that it also outperforms the log-softmax with respect to certain ranking scores, such as top-k scores, suggesting that the Z-loss has the flexibility to better match the task loss. These qualities thus makes the Z-loss an appealing candidate to train very efficiently large output networks such as word-language models or other extreme classification problems. On the One Billion Word (Chelba et al., 2014) dataset, we are able to train a model with the Z-loss 40 times faster than the log-softmax and more than 4 times faster than the hierarchical softmax. |
Tasks | Language Modelling |
Published | 2016-04-29 |
URL | http://arxiv.org/abs/1604.08859v2 |
http://arxiv.org/pdf/1604.08859v2.pdf | |
PWC | https://paperswithcode.com/paper/the-z-loss-a-shift-and-scale-invariant |
Repo | https://github.com/pascal20100/factored_output_layer |
Framework | none |
GLEU Without Tuning
Title | GLEU Without Tuning |
Authors | Courtney Napoles, Keisuke Sakaguchi, Matt Post, Joel Tetreault |
Abstract | The GLEU metric was proposed for evaluating grammatical error corrections using n-gram overlap with a set of reference sentences, as opposed to precision/recall of specific annotated errors (Napoles et al., 2015). This paper describes improvements made to the GLEU metric that address problems that arise when using an increasing number of reference sets. Unlike the originally presented metric, the modified metric does not require tuning. We recommend that this version be used instead of the original version. |
Tasks | |
Published | 2016-05-09 |
URL | http://arxiv.org/abs/1605.02592v1 |
http://arxiv.org/pdf/1605.02592v1.pdf | |
PWC | https://paperswithcode.com/paper/gleu-without-tuning |
Repo | https://github.com/cnap/gec-ranking |
Framework | none |
Max-Margin Deep Generative Models for (Semi-)Supervised Learning
Title | Max-Margin Deep Generative Models for (Semi-)Supervised Learning |
Authors | Chongxuan Li, Jun Zhu, Bo Zhang |
Abstract | Deep generative models (DGMs) are effective on learning multilayered representations of complex data and performing inference of input data by exploring the generative ability. However, it is relatively insufficient to empower the discriminative ability of DGMs on making accurate predictions. This paper presents max-margin deep generative models (mmDGMs) and a class-conditional variant (mmDCGMs), which explore the strongly discriminative principle of max-margin learning to improve the predictive performance of DGMs in both supervised and semi-supervised learning, while retaining the generative capability. In semi-supervised learning, we use the predictions of a max-margin classifier as the missing labels instead of performing full posterior inference for efficiency; we also introduce additional max-margin and label-balance regularization terms of unlabeled data for effectiveness. We develop an efficient doubly stochastic subgradient algorithm for the piecewise linear objectives in different settings. Empirical results on various datasets demonstrate that: (1) max-margin learning can significantly improve the prediction performance of DGMs and meanwhile retain the generative ability; (2) in supervised learning, mmDGMs are competitive to the best fully discriminative networks when employing convolutional neural networks as the generative and recognition models; and (3) in semi-supervised learning, mmDCGMs can perform efficient inference and achieve state-of-the-art classification results on several benchmarks. |
Tasks | |
Published | 2016-11-22 |
URL | http://arxiv.org/abs/1611.07119v1 |
http://arxiv.org/pdf/1611.07119v1.pdf | |
PWC | https://paperswithcode.com/paper/max-margin-deep-generative-models-for-semi |
Repo | https://github.com/thu-ml/mmdcgm-ssl |
Framework | none |
Auxiliary Deep Generative Models
Title | Auxiliary Deep Generative Models |
Authors | Lars Maaløe, Casper Kaae Sønderby, Søren Kaae Sønderby, Ole Winther |
Abstract | Deep generative models parameterized by neural networks have recently achieved state-of-the-art performance in unsupervised and semi-supervised learning. We extend deep generative models with auxiliary variables which improves the variational approximation. The auxiliary variables leave the generative model unchanged but make the variational distribution more expressive. Inspired by the structure of the auxiliary variable we also propose a model with two stochastic layers and skip connections. Our findings suggest that more expressive and properly specified deep generative models converge faster with better results. We show state-of-the-art performance within semi-supervised learning on MNIST, SVHN and NORB datasets. |
Tasks | |
Published | 2016-02-17 |
URL | http://arxiv.org/abs/1602.05473v4 |
http://arxiv.org/pdf/1602.05473v4.pdf | |
PWC | https://paperswithcode.com/paper/auxiliary-deep-generative-models |
Repo | https://github.com/larsmaaloee/auxiliary-deep-generative-models |
Framework | none |