May 7, 2019

2942 words 14 mins read

Paper Group AWR 20

Paper Group AWR 20

Bayesian optimization of hyper-parameters in reservoir computing. Modeling Context in Referring Expressions. Aggregated Residual Transformations for Deep Neural Networks. Scale-constrained Unsupervised Evaluation Method for Multi-scale Image Segmentation. Depth Image Inpainting: Improving Low Rank Matrix Completion with Low Gradient Regularization. …

Bayesian optimization of hyper-parameters in reservoir computing

Title Bayesian optimization of hyper-parameters in reservoir computing
Authors Jan Yperman, Thijs Becker
Abstract We describe a method for searching the optimal hyper-parameters in reservoir computing, which consists of a Gaussian process with Bayesian optimization. It provides an alternative to other frequently used optimization methods such as grid, random, or manual search. In addition to a set of optimal hyper-parameters, the method also provides a probability distribution of the cost function as a function of the hyper-parameters. We apply this method to two types of reservoirs: nonlinear delay nodes and echo state networks. It shows excellent performance on all considered benchmarks, either matching or significantly surpassing results found in the literature. In general, the algorithm achieves optimal results in fewer iterations when compared to other optimization methods. We have optimized up to six hyper-parameters simultaneously, which would have been infeasible using, e.g., grid search. Due to its automated nature, this method significantly reduces the need for expert knowledge when optimizing the hyper-parameters in reservoir computing. Existing software libraries for Bayesian optimization, such as Spearmint, make the implementation of the algorithm straightforward. A fork of the Spearmint framework along with a tutorial on how to use it in practice is available at https://bitbucket.org/uhasseltmachinelearning/spearmint/
Tasks
Published 2016-11-16
URL http://arxiv.org/abs/1611.05193v3
PDF http://arxiv.org/pdf/1611.05193v3.pdf
PWC https://paperswithcode.com/paper/bayesian-optimization-of-hyper-parameters-in
Repo https://github.com/rednotion/parallel_esn_web
Framework none

Modeling Context in Referring Expressions

Title Modeling Context in Referring Expressions
Authors Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, Tamara L. Berg
Abstract Humans refer to objects in their environments all the time, especially in dialogue with other people. We explore generating and comprehending natural language referring expressions for objects in images. In particular, we focus on incorporating better measures of visual context into referring expression models and find that visual comparison to other objects within an image helps improve performance significantly. We also develop methods to tie the language generation process together, so that we generate expressions for all objects of a particular category jointly. Evaluation on three recent datasets - RefCOCO, RefCOCO+, and RefCOCOg, shows the advantages of our methods for both referring expression generation and comprehension.
Tasks Text Generation
Published 2016-07-31
URL http://arxiv.org/abs/1608.00272v3
PDF http://arxiv.org/pdf/1608.00272v3.pdf
PWC https://paperswithcode.com/paper/modeling-context-in-referring-expressions
Repo https://github.com/lichengunc/refer
Framework none

Aggregated Residual Transformations for Deep Neural Networks

Title Aggregated Residual Transformations for Deep Neural Networks
Authors Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He
Abstract We present a simple, highly modularized network architecture for image classification. Our network is constructed by repeating a building block that aggregates a set of transformations with the same topology. Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy exposes a new dimension, which we call “cardinality” (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width. On the ImageNet-1K dataset, we empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy. Moreover, increasing cardinality is more effective than going deeper or wider when we increase the capacity. Our models, named ResNeXt, are the foundations of our entry to the ILSVRC 2016 classification task in which we secured 2nd place. We further investigate ResNeXt on an ImageNet-5K set and the COCO detection set, also showing better results than its ResNet counterpart. The code and models are publicly available online.
Tasks Image Classification
Published 2016-11-16
URL http://arxiv.org/abs/1611.05431v2
PDF http://arxiv.org/pdf/1611.05431v2.pdf
PWC https://paperswithcode.com/paper/aggregated-residual-transformations-for-deep
Repo https://github.com/guilherme-pombo/keras_resnext_fpn
Framework none

Scale-constrained Unsupervised Evaluation Method for Multi-scale Image Segmentation

Title Scale-constrained Unsupervised Evaluation Method for Multi-scale Image Segmentation
Authors Yuhang Lu, Youchuan Wan, Gang Li
Abstract Unsupervised evaluation of segmentation quality is a crucial step in image segmentation applications. Previous unsupervised evaluation methods usually lacked the adaptability to multi-scale segmentation. A scale-constrained evaluation method that evaluates segmentation quality according to the specified target scale is proposed in this paper. First, regional saliency and merging cost are employed to describe intra-region homogeneity and inter-region heterogeneity, respectively. Subsequently, both of them are standardized into equivalent spectral distances of a predefined region. Finally, by analyzing the relationship between image characteristics and segmentation quality, we establish the evaluation model. Experimental results show that the proposed method outperforms four commonly used unsupervised methods in multi-scale evaluation tasks.
Tasks Semantic Segmentation
Published 2016-11-15
URL http://arxiv.org/abs/1611.04850v1
PDF http://arxiv.org/pdf/1611.04850v1.pdf
PWC https://paperswithcode.com/paper/scale-constrained-unsupervised-evaluation
Repo https://github.com/Rudy423/SegEvaluation
Framework none

Depth Image Inpainting: Improving Low Rank Matrix Completion with Low Gradient Regularization

Title Depth Image Inpainting: Improving Low Rank Matrix Completion with Low Gradient Regularization
Authors Hongyang Xue, Shengming Zhang, Deng Cai
Abstract We consider the case of inpainting single depth images. Without corresponding color images, previous or next frames, depth image inpainting is quite challenging. One natural solution is to regard the image as a matrix and adopt the low rank regularization just as inpainting color images. However, the low rank assumption does not make full use of the properties of depth images. A shallow observation may inspire us to penalize the non-zero gradients by sparse gradient regularization. However, statistics show that though most pixels have zero gradients, there is still a non-ignorable part of pixels whose gradients are equal to 1. Based on this specific property of depth images , we propose a low gradient regularization method in which we reduce the penalty for gradient 1 while penalizing the non-zero gradients to allow for gradual depth changes. The proposed low gradient regularization is integrated with the low rank regularization into the low rank low gradient approach for depth image inpainting. We compare our proposed low gradient regularization with sparse gradient regularization. The experimental results show the effectiveness of our proposed approach.
Tasks Image Inpainting, Low-Rank Matrix Completion, Matrix Completion
Published 2016-04-20
URL http://arxiv.org/abs/1604.05817v1
PDF http://arxiv.org/pdf/1604.05817v1.pdf
PWC https://paperswithcode.com/paper/depth-image-inpainting-improving-low-rank
Repo https://github.com/xuehy/depthInpainting
Framework none

Auditing Black-box Models for Indirect Influence

Title Auditing Black-box Models for Indirect Influence
Authors Philip Adler, Casey Falk, Sorelle A. Friedler, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, Suresh Venkatasubramanian
Abstract Data-trained predictive models see widespread use, but for the most part they are used as black boxes which output a prediction or score. It is therefore hard to acquire a deeper understanding of model behavior, and in particular how different features influence the model prediction. This is important when interpreting the behavior of complex models, or asserting that certain problematic attributes (like race or gender) are not unduly influencing decisions. In this paper, we present a technique for auditing black-box models, which lets us study the extent to which existing models take advantage of particular features in the dataset, without knowing how the models work. Our work focuses on the problem of indirect influence: how some features might indirectly influence outcomes via other, related features. As a result, we can find attribute influences even in cases where, upon further direct examination of the model, the attribute is not referred to by the model at all. Our approach does not require the black-box model to be retrained. This is important if (for example) the model is only accessible via an API, and contrasts our work with other methods that investigate feature influence like feature selection. We present experimental evidence for the effectiveness of our procedure using a variety of publicly available datasets and models. We also validate our procedure using techniques from interpretable learning and feature selection, as well as against other black-box auditing procedures.
Tasks Feature Selection
Published 2016-02-23
URL http://arxiv.org/abs/1602.07043v2
PDF http://arxiv.org/pdf/1602.07043v2.pdf
PWC https://paperswithcode.com/paper/auditing-black-box-models-for-indirect
Repo https://github.com/cfalk/BlackBoxAuditing
Framework tf

Low-effort place recognition with WiFi fingerprints using deep learning

Title Low-effort place recognition with WiFi fingerprints using deep learning
Authors Michał Nowicki, Jan Wietrzykowski
Abstract Using WiFi signals for indoor localization is the main localization modality of the existing personal indoor localization systems operating on mobile devices. WiFi fingerprinting is also used for mobile robots, as WiFi signals are usually available indoors and can provide rough initial position estimate or can be used together with other positioning systems. Currently, the best solutions rely on filtering, manual data analysis, and time-consuming parameter tuning to achieve reliable and accurate localization. In this work, we propose to use deep neural networks to significantly lower the work-force burden of the localization system design, while still achieving satisfactory results. Assuming the state-of-the-art hierarchical approach, we employ the DNN system for building/floor classification. We show that stacked autoencoders allow to efficiently reduce the feature space in order to achieve robust and precise classification. The proposed architecture is verified on the publicly available UJIIndoorLoc dataset and the results are compared with other solutions.
Tasks
Published 2016-11-07
URL http://arxiv.org/abs/1611.02049v2
PDF http://arxiv.org/pdf/1611.02049v2.pdf
PWC https://paperswithcode.com/paper/low-effort-place-recognition-with-wifi
Repo https://github.com/vohoaiviet/indoor_localization
Framework none

Can Evolutionary Sampling Improve Bagged Ensembles?

Title Can Evolutionary Sampling Improve Bagged Ensembles?
Authors Harsh Nisar, Bhanu Pratap Singh Rawat
Abstract Perturb and Combine (P&C) group of methods generate multiple versions of the predictor by perturbing the training set or construction and then combining them into a single predictor (Breiman, 1996b). The motive is to improve the accuracy in unstable classification and regression methods. One of the most well known method in this group is Bagging. Arcing or Adaptive Resampling and Combining methods like AdaBoost are smarter variants of P&C methods. In this extended abstract, we lay the groundwork for a new family of methods under the P&C umbrella, known as Evolutionary Sampling (ES). We employ Evolutionary algorithms to suggest smarter sampling in both the feature space (sub-spaces) as well as training samples. We discuss multiple fitness functions to assess ensembles and empirically compare our performance against randomized sampling of training data and feature sub-spaces.
Tasks
Published 2016-10-03
URL http://arxiv.org/abs/1610.00465v1
PDF http://arxiv.org/pdf/1610.00465v1.pdf
PWC https://paperswithcode.com/paper/can-evolutionary-sampling-improve-bagged
Repo https://github.com/evoml/evoml
Framework none

Wide Residual Networks

Title Wide Residual Networks
Authors Sergey Zagoruyko, Nikos Komodakis
Abstract Deep residual networks were shown to be able to scale up to thousands of layers and still have improving performance. However, each fraction of a percent of improved accuracy costs nearly doubling the number of layers, and so training very deep residual networks has a problem of diminishing feature reuse, which makes these networks very slow to train. To tackle these problems, in this paper we conduct a detailed experimental study on the architecture of ResNet blocks, based on which we propose a novel architecture where we decrease depth and increase width of residual networks. We call the resulting network structures wide residual networks (WRNs) and show that these are far superior over their commonly used thin and very deep counterparts. For example, we demonstrate that even a simple 16-layer-deep wide residual network outperforms in accuracy and efficiency all previous deep residual networks, including thousand-layer-deep networks, achieving new state-of-the-art results on CIFAR, SVHN, COCO, and significant improvements on ImageNet. Our code and models are available at https://github.com/szagoruyko/wide-residual-networks
Tasks Image Classification
Published 2016-05-23
URL http://arxiv.org/abs/1605.07146v4
PDF http://arxiv.org/pdf/1605.07146v4.pdf
PWC https://paperswithcode.com/paper/wide-residual-networks
Repo https://github.com/meliketoy/wide-resnet.pytorch
Framework pytorch

Sigma Delta Quantized Networks

Title Sigma Delta Quantized Networks
Authors Peter O’Connor, Max Welling
Abstract Deep neural networks can be obscenely wasteful. When processing video, a convolutional network expends a fixed amount of computation for each frame with no regard to the similarity between neighbouring frames. As a result, it ends up repeatedly doing very similar computations. To put an end to such waste, we introduce Sigma-Delta networks. With each new input, each layer in this network sends a discretized form of its change in activation to the next layer. Thus the amount of computation that the network does scales with the amount of change in the input and layer activations, rather than the size of the network. We introduce an optimization method for converting any pre-trained deep network into an optimally efficient Sigma-Delta network, and show that our algorithm, if run on the appropriate hardware, could cut at least an order of magnitude from the computational cost of processing video data.
Tasks
Published 2016-11-07
URL http://arxiv.org/abs/1611.02024v2
PDF http://arxiv.org/pdf/1611.02024v2.pdf
PWC https://paperswithcode.com/paper/sigma-delta-quantized-networks
Repo https://github.com/petered/sigma-delta
Framework none

Adjusting for Dropout Variance in Batch Normalization and Weight Initialization

Title Adjusting for Dropout Variance in Batch Normalization and Weight Initialization
Authors Dan Hendrycks, Kevin Gimpel
Abstract We show how to adjust for the variance introduced by dropout with corrections to weight initialization and Batch Normalization, yielding higher accuracy. Though dropout can preserve the expected input to a neuron between train and test, the variance of the input differs. We thus propose a new weight initialization by correcting for the influence of dropout rates and an arbitrary nonlinearity’s influence on variance through simple corrective scalars. Since Batch Normalization trained with dropout estimates the variance of a layer’s incoming distribution with some inputs dropped, the variance also differs between train and test. After training a network with Batch Normalization and dropout, we simply update Batch Normalization’s variance moving averages with dropout off and obtain state of the art on CIFAR-10 and CIFAR-100 without data augmentation.
Tasks Data Augmentation
Published 2016-07-08
URL http://arxiv.org/abs/1607.02488v2
PDF http://arxiv.org/pdf/1607.02488v2.pdf
PWC https://paperswithcode.com/paper/adjusting-for-dropout-variance-in-batch
Repo https://github.com/hendrycks/init
Framework tf

The Z-loss: a shift and scale invariant classification loss belonging to the Spherical Family

Title The Z-loss: a shift and scale invariant classification loss belonging to the Spherical Family
Authors Alexandre de Brébisson, Pascal Vincent
Abstract Despite being the standard loss function to train multi-class neural networks, the log-softmax has two potential limitations. First, it involves computations that scale linearly with the number of output classes, which can restrict the size of problems we are able to tackle with current hardware. Second, it remains unclear how close it matches the task loss such as the top-k error rate or other non-differentiable evaluation metrics which we aim to optimize ultimately. In this paper, we introduce an alternative classification loss function, the Z-loss, which is designed to address these two issues. Unlike the log-softmax, it has the desirable property of belonging to the spherical loss family (Vincent et al., 2015), a class of loss functions for which training can be performed very efficiently with a complexity independent of the number of output classes. We show experimentally that it significantly outperforms the other spherical loss functions previously investigated. Furthermore, we show on a word language modeling task that it also outperforms the log-softmax with respect to certain ranking scores, such as top-k scores, suggesting that the Z-loss has the flexibility to better match the task loss. These qualities thus makes the Z-loss an appealing candidate to train very efficiently large output networks such as word-language models or other extreme classification problems. On the One Billion Word (Chelba et al., 2014) dataset, we are able to train a model with the Z-loss 40 times faster than the log-softmax and more than 4 times faster than the hierarchical softmax.
Tasks Language Modelling
Published 2016-04-29
URL http://arxiv.org/abs/1604.08859v2
PDF http://arxiv.org/pdf/1604.08859v2.pdf
PWC https://paperswithcode.com/paper/the-z-loss-a-shift-and-scale-invariant
Repo https://github.com/pascal20100/factored_output_layer
Framework none

GLEU Without Tuning

Title GLEU Without Tuning
Authors Courtney Napoles, Keisuke Sakaguchi, Matt Post, Joel Tetreault
Abstract The GLEU metric was proposed for evaluating grammatical error corrections using n-gram overlap with a set of reference sentences, as opposed to precision/recall of specific annotated errors (Napoles et al., 2015). This paper describes improvements made to the GLEU metric that address problems that arise when using an increasing number of reference sets. Unlike the originally presented metric, the modified metric does not require tuning. We recommend that this version be used instead of the original version.
Tasks
Published 2016-05-09
URL http://arxiv.org/abs/1605.02592v1
PDF http://arxiv.org/pdf/1605.02592v1.pdf
PWC https://paperswithcode.com/paper/gleu-without-tuning
Repo https://github.com/cnap/gec-ranking
Framework none

Max-Margin Deep Generative Models for (Semi-)Supervised Learning

Title Max-Margin Deep Generative Models for (Semi-)Supervised Learning
Authors Chongxuan Li, Jun Zhu, Bo Zhang
Abstract Deep generative models (DGMs) are effective on learning multilayered representations of complex data and performing inference of input data by exploring the generative ability. However, it is relatively insufficient to empower the discriminative ability of DGMs on making accurate predictions. This paper presents max-margin deep generative models (mmDGMs) and a class-conditional variant (mmDCGMs), which explore the strongly discriminative principle of max-margin learning to improve the predictive performance of DGMs in both supervised and semi-supervised learning, while retaining the generative capability. In semi-supervised learning, we use the predictions of a max-margin classifier as the missing labels instead of performing full posterior inference for efficiency; we also introduce additional max-margin and label-balance regularization terms of unlabeled data for effectiveness. We develop an efficient doubly stochastic subgradient algorithm for the piecewise linear objectives in different settings. Empirical results on various datasets demonstrate that: (1) max-margin learning can significantly improve the prediction performance of DGMs and meanwhile retain the generative ability; (2) in supervised learning, mmDGMs are competitive to the best fully discriminative networks when employing convolutional neural networks as the generative and recognition models; and (3) in semi-supervised learning, mmDCGMs can perform efficient inference and achieve state-of-the-art classification results on several benchmarks.
Tasks
Published 2016-11-22
URL http://arxiv.org/abs/1611.07119v1
PDF http://arxiv.org/pdf/1611.07119v1.pdf
PWC https://paperswithcode.com/paper/max-margin-deep-generative-models-for-semi
Repo https://github.com/thu-ml/mmdcgm-ssl
Framework none

Auxiliary Deep Generative Models

Title Auxiliary Deep Generative Models
Authors Lars Maaløe, Casper Kaae Sønderby, Søren Kaae Sønderby, Ole Winther
Abstract Deep generative models parameterized by neural networks have recently achieved state-of-the-art performance in unsupervised and semi-supervised learning. We extend deep generative models with auxiliary variables which improves the variational approximation. The auxiliary variables leave the generative model unchanged but make the variational distribution more expressive. Inspired by the structure of the auxiliary variable we also propose a model with two stochastic layers and skip connections. Our findings suggest that more expressive and properly specified deep generative models converge faster with better results. We show state-of-the-art performance within semi-supervised learning on MNIST, SVHN and NORB datasets.
Tasks
Published 2016-02-17
URL http://arxiv.org/abs/1602.05473v4
PDF http://arxiv.org/pdf/1602.05473v4.pdf
PWC https://paperswithcode.com/paper/auxiliary-deep-generative-models
Repo https://github.com/larsmaaloee/auxiliary-deep-generative-models
Framework none
comments powered by Disqus