May 7, 2019

2942 words 14 mins read

Paper Group AWR 20

Bayesian optimization of hyper-parameters in reservoir computing. Modeling Context in Referring Expressions. Aggregated Residual Transformations for Deep Neural Networks. Scale-constrained Unsupervised Evaluation Method for Multi-scale Image Segmentation. Depth Image Inpainting: Improving Low Rank Matrix Completion with Low Gradient Regularization. …

Bayesian optimization of hyper-parameters in reservoir computing


Title	Bayesian optimization of hyper-parameters in reservoir computing
Authors	Jan Yperman, Thijs Becker
Abstract	We describe a method for searching the optimal hyper-parameters in reservoir computing, which consists of a Gaussian process with Bayesian optimization. It provides an alternative to other frequently used optimization methods such as grid, random, or manual search. In addition to a set of optimal hyper-parameters, the method also provides a probability distribution of the cost function as a function of the hyper-parameters. We apply this method to two types of reservoirs: nonlinear delay nodes and echo state networks. It shows excellent performance on all considered benchmarks, either matching or significantly surpassing results found in the literature. In general, the algorithm achieves optimal results in fewer iterations when compared to other optimization methods. We have optimized up to six hyper-parameters simultaneously, which would have been infeasible using, e.g., grid search. Due to its automated nature, this method significantly reduces the need for expert knowledge when optimizing the hyper-parameters in reservoir computing. Existing software libraries for Bayesian optimization, such as Spearmint, make the implementation of the algorithm straightforward. A fork of the Spearmint framework along with a tutorial on how to use it in practice is available at https://bitbucket.org/uhasseltmachinelearning/spearmint/
Tasks
Published	2016-11-16
URL	http://arxiv.org/abs/1611.05193v3
PDF	http://arxiv.org/pdf/1611.05193v3.pdf
PWC	https://paperswithcode.com/paper/bayesian-optimization-of-hyper-parameters-in
Repo	https://github.com/rednotion/parallel_esn_web
Framework	none

Modeling Context in Referring Expressions


Title	Modeling Context in Referring Expressions
Authors	Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, Tamara L. Berg
Abstract	Humans refer to objects in their environments all the time, especially in dialogue with other people. We explore generating and comprehending natural language referring expressions for objects in images. In particular, we focus on incorporating better measures of visual context into referring expression models and find that visual comparison to other objects within an image helps improve performance significantly. We also develop methods to tie the language generation process together, so that we generate expressions for all objects of a particular category jointly. Evaluation on three recent datasets - RefCOCO, RefCOCO+, and RefCOCOg, shows the advantages of our methods for both referring expression generation and comprehension.
Tasks	Text Generation
Published	2016-07-31
URL	http://arxiv.org/abs/1608.00272v3
PDF	http://arxiv.org/pdf/1608.00272v3.pdf
PWC	https://paperswithcode.com/paper/modeling-context-in-referring-expressions
Repo	https://github.com/lichengunc/refer
Framework	none

Aggregated Residual Transformations for Deep Neural Networks


Title	Aggregated Residual Transformations for Deep Neural Networks
Authors	Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He
Abstract	We present a simple, highly modularized network architecture for image classification. Our network is constructed by repeating a building block that aggregates a set of transformations with the same topology. Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy exposes a new dimension, which we call “cardinality” (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width. On the ImageNet-1K dataset, we empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy. Moreover, increasing cardinality is more effective than going deeper or wider when we increase the capacity. Our models, named ResNeXt, are the foundations of our entry to the ILSVRC 2016 classification task in which we secured 2nd place. We further investigate ResNeXt on an ImageNet-5K set and the COCO detection set, also showing better results than its ResNet counterpart. The code and models are publicly available online.
Tasks	Image Classification
Published	2016-11-16
URL	http://arxiv.org/abs/1611.05431v2
PDF	http://arxiv.org/pdf/1611.05431v2.pdf
PWC	https://paperswithcode.com/paper/aggregated-residual-transformations-for-deep
Repo	https://github.com/guilherme-pombo/keras_resnext_fpn
Framework	none

Scale-constrained Unsupervised Evaluation Method for Multi-scale Image Segmentation


Title	Scale-constrained Unsupervised Evaluation Method for Multi-scale Image Segmentation
Authors	Yuhang Lu, Youchuan Wan, Gang Li
Abstract	Unsupervised evaluation of segmentation quality is a crucial step in image segmentation applications. Previous unsupervised evaluation methods usually lacked the adaptability to multi-scale segmentation. A scale-constrained evaluation method that evaluates segmentation quality according to the specified target scale is proposed in this paper. First, regional saliency and merging cost are employed to describe intra-region homogeneity and inter-region heterogeneity, respectively. Subsequently, both of them are standardized into equivalent spectral distances of a predefined region. Finally, by analyzing the relationship between image characteristics and segmentation quality, we establish the evaluation model. Experimental results show that the proposed method outperforms four commonly used unsupervised methods in multi-scale evaluation tasks.
Tasks	Semantic Segmentation
Published	2016-11-15
URL	http://arxiv.org/abs/1611.04850v1
PDF	http://arxiv.org/pdf/1611.04850v1.pdf
PWC	https://paperswithcode.com/paper/scale-constrained-unsupervised-evaluation
Repo	https://github.com/Rudy423/SegEvaluation
Framework	none

Depth Image Inpainting: Improving Low Rank Matrix Completion with Low Gradient Regularization


Title	Depth Image Inpainting: Improving Low Rank Matrix Completion with Low Gradient Regularization
Authors	Hongyang Xue, Shengming Zhang, Deng Cai
Abstract	We consider the case of inpainting single depth images. Without corresponding color images, previous or next frames, depth image inpainting is quite challenging. One natural solution is to regard the image as a matrix and adopt the low rank regularization just as inpainting color images. However, the low rank assumption does not make full use of the properties of depth images. A shallow observation may inspire us to penalize the non-zero gradients by sparse gradient regularization. However, statistics show that though most pixels have zero gradients, there is still a non-ignorable part of pixels whose gradients are equal to 1. Based on this specific property of depth images , we propose a low gradient regularization method in which we reduce the penalty for gradient 1 while penalizing the non-zero gradients to allow for gradual depth changes. The proposed low gradient regularization is integrated with the low rank regularization into the low rank low gradient approach for depth image inpainting. We compare our proposed low gradient regularization with sparse gradient regularization. The experimental results show the effectiveness of our proposed approach.
Tasks	Image Inpainting, Low-Rank Matrix Completion, Matrix Completion
Published	2016-04-20
URL	http://arxiv.org/abs/1604.05817v1
PDF	http://arxiv.org/pdf/1604.05817v1.pdf
PWC	https://paperswithcode.com/paper/depth-image-inpainting-improving-low-rank
Repo	https://github.com/xuehy/depthInpainting
Framework	none

Auditing Black-box Models for Indirect Influence


Title	Auditing Black-box Models for Indirect Influence
Authors	Philip Adler, Casey Falk, Sorelle A. Friedler, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, Suresh Venkatasubramanian
Abstract	Data-trained predictive models see widespread use, but for the most part they are used as black boxes which output a prediction or score. It is therefore hard to acquire a deeper understanding of model behavior, and in particular how different features influence the model prediction. This is important when interpreting the behavior of complex models, or asserting that certain problematic attributes (like race or gender) are not unduly influencing decisions. In this paper, we present a technique for auditing black-box models, which lets us study the extent to which existing models take advantage of particular features in the dataset, without knowing how the models work. Our work focuses on the problem of indirect influence: how some features might indirectly influence outcomes via other, related features. As a result, we can find attribute influences even in cases where, upon further direct examination of the model, the attribute is not referred to by the model at all. Our approach does not require the black-box model to be retrained. This is important if (for example) the model is only accessible via an API, and contrasts our work with other methods that investigate feature influence like feature selection. We present experimental evidence for the effectiveness of our procedure using a variety of publicly available datasets and models. We also validate our procedure using techniques from interpretable learning and feature selection, as well as against other black-box auditing procedures.
Tasks	Feature Selection
Published	2016-02-23
URL	http://arxiv.org/abs/1602.07043v2
PDF	http://arxiv.org/pdf/1602.07043v2.pdf
PWC	https://paperswithcode.com/paper/auditing-black-box-models-for-indirect
Repo	https://github.com/cfalk/BlackBoxAuditing
Framework	tf

Low-effort place recognition with WiFi fingerprints using deep learning


Title	Low-effort place recognition with WiFi fingerprints using deep learning
Authors	Michał Nowicki, Jan Wietrzykowski
Abstract	Using WiFi signals for indoor localization is the main localization modality of the existing personal indoor localization systems operating on mobile devices. WiFi fingerprinting is also used for mobile robots, as WiFi signals are usually available indoors and can provide rough initial position estimate or can be used together with other positioning systems. Currently, the best solutions rely on filtering, manual data analysis, and time-consuming parameter tuning to achieve reliable and accurate localization. In this work, we propose to use deep neural networks to significantly lower the work-force burden of the localization system design, while still achieving satisfactory results. Assuming the state-of-the-art hierarchical approach, we employ the DNN system for building/floor classification. We show that stacked autoencoders allow to efficiently reduce the feature space in order to achieve robust and precise classification. The proposed architecture is verified on the publicly available UJIIndoorLoc dataset and the results are compared with other solutions.
Tasks
Published	2016-11-07
URL	http://arxiv.org/abs/1611.02049v2
PDF	http://arxiv.org/pdf/1611.02049v2.pdf
PWC	https://paperswithcode.com/paper/low-effort-place-recognition-with-wifi
Repo	https://github.com/vohoaiviet/indoor_localization
Framework	none

Can Evolutionary Sampling Improve Bagged Ensembles?


Title	Can Evolutionary Sampling Improve Bagged Ensembles?
Authors	Harsh Nisar, Bhanu Pratap Singh Rawat
Abstract	Perturb and Combine (P&C) group of methods generate multiple versions of the predictor by perturbing the training set or construction and then combining them into a single predictor (Breiman, 1996b). The motive is to improve the accuracy in unstable classification and regression methods. One of the most well known method in this group is Bagging. Arcing or Adaptive Resampling and Combining methods like AdaBoost are smarter variants of P&C methods. In this extended abstract, we lay the groundwork for a new family of methods under the P&C umbrella, known as Evolutionary Sampling (ES). We employ Evolutionary algorithms to suggest smarter sampling in both the feature space (sub-spaces) as well as training samples. We discuss multiple fitness functions to assess ensembles and empirically compare our performance against randomized sampling of training data and feature sub-spaces.
Tasks
Published	2016-10-03
URL	http://arxiv.org/abs/1610.00465v1
PDF	http://arxiv.org/pdf/1610.00465v1.pdf
PWC	https://paperswithcode.com/paper/can-evolutionary-sampling-improve-bagged
Repo	https://github.com/evoml/evoml
Framework	none

Wide Residual Networks


Title	Wide Residual Networks
Authors	Sergey Zagoruyko, Nikos Komodakis
Abstract	Deep residual networks were shown to be able to scale up to thousands of layers and still have improving performance. However, each fraction of a percent of improved accuracy costs nearly doubling the number of layers, and so training very deep residual networks has a problem of diminishing feature reuse, which makes these networks very slow to train. To tackle these problems, in this paper we conduct a detailed experimental study on the architecture of ResNet blocks, based on which we propose a novel architecture where we decrease depth and increase width of residual networks. We call the resulting network structures wide residual networks (WRNs) and show that these are far superior over their commonly used thin and very deep counterparts. For example, we demonstrate that even a simple 16-layer-deep wide residual network outperforms in accuracy and efficiency all previous deep residual networks, including thousand-layer-deep networks, achieving new state-of-the-art results on CIFAR, SVHN, COCO, and significant improvements on ImageNet. Our code and models are available at https://github.com/szagoruyko/wide-residual-networks
Tasks	Image Classification
Published	2016-05-23
URL	http://arxiv.org/abs/1605.07146v4
PDF	http://arxiv.org/pdf/1605.07146v4.pdf
PWC	https://paperswithcode.com/paper/wide-residual-networks
Repo	https://github.com/meliketoy/wide-resnet.pytorch
Framework	pytorch

Sigma Delta Quantized Networks


Title	Sigma Delta Quantized Networks
Authors	Peter O’Connor, Max Welling
Abstract	Deep neural networks can be obscenely wasteful. When processing video, a convolutional network expends a fixed amount of computation for each frame with no regard to the similarity between neighbouring frames. As a result, it ends up repeatedly doing very similar computations. To put an end to such waste, we introduce Sigma-Delta networks. With each new input, each layer in this network sends a discretized form of its change in activation to the next layer. Thus the amount of computation that the network does scales with the amount of change in the input and layer activations, rather than the size of the network. We introduce an optimization method for converting any pre-trained deep network into an optimally efficient Sigma-Delta network, and show that our algorithm, if run on the appropriate hardware, could cut at least an order of magnitude from the computational cost of processing video data.
Tasks
Published	2016-11-07
URL	http://arxiv.org/abs/1611.02024v2
PDF	http://arxiv.org/pdf/1611.02024v2.pdf
PWC	https://paperswithcode.com/paper/sigma-delta-quantized-networks
Repo	https://github.com/petered/sigma-delta
Framework	none

Adjusting for Dropout Variance in Batch Normalization and Weight Initialization


Title	Adjusting for Dropout Variance in Batch Normalization and Weight Initialization
Authors	Dan Hendrycks, Kevin Gimpel
Abstract	We show how to adjust for the variance introduced by dropout with corrections to weight initialization and Batch Normalization, yielding higher accuracy. Though dropout can preserve the expected input to a neuron between train and test, the variance of the input differs. We thus propose a new weight initialization by correcting for the influence of dropout rates and an arbitrary nonlinearity’s influence on variance through simple corrective scalars. Since Batch Normalization trained with dropout estimates the variance of a layer’s incoming distribution with some inputs dropped, the variance also differs between train and test. After training a network with Batch Normalization and dropout, we simply update Batch Normalization’s variance moving averages with dropout off and obtain state of the art on CIFAR-10 and CIFAR-100 without data augmentation.
Tasks	Data Augmentation
Published	2016-07-08
URL	http://arxiv.org/abs/1607.02488v2
PDF	http://arxiv.org/pdf/1607.02488v2.pdf
PWC	https://paperswithcode.com/paper/adjusting-for-dropout-variance-in-batch
Repo	https://github.com/hendrycks/init
Framework	tf

The Z-loss: a shift and scale invariant classification loss belonging to the Spherical Family


Title	The Z-loss: a shift and scale invariant classification loss belonging to the Spherical Family
Authors	Alexandre de Brébisson, Pascal Vincent
Abstract	Despite being the standard loss function to train multi-class neural networks, the log-softmax has two potential limitations. First, it involves computations that scale linearly with the number of output classes, which can restrict the size of problems we are able to tackle with current hardware. Second, it remains unclear how close it matches the task loss such as the top-k error rate or other non-differentiable evaluation metrics which we aim to optimize ultimately. In this paper, we introduce an alternative classification loss function, the Z-loss, which is designed to address these two issues. Unlike the log-softmax, it has the desirable property of belonging to the spherical loss family (Vincent et al., 2015), a class of loss functions for which training can be performed very efficiently with a complexity independent of the number of output classes. We show experimentally that it significantly outperforms the other spherical loss functions previously investigated. Furthermore, we show on a word language modeling task that it also outperforms the log-softmax with respect to certain ranking scores, such as top-k scores, suggesting that the Z-loss has the flexibility to better match the task loss. These qualities thus makes the Z-loss an appealing candidate to train very efficiently large output networks such as word-language models or other extreme classification problems. On the One Billion Word (Chelba et al., 2014) dataset, we are able to train a model with the Z-loss 40 times faster than the log-softmax and more than 4 times faster than the hierarchical softmax.
Tasks	Language Modelling
Published	2016-04-29
URL	http://arxiv.org/abs/1604.08859v2
PDF	http://arxiv.org/pdf/1604.08859v2.pdf
PWC	https://paperswithcode.com/paper/the-z-loss-a-shift-and-scale-invariant
Repo	https://github.com/pascal20100/factored_output_layer
Framework	none

GLEU Without Tuning


Title	GLEU Without Tuning
Authors	Courtney Napoles, Keisuke Sakaguchi, Matt Post, Joel Tetreault
Abstract	The GLEU metric was proposed for evaluating grammatical error corrections using n-gram overlap with a set of reference sentences, as opposed to precision/recall of specific annotated errors (Napoles et al., 2015). This paper describes improvements made to the GLEU metric that address problems that arise when using an increasing number of reference sets. Unlike the originally presented metric, the modified metric does not require tuning. We recommend that this version be used instead of the original version.
Tasks
Published	2016-05-09
URL	http://arxiv.org/abs/1605.02592v1
PDF	http://arxiv.org/pdf/1605.02592v1.pdf
PWC	https://paperswithcode.com/paper/gleu-without-tuning
Repo	https://github.com/cnap/gec-ranking
Framework	none

Max-Margin Deep Generative Models for (Semi-)Supervised Learning


Title	Max-Margin Deep Generative Models for (Semi-)Supervised Learning
Authors	Chongxuan Li, Jun Zhu, Bo Zhang
Abstract	Deep generative models (DGMs) are effective on learning multilayered representations of complex data and performing inference of input data by exploring the generative ability. However, it is relatively insufficient to empower the discriminative ability of DGMs on making accurate predictions. This paper presents max-margin deep generative models (mmDGMs) and a class-conditional variant (mmDCGMs), which explore the strongly discriminative principle of max-margin learning to improve the predictive performance of DGMs in both supervised and semi-supervised learning, while retaining the generative capability. In semi-supervised learning, we use the predictions of a max-margin classifier as the missing labels instead of performing full posterior inference for efficiency; we also introduce additional max-margin and label-balance regularization terms of unlabeled data for effectiveness. We develop an efficient doubly stochastic subgradient algorithm for the piecewise linear objectives in different settings. Empirical results on various datasets demonstrate that: (1) max-margin learning can significantly improve the prediction performance of DGMs and meanwhile retain the generative ability; (2) in supervised learning, mmDGMs are competitive to the best fully discriminative networks when employing convolutional neural networks as the generative and recognition models; and (3) in semi-supervised learning, mmDCGMs can perform efficient inference and achieve state-of-the-art classification results on several benchmarks.
Tasks
Published	2016-11-22
URL	http://arxiv.org/abs/1611.07119v1
PDF	http://arxiv.org/pdf/1611.07119v1.pdf
PWC	https://paperswithcode.com/paper/max-margin-deep-generative-models-for-semi
Repo	https://github.com/thu-ml/mmdcgm-ssl
Framework	none

Auxiliary Deep Generative Models


Title	Auxiliary Deep Generative Models
Authors	Lars Maaløe, Casper Kaae Sønderby, Søren Kaae Sønderby, Ole Winther
Abstract	Deep generative models parameterized by neural networks have recently achieved state-of-the-art performance in unsupervised and semi-supervised learning. We extend deep generative models with auxiliary variables which improves the variational approximation. The auxiliary variables leave the generative model unchanged but make the variational distribution more expressive. Inspired by the structure of the auxiliary variable we also propose a model with two stochastic layers and skip connections. Our findings suggest that more expressive and properly specified deep generative models converge faster with better results. We show state-of-the-art performance within semi-supervised learning on MNIST, SVHN and NORB datasets.
Tasks
Published	2016-02-17
URL	http://arxiv.org/abs/1602.05473v4
PDF	http://arxiv.org/pdf/1602.05473v4.pdf
PWC	https://paperswithcode.com/paper/auxiliary-deep-generative-models
Repo	https://github.com/larsmaaloee/auxiliary-deep-generative-models
Framework	none