July 29, 2019

3554 words 17 mins read

Paper Group AWR 155

Paper Group AWR 155

Understanding Black-box Predictions via Influence Functions. RSI-CB: A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data. OLÉ: Orthogonal Low-rank Embedding, A Plug and Play Geometric Loss for Deep Learning. Progressive Neural Architecture Search. Curiosity-driven Exploration by Self-supervised Prediction. Learning Tran …

Understanding Black-box Predictions via Influence Functions

Title Understanding Black-box Predictions via Influence Functions
Authors Pang Wei Koh, Percy Liang
Abstract How can we explain the predictions of a black-box model? In this paper, we use influence functions – a classic technique from robust statistics – to trace a model’s prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks.
Tasks
Published 2017-03-14
URL http://arxiv.org/abs/1703.04730v2
PDF http://arxiv.org/pdf/1703.04730v2.pdf
PWC https://paperswithcode.com/paper/understanding-black-box-predictions-via
Repo https://github.com/darkonhub/darkon
Framework tf

RSI-CB: A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data

Title RSI-CB: A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data
Authors Haifeng Li, Xin Dou, Chao Tao, Zhixiang Hou, Jie Chen, Jian Peng, Min Deng, Ling Zhao
Abstract In recent years, deep convolutional neural network (DCNN) has seen a breakthrough progress in natural image recognition because of three points: universal approximation ability via DCNN, large-scale database (such as ImageNet), and supercomputing ability powered by GPU. The remote sensing field is still lacking a large-scale benchmark compared to ImageNet and Place2. In this paper, we propose a remote sensing image classification benchmark (RSI-CB) based on massive, scalable, and diverse crowdsource data. Using crowdsource data, such as Open Street Map (OSM) data, ground objects in remote sensing images can be annotated effectively by points of interest, vector data from OSM, or other crowdsource data. The annotated images can be used in remote sensing image classification tasks. Based on this method, we construct a worldwide large-scale benchmark for remote sensing image classification. This benchmark has two sub-datasets with 256 by 256 and 128 by 128 sizes because different DCNNs require different image sizes. The former contains 6 categories with 35 subclasses of more than 24,000 images. The latter contains 6 categories with 45 subclasses of more than 36,000 images. This classification system of ground objects is defined according to the national standard of land-use classification in China and is inspired by the hierarchy mechanism of ImageNet. Finally, we conduct many experiments to compare RSI-CB with the SAT-4, SAT-6, and UC-Merced datasets on handcrafted features, such as scale-invariant feature transform, color histogram, local binary patterns, and GIST, and classical DCNN models, such as AlexNet, VGGNet, GoogLeNet, and ResNet.
Tasks Image Classification, Remote Sensing Image Classification
Published 2017-05-30
URL https://arxiv.org/abs/1705.10450v3
PDF https://arxiv.org/pdf/1705.10450v3.pdf
PWC https://paperswithcode.com/paper/rsi-cb-a-large-scale-remote-sensing-image
Repo https://github.com/lehaifeng/RSI-CB
Framework none

OLÉ: Orthogonal Low-rank Embedding, A Plug and Play Geometric Loss for Deep Learning

Title OLÉ: Orthogonal Low-rank Embedding, A Plug and Play Geometric Loss for Deep Learning
Authors José Lezama, Qiang Qiu, Pablo Musé, Guillermo Sapiro
Abstract Deep neural networks trained using a softmax layer at the top and the cross-entropy loss are ubiquitous tools for image classification. Yet, this does not naturally enforce intra-class similarity nor inter-class margin of the learned deep representations. To simultaneously achieve these two goals, different solutions have been proposed in the literature, such as the pairwise or triplet losses. However, such solutions carry the extra task of selecting pairs or triplets, and the extra computational burden of computing and learning for many combinations of them. In this paper, we propose a plug-and-play loss term for deep networks that explicitly reduces intra-class variance and enforces inter-class margin simultaneously, in a simple and elegant geometric manner. For each class, the deep features are collapsed into a learned linear subspace, or union of them, and inter-class subspaces are pushed to be as orthogonal as possible. Our proposed Orthogonal Low-rank Embedding (OL'E) does not require carefully crafting pairs or triplets of samples for training, and works standalone as a classification loss, being the first reported deep metric learning framework of its kind. Because of the improved margin between features of different classes, the resulting deep networks generalize better, are more discriminative, and more robust. We demonstrate improved classification performance in general object recognition, plugging the proposed loss term into existing off-the-shelf architectures. In particular, we show the advantage of the proposed loss in the small data/model scenario, and we significantly advance the state-of-the-art on the Stanford STL-10 benchmark.
Tasks Image Classification, Metric Learning, Object Recognition
Published 2017-12-05
URL http://arxiv.org/abs/1712.01727v1
PDF http://arxiv.org/pdf/1712.01727v1.pdf
PWC https://paperswithcode.com/paper/ole-orthogonal-low-rank-embedding-a-plug-and
Repo https://github.com/jlezama/OrthogonalLowrankEmbedding
Framework pytorch
Title Progressive Neural Architecture Search
Authors Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy
Abstract We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms. Our approach uses a sequential model-based optimization (SMBO) strategy, in which we search for structures in order of increasing complexity, while simultaneously learning a surrogate model to guide the search through structure space. Direct comparison under the same search space shows that our method is up to 5 times more efficient than the RL method of Zoph et al. (2018) in terms of number of models evaluated, and 8 times faster in terms of total compute. The structures we discover in this way achieve state of the art classification accuracies on CIFAR-10 and ImageNet.
Tasks Image Classification, Neural Architecture Search
Published 2017-12-02
URL http://arxiv.org/abs/1712.00559v3
PDF http://arxiv.org/pdf/1712.00559v3.pdf
PWC https://paperswithcode.com/paper/progressive-neural-architecture-search
Repo https://github.com/Cadene/pretrained-models.pytorch
Framework pytorch

Curiosity-driven Exploration by Self-supervised Prediction

Title Curiosity-driven Exploration by Self-supervised Prediction
Authors Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell
Abstract In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the agent to explore its environment and learn skills that might be useful later in its life. We formulate curiosity as the error in an agent’s ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. Our formulation scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and, critically, ignores the aspects of the environment that cannot affect the agent. The proposed approach is evaluated in two environments: VizDoom and Super Mario Bros. Three broad settings are investigated: 1) sparse extrinsic reward, where curiosity allows for far fewer interactions with the environment to reach the goal; 2) exploration with no extrinsic reward, where curiosity pushes the agent to explore more efficiently; and 3) generalization to unseen scenarios (e.g. new levels of the same game) where the knowledge gained from earlier experience helps the agent explore new places much faster than starting from scratch. Demo video and code available at https://pathak22.github.io/noreward-rl/
Tasks
Published 2017-05-15
URL http://arxiv.org/abs/1705.05363v1
PDF http://arxiv.org/pdf/1705.05363v1.pdf
PWC https://paperswithcode.com/paper/curiosity-driven-exploration-by-self
Repo https://github.com/tegg89/DLCamp_Jeju2018
Framework tf

Learning Transferable Architectures for Scalable Image Recognition

Title Learning Transferable Architectures for Scalable Image Recognition
Authors Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le
Abstract Developing neural network image classification models often requires significant architecture engineering. In this paper, we study a method to learn the model architectures directly on the dataset of interest. As this approach is expensive when the dataset is large, we propose to search for an architectural building block on a small dataset and then transfer the block to a larger dataset. The key contribution of this work is the design of a new search space (the “NASNet search space”) which enables transferability. In our experiments, we search for the best convolutional layer (or “cell”) on the CIFAR-10 dataset and then apply this cell to the ImageNet dataset by stacking together more copies of this cell, each with their own parameters to design a convolutional architecture, named “NASNet architecture”. We also introduce a new regularization technique called ScheduledDropPath that significantly improves generalization in the NASNet models. On CIFAR-10 itself, NASNet achieves 2.4% error rate, which is state-of-the-art. On ImageNet, NASNet achieves, among the published works, state-of-the-art accuracy of 82.7% top-1 and 96.2% top-5 on ImageNet. Our model is 1.2% better in top-1 accuracy than the best human-invented architectures while having 9 billion fewer FLOPS - a reduction of 28% in computational demand from the previous state-of-the-art model. When evaluated at different levels of computational cost, accuracies of NASNets exceed those of the state-of-the-art human-designed models. For instance, a small version of NASNet also achieves 74% top-1 accuracy, which is 3.1% better than equivalently-sized, state-of-the-art models for mobile platforms. Finally, the learned features by NASNet used with the Faster-RCNN framework surpass state-of-the-art by 4.0% achieving 43.1% mAP on the COCO dataset.
Tasks Image Classification, Neural Architecture Search
Published 2017-07-21
URL http://arxiv.org/abs/1707.07012v4
PDF http://arxiv.org/pdf/1707.07012v4.pdf
PWC https://paperswithcode.com/paper/learning-transferable-architectures-for
Repo https://github.com/johannesu/NASNet-keras
Framework tf

Squeeze-and-Excitation Networks

Title Squeeze-and-Excitation Networks
Authors Jie Hu, Li Shen, Samuel Albanie, Gang Sun, Enhua Wu
Abstract The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN by enhancing the quality of spatial encodings throughout its feature hierarchy. In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We show that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets. We further demonstrate that SE blocks bring significant improvements in performance for existing state-of-the-art CNNs at slight additional computational cost. Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2.251%, surpassing the winning entry of 2016 by a relative improvement of ~25%. Models and code are available at https://github.com/hujie-frank/SENet.
Tasks Image Classification
Published 2017-09-05
URL https://arxiv.org/abs/1709.01507v4
PDF https://arxiv.org/pdf/1709.01507v4.pdf
PWC https://paperswithcode.com/paper/squeeze-and-excitation-networks
Repo https://github.com/TuSimple/neuron-selectivity-transfer
Framework tf

Opening the Black Box of Deep Neural Networks via Information

Title Opening the Black Box of Deep Neural Networks via Information
Authors Ravid Shwartz-Ziv, Naftali Tishby
Abstract Despite their great success, there is still no comprehensive theoretical understanding of learning with Deep Neural Networks (DNNs) or their inner organization. Previous work proposed to analyze DNNs in the \textit{Information Plane}; i.e., the plane of the Mutual Information values that each layer preserves on the input and output variables. They suggested that the goal of the network is to optimize the Information Bottleneck (IB) tradeoff between compression and prediction, successively, for each layer. In this work we follow up on this idea and demonstrate the effectiveness of the Information-Plane visualization of DNNs. Our main results are: (i) most of the training epochs in standard DL are spent on {\emph compression} of the input to efficient representation and not on fitting the training labels. (ii) The representation compression phase begins when the training errors becomes small and the Stochastic Gradient Decent (SGD) epochs change from a fast drift to smaller training error into a stochastic relaxation, or random diffusion, constrained by the training error value. (iii) The converged layers lie on or very close to the Information Bottleneck (IB) theoretical bound, and the maps from the input to any hidden layer and from this hidden layer to the output satisfy the IB self-consistent equations. This generalization through noise mechanism is unique to Deep Neural Networks and absent in one layer networks. (iv) The training time is dramatically reduced when adding more hidden layers. Thus the main advantage of the hidden layers is computational. This can be explained by the reduced relaxation time, as this it scales super-linearly (exponentially for simple diffusion) with the information compression from the previous layer.
Tasks
Published 2017-03-02
URL http://arxiv.org/abs/1703.00810v3
PDF http://arxiv.org/pdf/1703.00810v3.pdf
PWC https://paperswithcode.com/paper/opening-the-black-box-of-deep-neural-networks
Repo https://github.com/AnnaGolubeva/WS2018
Framework none

Deep Subspace Clustering Networks

Title Deep Subspace Clustering Networks
Authors Pan Ji, Tong Zhang, Hongdong Li, Mathieu Salzmann, Ian Reid
Abstract We present a novel deep neural network architecture for unsupervised subspace clustering. This architecture is built upon deep auto-encoders, which non-linearly map the input data into a latent space. Our key idea is to introduce a novel self-expressive layer between the encoder and the decoder to mimic the “self-expressiveness” property that has proven effective in traditional subspace clustering. Being differentiable, our new self-expressive layer provides a simple but effective way to learn pairwise affinities between all data points through a standard back-propagation procedure. Being nonlinear, our neural-network based method is able to cluster data points having complex (often nonlinear) structures. We further propose pre-training and fine-tuning strategies that let us effectively learn the parameters of our subspace clustering networks. Our experiments show that the proposed method significantly outperforms the state-of-the-art unsupervised subspace clustering methods.
Tasks Image Clustering
Published 2017-09-08
URL http://arxiv.org/abs/1709.02508v1
PDF http://arxiv.org/pdf/1709.02508v1.pdf
PWC https://paperswithcode.com/paper/deep-subspace-clustering-networks
Repo https://github.com/panji1990/Deep-subspace-clustering-networks
Framework tf

YellowFin and the Art of Momentum Tuning

Title YellowFin and the Art of Momentum Tuning
Authors Jian Zhang, Ioannis Mitliagkas
Abstract Hyperparameter tuning is one of the most time-consuming workloads in deep learning. State-of-the-art optimizers, such as AdaGrad, RMSProp and Adam, reduce this labor by adaptively tuning an individual learning rate for each variable. Recently researchers have shown renewed interest in simpler methods like momentum SGD as they may yield better test metrics. Motivated by this trend, we ask: can simple adaptive methods based on SGD perform as well or better? We revisit the momentum SGD algorithm and show that hand-tuning a single learning rate and momentum makes it competitive with Adam. We then analyze its robustness to learning rate misspecification and objective curvature variation. Based on these insights, we design YellowFin, an automatic tuner for momentum and learning rate in SGD. YellowFin optionally uses a negative-feedback loop to compensate for the momentum dynamics in asynchronous settings on the fly. We empirically show that YellowFin can converge in fewer iterations than Adam on ResNets and LSTMs for image recognition, language modeling and constituency parsing, with a speedup of up to 3.28x in synchronous and up to 2.69x in asynchronous settings.
Tasks Constituency Parsing, Language Modelling
Published 2017-06-12
URL http://arxiv.org/abs/1706.03471v2
PDF http://arxiv.org/pdf/1706.03471v2.pdf
PWC https://paperswithcode.com/paper/yellowfin-and-the-art-of-momentum-tuning
Repo https://github.com/JianGoForIt/YellowFin_Pytorch
Framework pytorch

Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation

Title Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation
Authors Yen-Cheng Liu, Yu-Ying Yeh, Tzu-Chien Fu, Sheng-De Wang, Wei-Chen Chiu, Yu-Chiang Frank Wang
Abstract While representation learning aims to derive interpretable features for describing visual data, representation disentanglement further results in such features so that particular image attributes can be identified and manipulated. However, one cannot easily address this task without observing ground truth annotation for the training data. To address this problem, we propose a novel deep learning model of Cross-Domain Representation Disentangler (CDRD). By observing fully annotated source-domain data and unlabeled target-domain data of interest, our model bridges the information across data domains and transfers the attribute information accordingly. Thus, cross-domain joint feature disentanglement and adaptation can be jointly performed. In the experiments, we provide qualitative results to verify our disentanglement capability. Moreover, we further confirm that our model can be applied for solving classification tasks of unsupervised domain adaptation, and performs favorably against state-of-the-art image disentanglement and translation methods.
Tasks Domain Adaptation, Representation Learning, Unsupervised Domain Adaptation
Published 2017-05-03
URL http://arxiv.org/abs/1705.01314v4
PDF http://arxiv.org/pdf/1705.01314v4.pdf
PWC https://paperswithcode.com/paper/detach-and-adapt-learning-cross-domain
Repo https://github.com/ycliu93/CDRD
Framework tf

Deep learning with convolutional neural networks for decoding and visualization of EEG pathology

Title Deep learning with convolutional neural networks for decoding and visualization of EEG pathology
Authors Robin Tibor Schirrmeister, Lukas Gemein, Katharina Eggensperger, Frank Hutter, Tonio Ball
Abstract We apply convolutional neural networks (ConvNets) to the task of distinguishing pathological from normal EEG recordings in the Temple University Hospital EEG Abnormal Corpus. We use two basic, shallow and deep ConvNet architectures recently shown to decode task-related information from EEG at least as well as established algorithms designed for this purpose. In decoding EEG pathology, both ConvNets reached substantially better accuracies (about 6% better, ~85% vs. ~79%) than the only published result for this dataset, and were still better when using only 1 minute of each recording for training and only six seconds of each recording for testing. We used automated methods to optimize architectural hyperparameters and found intriguingly different ConvNet architectures, e.g., with max pooling as the only nonlinearity. Visualizations of the ConvNet decoding behavior showed that they used spectral power changes in the delta (0-4 Hz) and theta (4-8 Hz) frequency range, possibly alongside other features, consistent with expectations derived from spectral analysis of the EEG data and from the textual medical reports. Analysis of the textual medical reports also highlighted the potential for accuracy increases by integrating contextual information, such as the age of subjects. In summary, the ConvNets and visualization techniques used in this study constitute a next step towards clinically useful automated EEG diagnosis and establish a new baseline for future work on this topic.
Tasks EEG
Published 2017-08-26
URL http://arxiv.org/abs/1708.08012v3
PDF http://arxiv.org/pdf/1708.08012v3.pdf
PWC https://paperswithcode.com/paper/deep-learning-with-convolutional-neural-1
Repo https://github.com/robintibor/auto-eeg-diagnosis-example
Framework none

Blind Source Separation Using Mixtures of Alpha-Stable Distributions

Title Blind Source Separation Using Mixtures of Alpha-Stable Distributions
Authors Nicolas Keriven, Antoine Deleforge, Antoine Liutkus
Abstract We propose a new blind source separation algorithm based on mixtures of alpha-stable distributions. Complex symmetric alpha-stable distributions have been recently showed to better model audio signals in the time-frequency domain than classical Gaussian distributions thanks to their larger dynamic range. However, inference of these models is notoriously hard to perform because their probability density functions do not have a closed-form expression in general. Here, we introduce a novel method for estimating mixture of alpha-stable distributions based on characteristic function matching. We apply this to the blind estimation of binary masks in individual frequency bands from multichannel convolutive audio mixes. We show that the proposed method yields better separation performance than Gaussian-based binary-masking methods.
Tasks
Published 2017-11-13
URL http://arxiv.org/abs/1711.04460v3
PDF http://arxiv.org/pdf/1711.04460v3.pdf
PWC https://paperswithcode.com/paper/blind-source-separation-using-mixtures-of
Repo https://github.com/nkeriven/alpha_stable_bss
Framework none

Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions

Title Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions
Authors Torsten Sattler, Will Maddern, Carl Toft, Akihiko Torii, Lars Hammarstrand, Erik Stenborg, Daniel Safari, Masatoshi Okutomi, Marc Pollefeys, Josef Sivic, Fredrik Kahl, Tomas Pajdla
Abstract Visual localization enables autonomous vehicles to navigate in their surroundings and augmented reality applications to link virtual to real worlds. Practical visual localization approaches need to be robust to a wide variety of viewing condition, including day-night changes, as well as weather and seasonal variations, while providing highly accurate 6 degree-of-freedom (6DOF) camera pose estimates. In this paper, we introduce the first benchmark datasets specifically designed for analyzing the impact of such factors on visual localization. Using carefully created ground truth poses for query images taken under a wide variety of conditions, we evaluate the impact of various factors on 6DOF camera pose estimation accuracy through extensive experiments with state-of-the-art localization approaches. Based on our results, we draw conclusions about the difficulty of different conditions, showing that long-term localization is far from solved, and propose promising avenues for future work, including sequence-based localization approaches and the need for better local features. Our benchmark is available at visuallocalization.net.
Tasks Autonomous Vehicles, Pose Estimation, Visual Localization
Published 2017-07-28
URL http://arxiv.org/abs/1707.09092v3
PDF http://arxiv.org/pdf/1707.09092v3.pdf
PWC https://paperswithcode.com/paper/benchmarking-6dof-outdoor-visual-localization
Repo https://github.com/ethz-asl/hf_net
Framework tf

Revisiting Unreasonable Effectiveness of Data in Deep Learning Era

Title Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
Authors Chen Sun, Abhinav Shrivastava, Saurabh Singh, Abhinav Gupta
Abstract The success of deep learning in vision can be attributed to: (a) models with high capacity; (b) increased computational power; and (c) availability of large-scale labeled data. Since 2012, there have been significant advances in representation capabilities of the models and computational capabilities of GPUs. But the size of the biggest dataset has surprisingly remained constant. What will happen if we increase the dataset size by 10x or 100x? This paper takes a step towards clearing the clouds of mystery surrounding the relationship between `enormous data’ and visual deep learning. By exploiting the JFT-300M dataset which has more than 375M noisy labels for 300M images, we investigate how the performance of current vision tasks would change if this data was used for representation learning. Our paper delivers some surprising (and some expected) findings. First, we find that the performance on vision tasks increases logarithmically based on volume of training data size. Second, we show that representation learning (or pre-training) still holds a lot of promise. One can improve performance on many vision tasks by just training a better base model. Finally, as expected, we present new state-of-the-art results for different vision tasks including image classification, object detection, semantic segmentation and human pose estimation. Our sincere hope is that this inspires vision community to not undervalue the data and develop collective efforts in building larger datasets. |
Tasks Image Classification, Object Detection, Pose Estimation, Representation Learning, Semantic Segmentation
Published 2017-07-10
URL http://arxiv.org/abs/1707.02968v2
PDF http://arxiv.org/pdf/1707.02968v2.pdf
PWC https://paperswithcode.com/paper/revisiting-unreasonable-effectiveness-of-data
Repo https://github.com/Tencent/tencent-ml-images
Framework tf
comments powered by Disqus