October 21, 2019

3207 words 16 mins read

Paper Group AWR 51

Paper Group AWR 51

MicronNet: A Highly Compact Deep Convolutional Neural Network Architecture for Real-time Embedded Traffic Sign Classification. GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition. Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach. Weakly Supervised Silhouette-based Semantic Scene Change Detection. Finite-Da …

MicronNet: A Highly Compact Deep Convolutional Neural Network Architecture for Real-time Embedded Traffic Sign Classification

Title MicronNet: A Highly Compact Deep Convolutional Neural Network Architecture for Real-time Embedded Traffic Sign Classification
Authors Alexander Wong, Mohammad Javad Shafiee, Michael St. Jules
Abstract Traffic sign recognition is a very important computer vision task for a number of real-world applications such as intelligent transportation surveillance and analysis. While deep neural networks have been demonstrated in recent years to provide state-of-the-art performance traffic sign recognition, a key challenge for enabling the widespread deployment of deep neural networks for embedded traffic sign recognition is the high computational and memory requirements of such networks. As a consequence, there are significant benefits in investigating compact deep neural network architectures for traffic sign recognition that are better suited for embedded devices. In this paper, we introduce MicronNet, a highly compact deep convolutional neural network for real-time embedded traffic sign recognition designed based on macroarchitecture design principles (e.g., spectral macroarchitecture augmentation, parameter precision optimization, etc.) as well as numerical microarchitecture optimization strategies. The resulting overall architecture of MicronNet is thus designed with as few parameters and computations as possible while maintaining recognition performance, leading to optimized information density of the proposed network. The resulting MicronNet possesses a model size of just ~1MB and ~510,000 parameters (~27x fewer parameters than state-of-the-art) while still achieving a human performance level top-1 accuracy of 98.9% on the German traffic sign recognition benchmark. Furthermore, MicronNet requires just ~10 million multiply-accumulate operations to perform inference, and has a time-to-compute of just 32.19 ms on a Cortex-A53 high efficiency processor. These experimental results show that highly compact, optimized deep neural network architectures can be designed for real-time traffic sign recognition that are well-suited for embedded scenarios.
Tasks Traffic Sign Recognition
Published 2018-03-28
URL http://arxiv.org/abs/1804.00497v3
PDF http://arxiv.org/pdf/1804.00497v3.pdf
PWC https://paperswithcode.com/paper/micronnet-a-highly-compact-deep-convolutional
Repo https://github.com/ppriyank/MicronNet
Framework pytorch

GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition

Title GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition
Authors Hanqing Chao, Yiwei He, Junping Zhang, Jianfeng Feng
Abstract As a unique biometric feature that can be recognized at a distance, gait has broad applications in crime prevention, forensic identification and social security. To portray a gait, existing gait recognition methods utilize either a gait template, where temporal information is hard to preserve, or a gait sequence, which must keep unnecessary sequential constraints and thus loses the flexibility of gait recognition. In this paper we present a novel perspective, where a gait is regarded as a set consisting of independent frames. We propose a new network named GaitSet to learn identity information from the set. Based on the set perspective, our method is immune to permutation of frames, and can naturally integrate frames from different videos which have been filmed under different scenarios, such as diverse viewing angles, different clothes/carrying conditions. Experiments show that under normal walking conditions, our single-model method achieves an average rank-1 accuracy of 95.0% on the CASIA-B gait dataset and an 87.1% accuracy on the OU-MVLP gait dataset. These results represent new state-of-the-art recognition accuracy. On various complex scenarios, our model exhibits a significant level of robustness. It achieves accuracies of 87.2% and 70.4% on CASIA-B under bag-carrying and coat-wearing walking conditions, respectively. These outperform the existing best methods by a large margin. The method presented can also achieve a satisfactory accuracy with a small number of frames in a test sample, e.g., 82.5% on CASIA-B with only 7 frames. The source code has been released at https://github.com/AbnerHqC/GaitSet.
Tasks Gait Recognition
Published 2018-11-15
URL http://arxiv.org/abs/1811.06186v4
PDF http://arxiv.org/pdf/1811.06186v4.pdf
PWC https://paperswithcode.com/paper/gaitset-regarding-gait-as-a-set-for-cross
Repo https://github.com/AbnerHqC/GaitSet
Framework pytorch

Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach

Title Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach
Authors Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, Luca Daniel
Abstract The robustness of neural networks to adversarial examples has received great attention due to security implications. Despite various attack approaches to crafting visually imperceptible adversarial examples, little has been developed towards a comprehensive measure of robustness. In this paper, we provide a theoretical justification for converting robustness analysis into a local Lipschitz constant estimation problem, and propose to use the Extreme Value Theory for efficient evaluation. Our analysis yields a novel robustness metric called CLEVER, which is short for Cross Lipschitz Extreme Value for nEtwork Robustness. The proposed CLEVER score is attack-agnostic and computationally feasible for large neural networks. Experimental results on various networks, including ResNet, Inception-v3 and MobileNet, show that (i) CLEVER is aligned with the robustness indication measured by the $\ell_2$ and $\ell_\infty$ norms of adversarial examples from powerful attacks, and (ii) defended networks using defensive distillation or bounded ReLU indeed achieve better CLEVER scores. To the best of our knowledge, CLEVER is the first attack-independent robustness metric that can be applied to any neural network classifier.
Tasks
Published 2018-01-31
URL http://arxiv.org/abs/1801.10578v1
PDF http://arxiv.org/pdf/1801.10578v1.pdf
PWC https://paperswithcode.com/paper/evaluating-the-robustness-of-neural-networks
Repo https://github.com/huanzhang12/CLEVER
Framework tf

Weakly Supervised Silhouette-based Semantic Scene Change Detection

Title Weakly Supervised Silhouette-based Semantic Scene Change Detection
Authors Ken Sakurada, Mikiya Shibuya, Weimin Wang
Abstract This paper presents a novel semantic scene change detection scheme with only weak supervision. A straightforward approach for this task is to train a semantic change detection network directly from a large-scale dataset in an end-to-end manner. However, a specific dataset for this task, which is usually labor-intensive and time-consuming, becomes indispensable. To avoid this problem, we propose to train this kind of network from existing datasets by dividing this task into change detection and semantic extraction. On the other hand, the difference in camera viewpoints, for example, images of the same scene captured from a vehicle-mounted camera at different time points, usually brings a challenge to the change detection task. To address this challenge, we propose a new siamese network structure with the introduction of correlation layer. In addition, we create a publicly available dataset for semantic change detection to evaluate the proposed method. The experimental results verified both the robustness to viewpoint difference in change detection task and the effectiveness for semantic change detection of the proposed networks. Our code and dataset are available at https://github.com/xdspacelab/sscdnet.
Tasks
Published 2018-11-29
URL https://arxiv.org/abs/1811.11985v2
PDF https://arxiv.org/pdf/1811.11985v2.pdf
PWC https://paperswithcode.com/paper/weakly-supervised-silhouette-based-semantic
Repo https://github.com/xdspacelab/sscdnet
Framework pytorch

Finite-Data Performance Guarantees for the Output-Feedback Control of an Unknown System

Title Finite-Data Performance Guarantees for the Output-Feedback Control of an Unknown System
Authors Ross Boczar, Nikolai Matni, Benjamin Recht
Abstract As the systems we control become more complex, first-principle modeling becomes either impossible or intractable, motivating the use of machine learning techniques for the control of systems with continuous action spaces. As impressive as the empirical success of these methods have been, strong theoretical guarantees of performance, safety, or robustness are few and far between. This paper takes a step towards such providing such guarantees by establishing finite-data performance guarantees for the robust output-feedback control of an unknown FIR SISO system. In particular, we introduce the “Coarse-ID control” pipeline, which is composed of a system identification step followed by a robust controller synthesis procedure, and analyze its end-to-end performance, providing quantitative bounds on the performance degradation suffered due to model uncertainty as a function of the number of experiments run to identify the system. We conclude with numerical examples demonstrating the effectiveness of our method.
Tasks
Published 2018-03-25
URL http://arxiv.org/abs/1803.09186v2
PDF http://arxiv.org/pdf/1803.09186v2.pdf
PWC https://paperswithcode.com/paper/finite-data-performance-guarantees-for-the
Repo https://github.com/rjboczar/OF-end-to-end-CDC
Framework none

Real-Time Dense Stereo Matching With ELAS on FPGA Accelerated Embedded Devices

Title Real-Time Dense Stereo Matching With ELAS on FPGA Accelerated Embedded Devices
Authors Oscar Rahnama, Duncan Frost, Ondrej Miksik, Philip H. S. Torr
Abstract For many applications in low-power real-time robotics, stereo cameras are the sensors of choice for depth perception as they are typically cheaper and more versatile than their active counterparts. Their biggest drawback, however, is that they do not directly sense depth maps; instead, these must be estimated through data-intensive processes. Therefore, appropriate algorithm selection plays an important role in achieving the desired performance characteristics. Motivated by applications in space and mobile robotics, we implement and evaluate a FPGA-accelerated adaptation of the ELAS algorithm. Despite offering one of the best trade-offs between efficiency and accuracy, ELAS has only been shown to run at 1.5-3 fps on a high-end CPU. Our system preserves all intriguing properties of the original algorithm, such as the slanted plane priors, but can achieve a frame rate of 47fps whilst consuming under 4W of power. Unlike previous FPGA based designs, we take advantage of both components on the CPU/FPGA System-on-Chip to showcase the strategy necessary to accelerate more complex and computationally diverse algorithms for such low power, real-time systems.
Tasks Stereo Matching, Stereo Matching Hand
Published 2018-02-20
URL http://arxiv.org/abs/1802.07210v1
PDF http://arxiv.org/pdf/1802.07210v1.pdf
PWC https://paperswithcode.com/paper/real-time-dense-stereo-matching-with-elas-on
Repo https://github.com/torrvision/ELAS_SoC
Framework none
Title Spreading vectors for similarity search
Authors Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Hervé Jégou
Abstract Discretizing multi-dimensional data distributions is a fundamental step of modern indexing methods. State-of-the-art techniques learn parameters of quantizers on training data for optimal performance, thus adapting quantizers to the data. In this work, we propose to reverse this paradigm and adapt the data to the quantizer: we train a neural net which last layer forms a fixed parameter-free quantizer, such as pre-defined points of a hyper-sphere. As a proxy objective, we design and train a neural network that favors uniformity in the spherical latent space, while preserving the neighborhood structure after the mapping. We propose a new regularizer derived from the Kozachenko–Leonenko differential entropy estimator to enforce uniformity and combine it with a locality-aware triplet loss. Experiments show that our end-to-end approach outperforms most learned quantization methods, and is competitive with the state of the art on widely adopted benchmarks. Furthermore, we show that training without the quantization step results in almost no difference in accuracy, but yields a generic catalyzer that can be applied with any subsequent quantizer.
Tasks Quantization
Published 2018-06-08
URL https://arxiv.org/abs/1806.03198v3
PDF https://arxiv.org/pdf/1806.03198v3.pdf
PWC https://paperswithcode.com/paper/spreading-vectors-for-similarity-search
Repo https://github.com/facebookresearch/spreadingvectors
Framework pytorch

Exploration Conscious Reinforcement Learning Revisited

Title Exploration Conscious Reinforcement Learning Revisited
Authors Lior Shani, Yonathan Efroni, Shie Mannor
Abstract The Exploration-Exploitation tradeoff arises in Reinforcement Learning when one cannot tell if a policy is optimal. Then, there is a constant need to explore new actions instead of exploiting past experience. In practice, it is common to resolve the tradeoff by using a fixed exploration mechanism, such as $\epsilon$-greedy exploration or by adding Gaussian noise, while still trying to learn an optimal policy. In this work, we take a different approach and study exploration-conscious criteria, that result in optimal policies with respect to the exploration mechanism. Solving these criteria, as we establish, amounts to solving a surrogate Markov Decision Process. We continue and analyze properties of exploration-conscious optimal policies and characterize two general approaches to solve such criteria. Building on the approaches, we apply simple changes in existing tabular and deep Reinforcement Learning algorithms and empirically demonstrate superior performance relatively to their non-exploration-conscious counterparts, both for discrete and continuous action spaces.
Tasks
Published 2018-12-13
URL https://arxiv.org/abs/1812.05551v3
PDF https://arxiv.org/pdf/1812.05551v3.pdf
PWC https://paperswithcode.com/paper/exploration-conscious-reinforcement-learning
Repo https://github.com/shanlior/ExplorationConsciousRL
Framework tf

VizML: A Machine Learning Approach to Visualization Recommendation

Title VizML: A Machine Learning Approach to Visualization Recommendation
Authors Kevin Z. Hu, Michiel A. Bakker, Stephen Li, Tim Kraska, César A. Hidalgo
Abstract Data visualization should be accessible for all analysts with data, not just the few with technical expertise. Visualization recommender systems aim to lower the barrier to exploring basic visualizations by automatically generating results for analysts to search and select, rather than manually specify. Here, we demonstrate a novel machine learning-based approach to visualization recommendation that learns visualization design choices from a large corpus of datasets and associated visualizations. First, we identify five key design choices made by analysts while creating visualizations, such as selecting a visualization type and choosing to encode a column along the X- or Y-axis. We train models to predict these design choices using one million dataset-visualization pairs collected from a popular online visualization platform. Neural networks predict these design choices with high accuracy compared to baseline models. We report and interpret feature importances from one of these baseline models. To evaluate the generalizability and uncertainty of our approach, we benchmark with a crowdsourced test set, and show that the performance of our model is comparable to human performance when predicting consensus visualization type, and exceeds that of other ML-based systems.
Tasks Recommendation Systems
Published 2018-08-14
URL http://arxiv.org/abs/1808.04819v1
PDF http://arxiv.org/pdf/1808.04819v1.pdf
PWC https://paperswithcode.com/paper/vizml-a-machine-learning-approach-to
Repo https://github.com/mitmedialab/vizml
Framework none

Learning regression and verification networks for long-term visual tracking

Title Learning regression and verification networks for long-term visual tracking
Authors Yunhua Zhang, Dong Wang, Lijun Wang, Jinqing Qi, Huchuan Lu
Abstract Compared with short-term tracking, the long-term tracking task requires determining the tracked object is present or absent, and then estimating the accurate bounding box if present or conducting image-wide re-detection if absent. Until now, few attempts have been done although this task is much closer to designing practical tracking systems. In this work, we propose a novel long-term tracking framework based on deep regression and verification networks. The offline-trained regression model is designed using the object-aware feature fusion and region proposal networks to generate a series of candidates and estimate their similarity scores effectively. The verification network evaluates these candidates to output the optimal one as the tracked object with its classification score, which is online updated to adapt to the appearance variations based on newly reliable observations. The similarity and classification scores are combined to obtain a final confidence value, based on which our tracker can determine the absence of the target accurately and conduct image-wide re-detection to capture the target successfully when it reappears. Extensive experiments show that our tracker achieves the best performance on the VOT2018 long-term challenge and state-of-the-art results on the OxUvA long-term dataset.
Tasks Visual Tracking
Published 2018-09-12
URL http://arxiv.org/abs/1809.04320v2
PDF http://arxiv.org/pdf/1809.04320v2.pdf
PWC https://paperswithcode.com/paper/learning-regression-and-verification-networks
Repo https://github.com/xiaobai1217/MBMD
Framework tf

InfoCatVAE: Representation Learning with Categorical Variational Autoencoders

Title InfoCatVAE: Representation Learning with Categorical Variational Autoencoders
Authors Edouard Pineau, Marc Lelarge
Abstract This paper describes InfoCatVAE, an extension of the variational autoencoder that enables unsupervised disentangled representation learning. InfoCatVAE uses multimodal distributions for the prior and the inference network and then maximizes the evidence lower bound objective (ELBO). We connect the new ELBO derived for our model with a natural soft clustering objective which explains the robustness of our approach. We then adapt the InfoGANs method to our setting in order to maximize the mutual information between the categorical code and the generated inputs and obtain an improved model.
Tasks Representation Learning
Published 2018-06-20
URL http://arxiv.org/abs/1806.08240v2
PDF http://arxiv.org/pdf/1806.08240v2.pdf
PWC https://paperswithcode.com/paper/infocatvae-representation-learning-with
Repo https://github.com/edouardpineau/infoCatVAE
Framework none

Informative Object Annotations: Tell Me Something I Don’t Know

Title Informative Object Annotations: Tell Me Something I Don’t Know
Authors Lior Bracha, Gal Chechik
Abstract Capturing the interesting components of an image is a key aspect of image understanding. When a speaker annotates an image, selecting labels that are informative greatly depends on the prior knowledge of a prospective listener. Motivated by cognitive theories of categorization and communication, we present a new unsupervised approach to model this prior knowledge and quantify the informativeness of a description. Specifically, we compute how knowledge of a label reduces uncertainty over the space of labels and utilize this to rank candidate labels for describing an image. While the full estimation problem is intractable, we describe an efficient algorithm to approximate entropy reduction using a tree-structured graphical model. We evaluate our approach on the open-images dataset using a new evaluation set of 10K ground-truth ratings and find that it achieves ~65% agreement with human raters, largely outperforming other unsupervised baseline approaches.
Tasks
Published 2018-12-26
URL http://arxiv.org/abs/1812.10358v1
PDF http://arxiv.org/pdf/1812.10358v1.pdf
PWC https://paperswithcode.com/paper/informative-object-annotations-tell-me
Repo https://github.com/liorbracha/iota
Framework none

What Does a TextCNN Learn?

Title What Does a TextCNN Learn?
Authors Linyuan Gong, Ruyi Ji
Abstract TextCNN, the convolutional neural network for text, is a useful deep learning algorithm for sentence classification tasks such as sentiment analysis and question classification. However, neural networks have long been known as black boxes because interpreting them is a challenging task. Researchers have developed several tools to understand a CNN for image classification by deep visualization, but research about deep TextCNNs is still insufficient. In this paper, we are trying to understand what a TextCNN learns on two classical NLP datasets. Our work focuses on functions of different convolutional kernels and correlations between convolutional kernels.
Tasks Image Classification, Sentence Classification, Sentiment Analysis
Published 2018-01-19
URL http://arxiv.org/abs/1801.06287v1
PDF http://arxiv.org/pdf/1801.06287v1.pdf
PWC https://paperswithcode.com/paper/what-does-a-textcnn-learn
Repo https://github.com/arita37/mlmodels
Framework tf

A disciplined approach to neural network hyper-parameters: Part 1 – learning rate, batch size, momentum, and weight decay

Title A disciplined approach to neural network hyper-parameters: Part 1 – learning rate, batch size, momentum, and weight decay
Authors Leslie N. Smith
Abstract Although deep learning has produced dazzling successes for applications of image, speech, and video processing in the past few years, most trainings are with suboptimal hyper-parameters, requiring unnecessarily long training times. Setting the hyper-parameters remains a black art that requires years of experience to acquire. This report proposes several efficient ways to set the hyper-parameters that significantly reduce training time and improves performance. Specifically, this report shows how to examine the training validation/test loss function for subtle clues of underfitting and overfitting and suggests guidelines for moving toward the optimal balance point. Then it discusses how to increase/decrease the learning rate/momentum to speed up training. Our experiments show that it is crucial to balance every manner of regularization for each dataset and architecture. Weight decay is used as a sample regularizer to show how its optimal value is tightly coupled with the learning rates and momentums. Files to help replicate the results reported here are available.
Tasks
Published 2018-03-26
URL http://arxiv.org/abs/1803.09820v2
PDF http://arxiv.org/pdf/1803.09820v2.pdf
PWC https://paperswithcode.com/paper/a-disciplined-approach-to-neural-network
Repo https://github.com/AbhimanyuAryan/ImageClassification
Framework none

The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions

Title The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions
Authors Philipp Tschandl, Cliff Rosendahl, Harald Kittler
Abstract Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available datasets of dermatoscopic images. We tackle this problem by releasing the HAM10000 (“Human Against Machine with 10000 training images”) dataset. We collected dermatoscopic images from different populations acquired and stored by different modalities. Given this diversity we had to apply different acquisition and cleaning methods and developed semi-automatic workflows utilizing specifically trained neural networks. The final dataset consists of 10015 dermatoscopic images which are released as a training set for academic machine learning purposes and are publicly available through the ISIC archive. This benchmark dataset can be used for machine learning and for comparisons with human experts. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions. More than 50% of lesions have been confirmed by pathology, while the ground truth for the rest of the cases was either follow-up, expert consensus, or confirmation by in-vivo confocal microscopy.
Tasks
Published 2018-03-28
URL http://arxiv.org/abs/1803.10417v3
PDF http://arxiv.org/pdf/1803.10417v3.pdf
PWC https://paperswithcode.com/paper/the-ham10000-dataset-a-large-collection-of
Repo https://github.com/ptschandl/HAM10000_dataset
Framework caffe2
comments powered by Disqus