October 21, 2019

3078 words 15 mins read

Paper Group AWR 112

Paper Group AWR 112

Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection. Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly. Policy Optimization via Importance Sampling. Generic Model-Agnostic Convolutional Neural Network for Single Image Dehazing. ShakeDrop Regularization for Deep Resi …

Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection

Title Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection
Authors Alexander Wong, Mohammad Javad Shafiee, Francis Li, Brendan Chwyl
Abstract Object detection is a major challenge in computer vision, involving both object classification and object localization within a scene. While deep neural networks have been shown in recent years to yield very powerful techniques for tackling the challenge of object detection, one of the biggest challenges with enabling such object detection networks for widespread deployment on embedded devices is high computational and memory requirements. Recently, there has been an increasing focus in exploring small deep neural network architectures for object detection that are more suitable for embedded devices, such as Tiny YOLO and SqueezeDet. Inspired by the efficiency of the Fire microarchitecture introduced in SqueezeNet and the object detection performance of the single-shot detection macroarchitecture introduced in SSD, this paper introduces Tiny SSD, a single-shot detection deep convolutional neural network for real-time embedded object detection that is composed of a highly optimized, non-uniform Fire sub-network stack and a non-uniform sub-network stack of highly optimized SSD-based auxiliary convolutional feature layers designed specifically to minimize model size while maintaining object detection performance. The resulting Tiny SSD possess a model size of 2.3MB (~26X smaller than Tiny YOLO) while still achieving an mAP of 61.3% on VOC 2007 (~4.2% higher than Tiny YOLO). These experimental results show that very small deep neural network architectures can be designed for real-time object detection that are well-suited for embedded scenarios.
Tasks Object Classification, Object Detection, Object Localization, Real-Time Object Detection
Published 2018-02-19
URL http://arxiv.org/abs/1802.06488v1
PDF http://arxiv.org/pdf/1802.06488v1.pdf
PWC https://paperswithcode.com/paper/tiny-ssd-a-tiny-single-shot-detection-deep
Repo https://github.com/lampsonSong/tinySSD
Framework caffe2

Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Title Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly
Authors Michael Opitz, Georg Waltner, Horst Possegger, Horst Bischof
Abstract Learning similarity functions between image pairs with deep neural networks yields highly correlated activations of embeddings. In this work, we show how to improve the robustness of such embeddings by exploiting the independence within ensembles. To this end, we divide the last embedding layer of a deep network into an embedding ensemble and formulate training this ensemble as an online gradient boosting problem. Each learner receives a reweighted training sample from the previous learners. Further, we propose two loss functions which increase the diversity in our ensemble. These loss functions can be applied either for weight initialization or during training. Together, our contributions leverage large embedding sizes more effectively by significantly reducing correlation of the embedding and consequently increase retrieval accuracy of the embedding. Our method works with any differentiable loss function and does not introduce any additional parameters during test time. We evaluate our metric learning method on image retrieval tasks and show that it improves over state-of-the-art methods on the CUB 200-2011, Cars-196, Stanford Online Products, In-Shop Clothes Retrieval and VehicleID datasets.
Tasks Image Retrieval, Metric Learning
Published 2018-01-15
URL http://arxiv.org/abs/1801.04815v1
PDF http://arxiv.org/pdf/1801.04815v1.pdf
PWC https://paperswithcode.com/paper/deep-metric-learning-with-bier-boosting
Repo https://github.com/mop/bier
Framework tf

Policy Optimization via Importance Sampling

Title Policy Optimization via Importance Sampling
Authors Alberto Maria Metelli, Matteo Papini, Francesco Faccio, Marcello Restelli
Abstract Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating online and offline optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial, as it requires to account for the variance of the objective function estimate. In this paper, we propose a novel, model-free, policy search algorithm, POIS, applicable in both action-based and parameter-based settings. We first derive a high-confidence bound for importance sampling estimation; then we define a surrogate objective function, which is optimized offline whenever a new batch of trajectories is collected. Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with state-of-the-art policy optimization methods.
Tasks Continuous Control
Published 2018-09-17
URL http://arxiv.org/abs/1809.06098v2
PDF http://arxiv.org/pdf/1809.06098v2.pdf
PWC https://paperswithcode.com/paper/policy-optimization-via-importance-sampling
Repo https://github.com/T3p/pois
Framework tf

Generic Model-Agnostic Convolutional Neural Network for Single Image Dehazing

Title Generic Model-Agnostic Convolutional Neural Network for Single Image Dehazing
Authors Zheng Liu, Botao Xiao, Muhammad Alrabeiah, Keyan Wang, Jun Chen
Abstract Haze and smog are among the most common environmental factors impacting image quality and, therefore, image analysis. This paper proposes an end-to-end generative method for image dehazing. It is based on designing a fully convolutional neural network to recognize haze structures in input images and restore clear, haze-free images. The proposed method is agnostic in the sense that it does not explore the atmosphere scattering model. Somewhat surprisingly, it achieves superior performance relative to all existing state-of-the-art methods for image dehazing even on SOTS outdoor images, which are synthesized using the atmosphere scattering model. Project detail and code can be found here: https://github.com/Seanforfun/GMAN_Net_Haze_Removal
Tasks Image Dehazing, Single Image Dehazing
Published 2018-10-05
URL https://arxiv.org/abs/1810.02862v2
PDF https://arxiv.org/pdf/1810.02862v2.pdf
PWC https://paperswithcode.com/paper/generic-model-agnostic-convolutional-neural
Repo https://github.com/Seanforfun/GMAN_Net_Haze_Removal
Framework tf

ShakeDrop Regularization for Deep Residual Learning

Title ShakeDrop Regularization for Deep Residual Learning
Authors Yoshihiro Yamada, Masakazu Iwamura, Takuya Akiba, Koichi Kise
Abstract Overfitting is a crucial problem in deep neural networks, even in the latest network architectures. In this paper, to relieve the overfitting effect of ResNet and its improvements (i.e., Wide ResNet, PyramidNet, and ResNeXt), we propose a new regularization method called ShakeDrop regularization. ShakeDrop is inspired by Shake-Shake, which is an effective regularization method, but can be applied to ResNeXt only. ShakeDrop is more effective than Shake-Shake and can be applied not only to ResNeXt but also ResNet, Wide ResNet, and PyramidNet. An important key is to achieve stability of training. Because effective regularization often causes unstable training, we introduce a training stabilizer, which is an unusual use of an existing regularizer. Through experiments under various conditions, we demonstrate the conditions under which ShakeDrop works well.
Tasks
Published 2018-02-07
URL https://arxiv.org/abs/1802.02375v3
PDF https://arxiv.org/pdf/1802.02375v3.pdf
PWC https://paperswithcode.com/paper/shakedrop-regularization-for-deep-residual
Repo https://github.com/imenurok/ShakeDrop
Framework pytorch

Adversarial Attack on Graph Structured Data

Title Adversarial Attack on Graph Structured Data
Authors Hanjun Dai, Hui Li, Tian Tian, Xin Huang, Lin Wang, Jun Zhu, Le Song
Abstract Deep learning on graph structures has shown exciting results in various applications. However, few attentions have been paid to the robustness of such models, in contrast to numerous research work for image or text adversarial attack and defense. In this paper, we focus on the adversarial attacks that fool the model by modifying the combinatorial structure of data. We first propose a reinforcement learning based attack method that learns the generalizable attack policy, while only requiring prediction labels from the target classifier. Also, variants of genetic algorithms and gradient methods are presented in the scenario where prediction confidence or gradients are available. We use both synthetic and real-world data to show that, a family of Graph Neural Network models are vulnerable to these attacks, in both graph-level and node-level classification tasks. We also show such attacks can be used to diagnose the learned classifiers.
Tasks Adversarial Attack
Published 2018-06-06
URL http://arxiv.org/abs/1806.02371v1
PDF http://arxiv.org/pdf/1806.02371v1.pdf
PWC https://paperswithcode.com/paper/adversarial-attack-on-graph-structured-data
Repo https://github.com/Hanjun-Dai/graph_adversarial_attack
Framework pytorch

Adversarial Complementary Learning for Weakly Supervised Object Localization

Title Adversarial Complementary Learning for Weakly Supervised Object Localization
Authors Xiaolin Zhang, Yunchao Wei, Jiashi Feng, Yi Yang, Thomas Huang
Abstract In this work, we propose Adversarial Complementary Learning (ACoL) to automatically localize integral objects of semantic interest with weak supervision. We first mathematically prove that class localization maps can be obtained by directly selecting the class-specific feature maps of the last convolutional layer, which paves a simple way to identify object regions. We then present a simple network architecture including two parallel-classifiers for object localization. Specifically, we leverage one classification branch to dynamically localize some discriminative object regions during the forward pass. Although it is usually responsive to sparse parts of the target objects, this classifier can drive the counterpart classifier to discover new and complementary object regions by erasing its discovered regions from the feature maps. With such an adversarial learning, the two parallel-classifiers are forced to leverage complementary object regions for classification and can finally generate integral object localization together. The merits of ACoL are mainly two-fold: 1) it can be trained in an end-to-end manner; 2) dynamically erasing enables the counterpart classifier to discover complementary object regions more effectively. We demonstrate the superiority of our ACoL approach in a variety of experiments. In particular, the Top-1 localization error rate on the ILSVRC dataset is 45.14%, which is the new state-of-the-art.
Tasks Object Localization, Weakly-Supervised Object Localization
Published 2018-04-19
URL http://arxiv.org/abs/1804.06962v1
PDF http://arxiv.org/pdf/1804.06962v1.pdf
PWC https://paperswithcode.com/paper/adversarial-complementary-learning-for-weakly
Repo https://github.com/Hayashi-Yudai/ML_models
Framework tf

Deflecting Adversarial Attacks with Pixel Deflection

Title Deflecting Adversarial Attacks with Pixel Deflection
Authors Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, James Storer
Abstract CNNs are poised to become integral parts of many critical systems. Despite their robustness to natural variations, image pixel values can be manipulated, via small, carefully crafted, imperceptible perturbations, to cause a model to misclassify images. We present an algorithm to process an image so that classification accuracy is significantly preserved in the presence of such adversarial manipulations. Image classifiers tend to be robust to natural noise, and adversarial attacks tend to be agnostic to object location. These observations motivate our strategy, which leverages model robustness to defend against adversarial perturbations by forcing the image to match natural image statistics. Our algorithm locally corrupts the image by redistributing pixel values via a process we term pixel deflection. A subsequent wavelet-based denoising operation softens this corruption, as well as some of the adversarial changes. We demonstrate experimentally that the combination of these techniques enables the effective recovery of the true class, against a variety of robust attacks. Our results compare favorably with current state-of-the-art defenses, without requiring retraining or modifying the CNN.
Tasks Adversarial Attack
Published 2018-01-26
URL http://arxiv.org/abs/1801.08926v3
PDF http://arxiv.org/pdf/1801.08926v3.pdf
PWC https://paperswithcode.com/paper/deflecting-adversarial-attacks-with-pixel
Repo https://github.com/iamaaditya/pixel-deflection
Framework none

Rob-GAN: Generator, Discriminator, and Adversarial Attacker

Title Rob-GAN: Generator, Discriminator, and Adversarial Attacker
Authors Xuanqing Liu, Cho-Jui Hsieh
Abstract We study two important concepts in adversarial deep learning—adversarial training and generative adversarial network (GAN). Adversarial training is the technique used to improve the robustness of discriminator by combining adversarial attacker and discriminator in the training phase. GAN is commonly used for image generation by jointly optimizing discriminator and generator. We show these two concepts are indeed closely related and can be used to strengthen each other—adding a generator to the adversarial training procedure can improve the robustness of discriminators, and adding an adversarial attack to GAN training can improve the convergence speed and lead to better generators. Combining these two insights, we develop a framework called Rob-GAN to jointly optimize generator and discriminator in the presence of adversarial attacks—the generator generates fake images to fool discriminator; the adversarial attacker perturbs real images to fool the discriminator, and the discriminator wants to minimize loss under fake and adversarial images. Through this end-to-end training procedure, we are able to simultaneously improve the convergence speed of GAN training, the quality of synthetic images, and the robustness of discriminator under strong adversarial attacks. Experimental results demonstrate that the obtained classifier is more robust than the state-of-the-art adversarial training approach, and the generator outperforms SN-GAN on ImageNet-143.
Tasks Adversarial Attack, Image Generation
Published 2018-07-27
URL http://arxiv.org/abs/1807.10454v3
PDF http://arxiv.org/pdf/1807.10454v3.pdf
PWC https://paperswithcode.com/paper/from-adversarial-training-to-generative
Repo https://github.com/xuanqing94/RobGAN
Framework pytorch

The Earth ain’t Flat: Monocular Reconstruction of Vehicles on Steep and Graded Roads from a Moving Camera

Title The Earth ain’t Flat: Monocular Reconstruction of Vehicles on Steep and Graded Roads from a Moving Camera
Authors Junaid Ahmed Ansari, Sarthak Sharma, Anshuman Majumdar, J. Krishna Murthy, K. Madhava Krishna
Abstract Accurate localization of other traffic participants is a vital task in autonomous driving systems. State-of-the-art systems employ a combination of sensing modalities such as RGB cameras and LiDARs for localizing traffic participants, but most such demonstrations have been confined to plain roads. We demonstrate, to the best of our knowledge, the first results for monocular object localization and shape estimation on surfaces that do not share the same plane with the moving monocular camera. We approximate road surfaces by local planar patches and use semantic cues from vehicles in the scene to initialize a local bundle-adjustment like procedure that simultaneously estimates the pose and shape of the vehicles, and the orientation of the local ground plane on which the vehicle stands as well. We evaluate the proposed approach on the KITTI and SYNTHIA-SF benchmarks, for a variety of road plane configurations. The proposed approach significantly improves the state-of-the-art for monocular object localization on arbitrarily-shaped roads.
Tasks Autonomous Driving, Object Localization
Published 2018-03-06
URL http://arxiv.org/abs/1803.02057v1
PDF http://arxiv.org/pdf/1803.02057v1.pdf
PWC https://paperswithcode.com/paper/the-earth-aint-flat-monocular-reconstruction
Repo https://github.com/sarthaksharma13/IROS18
Framework none

A Multimodal LSTM for Predicting Listener Empathic Responses Over Time

Title A Multimodal LSTM for Predicting Listener Empathic Responses Over Time
Authors Zhi-Xuan Tan, Arushi Goel, Thanh-Son Nguyen, Desmond C. Ong
Abstract People naturally understand the emotions of-and often also empathize with-those around them. In this paper, we predict the emotional valence of an empathic listener over time as they listen to a speaker narrating a life story. We use the dataset provided by the OMG-Empathy Prediction Challenge, a workshop held in conjunction with IEEE FG 2019. We present a multimodal LSTM model with feature-level fusion and local attention that predicts empathic responses from audio, text, and visual features. Our best-performing model, which used only the audio and text features, achieved a concordance correlation coefficient (CCC) of 0.29 and 0.32 on the Validation set for the Generalized and Personalized track respectively, and achieved a CCC of 0.14 and 0.14 on the held-out Test set. We discuss the difficulties faced and the lessons learnt tackling this challenge.
Tasks
Published 2018-12-12
URL http://arxiv.org/abs/1812.04891v2
PDF http://arxiv.org/pdf/1812.04891v2.pdf
PWC https://paperswithcode.com/paper/a-multimodal-lstm-for-predicting-listener
Repo https://github.com/desmond-ong/cheem-omg-empathy
Framework pytorch

Variable Selection and Task Grouping for Multi-Task Learning

Title Variable Selection and Task Grouping for Multi-Task Learning
Authors Jun-Yong Jeong, Chi-Hyuck Jun
Abstract We consider multi-task learning, which simultaneously learns related prediction tasks, to improve generalization performance. We factorize a coefficient matrix as the product of two matrices based on a low-rank assumption. These matrices have sparsities to simultaneously perform variable selection and learn and overlapping group structure among the tasks. The resulting bi-convex objective function is minimized by alternating optimization where sub-problems are solved using alternating direction method of multipliers and accelerated proximal gradient descent. Moreover, we provide the performance bound of the proposed method. The effectiveness of the proposed method is validated for both synthetic and real-world datasets.
Tasks Multi-Task Learning
Published 2018-02-13
URL http://arxiv.org/abs/1802.04676v1
PDF http://arxiv.org/pdf/1802.04676v1.pdf
PWC https://paperswithcode.com/paper/variable-selection-and-task-grouping-for
Repo https://github.com/JunYongJeong/VSTG-MTL
Framework none

Surrogate-assisted Bayesian inversion for landscape and basin evolution models

Title Surrogate-assisted Bayesian inversion for landscape and basin evolution models
Authors Rohitash Chandra, Danial Azam, Arpit Kapoor, R. Dietmar Müller
Abstract The complex and computationally expensive features of the forward landscape and sedimentary basin evolution models pose a major challenge in the development of efficient inference and optimization methods. Bayesian inference provides a methodology for estimation and uncertainty quantification of free model parameters. In our previous work, parallel tempering Bayeslands was developed as a framework for parameter estimation and uncertainty quantification for the landscape and basin evolution modelling software Badlands. Parallel tempering Bayeslands features high-performance computing with dozens of processing cores running in parallel to enhance computational efficiency. Although parallel computing is used, the procedure remains computationally challenging since thousands of samples need to be drawn and evaluated. In large-scale landscape and basin evolution problems, a single model evaluation can take from several minutes to hours, and in certain cases, even days. Surrogate-assisted optimization has been with successfully applied to a number of engineering problems. This motivates its use in optimisation and inference methods suited for complex models in geology and geophysics. Surrogates can speed up parallel tempering Bayeslands by developing computationally inexpensive surrogates to mimic expensive models. In this paper, we present an application of surrogate-assisted parallel tempering where that surrogate mimics a landscape evolution model including erosion, sediment transport and deposition, by estimating the likelihood function that is given by the model. We employ a machine learning model as a surrogate that learns from the samples generated by the parallel tempering algorithm. The results show that the methodology is effective in lowering the overall computational cost significantly while retaining the quality of solutions.
Tasks Bayesian Inference
Published 2018-12-12
URL https://arxiv.org/abs/1812.08655v1
PDF https://arxiv.org/pdf/1812.08655v1.pdf
PWC https://paperswithcode.com/paper/surrogate-assisted-bayesian-inversion-for
Repo https://github.com/intelligentEarth/surrogate-pt-Bayeslands
Framework tf

The unreasonable effectiveness of the forget gate

Title The unreasonable effectiveness of the forget gate
Authors Jos van der Westhuizen, Joan Lasenby
Abstract Given the success of the gated recurrent unit, a natural question is whether all the gates of the long short-term memory (LSTM) network are necessary. Previous research has shown that the forget gate is one of the most important gates in the LSTM. Here we show that a forget-gate-only version of the LSTM with chrono-initialized biases, not only provides computational savings but outperforms the standard LSTM on multiple benchmark datasets and competes with some of the best contemporary models. Our proposed network, the JANET, achieves accuracies of 99% and 92.5% on the MNIST and pMNIST datasets, outperforming the standard LSTM which yields accuracies of 98.5% and 91%.
Tasks
Published 2018-04-13
URL http://arxiv.org/abs/1804.04849v3
PDF http://arxiv.org/pdf/1804.04849v3.pdf
PWC https://paperswithcode.com/paper/the-unreasonable-effectiveness-of-the-forget
Repo https://github.com/JosvanderWesthuizen/janet
Framework tf

Photo Wake-Up: 3D Character Animation from a Single Photo

Title Photo Wake-Up: 3D Character Animation from a Single Photo
Authors Chung-Yi Weng, Brian Curless, Ira Kemelmacher-Shlizerman
Abstract We present a method and application for animating a human subject from a single photo. E.g., the character can walk out, run, sit, or jump in 3D. The key contributions of this paper are: 1) an application of viewing and animating humans in single photos in 3D, 2) a novel 2D warping method to deform a posable template body model to fit the person’s complex silhouette to create an animatable mesh, and 3) a method for handling partial self occlusions. We compare to state-of-the-art related methods and evaluate results with human studies. Further, we present an interactive interface that allows re-posing the person in 3D, and an augmented reality setup where the animated 3D person can emerge from the photo into the real world. We demonstrate the method on photos, posters, and art.
Tasks 3D Character Animation From A Single Photo
Published 2018-12-05
URL http://arxiv.org/abs/1812.02246v1
PDF http://arxiv.org/pdf/1812.02246v1.pdf
PWC https://paperswithcode.com/paper/photo-wake-up-3d-character-animation-from-a
Repo https://github.com/mplatnic/Deep-Learning
Framework tf
comments powered by Disqus