October 21, 2019

3078 words 15 mins read

Paper Group AWR 112

Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection. Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly. Policy Optimization via Importance Sampling. Generic Model-Agnostic Convolutional Neural Network for Single Image Dehazing. ShakeDrop Regularization for Deep Resi …

Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection


Title	Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection
Authors	Alexander Wong, Mohammad Javad Shafiee, Francis Li, Brendan Chwyl
Abstract	Object detection is a major challenge in computer vision, involving both object classification and object localization within a scene. While deep neural networks have been shown in recent years to yield very powerful techniques for tackling the challenge of object detection, one of the biggest challenges with enabling such object detection networks for widespread deployment on embedded devices is high computational and memory requirements. Recently, there has been an increasing focus in exploring small deep neural network architectures for object detection that are more suitable for embedded devices, such as Tiny YOLO and SqueezeDet. Inspired by the efficiency of the Fire microarchitecture introduced in SqueezeNet and the object detection performance of the single-shot detection macroarchitecture introduced in SSD, this paper introduces Tiny SSD, a single-shot detection deep convolutional neural network for real-time embedded object detection that is composed of a highly optimized, non-uniform Fire sub-network stack and a non-uniform sub-network stack of highly optimized SSD-based auxiliary convolutional feature layers designed specifically to minimize model size while maintaining object detection performance. The resulting Tiny SSD possess a model size of 2.3MB (~26X smaller than Tiny YOLO) while still achieving an mAP of 61.3% on VOC 2007 (~4.2% higher than Tiny YOLO). These experimental results show that very small deep neural network architectures can be designed for real-time object detection that are well-suited for embedded scenarios.
Tasks	Object Classification, Object Detection, Object Localization, Real-Time Object Detection
Published	2018-02-19
URL	http://arxiv.org/abs/1802.06488v1
PDF	http://arxiv.org/pdf/1802.06488v1.pdf
PWC	https://paperswithcode.com/paper/tiny-ssd-a-tiny-single-shot-detection-deep
Repo	https://github.com/lampsonSong/tinySSD
Framework	caffe2

Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly


Title	Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly
Authors	Michael Opitz, Georg Waltner, Horst Possegger, Horst Bischof
Abstract	Learning similarity functions between image pairs with deep neural networks yields highly correlated activations of embeddings. In this work, we show how to improve the robustness of such embeddings by exploiting the independence within ensembles. To this end, we divide the last embedding layer of a deep network into an embedding ensemble and formulate training this ensemble as an online gradient boosting problem. Each learner receives a reweighted training sample from the previous learners. Further, we propose two loss functions which increase the diversity in our ensemble. These loss functions can be applied either for weight initialization or during training. Together, our contributions leverage large embedding sizes more effectively by significantly reducing correlation of the embedding and consequently increase retrieval accuracy of the embedding. Our method works with any differentiable loss function and does not introduce any additional parameters during test time. We evaluate our metric learning method on image retrieval tasks and show that it improves over state-of-the-art methods on the CUB 200-2011, Cars-196, Stanford Online Products, In-Shop Clothes Retrieval and VehicleID datasets.
Tasks	Image Retrieval, Metric Learning
Published	2018-01-15
URL	http://arxiv.org/abs/1801.04815v1
PDF	http://arxiv.org/pdf/1801.04815v1.pdf
PWC	https://paperswithcode.com/paper/deep-metric-learning-with-bier-boosting
Repo	https://github.com/mop/bier
Framework	tf

Policy Optimization via Importance Sampling


Title	Policy Optimization via Importance Sampling
Authors	Alberto Maria Metelli, Matteo Papini, Francesco Faccio, Marcello Restelli
Abstract	Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating online and offline optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial, as it requires to account for the variance of the objective function estimate. In this paper, we propose a novel, model-free, policy search algorithm, POIS, applicable in both action-based and parameter-based settings. We first derive a high-confidence bound for importance sampling estimation; then we define a surrogate objective function, which is optimized offline whenever a new batch of trajectories is collected. Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with state-of-the-art policy optimization methods.
Tasks	Continuous Control
Published	2018-09-17
URL	http://arxiv.org/abs/1809.06098v2
PDF	http://arxiv.org/pdf/1809.06098v2.pdf
PWC	https://paperswithcode.com/paper/policy-optimization-via-importance-sampling
Repo	https://github.com/T3p/pois
Framework	tf

Generic Model-Agnostic Convolutional Neural Network for Single Image Dehazing


Title	Generic Model-Agnostic Convolutional Neural Network for Single Image Dehazing
Authors	Zheng Liu, Botao Xiao, Muhammad Alrabeiah, Keyan Wang, Jun Chen
Abstract	Haze and smog are among the most common environmental factors impacting image quality and, therefore, image analysis. This paper proposes an end-to-end generative method for image dehazing. It is based on designing a fully convolutional neural network to recognize haze structures in input images and restore clear, haze-free images. The proposed method is agnostic in the sense that it does not explore the atmosphere scattering model. Somewhat surprisingly, it achieves superior performance relative to all existing state-of-the-art methods for image dehazing even on SOTS outdoor images, which are synthesized using the atmosphere scattering model. Project detail and code can be found here: https://github.com/Seanforfun/GMAN_Net_Haze_Removal
Tasks	Image Dehazing, Single Image Dehazing
Published	2018-10-05
URL	https://arxiv.org/abs/1810.02862v2
PDF	https://arxiv.org/pdf/1810.02862v2.pdf
PWC	https://paperswithcode.com/paper/generic-model-agnostic-convolutional-neural
Repo	https://github.com/Seanforfun/GMAN_Net_Haze_Removal
Framework	tf

ShakeDrop Regularization for Deep Residual Learning


Title	ShakeDrop Regularization for Deep Residual Learning
Authors	Yoshihiro Yamada, Masakazu Iwamura, Takuya Akiba, Koichi Kise
Abstract	Overfitting is a crucial problem in deep neural networks, even in the latest network architectures. In this paper, to relieve the overfitting effect of ResNet and its improvements (i.e., Wide ResNet, PyramidNet, and ResNeXt), we propose a new regularization method called ShakeDrop regularization. ShakeDrop is inspired by Shake-Shake, which is an effective regularization method, but can be applied to ResNeXt only. ShakeDrop is more effective than Shake-Shake and can be applied not only to ResNeXt but also ResNet, Wide ResNet, and PyramidNet. An important key is to achieve stability of training. Because effective regularization often causes unstable training, we introduce a training stabilizer, which is an unusual use of an existing regularizer. Through experiments under various conditions, we demonstrate the conditions under which ShakeDrop works well.
Tasks
Published	2018-02-07
URL	https://arxiv.org/abs/1802.02375v3
PDF	https://arxiv.org/pdf/1802.02375v3.pdf
PWC	https://paperswithcode.com/paper/shakedrop-regularization-for-deep-residual
Repo	https://github.com/imenurok/ShakeDrop
Framework	pytorch

Adversarial Attack on Graph Structured Data


Title	Adversarial Attack on Graph Structured Data
Authors	Hanjun Dai, Hui Li, Tian Tian, Xin Huang, Lin Wang, Jun Zhu, Le Song
Abstract	Deep learning on graph structures has shown exciting results in various applications. However, few attentions have been paid to the robustness of such models, in contrast to numerous research work for image or text adversarial attack and defense. In this paper, we focus on the adversarial attacks that fool the model by modifying the combinatorial structure of data. We first propose a reinforcement learning based attack method that learns the generalizable attack policy, while only requiring prediction labels from the target classifier. Also, variants of genetic algorithms and gradient methods are presented in the scenario where prediction confidence or gradients are available. We use both synthetic and real-world data to show that, a family of Graph Neural Network models are vulnerable to these attacks, in both graph-level and node-level classification tasks. We also show such attacks can be used to diagnose the learned classifiers.
Tasks	Adversarial Attack
Published	2018-06-06
URL	http://arxiv.org/abs/1806.02371v1
PDF	http://arxiv.org/pdf/1806.02371v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-attack-on-graph-structured-data
Repo	https://github.com/Hanjun-Dai/graph_adversarial_attack
Framework	pytorch

Adversarial Complementary Learning for Weakly Supervised Object Localization


Title	Adversarial Complementary Learning for Weakly Supervised Object Localization
Authors	Xiaolin Zhang, Yunchao Wei, Jiashi Feng, Yi Yang, Thomas Huang
Abstract	In this work, we propose Adversarial Complementary Learning (ACoL) to automatically localize integral objects of semantic interest with weak supervision. We first mathematically prove that class localization maps can be obtained by directly selecting the class-specific feature maps of the last convolutional layer, which paves a simple way to identify object regions. We then present a simple network architecture including two parallel-classifiers for object localization. Specifically, we leverage one classification branch to dynamically localize some discriminative object regions during the forward pass. Although it is usually responsive to sparse parts of the target objects, this classifier can drive the counterpart classifier to discover new and complementary object regions by erasing its discovered regions from the feature maps. With such an adversarial learning, the two parallel-classifiers are forced to leverage complementary object regions for classification and can finally generate integral object localization together. The merits of ACoL are mainly two-fold: 1) it can be trained in an end-to-end manner; 2) dynamically erasing enables the counterpart classifier to discover complementary object regions more effectively. We demonstrate the superiority of our ACoL approach in a variety of experiments. In particular, the Top-1 localization error rate on the ILSVRC dataset is 45.14%, which is the new state-of-the-art.
Tasks	Object Localization, Weakly-Supervised Object Localization
Published	2018-04-19
URL	http://arxiv.org/abs/1804.06962v1
PDF	http://arxiv.org/pdf/1804.06962v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-complementary-learning-for-weakly
Repo	https://github.com/Hayashi-Yudai/ML_models
Framework	tf

Deflecting Adversarial Attacks with Pixel Deflection


Title	Deflecting Adversarial Attacks with Pixel Deflection
Authors	Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, James Storer
Abstract	CNNs are poised to become integral parts of many critical systems. Despite their robustness to natural variations, image pixel values can be manipulated, via small, carefully crafted, imperceptible perturbations, to cause a model to misclassify images. We present an algorithm to process an image so that classification accuracy is significantly preserved in the presence of such adversarial manipulations. Image classifiers tend to be robust to natural noise, and adversarial attacks tend to be agnostic to object location. These observations motivate our strategy, which leverages model robustness to defend against adversarial perturbations by forcing the image to match natural image statistics. Our algorithm locally corrupts the image by redistributing pixel values via a process we term pixel deflection. A subsequent wavelet-based denoising operation softens this corruption, as well as some of the adversarial changes. We demonstrate experimentally that the combination of these techniques enables the effective recovery of the true class, against a variety of robust attacks. Our results compare favorably with current state-of-the-art defenses, without requiring retraining or modifying the CNN.
Tasks	Adversarial Attack
Published	2018-01-26
URL	http://arxiv.org/abs/1801.08926v3
PDF	http://arxiv.org/pdf/1801.08926v3.pdf
PWC	https://paperswithcode.com/paper/deflecting-adversarial-attacks-with-pixel
Repo	https://github.com/iamaaditya/pixel-deflection
Framework	none

Rob-GAN: Generator, Discriminator, and Adversarial Attacker


Title	Rob-GAN: Generator, Discriminator, and Adversarial Attacker
Authors	Xuanqing Liu, Cho-Jui Hsieh
Abstract	We study two important concepts in adversarial deep learning—adversarial training and generative adversarial network (GAN). Adversarial training is the technique used to improve the robustness of discriminator by combining adversarial attacker and discriminator in the training phase. GAN is commonly used for image generation by jointly optimizing discriminator and generator. We show these two concepts are indeed closely related and can be used to strengthen each other—adding a generator to the adversarial training procedure can improve the robustness of discriminators, and adding an adversarial attack to GAN training can improve the convergence speed and lead to better generators. Combining these two insights, we develop a framework called Rob-GAN to jointly optimize generator and discriminator in the presence of adversarial attacks—the generator generates fake images to fool discriminator; the adversarial attacker perturbs real images to fool the discriminator, and the discriminator wants to minimize loss under fake and adversarial images. Through this end-to-end training procedure, we are able to simultaneously improve the convergence speed of GAN training, the quality of synthetic images, and the robustness of discriminator under strong adversarial attacks. Experimental results demonstrate that the obtained classifier is more robust than the state-of-the-art adversarial training approach, and the generator outperforms SN-GAN on ImageNet-143.
Tasks	Adversarial Attack, Image Generation
Published	2018-07-27
URL	http://arxiv.org/abs/1807.10454v3
PDF	http://arxiv.org/pdf/1807.10454v3.pdf
PWC	https://paperswithcode.com/paper/from-adversarial-training-to-generative
Repo	https://github.com/xuanqing94/RobGAN
Framework	pytorch

The Earth ain’t Flat: Monocular Reconstruction of Vehicles on Steep and Graded Roads from a Moving Camera


Title	The Earth ain’t Flat: Monocular Reconstruction of Vehicles on Steep and Graded Roads from a Moving Camera
Authors	Junaid Ahmed Ansari, Sarthak Sharma, Anshuman Majumdar, J. Krishna Murthy, K. Madhava Krishna
Abstract	Accurate localization of other traffic participants is a vital task in autonomous driving systems. State-of-the-art systems employ a combination of sensing modalities such as RGB cameras and LiDARs for localizing traffic participants, but most such demonstrations have been confined to plain roads. We demonstrate, to the best of our knowledge, the first results for monocular object localization and shape estimation on surfaces that do not share the same plane with the moving monocular camera. We approximate road surfaces by local planar patches and use semantic cues from vehicles in the scene to initialize a local bundle-adjustment like procedure that simultaneously estimates the pose and shape of the vehicles, and the orientation of the local ground plane on which the vehicle stands as well. We evaluate the proposed approach on the KITTI and SYNTHIA-SF benchmarks, for a variety of road plane configurations. The proposed approach significantly improves the state-of-the-art for monocular object localization on arbitrarily-shaped roads.
Tasks	Autonomous Driving, Object Localization
Published	2018-03-06
URL	http://arxiv.org/abs/1803.02057v1
PDF	http://arxiv.org/pdf/1803.02057v1.pdf
PWC	https://paperswithcode.com/paper/the-earth-aint-flat-monocular-reconstruction
Repo	https://github.com/sarthaksharma13/IROS18
Framework	none

A Multimodal LSTM for Predicting Listener Empathic Responses Over Time


Title	A Multimodal LSTM for Predicting Listener Empathic Responses Over Time
Authors	Zhi-Xuan Tan, Arushi Goel, Thanh-Son Nguyen, Desmond C. Ong
Abstract	People naturally understand the emotions of-and often also empathize with-those around them. In this paper, we predict the emotional valence of an empathic listener over time as they listen to a speaker narrating a life story. We use the dataset provided by the OMG-Empathy Prediction Challenge, a workshop held in conjunction with IEEE FG 2019. We present a multimodal LSTM model with feature-level fusion and local attention that predicts empathic responses from audio, text, and visual features. Our best-performing model, which used only the audio and text features, achieved a concordance correlation coefficient (CCC) of 0.29 and 0.32 on the Validation set for the Generalized and Personalized track respectively, and achieved a CCC of 0.14 and 0.14 on the held-out Test set. We discuss the difficulties faced and the lessons learnt tackling this challenge.
Tasks
Published	2018-12-12
URL	http://arxiv.org/abs/1812.04891v2
PDF	http://arxiv.org/pdf/1812.04891v2.pdf
PWC	https://paperswithcode.com/paper/a-multimodal-lstm-for-predicting-listener
Repo	https://github.com/desmond-ong/cheem-omg-empathy
Framework	pytorch

Variable Selection and Task Grouping for Multi-Task Learning


Title	Variable Selection and Task Grouping for Multi-Task Learning
Authors	Jun-Yong Jeong, Chi-Hyuck Jun
Abstract	We consider multi-task learning, which simultaneously learns related prediction tasks, to improve generalization performance. We factorize a coefficient matrix as the product of two matrices based on a low-rank assumption. These matrices have sparsities to simultaneously perform variable selection and learn and overlapping group structure among the tasks. The resulting bi-convex objective function is minimized by alternating optimization where sub-problems are solved using alternating direction method of multipliers and accelerated proximal gradient descent. Moreover, we provide the performance bound of the proposed method. The effectiveness of the proposed method is validated for both synthetic and real-world datasets.
Tasks	Multi-Task Learning
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04676v1
PDF	http://arxiv.org/pdf/1802.04676v1.pdf
PWC	https://paperswithcode.com/paper/variable-selection-and-task-grouping-for
Repo	https://github.com/JunYongJeong/VSTG-MTL
Framework	none

Surrogate-assisted Bayesian inversion for landscape and basin evolution models


Title	Surrogate-assisted Bayesian inversion for landscape and basin evolution models
Authors	Rohitash Chandra, Danial Azam, Arpit Kapoor, R. Dietmar Müller
Abstract	The complex and computationally expensive features of the forward landscape and sedimentary basin evolution models pose a major challenge in the development of efficient inference and optimization methods. Bayesian inference provides a methodology for estimation and uncertainty quantification of free model parameters. In our previous work, parallel tempering Bayeslands was developed as a framework for parameter estimation and uncertainty quantification for the landscape and basin evolution modelling software Badlands. Parallel tempering Bayeslands features high-performance computing with dozens of processing cores running in parallel to enhance computational efficiency. Although parallel computing is used, the procedure remains computationally challenging since thousands of samples need to be drawn and evaluated. In large-scale landscape and basin evolution problems, a single model evaluation can take from several minutes to hours, and in certain cases, even days. Surrogate-assisted optimization has been with successfully applied to a number of engineering problems. This motivates its use in optimisation and inference methods suited for complex models in geology and geophysics. Surrogates can speed up parallel tempering Bayeslands by developing computationally inexpensive surrogates to mimic expensive models. In this paper, we present an application of surrogate-assisted parallel tempering where that surrogate mimics a landscape evolution model including erosion, sediment transport and deposition, by estimating the likelihood function that is given by the model. We employ a machine learning model as a surrogate that learns from the samples generated by the parallel tempering algorithm. The results show that the methodology is effective in lowering the overall computational cost significantly while retaining the quality of solutions.
Tasks	Bayesian Inference
Published	2018-12-12
URL	https://arxiv.org/abs/1812.08655v1
PDF	https://arxiv.org/pdf/1812.08655v1.pdf
PWC	https://paperswithcode.com/paper/surrogate-assisted-bayesian-inversion-for
Repo	https://github.com/intelligentEarth/surrogate-pt-Bayeslands
Framework	tf

The unreasonable effectiveness of the forget gate


Title	The unreasonable effectiveness of the forget gate
Authors	Jos van der Westhuizen, Joan Lasenby
Abstract	Given the success of the gated recurrent unit, a natural question is whether all the gates of the long short-term memory (LSTM) network are necessary. Previous research has shown that the forget gate is one of the most important gates in the LSTM. Here we show that a forget-gate-only version of the LSTM with chrono-initialized biases, not only provides computational savings but outperforms the standard LSTM on multiple benchmark datasets and competes with some of the best contemporary models. Our proposed network, the JANET, achieves accuracies of 99% and 92.5% on the MNIST and pMNIST datasets, outperforming the standard LSTM which yields accuracies of 98.5% and 91%.
Tasks
Published	2018-04-13
URL	http://arxiv.org/abs/1804.04849v3
PDF	http://arxiv.org/pdf/1804.04849v3.pdf
PWC	https://paperswithcode.com/paper/the-unreasonable-effectiveness-of-the-forget
Repo	https://github.com/JosvanderWesthuizen/janet
Framework	tf

Photo Wake-Up: 3D Character Animation from a Single Photo


Title	Photo Wake-Up: 3D Character Animation from a Single Photo
Authors	Chung-Yi Weng, Brian Curless, Ira Kemelmacher-Shlizerman
Abstract	We present a method and application for animating a human subject from a single photo. E.g., the character can walk out, run, sit, or jump in 3D. The key contributions of this paper are: 1) an application of viewing and animating humans in single photos in 3D, 2) a novel 2D warping method to deform a posable template body model to fit the person’s complex silhouette to create an animatable mesh, and 3) a method for handling partial self occlusions. We compare to state-of-the-art related methods and evaluate results with human studies. Further, we present an interactive interface that allows re-posing the person in 3D, and an augmented reality setup where the animated 3D person can emerge from the photo into the real world. We demonstrate the method on photos, posters, and art.
Tasks	3D Character Animation From A Single Photo
Published	2018-12-05
URL	http://arxiv.org/abs/1812.02246v1
PDF	http://arxiv.org/pdf/1812.02246v1.pdf
PWC	https://paperswithcode.com/paper/photo-wake-up-3d-character-animation-from-a
Repo	https://github.com/mplatnic/Deep-Learning
Framework	tf