Paper Group AWR 112
Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection. Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly. Policy Optimization via Importance Sampling. Generic Model-Agnostic Convolutional Neural Network for Single Image Dehazing. ShakeDrop Regularization for Deep Resi …
Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection
Title | Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection |
Authors | Alexander Wong, Mohammad Javad Shafiee, Francis Li, Brendan Chwyl |
Abstract | Object detection is a major challenge in computer vision, involving both object classification and object localization within a scene. While deep neural networks have been shown in recent years to yield very powerful techniques for tackling the challenge of object detection, one of the biggest challenges with enabling such object detection networks for widespread deployment on embedded devices is high computational and memory requirements. Recently, there has been an increasing focus in exploring small deep neural network architectures for object detection that are more suitable for embedded devices, such as Tiny YOLO and SqueezeDet. Inspired by the efficiency of the Fire microarchitecture introduced in SqueezeNet and the object detection performance of the single-shot detection macroarchitecture introduced in SSD, this paper introduces Tiny SSD, a single-shot detection deep convolutional neural network for real-time embedded object detection that is composed of a highly optimized, non-uniform Fire sub-network stack and a non-uniform sub-network stack of highly optimized SSD-based auxiliary convolutional feature layers designed specifically to minimize model size while maintaining object detection performance. The resulting Tiny SSD possess a model size of 2.3MB (~26X smaller than Tiny YOLO) while still achieving an mAP of 61.3% on VOC 2007 (~4.2% higher than Tiny YOLO). These experimental results show that very small deep neural network architectures can be designed for real-time object detection that are well-suited for embedded scenarios. |
Tasks | Object Classification, Object Detection, Object Localization, Real-Time Object Detection |
Published | 2018-02-19 |
URL | http://arxiv.org/abs/1802.06488v1 |
http://arxiv.org/pdf/1802.06488v1.pdf | |
PWC | https://paperswithcode.com/paper/tiny-ssd-a-tiny-single-shot-detection-deep |
Repo | https://github.com/lampsonSong/tinySSD |
Framework | caffe2 |
Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly
Title | Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly |
Authors | Michael Opitz, Georg Waltner, Horst Possegger, Horst Bischof |
Abstract | Learning similarity functions between image pairs with deep neural networks yields highly correlated activations of embeddings. In this work, we show how to improve the robustness of such embeddings by exploiting the independence within ensembles. To this end, we divide the last embedding layer of a deep network into an embedding ensemble and formulate training this ensemble as an online gradient boosting problem. Each learner receives a reweighted training sample from the previous learners. Further, we propose two loss functions which increase the diversity in our ensemble. These loss functions can be applied either for weight initialization or during training. Together, our contributions leverage large embedding sizes more effectively by significantly reducing correlation of the embedding and consequently increase retrieval accuracy of the embedding. Our method works with any differentiable loss function and does not introduce any additional parameters during test time. We evaluate our metric learning method on image retrieval tasks and show that it improves over state-of-the-art methods on the CUB 200-2011, Cars-196, Stanford Online Products, In-Shop Clothes Retrieval and VehicleID datasets. |
Tasks | Image Retrieval, Metric Learning |
Published | 2018-01-15 |
URL | http://arxiv.org/abs/1801.04815v1 |
http://arxiv.org/pdf/1801.04815v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-metric-learning-with-bier-boosting |
Repo | https://github.com/mop/bier |
Framework | tf |
Policy Optimization via Importance Sampling
Title | Policy Optimization via Importance Sampling |
Authors | Alberto Maria Metelli, Matteo Papini, Francesco Faccio, Marcello Restelli |
Abstract | Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating online and offline optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial, as it requires to account for the variance of the objective function estimate. In this paper, we propose a novel, model-free, policy search algorithm, POIS, applicable in both action-based and parameter-based settings. We first derive a high-confidence bound for importance sampling estimation; then we define a surrogate objective function, which is optimized offline whenever a new batch of trajectories is collected. Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with state-of-the-art policy optimization methods. |
Tasks | Continuous Control |
Published | 2018-09-17 |
URL | http://arxiv.org/abs/1809.06098v2 |
http://arxiv.org/pdf/1809.06098v2.pdf | |
PWC | https://paperswithcode.com/paper/policy-optimization-via-importance-sampling |
Repo | https://github.com/T3p/pois |
Framework | tf |
Generic Model-Agnostic Convolutional Neural Network for Single Image Dehazing
Title | Generic Model-Agnostic Convolutional Neural Network for Single Image Dehazing |
Authors | Zheng Liu, Botao Xiao, Muhammad Alrabeiah, Keyan Wang, Jun Chen |
Abstract | Haze and smog are among the most common environmental factors impacting image quality and, therefore, image analysis. This paper proposes an end-to-end generative method for image dehazing. It is based on designing a fully convolutional neural network to recognize haze structures in input images and restore clear, haze-free images. The proposed method is agnostic in the sense that it does not explore the atmosphere scattering model. Somewhat surprisingly, it achieves superior performance relative to all existing state-of-the-art methods for image dehazing even on SOTS outdoor images, which are synthesized using the atmosphere scattering model. Project detail and code can be found here: https://github.com/Seanforfun/GMAN_Net_Haze_Removal |
Tasks | Image Dehazing, Single Image Dehazing |
Published | 2018-10-05 |
URL | https://arxiv.org/abs/1810.02862v2 |
https://arxiv.org/pdf/1810.02862v2.pdf | |
PWC | https://paperswithcode.com/paper/generic-model-agnostic-convolutional-neural |
Repo | https://github.com/Seanforfun/GMAN_Net_Haze_Removal |
Framework | tf |
ShakeDrop Regularization for Deep Residual Learning
Title | ShakeDrop Regularization for Deep Residual Learning |
Authors | Yoshihiro Yamada, Masakazu Iwamura, Takuya Akiba, Koichi Kise |
Abstract | Overfitting is a crucial problem in deep neural networks, even in the latest network architectures. In this paper, to relieve the overfitting effect of ResNet and its improvements (i.e., Wide ResNet, PyramidNet, and ResNeXt), we propose a new regularization method called ShakeDrop regularization. ShakeDrop is inspired by Shake-Shake, which is an effective regularization method, but can be applied to ResNeXt only. ShakeDrop is more effective than Shake-Shake and can be applied not only to ResNeXt but also ResNet, Wide ResNet, and PyramidNet. An important key is to achieve stability of training. Because effective regularization often causes unstable training, we introduce a training stabilizer, which is an unusual use of an existing regularizer. Through experiments under various conditions, we demonstrate the conditions under which ShakeDrop works well. |
Tasks | |
Published | 2018-02-07 |
URL | https://arxiv.org/abs/1802.02375v3 |
https://arxiv.org/pdf/1802.02375v3.pdf | |
PWC | https://paperswithcode.com/paper/shakedrop-regularization-for-deep-residual |
Repo | https://github.com/imenurok/ShakeDrop |
Framework | pytorch |
Adversarial Attack on Graph Structured Data
Title | Adversarial Attack on Graph Structured Data |
Authors | Hanjun Dai, Hui Li, Tian Tian, Xin Huang, Lin Wang, Jun Zhu, Le Song |
Abstract | Deep learning on graph structures has shown exciting results in various applications. However, few attentions have been paid to the robustness of such models, in contrast to numerous research work for image or text adversarial attack and defense. In this paper, we focus on the adversarial attacks that fool the model by modifying the combinatorial structure of data. We first propose a reinforcement learning based attack method that learns the generalizable attack policy, while only requiring prediction labels from the target classifier. Also, variants of genetic algorithms and gradient methods are presented in the scenario where prediction confidence or gradients are available. We use both synthetic and real-world data to show that, a family of Graph Neural Network models are vulnerable to these attacks, in both graph-level and node-level classification tasks. We also show such attacks can be used to diagnose the learned classifiers. |
Tasks | Adversarial Attack |
Published | 2018-06-06 |
URL | http://arxiv.org/abs/1806.02371v1 |
http://arxiv.org/pdf/1806.02371v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-attack-on-graph-structured-data |
Repo | https://github.com/Hanjun-Dai/graph_adversarial_attack |
Framework | pytorch |
Adversarial Complementary Learning for Weakly Supervised Object Localization
Title | Adversarial Complementary Learning for Weakly Supervised Object Localization |
Authors | Xiaolin Zhang, Yunchao Wei, Jiashi Feng, Yi Yang, Thomas Huang |
Abstract | In this work, we propose Adversarial Complementary Learning (ACoL) to automatically localize integral objects of semantic interest with weak supervision. We first mathematically prove that class localization maps can be obtained by directly selecting the class-specific feature maps of the last convolutional layer, which paves a simple way to identify object regions. We then present a simple network architecture including two parallel-classifiers for object localization. Specifically, we leverage one classification branch to dynamically localize some discriminative object regions during the forward pass. Although it is usually responsive to sparse parts of the target objects, this classifier can drive the counterpart classifier to discover new and complementary object regions by erasing its discovered regions from the feature maps. With such an adversarial learning, the two parallel-classifiers are forced to leverage complementary object regions for classification and can finally generate integral object localization together. The merits of ACoL are mainly two-fold: 1) it can be trained in an end-to-end manner; 2) dynamically erasing enables the counterpart classifier to discover complementary object regions more effectively. We demonstrate the superiority of our ACoL approach in a variety of experiments. In particular, the Top-1 localization error rate on the ILSVRC dataset is 45.14%, which is the new state-of-the-art. |
Tasks | Object Localization, Weakly-Supervised Object Localization |
Published | 2018-04-19 |
URL | http://arxiv.org/abs/1804.06962v1 |
http://arxiv.org/pdf/1804.06962v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-complementary-learning-for-weakly |
Repo | https://github.com/Hayashi-Yudai/ML_models |
Framework | tf |
Deflecting Adversarial Attacks with Pixel Deflection
Title | Deflecting Adversarial Attacks with Pixel Deflection |
Authors | Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, James Storer |
Abstract | CNNs are poised to become integral parts of many critical systems. Despite their robustness to natural variations, image pixel values can be manipulated, via small, carefully crafted, imperceptible perturbations, to cause a model to misclassify images. We present an algorithm to process an image so that classification accuracy is significantly preserved in the presence of such adversarial manipulations. Image classifiers tend to be robust to natural noise, and adversarial attacks tend to be agnostic to object location. These observations motivate our strategy, which leverages model robustness to defend against adversarial perturbations by forcing the image to match natural image statistics. Our algorithm locally corrupts the image by redistributing pixel values via a process we term pixel deflection. A subsequent wavelet-based denoising operation softens this corruption, as well as some of the adversarial changes. We demonstrate experimentally that the combination of these techniques enables the effective recovery of the true class, against a variety of robust attacks. Our results compare favorably with current state-of-the-art defenses, without requiring retraining or modifying the CNN. |
Tasks | Adversarial Attack |
Published | 2018-01-26 |
URL | http://arxiv.org/abs/1801.08926v3 |
http://arxiv.org/pdf/1801.08926v3.pdf | |
PWC | https://paperswithcode.com/paper/deflecting-adversarial-attacks-with-pixel |
Repo | https://github.com/iamaaditya/pixel-deflection |
Framework | none |
Rob-GAN: Generator, Discriminator, and Adversarial Attacker
Title | Rob-GAN: Generator, Discriminator, and Adversarial Attacker |
Authors | Xuanqing Liu, Cho-Jui Hsieh |
Abstract | We study two important concepts in adversarial deep learning—adversarial training and generative adversarial network (GAN). Adversarial training is the technique used to improve the robustness of discriminator by combining adversarial attacker and discriminator in the training phase. GAN is commonly used for image generation by jointly optimizing discriminator and generator. We show these two concepts are indeed closely related and can be used to strengthen each other—adding a generator to the adversarial training procedure can improve the robustness of discriminators, and adding an adversarial attack to GAN training can improve the convergence speed and lead to better generators. Combining these two insights, we develop a framework called Rob-GAN to jointly optimize generator and discriminator in the presence of adversarial attacks—the generator generates fake images to fool discriminator; the adversarial attacker perturbs real images to fool the discriminator, and the discriminator wants to minimize loss under fake and adversarial images. Through this end-to-end training procedure, we are able to simultaneously improve the convergence speed of GAN training, the quality of synthetic images, and the robustness of discriminator under strong adversarial attacks. Experimental results demonstrate that the obtained classifier is more robust than the state-of-the-art adversarial training approach, and the generator outperforms SN-GAN on ImageNet-143. |
Tasks | Adversarial Attack, Image Generation |
Published | 2018-07-27 |
URL | http://arxiv.org/abs/1807.10454v3 |
http://arxiv.org/pdf/1807.10454v3.pdf | |
PWC | https://paperswithcode.com/paper/from-adversarial-training-to-generative |
Repo | https://github.com/xuanqing94/RobGAN |
Framework | pytorch |
The Earth ain’t Flat: Monocular Reconstruction of Vehicles on Steep and Graded Roads from a Moving Camera
Title | The Earth ain’t Flat: Monocular Reconstruction of Vehicles on Steep and Graded Roads from a Moving Camera |
Authors | Junaid Ahmed Ansari, Sarthak Sharma, Anshuman Majumdar, J. Krishna Murthy, K. Madhava Krishna |
Abstract | Accurate localization of other traffic participants is a vital task in autonomous driving systems. State-of-the-art systems employ a combination of sensing modalities such as RGB cameras and LiDARs for localizing traffic participants, but most such demonstrations have been confined to plain roads. We demonstrate, to the best of our knowledge, the first results for monocular object localization and shape estimation on surfaces that do not share the same plane with the moving monocular camera. We approximate road surfaces by local planar patches and use semantic cues from vehicles in the scene to initialize a local bundle-adjustment like procedure that simultaneously estimates the pose and shape of the vehicles, and the orientation of the local ground plane on which the vehicle stands as well. We evaluate the proposed approach on the KITTI and SYNTHIA-SF benchmarks, for a variety of road plane configurations. The proposed approach significantly improves the state-of-the-art for monocular object localization on arbitrarily-shaped roads. |
Tasks | Autonomous Driving, Object Localization |
Published | 2018-03-06 |
URL | http://arxiv.org/abs/1803.02057v1 |
http://arxiv.org/pdf/1803.02057v1.pdf | |
PWC | https://paperswithcode.com/paper/the-earth-aint-flat-monocular-reconstruction |
Repo | https://github.com/sarthaksharma13/IROS18 |
Framework | none |
A Multimodal LSTM for Predicting Listener Empathic Responses Over Time
Title | A Multimodal LSTM for Predicting Listener Empathic Responses Over Time |
Authors | Zhi-Xuan Tan, Arushi Goel, Thanh-Son Nguyen, Desmond C. Ong |
Abstract | People naturally understand the emotions of-and often also empathize with-those around them. In this paper, we predict the emotional valence of an empathic listener over time as they listen to a speaker narrating a life story. We use the dataset provided by the OMG-Empathy Prediction Challenge, a workshop held in conjunction with IEEE FG 2019. We present a multimodal LSTM model with feature-level fusion and local attention that predicts empathic responses from audio, text, and visual features. Our best-performing model, which used only the audio and text features, achieved a concordance correlation coefficient (CCC) of 0.29 and 0.32 on the Validation set for the Generalized and Personalized track respectively, and achieved a CCC of 0.14 and 0.14 on the held-out Test set. We discuss the difficulties faced and the lessons learnt tackling this challenge. |
Tasks | |
Published | 2018-12-12 |
URL | http://arxiv.org/abs/1812.04891v2 |
http://arxiv.org/pdf/1812.04891v2.pdf | |
PWC | https://paperswithcode.com/paper/a-multimodal-lstm-for-predicting-listener |
Repo | https://github.com/desmond-ong/cheem-omg-empathy |
Framework | pytorch |
Variable Selection and Task Grouping for Multi-Task Learning
Title | Variable Selection and Task Grouping for Multi-Task Learning |
Authors | Jun-Yong Jeong, Chi-Hyuck Jun |
Abstract | We consider multi-task learning, which simultaneously learns related prediction tasks, to improve generalization performance. We factorize a coefficient matrix as the product of two matrices based on a low-rank assumption. These matrices have sparsities to simultaneously perform variable selection and learn and overlapping group structure among the tasks. The resulting bi-convex objective function is minimized by alternating optimization where sub-problems are solved using alternating direction method of multipliers and accelerated proximal gradient descent. Moreover, we provide the performance bound of the proposed method. The effectiveness of the proposed method is validated for both synthetic and real-world datasets. |
Tasks | Multi-Task Learning |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04676v1 |
http://arxiv.org/pdf/1802.04676v1.pdf | |
PWC | https://paperswithcode.com/paper/variable-selection-and-task-grouping-for |
Repo | https://github.com/JunYongJeong/VSTG-MTL |
Framework | none |
Surrogate-assisted Bayesian inversion for landscape and basin evolution models
Title | Surrogate-assisted Bayesian inversion for landscape and basin evolution models |
Authors | Rohitash Chandra, Danial Azam, Arpit Kapoor, R. Dietmar Müller |
Abstract | The complex and computationally expensive features of the forward landscape and sedimentary basin evolution models pose a major challenge in the development of efficient inference and optimization methods. Bayesian inference provides a methodology for estimation and uncertainty quantification of free model parameters. In our previous work, parallel tempering Bayeslands was developed as a framework for parameter estimation and uncertainty quantification for the landscape and basin evolution modelling software Badlands. Parallel tempering Bayeslands features high-performance computing with dozens of processing cores running in parallel to enhance computational efficiency. Although parallel computing is used, the procedure remains computationally challenging since thousands of samples need to be drawn and evaluated. In large-scale landscape and basin evolution problems, a single model evaluation can take from several minutes to hours, and in certain cases, even days. Surrogate-assisted optimization has been with successfully applied to a number of engineering problems. This motivates its use in optimisation and inference methods suited for complex models in geology and geophysics. Surrogates can speed up parallel tempering Bayeslands by developing computationally inexpensive surrogates to mimic expensive models. In this paper, we present an application of surrogate-assisted parallel tempering where that surrogate mimics a landscape evolution model including erosion, sediment transport and deposition, by estimating the likelihood function that is given by the model. We employ a machine learning model as a surrogate that learns from the samples generated by the parallel tempering algorithm. The results show that the methodology is effective in lowering the overall computational cost significantly while retaining the quality of solutions. |
Tasks | Bayesian Inference |
Published | 2018-12-12 |
URL | https://arxiv.org/abs/1812.08655v1 |
https://arxiv.org/pdf/1812.08655v1.pdf | |
PWC | https://paperswithcode.com/paper/surrogate-assisted-bayesian-inversion-for |
Repo | https://github.com/intelligentEarth/surrogate-pt-Bayeslands |
Framework | tf |
The unreasonable effectiveness of the forget gate
Title | The unreasonable effectiveness of the forget gate |
Authors | Jos van der Westhuizen, Joan Lasenby |
Abstract | Given the success of the gated recurrent unit, a natural question is whether all the gates of the long short-term memory (LSTM) network are necessary. Previous research has shown that the forget gate is one of the most important gates in the LSTM. Here we show that a forget-gate-only version of the LSTM with chrono-initialized biases, not only provides computational savings but outperforms the standard LSTM on multiple benchmark datasets and competes with some of the best contemporary models. Our proposed network, the JANET, achieves accuracies of 99% and 92.5% on the MNIST and pMNIST datasets, outperforming the standard LSTM which yields accuracies of 98.5% and 91%. |
Tasks | |
Published | 2018-04-13 |
URL | http://arxiv.org/abs/1804.04849v3 |
http://arxiv.org/pdf/1804.04849v3.pdf | |
PWC | https://paperswithcode.com/paper/the-unreasonable-effectiveness-of-the-forget |
Repo | https://github.com/JosvanderWesthuizen/janet |
Framework | tf |
Photo Wake-Up: 3D Character Animation from a Single Photo
Title | Photo Wake-Up: 3D Character Animation from a Single Photo |
Authors | Chung-Yi Weng, Brian Curless, Ira Kemelmacher-Shlizerman |
Abstract | We present a method and application for animating a human subject from a single photo. E.g., the character can walk out, run, sit, or jump in 3D. The key contributions of this paper are: 1) an application of viewing and animating humans in single photos in 3D, 2) a novel 2D warping method to deform a posable template body model to fit the person’s complex silhouette to create an animatable mesh, and 3) a method for handling partial self occlusions. We compare to state-of-the-art related methods and evaluate results with human studies. Further, we present an interactive interface that allows re-posing the person in 3D, and an augmented reality setup where the animated 3D person can emerge from the photo into the real world. We demonstrate the method on photos, posters, and art. |
Tasks | 3D Character Animation From A Single Photo |
Published | 2018-12-05 |
URL | http://arxiv.org/abs/1812.02246v1 |
http://arxiv.org/pdf/1812.02246v1.pdf | |
PWC | https://paperswithcode.com/paper/photo-wake-up-3d-character-animation-from-a |
Repo | https://github.com/mplatnic/Deep-Learning |
Framework | tf |