October 19, 2019

3174 words 15 mins read

Paper Group ANR 227

Robustness Analysis of Pedestrian Detectors for Surveillance. Kid on The Phone! Toward Automatic Detection of Children on Mobile Devices. Synergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC. Threshold Auto-Tuning Metric Learning. Adversarial Semantic Alignment for Improved Image Captions. Transferable neural networks f …

Robustness Analysis of Pedestrian Detectors for Surveillance


Title	Robustness Analysis of Pedestrian Detectors for Surveillance
Authors	Yuming Fang, Guanqun Ding, Yuan Yuan, Weisi Lin, Haiwen Liu
Abstract	To obtain effective pedestrian detection results in surveillance video, there have been many methods proposed to handle the problems from severe occlusion, pose variation, clutter background, \emph{etc}. Besides detection accuracy, a robust surveillance video system should be stable to video quality degradation by network transmission, environment variation, etc. In this study, we conduct the research on the robustness of pedestrian detection algorithms to video quality degradation. The main contribution of this work includes the following three aspects. First, a large-scale Distorted Surveillance Video Data Set (DSurVD) is constructed from high-quality video sequences and their corresponding distorted versions. Second, we design a method to evaluate detection stability and a robustness measure called Robustness Quadrangle, which can be adopted to visualize detection accuracy of pedestrian detection algorithms on high-quality video sequences and stability with video quality degradation. Third, the robustness of seven existing pedestrian detection algorithms is evaluated by the built DSurVD. Experimental results show that the robustness can be further improved for existing pedestrian detection algorithms. Additionally, we provide much in-depth discussion on how different distortion types influence the performance of pedestrian detection algorithms, which is important to design effective pedestrian detection algorithms for surveillance. The DSurVD data set can be download from BaiduYunDisk, https://pan.baidu.com/s/1I9Kqj8rmubOYu7bkBfkUpA, Password: lqmc
Tasks	Pedestrian Detection
Published	2018-07-12
URL	http://arxiv.org/abs/1807.04562v2
PDF	http://arxiv.org/pdf/1807.04562v2.pdf
PWC	https://paperswithcode.com/paper/robustness-analysis-of-pedestrian-detectors
Repo
Framework

Kid on The Phone! Toward Automatic Detection of Children on Mobile Devices


Title	Kid on The Phone! Toward Automatic Detection of Children on Mobile Devices
Authors	Toan Nguyen, Aditi Roy, Nasir Memon
Abstract	Studies have shown that children can be exposed to smart devices at a very early age. This has important implications on research in children-computer interaction, children online safety and early education. Many systems have been built based on such research. In this work, we present multiple techniques to automatically detect the presence of a child on a smart device, which could be used as the first step on such systems. Our methods distinguish children from adults based on behavioral differences while operating a touch-enabled modern computing device. Behavioral differences are extracted from data recorded by the touchscreen and built-in sensors. To evaluate the effectiveness of the proposed methods, a new data set has been created from 50 children and adults who interacted with off-the-shelf applications on smart phones. Results show that it is possible to achieve 99% accuracy and less than 0.5% error rate after 8 consecutive touch gestures using only touch information or 5 seconds of sensor reading. If information is used from multiple sensors, then only after 3 gestures, similar performance could be achieved.
Tasks
Published	2018-08-05
URL	http://arxiv.org/abs/1808.01680v1
PDF	http://arxiv.org/pdf/1808.01680v1.pdf
PWC	https://paperswithcode.com/paper/kid-on-the-phone-toward-automatic-detection
Repo
Framework

Synergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC


Title	Synergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC
Authors	Guanwen Zhong, Akshat Dubey, Tan Cheng, Tulika Mitra
Abstract	Convolutional Neural Networks (CNN) have been widely deployed in diverse application domains. There has been significant progress in accelerating both their training and inference using high-performance GPUs, FPGAs, and custom ASICs for datacenter-scale environments. The recent proliferation of mobile and IoT devices have necessitated real-time, energy-efficient deep neural network inference on embedded-class, resource-constrained platforms. In this context, we present {\em Synergy}, an automated, hardware-software co-designed, pipelined, high-throughput CNN inference framework on embedded heterogeneous system-on-chip (SoC) architectures (Xilinx Zynq). {\em Synergy} leverages, through multi-threading, all the available on-chip resources, which includes the dual-core ARM processor along with the FPGA and the NEON SIMD engines as accelerators. Moreover, {\em Synergy} provides a unified abstraction of the heterogeneous accelerators (FPGA and NEON) and can adapt to different network configurations at runtime without changing the underlying hardware accelerator architecture by balancing workload across accelerators through work-stealing. {\em Synergy} achieves 7.3X speedup, averaged across seven CNN models, over a well-optimized software-only solution. {\em Synergy} demonstrates substantially better throughput and energy-efficiency compared to the contemporary CNN implementations on the same SoC architecture.
Tasks
Published	2018-03-28
URL	http://arxiv.org/abs/1804.00706v1
PDF	http://arxiv.org/pdf/1804.00706v1.pdf
PWC	https://paperswithcode.com/paper/synergy-a-hwsw-framework-for-high-throughput
Repo
Framework

Threshold Auto-Tuning Metric Learning


Title	Threshold Auto-Tuning Metric Learning
Authors	Yuya Onuma, Rachelle Rivero, Tsuyoshi Kato
Abstract	It has been reported repeatedly that discriminative learning of distance metric boosts the pattern recognition performance. A weak point of ITML-based methods is that the distance threshold for similarity/dissimilarity constraints must be determined manually and it is sensitive to generalization performance, although the ITML-based methods enjoy an advantage that the Bregman projection framework can be applied for optimization of distance metric. In this paper, we present a new formulation of metric learning algorithm in which the distance threshold is optimized together. Since the optimization is still in the Bregman projection framework, the Dykstra algorithm can be applied for optimization. A nonlinear equation has to be solved to project the solution onto a half-space in each iteration. Na"{i}ve method takes $O(LMn^{3})$ computational time to solve the nonlinear equation. In this study, an efficient technique that can solve the nonlinear equation in $O(Mn^{3})$ has been discovered. We have proved that the root exists and is unique. We empirically show that the accuracy of pattern recognition for the proposed metric learning algorithm is comparable to the existing metric learning methods, yet the distance threshold is automatically tuned for the proposed metric learning algorithm.
Tasks	Metric Learning
Published	2018-01-07
URL	http://arxiv.org/abs/1801.02125v2
PDF	http://arxiv.org/pdf/1801.02125v2.pdf
PWC	https://paperswithcode.com/paper/threshold-auto-tuning-metric-learning
Repo
Framework

Adversarial Semantic Alignment for Improved Image Captions


Title	Adversarial Semantic Alignment for Improved Image Captions
Authors	Pierre L. Dognin, Igor Melnyk, Youssef Mroueh, Jarret Ross, Tom Sercu
Abstract	In this paper we study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. We empirically focus on the viability of two training methods: Self-critical Sequence Training (SCST) and Gumbel Straight-Through (ST) and demonstrate that SCST shows more stable gradient behavior and improved results over Gumbel ST, even without accessing discriminator gradients directly. We also address the problem of automatic evaluation for captioning models and introduce a new semantic score, and show its correlation to human judgement. As an evaluation paradigm, we argue that an important criterion for a captioner is the ability to generalize to compositions of objects that do not usually co-occur together. To this end, we introduce a small captioned Out of Context (OOC) test set. The OOC set, combined with our semantic score, are the proposed new diagnosis tools for the captioning community. When evaluated on OOC and MS-COCO benchmarks, we show that SCST-based training has a strong performance in both semantic score and human evaluation, promising to be a valuable new approach for efficient discrete GAN training.
Tasks	Image Captioning
Published	2018-04-30
URL	https://arxiv.org/abs/1805.00063v3
PDF	https://arxiv.org/pdf/1805.00063v3.pdf
PWC	https://paperswithcode.com/paper/improved-image-captioning-with-adversarial
Repo
Framework

Transferable neural networks for enhanced sampling of protein dynamics


Title	Transferable neural networks for enhanced sampling of protein dynamics
Authors	Mohammad M. Sultan, Hannah K. Wayment-Steele, Vijay S. Pande
Abstract	Variational auto-encoder frameworks have demonstrated success in reducing complex nonlinear dynamics in molecular simulation to a single non-linear embedding. In this work, we illustrate how this non-linear latent embedding can be used as a collective variable for enhanced sampling, and present a simple modification that allows us to rapidly perform sampling in multiple related systems. We first demonstrate our method is able to describe the effects of force field changes in capped alanine dipeptide after learning a model using AMBER99. We further provide a simple extension to variational dynamics encoders that allows the model to be trained in a more efficient manner on larger systems by encoding the outputs of a linear transformation using time-structure based independent component analysis (tICA). Using this technique, we show how such a model trained for one protein, the WW domain, can efficiently be transferred to perform enhanced sampling on a related mutant protein, the GTT mutation. This method shows promise for its ability to rapidly sample related systems using a single transferable collective variable and is generally applicable to sets of related simulations, enabling us to probe the effects of variation in increasingly large systems of biophysical interest.
Tasks
Published	2018-01-02
URL	http://arxiv.org/abs/1801.00636v1
PDF	http://arxiv.org/pdf/1801.00636v1.pdf
PWC	https://paperswithcode.com/paper/transferable-neural-networks-for-enhanced
Repo
Framework

Image Reconstruction via Variational Network for Real-Time Hand-Held Sound-Speed Imaging


Title	Image Reconstruction via Variational Network for Real-Time Hand-Held Sound-Speed Imaging
Authors	Valery Vishnevskiy, Sergio J Sanabria, Orcun Goksel
Abstract	Speed-of-sound is a biomechanical property for quantitative tissue differentiation, with great potential as a new ultrasound-based image modality. A conventional ultrasound array transducer can be used together with an acoustic mirror, or so-called reflector, to reconstruct sound-speed images from time-of-flight measurements to the reflector collected between transducer element pairs, which constitutes a challenging problem of limited-angle computed tomography. For this problem, we herein present a variational network based image reconstruction architecture that is based on optimization loop unrolling, and provide an efficient training protocol of this network architecture on fully synthetic inclusion data. Our results indicate that the learned model presents good generalization ability, being able to reconstruct images with significantly different statistics compared to the training set. Complex inclusion geometries were shown to be successfully reconstructed, also improving over the prior-art by 23% in reconstruction error and by 10% in contrast on synthetic data. In a phantom study, we demonstrated the detection of multiple inclusions that were not distinguishable by prior-art reconstruction, meanwhile improving the contrast by 27% for a stiff inclusion and by 219% for a soft inclusion. Our reconstruction algorithm takes approximately 10ms, enabling its use as a real-time imaging method on an ultrasound machine, for which we are demonstrating an example preliminary setup herein.
Tasks	Image Reconstruction
Published	2018-07-19
URL	http://arxiv.org/abs/1807.07416v1
PDF	http://arxiv.org/pdf/1807.07416v1.pdf
PWC	https://paperswithcode.com/paper/image-reconstruction-via-variational-network
Repo
Framework

VirtualIdentity: Privacy-Preserving User Profiling


Title	VirtualIdentity: Privacy-Preserving User Profiling
Authors	Sisi Wang, Wing-Sea Poon, Golnoosh Farnadi, Caleb Horst, Kebra Thompson, Michael Nickels, Rafael Dowsley, Anderson C. A. Nascimento, Martine De Cock
Abstract	User profiling from user generated content (UGC) is a common practice that supports the business models of many social media companies. Existing systems require that the UGC is fully exposed to the module that constructs the user profiles. In this paper we show that it is possible to build user profiles without ever accessing the user’s original data, and without exposing the trained machine learning models for user profiling – which are the intellectual property of the company – to the users of the social media site. We present VirtualIdentity, an application that uses secure multi-party cryptographic protocols to detect the age, gender and personality traits of users by classifying their user-generated text and personal pictures with trained support vector machine models in a privacy-preserving manner.
Tasks
Published	2018-08-30
URL	http://arxiv.org/abs/1808.10151v1
PDF	http://arxiv.org/pdf/1808.10151v1.pdf
PWC	https://paperswithcode.com/paper/virtualidentity-privacy-preserving-user
Repo
Framework

String Methods for Stochastic Image and Shape Matching


Title	String Methods for Stochastic Image and Shape Matching
Authors	Alexis Arnaudon, Darryl Holm, Stefan Sommer
Abstract	Matching of images and analysis of shape differences is traditionally pursued by energy minimization of paths of deformations acting to match the shape objects. In the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework, iterative gradient descents on the matching functional lead to matching algorithms informally known as Beg algorithms. When stochasticity is introduced to model stochastic variability of shapes and to provide more realistic models of observed shape data, the corresponding matching problem can be solved with a stochastic Beg algorithm, similar to the finite temperature string method used in rare event sampling. In this paper, we apply a stochastic model compatible with the geometry of the LDDMM framework to obtain a stochastic model of images and we derive the stochastic version of the Beg algorithm which we compare with the string method and an expectation-maximization optimization of posterior likelihoods. The algorithm and its use for statistical inference is tested on stochastic LDDMM landmarks and images.
Tasks
Published	2018-05-15
URL	http://arxiv.org/abs/1805.06038v3
PDF	http://arxiv.org/pdf/1805.06038v3.pdf
PWC	https://paperswithcode.com/paper/string-methods-for-stochastic-image-and-shape
Repo
Framework

Hardening Deep Neural Networks via Adversarial Model Cascades


Title	Hardening Deep Neural Networks via Adversarial Model Cascades
Authors	Deepak Vijaykeerthy, Anshuman Suri, Sameep Mehta, Ponnurangam Kumaraguru
Abstract	Deep neural networks (DNNs) are vulnerable to malicious inputs crafted by an adversary to produce erroneous outputs. Works on securing neural networks against adversarial examples achieve high empirical robustness on simple datasets such as MNIST. However, these techniques are inadequate when empirically tested on complex data sets such as CIFAR-10 and SVHN. Further, existing techniques are designed to target specific attacks and fail to generalize across attacks. We propose the Adversarial Model Cascades (AMC) as a way to tackle the above inadequacies. Our approach trains a cascade of models sequentially where each model is optimized to be robust towards a mixture of multiple attacks. Ultimately, it yields a single model which is secure against a wide range of attacks; namely FGSM, Elastic, Virtual Adversarial Perturbations and Madry. On an average, AMC increases the model’s empirical robustness against various attacks simultaneously, by a significant margin (of 6.225% for MNIST, 5.075% for SVHN and 2.65% for CIFAR10). At the same time, the model’s performance on non-adversarial inputs is comparable to the state-of-the-art models.
Tasks
Published	2018-02-02
URL	http://arxiv.org/abs/1802.01448v4
PDF	http://arxiv.org/pdf/1802.01448v4.pdf
PWC	https://paperswithcode.com/paper/hardening-deep-neural-networks-via
Repo
Framework

Acquire, Augment, Segment & Enjoy: Weakly Supervised Instance Segmentation of Supermarket Products


Title	Acquire, Augment, Segment & Enjoy: Weakly Supervised Instance Segmentation of Supermarket Products
Authors	Patrick Follmann, Bertram Drost, Tobias Böttger
Abstract	Grocery stores have thousands of products that are usually identified using barcodes with a human in the loop. For automated checkout systems, it is necessary to count and classify the groceries efficiently and robustly. One possibility is to use a deep learning algorithm for instance-aware semantic segmentation. Such methods achieve high accuracies but require a large amount of annotated training data. We propose a system to generate the training annotations in a weakly supervised manner, drastically reducing the labeling effort. We assume that for each training image, only the object class is known. The system automatically segments the corresponding object from the background. The obtained training data is augmented to simulate variations similar to those seen in real-world setups.
Tasks	Instance Segmentation, Semantic Segmentation, Weakly-supervised instance segmentation
Published	2018-07-05
URL	http://arxiv.org/abs/1807.02001v2
PDF	http://arxiv.org/pdf/1807.02001v2.pdf
PWC	https://paperswithcode.com/paper/acquire-augment-segment-enjoy-weakly
Repo
Framework

Semi-Autoregressive Neural Machine Translation


Title	Semi-Autoregressive Neural Machine Translation
Authors	Chunqi Wang, Ji Zhang, Haiqing Chen
Abstract	Existing approaches to neural machine translation are typically autoregressive models. While these models attain state-of-the-art translation quality, they are suffering from low parallelizability and thus slow at decoding long sequences. In this paper, we propose a novel model for fast sequence generation — the semi-autoregressive Transformer (SAT). The SAT keeps the autoregressive property in global but relieves in local and thus is able to produce multiple successive words in parallel at each time step. Experiments conducted on English-German and Chinese-English translation tasks show that the SAT achieves a good balance between translation quality and decoding speed. On WMT’14 English-German translation, the SAT achieves 5.58$\times$ speedup while maintains 88% translation quality, significantly better than the previous non-autoregressive methods. When produces two words at each time step, the SAT is almost lossless (only 1% degeneration in BLEU score).
Tasks	Machine Translation
Published	2018-08-26
URL	http://arxiv.org/abs/1808.08583v2
PDF	http://arxiv.org/pdf/1808.08583v2.pdf
PWC	https://paperswithcode.com/paper/semi-autoregressive-neural-machine
Repo
Framework

A Multi-Stream Convolutional Neural Network Framework for Group Activity Recognition


Title	A Multi-Stream Convolutional Neural Network Framework for Group Activity Recognition
Authors	Sina Mokhtarzadeh Azar, Mina Ghadimi Atigh, Ahmad Nickabadi
Abstract	In this work, we present a framework based on multi-stream convolutional neural networks (CNNs) for group activity recognition. Streams of CNNs are separately trained on different modalities and their predictions are fused at the end. Each stream has two branches to predict the group activity based on person and scene level representations. A new modality based on the human pose estimation is presented to add extra information to the model. We evaluate our method on the Volleyball and Collective Activity datasets. Experimental results show that the proposed framework is able to achieve state-of-the-art results when multiple or single frames are given as input to the model with 90.50% and 86.61% accuracy on Volleyball dataset, respectively, and 87.01% accuracy of multiple frames group activity on Collective Activity dataset.
Tasks	Activity Recognition, Group Activity Recognition, Pose Estimation
Published	2018-12-26
URL	http://arxiv.org/abs/1812.10328v1
PDF	http://arxiv.org/pdf/1812.10328v1.pdf
PWC	https://paperswithcode.com/paper/a-multi-stream-convolutional-neural-network
Repo
Framework

Fused Deep Neural Networks for Efficient Pedestrian Detection


Title	Fused Deep Neural Networks for Efficient Pedestrian Detection
Authors	Xianzhi Du, Mostafa El-Khamy, Vlad I. Morariu, Jungwon Lee, Larry Davis
Abstract	In this paper, we present an efficient pedestrian detection system, designed by fusion of multiple deep neural network (DNN) systems. Pedestrian candidates are first generated by a single shot convolutional multi-box detector at different locations with various scales and aspect ratios. The candidate generator is designed to provide the majority of ground truth pedestrian annotations at the cost of a large number of false positives. Then, a classification system using the idea of ensemble learning is deployed to improve the detection accuracy. The classification system further classifies the generated candidates based on opinions of multiple deep verification networks and a fusion network which utilizes a novel soft-rejection fusion method to adjust the confidence in the detection results. To improve the training of the deep verification networks, a novel soft-label method is devised to assign floating point labels to the generated pedestrian candidates. A deep context aggregation semantic segmentation network also provides pixel-level classification of the scene and its results are softly fused with the detection results by the single shot detector. Our pedestrian detector compared favorably to state-of-art methods on all popular pedestrian detection datasets. For example, our fused DNN has better detection accuracy on the Caltech Pedestrian dataset than all previous state of art methods, while also being the fastest. We significantly improved the log-average miss rate on the Caltech pedestrian dataset to 7.67% and achieved the new state-of-the-art.
Tasks	Pedestrian Detection, Semantic Segmentation
Published	2018-05-02
URL	http://arxiv.org/abs/1805.08688v1
PDF	http://arxiv.org/pdf/1805.08688v1.pdf
PWC	https://paperswithcode.com/paper/fused-deep-neural-networks-for-efficient
Repo
Framework

Multi-task Learning for Continuous Control


Title	Multi-task Learning for Continuous Control
Authors	Himani Arora, Rajath Kumar, Jason Krone, Chong Li
Abstract	Reliable and effective multi-task learning is a prerequisite for the development of robotic agents that can quickly learn to accomplish related, everyday tasks. However, in the reinforcement learning domain, multi-task learning has not exhibited the same level of success as in other domains, such as computer vision. In addition, most reinforcement learning research on multi-task learning has been focused on discrete action spaces, which are not used for robotic control in the real-world. In this work, we apply multi-task learning methods to continuous action spaces and benchmark their performance on a series of simulated continuous control tasks. Most notably, we show that multi-task learning outperforms our baselines and alternative knowledge sharing methods.
Tasks	Continuous Control, Multi-Task Learning
Published	2018-02-03
URL	http://arxiv.org/abs/1802.01034v1
PDF	http://arxiv.org/pdf/1802.01034v1.pdf
PWC	https://paperswithcode.com/paper/multi-task-learning-for-continuous-control
Repo
Framework