May 7, 2019

3152 words 15 mins read

Paper Group AWR 105

EIE: Efficient Inference Engine on Compressed Deep Neural Network. Wav2Letter: an End-to-End ConvNet-based Speech Recognition System. Learning a Driving Simulator. Semantic Perceptual Image Compression using Deep Convolution Networks. Stochastic Variance Reduction for Nonconvex Optimization. Distributed Constraint Optimization Problems and Applicat …

EIE: Efficient Inference Engine on Compressed Deep Neural Network


Title	EIE: Efficient Inference Engine on Compressed Deep Neural Network
Authors	Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally
Abstract	State-of-the-art deep neural networks (DNNs) have hundreds of millions of connections and are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources and power budgets. While custom hardware helps the computation, fetching weights from DRAM is two orders of magnitude more expensive than ALU operations, and dominates the required power. Previously proposed ‘Deep Compression’ makes it possible to fit large DNNs (AlexNet and VGGNet) fully in on-chip SRAM. This compression is achieved by pruning the redundant connections and having multiple connections share the same weight. We propose an energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing. Going from DRAM to SRAM gives EIE 120x energy saving; Exploiting sparsity saves 10x; Weight sharing gives 8x; Skipping zero activations from ReLU saves another 3x. Evaluated on nine DNN benchmarks, EIE is 189x and 13x faster when compared to CPU and GPU implementations of the same DNN without compression. EIE has a processing power of 102GOPS/s working directly on a compressed network, corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of AlexNet at 1.88x10^4 frames/sec with a power dissipation of only 600mW. It is 24,000x and 3,400x more energy efficient than a CPU and GPU respectively. Compared with DaDianNao, EIE has 2.9x, 19x and 3x better throughput, energy efficiency and area efficiency.
Tasks
Published	2016-02-04
URL	http://arxiv.org/abs/1602.01528v2
PDF	http://arxiv.org/pdf/1602.01528v2.pdf
PWC	https://paperswithcode.com/paper/eie-efficient-inference-engine-on-compressed
Repo	https://github.com/songhan/Deep-Compression-AlexNet
Framework	caffe2

Wav2Letter: an End-to-End ConvNet-based Speech Recognition System


Title	Wav2Letter: an End-to-End ConvNet-based Speech Recognition System
Authors	Ronan Collobert, Christian Puhrsch, Gabriel Synnaeve
Abstract	This paper presents a simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding. It is trained to output letters, with transcribed speech, without the need for force alignment of phonemes. We introduce an automatic segmentation criterion for training from sequence annotation without alignment that is on par with CTC while being simpler. We show competitive results in word error rate on the Librispeech corpus with MFCC features, and promising results from raw waveform.
Tasks	Speech Recognition
Published	2016-09-11
URL	http://arxiv.org/abs/1609.03193v2
PDF	http://arxiv.org/pdf/1609.03193v2.pdf
PWC	https://paperswithcode.com/paper/wav2letter-an-end-to-end-convnet-based-speech
Repo	https://github.com/MrMao/wav2letter
Framework	torch

Learning a Driving Simulator


Title	Learning a Driving Simulator
Authors	Eder Santana, George Hotz
Abstract	Comma.ai’s approach to Artificial Intelligence for self-driving cars is based on an agent that learns to clone driver behaviors and plans maneuvers by simulating future events in the road. This paper illustrates one of our research approaches for driving simulation. One where we learn to simulate. Here we investigate variational autoencoders with classical and learned cost functions using generative adversarial networks for embedding road frames. Afterwards, we learn a transition model in the embedded space using action conditioned Recurrent Neural Networks. We show that our approach can keep predicting realistic looking video for several frames despite the transition model being optimized without a cost function in the pixel space.
Tasks	Self-Driving Cars, Video Prediction
Published	2016-08-03
URL	http://arxiv.org/abs/1608.01230v1
PDF	http://arxiv.org/pdf/1608.01230v1.pdf
PWC	https://paperswithcode.com/paper/learning-a-driving-simulator
Repo	https://github.com/Sondreab/end-to-end_autonomous_driving
Framework	tf

Semantic Perceptual Image Compression using Deep Convolution Networks


Title	Semantic Perceptual Image Compression using Deep Convolution Networks
Authors	Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, James Storer
Abstract	It has long been considered a significant problem to improve the visual quality of lossy image and video compression. Recent advances in computing power together with the availability of large training data sets has increased interest in the application of deep learning cnns to address image recognition and image processing tasks. Here, we present a powerful cnn tailored to the specific task of semantic image understanding to achieve higher visual quality in lossy compression. A modest increase in complexity is incorporated to the encoder which allows a standard, off-the-shelf jpeg decoder to be used. While jpeg encoding may be optimized for generic images, the process is ultimately unaware of the specific content of the image to be compressed. Our technique makes jpeg content-aware by designing and training a model to identify multiple semantic regions in a given image. Unlike object detection techniques, our model does not require labeling of object positions and is able to identify objects in a single pass. We present a new cnn architecture directed specifically to image compression, which generates a map that highlights semantically-salient regions so that they can be encoded at higher quality as compared to background regions. By adding a complete set of features for every class, and then taking a threshold over the sum of all feature activations, we generate a map that highlights semantically-salient regions so that they can be encoded at a better quality compared to background regions. Experiments are presented on the Kodak PhotoCD dataset and the MIT Saliency Benchmark dataset, in which our algorithm achieves higher visual quality for the same compressed size.
Tasks	Image Compression, Object Detection, Video Compression
Published	2016-12-27
URL	http://arxiv.org/abs/1612.08712v2
PDF	http://arxiv.org/pdf/1612.08712v2.pdf
PWC	https://paperswithcode.com/paper/semantic-perceptual-image-compression-using
Repo	https://github.com/iamaaditya/image-compression-cnn
Framework	tf

Stochastic Variance Reduction for Nonconvex Optimization


Title	Stochastic Variance Reduction for Nonconvex Optimization
Authors	Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, Alex Smola
Abstract	We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD); but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to mini-batching in parallel settings.
Tasks
Published	2016-03-19
URL	http://arxiv.org/abs/1603.06160v2
PDF	http://arxiv.org/pdf/1603.06160v2.pdf
PWC	https://paperswithcode.com/paper/stochastic-variance-reduction-for-nonconvex
Repo	https://github.com/ryandgoldenberg1/svrg_project
Framework	pytorch

Distributed Constraint Optimization Problems and Applications: A Survey


Title	Distributed Constraint Optimization Problems and Applications: A Survey
Authors	Ferdinando Fioretto, Enrico Pontelli, William Yeoh
Abstract	The field of Multi-Agent System (MAS) is an active area of research within Artificial Intelligence, with an increasingly important impact in industrial and other real-world applications. Within a MAS, autonomous agents interact to pursue personal interests and/or to achieve common objectives. Distributed Constraint Optimization Problems (DCOPs) have emerged as one of the prominent agent architectures to govern the agents’ autonomous behavior, where both algorithms and communication models are driven by the structure of the specific problem. During the last decade, several extensions to the DCOP model have enabled them to support MAS in complex, real-time, and uncertain environments. This survey aims at providing an overview of the DCOP model, giving a classification of its multiple extensions and addressing both resolution methods and applications that find a natural mapping within each class of DCOPs. The proposed classification suggests several future perspectives for DCOP extensions, and identifies challenges in the design of efficient resolution algorithms, possibly through the adaptation of strategies from different areas.
Tasks
Published	2016-02-20
URL	http://arxiv.org/abs/1602.06347v4
PDF	http://arxiv.org/pdf/1602.06347v4.pdf
PWC	https://paperswithcode.com/paper/distributed-constraint-optimization-problems
Repo	https://github.com/nandofioretto/distributed_multiagent_optimization_survey
Framework	none

Variational Fourier features for Gaussian processes


Title	Variational Fourier features for Gaussian processes
Authors	James Hensman, Nicolas Durrande, Arno Solin
Abstract	This work brings together two powerful concepts in Gaussian processes: the variational approach to sparse approximation and the spectral representation of Gaussian processes. This gives rise to an approximation that inherits the benefits of the variational approach but with the representational power and computational scalability of spectral representations. The work hinges on a key result that there exist spectral features related to a finite domain of the Gaussian process which exhibit almost-independent covariances. We derive these expressions for Matern kernels in one dimension, and generalize to more dimensions using kernels with specific structures. Under the assumption of additive Gaussian noise, our method requires only a single pass through the dataset, making for very fast and accurate computation. We fit a model to 4 million training points in just a few minutes on a standard laptop. With non-conjugate likelihoods, our MCMC scheme reduces the cost of computation from O(NM2) (for a sparse Gaussian process) to O(NM) per iteration, where N is the number of data and M is the number of features.
Tasks	Gaussian Processes
Published	2016-11-21
URL	http://arxiv.org/abs/1611.06740v2
PDF	http://arxiv.org/pdf/1611.06740v2.pdf
PWC	https://paperswithcode.com/paper/variational-fourier-features-for-gaussian
Repo	https://github.com/jameshensman/VFF
Framework	none

DOTmark - A Benchmark for Discrete Optimal Transport


Title	DOTmark - A Benchmark for Discrete Optimal Transport
Authors	Jörn Schrieber, Dominic Schuhmacher, Carsten Gottschlich
Abstract	The Wasserstein metric or earth mover’s distance (EMD) is a useful tool in statistics, machine learning and computer science with many applications to biological or medical imaging, among others. Especially in the light of increasingly complex data, the computation of these distances via optimal transport is often the limiting factor. Inspired by this challenge, a variety of new approaches to optimal transport has been proposed in recent years and along with these new methods comes the need for a meaningful comparison. In this paper, we introduce a benchmark for discrete optimal transport, called DOTmark, which is designed to serve as a neutral collection of problems, where discrete optimal transport methods can be tested, compared to one another, and brought to their limits on large-scale instances. It consists of a variety of grayscale images, in various resolutions and classes, such as several types of randomly generated images, classical test images and real data from microscopy. Along with the DOTmark we present a survey and a performance test for a cross section of established methods ranging from more traditional algorithms, such as the transportation simplex, to recently developed approaches, such as the shielding neighborhood method, and including also a comparison with commercial solvers.
Tasks
Published	2016-10-11
URL	http://arxiv.org/abs/1610.03368v1
PDF	http://arxiv.org/pdf/1610.03368v1.pdf
PWC	https://paperswithcode.com/paper/dotmark-a-benchmark-for-discrete-optimal
Repo	https://github.com/nbonneel/network_simplex
Framework	none

Interactive Removal and Ground Truth for Difficult Shadow Scenes


Title	Interactive Removal and Ground Truth for Difficult Shadow Scenes
Authors	Han Gong, Darren P. Cosker
Abstract	A user-centric method for fast, interactive, robust and high-quality shadow removal is presented. Our algorithm can perform detection and removal in a range of difficult cases: such as highly textured and colored shadows. To perform detection an on-the-fly learning approach is adopted guided by two rough user inputs for the pixels of the shadow and the lit area. After detection, shadow removal is performed by registering the penumbra to a normalized frame which allows us efficient estimation of non-uniform shadow illumination changes, resulting in accurate and robust removal. Another major contribution of this work is the first validated and multi-scene category ground truth for shadow removal algorithms. This data set containing 186 images eliminates inconsistencies between shadow and shadow-free images and provides a range of different shadow types such as soft, textured, colored and broken shadow. Using this data, the most thorough comparison of state-of-the-art shadow removal methods to date is performed, showing our proposed new algorithm to outperform the state-of-the-art across several measures and shadow category. To complement our dataset, an online shadow removal benchmark website is also presented to encourage future open comparisons in this challenging field of research.
Tasks
Published	2016-08-02
URL	http://arxiv.org/abs/1608.00762v1
PDF	http://arxiv.org/pdf/1608.00762v1.pdf
PWC	https://paperswithcode.com/paper/interactive-removal-and-ground-truth-for
Repo	https://github.com/hangong/deshadow
Framework	none

Vote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks


Title	Vote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks
Authors	Martin Engelcke, Dushyant Rao, Dominic Zeng Wang, Chi Hay Tong, Ingmar Posner
Abstract	This paper proposes a computationally efficient approach to detecting objects natively in 3D point clouds using convolutional neural networks (CNNs). In particular, this is achieved by leveraging a feature-centric voting scheme to implement novel convolutional layers which explicitly exploit the sparsity encountered in the input. To this end, we examine the trade-off between accuracy and speed for different architectures and additionally propose to use an L1 penalty on the filter activations to further encourage sparsity in the intermediate representations. To the best of our knowledge, this is the first work to propose sparse convolutional layers and L1 regularisation for efficient large-scale processing of 3D data. We demonstrate the efficacy of our approach on the KITTI object detection benchmark and show that Vote3Deep models with as few as three layers outperform the previous state of the art in both laser and laser-vision based approaches by margins of up to 40% while remaining highly competitive in terms of processing time.
Tasks	Object Detection, Real-Time Object Detection
Published	2016-09-21
URL	http://arxiv.org/abs/1609.06666v2
PDF	http://arxiv.org/pdf/1609.06666v2.pdf
PWC	https://paperswithcode.com/paper/vote3deep-fast-object-detection-in-3d-point
Repo	https://github.com/s10803926/3D-Object-detection-from-Pointcloud
Framework	tf

A constrained L1 minimization approach for estimating multiple Sparse Gaussian or Nonparanormal Graphical Models


Title	A constrained L1 minimization approach for estimating multiple Sparse Gaussian or Nonparanormal Graphical Models
Authors	Beilun Wang, Ritambhara Singh, Yanjun Qi
Abstract	Identifying context-specific entity networks from aggregated data is an important task, arising often in bioinformatics and neuroimaging. Computationally, this task can be formulated as jointly estimating multiple different, but related, sparse Undirected Graphical Models (UGM) from aggregated samples across several contexts. Previous joint-UGM studies have mostly focused on sparse Gaussian Graphical Models (sGGMs) and can’t identify context-specific edge patterns directly. We, therefore, propose a novel approach, SIMULE (detecting Shared and Individual parts of MULtiple graphs Explicitly) to learn multi-UGM via a constrained L1 minimization. SIMULE automatically infers both specific edge patterns that are unique to each context and shared interactions preserved among all the contexts. Through the L1 constrained formulation, this problem is cast as multiple independent subtasks of linear programming that can be solved efficiently in parallel. In addition to Gaussian data, SIMULE can also handle multivariate Nonparanormal data that greatly relaxes the normality assumption that many real-world applications do not follow. We provide a novel theoretical proof showing that SIMULE achieves a consistent result at the rate O(log(Kp)/n_{tot}). On multiple synthetic datasets and two biomedical datasets, SIMULE shows significant improvement over state-of-the-art multi-sGGM and single-UGM baselines.
Tasks
Published	2016-05-11
URL	http://arxiv.org/abs/1605.03468v6
PDF	http://arxiv.org/pdf/1605.03468v6.pdf
PWC	https://paperswithcode.com/paper/a-constrained-l1-minimization-approach-for
Repo	https://github.com/QData/SIMULE
Framework	none

Transfer Learning for Low-Resource Neural Machine Translation


Title	Transfer Learning for Low-Resource Neural Machine Translation
Authors	Barret Zoph, Deniz Yuret, Jonathan May, Kevin Knight
Abstract	The encoder-decoder framework for neural machine translation (NMT) has been shown effective in large data scenarios, but is much less effective for low-resource languages. We present a transfer learning method that significantly improves Bleu scores across a range of low-resource languages. Our key idea is to first train a high-resource language pair (the parent model), then transfer some of the learned parameters to the low-resource pair (the child model) to initialize and constrain training. Using our transfer learning method we improve baseline NMT models by an average of 5.6 Bleu on four low-resource language pairs. Ensembling and unknown word replacement add another 2 Bleu which brings the NMT performance on low-resource machine translation close to a strong syntax based machine translation (SBMT) system, exceeding its performance on one language pair. Additionally, using the transfer learning model for re-scoring, we can improve the SBMT system by an average of 1.3 Bleu, improving the state-of-the-art on low-resource machine translation.
Tasks	Low-Resource Neural Machine Translation, Machine Translation, Transfer Learning
Published	2016-04-08
URL	http://arxiv.org/abs/1604.02201v1
PDF	http://arxiv.org/pdf/1604.02201v1.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-for-low-resource-neural
Repo	https://github.com/isi-nlp/Zoph_RNN
Framework	none

CNN Architectures for Large-Scale Audio Classification


Title	CNN Architectures for Large-Scale Audio Classification
Authors	Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, Kevin Wilson
Abstract	Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.
Tasks	Audio Classification
Published	2016-09-29
URL	http://arxiv.org/abs/1609.09430v2
PDF	http://arxiv.org/pdf/1609.09430v2.pdf
PWC	https://paperswithcode.com/paper/cnn-architectures-for-large-scale-audio
Repo	https://github.com/deephdc/audio-classification-tf
Framework	tf

Learning Local Descriptors by Optimizing the Keypoint-Correspondence Criterion: Applications to Face Matching, Learning from Unlabeled Videos and 3D-Shape Retrieval


Title	Learning Local Descriptors by Optimizing the Keypoint-Correspondence Criterion: Applications to Face Matching, Learning from Unlabeled Videos and 3D-Shape Retrieval
Authors	Nenad Markuš, Igor S. Pandžić, Jörgen Ahlberg
Abstract	Current best local descriptors are learned on a large dataset of matching and non-matching keypoint pairs. However, data of this kind is not always available since detailed keypoint correspondences can be hard to establish. On the other hand, we can often obtain labels for pairs of keypoint bags. For example, keypoint bags extracted from two images of the same object under different views form a matching pair, and keypoint bags extracted from images of different objects form a non-matching pair. On average, matching pairs should contain more corresponding keypoints than non-matching pairs. We describe an end-to-end differentiable architecture that enables the learning of local keypoint descriptors from such weakly-labeled data. Additionally, we discuss how to improve the method by incorporating the procedure of mining hard negatives. We also show how can our approach be used to learn convolutional features from unlabeled video signals and 3D models. Our implementation is available at https://github.com/nenadmarkus/wlrn
Tasks	3D Shape Retrieval
Published	2016-03-30
URL	https://arxiv.org/abs/1603.09095v6
PDF	https://arxiv.org/pdf/1603.09095v6.pdf
PWC	https://paperswithcode.com/paper/learning-local-descriptors-by-optimizing-the
Repo	https://github.com/nenadmarkus/wlrn
Framework	pytorch

Recognizing Surgical Activities with Recurrent Neural Networks


Title	Recognizing Surgical Activities with Recurrent Neural Networks
Authors	Robert DiPietro, Colin Lea, Anand Malpani, Narges Ahmidi, S. Swaroop Vedula, Gyusung I. Lee, Mija R. Lee, Gregory D. Hager
Abstract	We apply recurrent neural networks to the task of recognizing surgical activities from robot kinematics. Prior work in this area focuses on recognizing short, low-level activities, or gestures, and has been based on variants of hidden Markov models and conditional random fields. In contrast, we work on recognizing both gestures and longer, higher-level activites, or maneuvers, and we model the mapping from kinematics to gestures/maneuvers with recurrent neural networks. To our knowledge, we are the first to apply recurrent neural networks to this task. Using a single model and a single set of hyperparameters, we match state-of-the-art performance for gesture recognition and advance state-of-the-art performance for maneuver recognition, in terms of both accuracy and edit distance. Code is available at https://github.com/rdipietro/miccai-2016-surgical-activity-rec .
Tasks	Gesture Recognition
Published	2016-06-20
URL	http://arxiv.org/abs/1606.06329v2
PDF	http://arxiv.org/pdf/1606.06329v2.pdf
PWC	https://paperswithcode.com/paper/recognizing-surgical-activities-with
Repo	https://github.com/rdipietro/miccai-2016-surgical-activity-rec
Framework	tf