May 7, 2019

3152 words 15 mins read

Paper Group AWR 105

Paper Group AWR 105

EIE: Efficient Inference Engine on Compressed Deep Neural Network. Wav2Letter: an End-to-End ConvNet-based Speech Recognition System. Learning a Driving Simulator. Semantic Perceptual Image Compression using Deep Convolution Networks. Stochastic Variance Reduction for Nonconvex Optimization. Distributed Constraint Optimization Problems and Applicat …

EIE: Efficient Inference Engine on Compressed Deep Neural Network

Title EIE: Efficient Inference Engine on Compressed Deep Neural Network
Authors Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally
Abstract State-of-the-art deep neural networks (DNNs) have hundreds of millions of connections and are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources and power budgets. While custom hardware helps the computation, fetching weights from DRAM is two orders of magnitude more expensive than ALU operations, and dominates the required power. Previously proposed ‘Deep Compression’ makes it possible to fit large DNNs (AlexNet and VGGNet) fully in on-chip SRAM. This compression is achieved by pruning the redundant connections and having multiple connections share the same weight. We propose an energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing. Going from DRAM to SRAM gives EIE 120x energy saving; Exploiting sparsity saves 10x; Weight sharing gives 8x; Skipping zero activations from ReLU saves another 3x. Evaluated on nine DNN benchmarks, EIE is 189x and 13x faster when compared to CPU and GPU implementations of the same DNN without compression. EIE has a processing power of 102GOPS/s working directly on a compressed network, corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of AlexNet at 1.88x10^4 frames/sec with a power dissipation of only 600mW. It is 24,000x and 3,400x more energy efficient than a CPU and GPU respectively. Compared with DaDianNao, EIE has 2.9x, 19x and 3x better throughput, energy efficiency and area efficiency.
Tasks
Published 2016-02-04
URL http://arxiv.org/abs/1602.01528v2
PDF http://arxiv.org/pdf/1602.01528v2.pdf
PWC https://paperswithcode.com/paper/eie-efficient-inference-engine-on-compressed
Repo https://github.com/songhan/Deep-Compression-AlexNet
Framework caffe2

Wav2Letter: an End-to-End ConvNet-based Speech Recognition System

Title Wav2Letter: an End-to-End ConvNet-based Speech Recognition System
Authors Ronan Collobert, Christian Puhrsch, Gabriel Synnaeve
Abstract This paper presents a simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding. It is trained to output letters, with transcribed speech, without the need for force alignment of phonemes. We introduce an automatic segmentation criterion for training from sequence annotation without alignment that is on par with CTC while being simpler. We show competitive results in word error rate on the Librispeech corpus with MFCC features, and promising results from raw waveform.
Tasks Speech Recognition
Published 2016-09-11
URL http://arxiv.org/abs/1609.03193v2
PDF http://arxiv.org/pdf/1609.03193v2.pdf
PWC https://paperswithcode.com/paper/wav2letter-an-end-to-end-convnet-based-speech
Repo https://github.com/MrMao/wav2letter
Framework torch

Learning a Driving Simulator

Title Learning a Driving Simulator
Authors Eder Santana, George Hotz
Abstract Comma.ai’s approach to Artificial Intelligence for self-driving cars is based on an agent that learns to clone driver behaviors and plans maneuvers by simulating future events in the road. This paper illustrates one of our research approaches for driving simulation. One where we learn to simulate. Here we investigate variational autoencoders with classical and learned cost functions using generative adversarial networks for embedding road frames. Afterwards, we learn a transition model in the embedded space using action conditioned Recurrent Neural Networks. We show that our approach can keep predicting realistic looking video for several frames despite the transition model being optimized without a cost function in the pixel space.
Tasks Self-Driving Cars, Video Prediction
Published 2016-08-03
URL http://arxiv.org/abs/1608.01230v1
PDF http://arxiv.org/pdf/1608.01230v1.pdf
PWC https://paperswithcode.com/paper/learning-a-driving-simulator
Repo https://github.com/Sondreab/end-to-end_autonomous_driving
Framework tf

Semantic Perceptual Image Compression using Deep Convolution Networks

Title Semantic Perceptual Image Compression using Deep Convolution Networks
Authors Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, James Storer
Abstract It has long been considered a significant problem to improve the visual quality of lossy image and video compression. Recent advances in computing power together with the availability of large training data sets has increased interest in the application of deep learning cnns to address image recognition and image processing tasks. Here, we present a powerful cnn tailored to the specific task of semantic image understanding to achieve higher visual quality in lossy compression. A modest increase in complexity is incorporated to the encoder which allows a standard, off-the-shelf jpeg decoder to be used. While jpeg encoding may be optimized for generic images, the process is ultimately unaware of the specific content of the image to be compressed. Our technique makes jpeg content-aware by designing and training a model to identify multiple semantic regions in a given image. Unlike object detection techniques, our model does not require labeling of object positions and is able to identify objects in a single pass. We present a new cnn architecture directed specifically to image compression, which generates a map that highlights semantically-salient regions so that they can be encoded at higher quality as compared to background regions. By adding a complete set of features for every class, and then taking a threshold over the sum of all feature activations, we generate a map that highlights semantically-salient regions so that they can be encoded at a better quality compared to background regions. Experiments are presented on the Kodak PhotoCD dataset and the MIT Saliency Benchmark dataset, in which our algorithm achieves higher visual quality for the same compressed size.
Tasks Image Compression, Object Detection, Video Compression
Published 2016-12-27
URL http://arxiv.org/abs/1612.08712v2
PDF http://arxiv.org/pdf/1612.08712v2.pdf
PWC https://paperswithcode.com/paper/semantic-perceptual-image-compression-using
Repo https://github.com/iamaaditya/image-compression-cnn
Framework tf

Stochastic Variance Reduction for Nonconvex Optimization

Title Stochastic Variance Reduction for Nonconvex Optimization
Authors Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, Alex Smola
Abstract We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD); but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to mini-batching in parallel settings.
Tasks
Published 2016-03-19
URL http://arxiv.org/abs/1603.06160v2
PDF http://arxiv.org/pdf/1603.06160v2.pdf
PWC https://paperswithcode.com/paper/stochastic-variance-reduction-for-nonconvex
Repo https://github.com/ryandgoldenberg1/svrg_project
Framework pytorch

Distributed Constraint Optimization Problems and Applications: A Survey

Title Distributed Constraint Optimization Problems and Applications: A Survey
Authors Ferdinando Fioretto, Enrico Pontelli, William Yeoh
Abstract The field of Multi-Agent System (MAS) is an active area of research within Artificial Intelligence, with an increasingly important impact in industrial and other real-world applications. Within a MAS, autonomous agents interact to pursue personal interests and/or to achieve common objectives. Distributed Constraint Optimization Problems (DCOPs) have emerged as one of the prominent agent architectures to govern the agents’ autonomous behavior, where both algorithms and communication models are driven by the structure of the specific problem. During the last decade, several extensions to the DCOP model have enabled them to support MAS in complex, real-time, and uncertain environments. This survey aims at providing an overview of the DCOP model, giving a classification of its multiple extensions and addressing both resolution methods and applications that find a natural mapping within each class of DCOPs. The proposed classification suggests several future perspectives for DCOP extensions, and identifies challenges in the design of efficient resolution algorithms, possibly through the adaptation of strategies from different areas.
Tasks
Published 2016-02-20
URL http://arxiv.org/abs/1602.06347v4
PDF http://arxiv.org/pdf/1602.06347v4.pdf
PWC https://paperswithcode.com/paper/distributed-constraint-optimization-problems
Repo https://github.com/nandofioretto/distributed_multiagent_optimization_survey
Framework none

Variational Fourier features for Gaussian processes

Title Variational Fourier features for Gaussian processes
Authors James Hensman, Nicolas Durrande, Arno Solin
Abstract This work brings together two powerful concepts in Gaussian processes: the variational approach to sparse approximation and the spectral representation of Gaussian processes. This gives rise to an approximation that inherits the benefits of the variational approach but with the representational power and computational scalability of spectral representations. The work hinges on a key result that there exist spectral features related to a finite domain of the Gaussian process which exhibit almost-independent covariances. We derive these expressions for Matern kernels in one dimension, and generalize to more dimensions using kernels with specific structures. Under the assumption of additive Gaussian noise, our method requires only a single pass through the dataset, making for very fast and accurate computation. We fit a model to 4 million training points in just a few minutes on a standard laptop. With non-conjugate likelihoods, our MCMC scheme reduces the cost of computation from O(NM2) (for a sparse Gaussian process) to O(NM) per iteration, where N is the number of data and M is the number of features.
Tasks Gaussian Processes
Published 2016-11-21
URL http://arxiv.org/abs/1611.06740v2
PDF http://arxiv.org/pdf/1611.06740v2.pdf
PWC https://paperswithcode.com/paper/variational-fourier-features-for-gaussian
Repo https://github.com/jameshensman/VFF
Framework none

DOTmark - A Benchmark for Discrete Optimal Transport

Title DOTmark - A Benchmark for Discrete Optimal Transport
Authors Jörn Schrieber, Dominic Schuhmacher, Carsten Gottschlich
Abstract The Wasserstein metric or earth mover’s distance (EMD) is a useful tool in statistics, machine learning and computer science with many applications to biological or medical imaging, among others. Especially in the light of increasingly complex data, the computation of these distances via optimal transport is often the limiting factor. Inspired by this challenge, a variety of new approaches to optimal transport has been proposed in recent years and along with these new methods comes the need for a meaningful comparison. In this paper, we introduce a benchmark for discrete optimal transport, called DOTmark, which is designed to serve as a neutral collection of problems, where discrete optimal transport methods can be tested, compared to one another, and brought to their limits on large-scale instances. It consists of a variety of grayscale images, in various resolutions and classes, such as several types of randomly generated images, classical test images and real data from microscopy. Along with the DOTmark we present a survey and a performance test for a cross section of established methods ranging from more traditional algorithms, such as the transportation simplex, to recently developed approaches, such as the shielding neighborhood method, and including also a comparison with commercial solvers.
Tasks
Published 2016-10-11
URL http://arxiv.org/abs/1610.03368v1
PDF http://arxiv.org/pdf/1610.03368v1.pdf
PWC https://paperswithcode.com/paper/dotmark-a-benchmark-for-discrete-optimal
Repo https://github.com/nbonneel/network_simplex
Framework none

Interactive Removal and Ground Truth for Difficult Shadow Scenes

Title Interactive Removal and Ground Truth for Difficult Shadow Scenes
Authors Han Gong, Darren P. Cosker
Abstract A user-centric method for fast, interactive, robust and high-quality shadow removal is presented. Our algorithm can perform detection and removal in a range of difficult cases: such as highly textured and colored shadows. To perform detection an on-the-fly learning approach is adopted guided by two rough user inputs for the pixels of the shadow and the lit area. After detection, shadow removal is performed by registering the penumbra to a normalized frame which allows us efficient estimation of non-uniform shadow illumination changes, resulting in accurate and robust removal. Another major contribution of this work is the first validated and multi-scene category ground truth for shadow removal algorithms. This data set containing 186 images eliminates inconsistencies between shadow and shadow-free images and provides a range of different shadow types such as soft, textured, colored and broken shadow. Using this data, the most thorough comparison of state-of-the-art shadow removal methods to date is performed, showing our proposed new algorithm to outperform the state-of-the-art across several measures and shadow category. To complement our dataset, an online shadow removal benchmark website is also presented to encourage future open comparisons in this challenging field of research.
Tasks
Published 2016-08-02
URL http://arxiv.org/abs/1608.00762v1
PDF http://arxiv.org/pdf/1608.00762v1.pdf
PWC https://paperswithcode.com/paper/interactive-removal-and-ground-truth-for
Repo https://github.com/hangong/deshadow
Framework none

Vote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks

Title Vote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks
Authors Martin Engelcke, Dushyant Rao, Dominic Zeng Wang, Chi Hay Tong, Ingmar Posner
Abstract This paper proposes a computationally efficient approach to detecting objects natively in 3D point clouds using convolutional neural networks (CNNs). In particular, this is achieved by leveraging a feature-centric voting scheme to implement novel convolutional layers which explicitly exploit the sparsity encountered in the input. To this end, we examine the trade-off between accuracy and speed for different architectures and additionally propose to use an L1 penalty on the filter activations to further encourage sparsity in the intermediate representations. To the best of our knowledge, this is the first work to propose sparse convolutional layers and L1 regularisation for efficient large-scale processing of 3D data. We demonstrate the efficacy of our approach on the KITTI object detection benchmark and show that Vote3Deep models with as few as three layers outperform the previous state of the art in both laser and laser-vision based approaches by margins of up to 40% while remaining highly competitive in terms of processing time.
Tasks Object Detection, Real-Time Object Detection
Published 2016-09-21
URL http://arxiv.org/abs/1609.06666v2
PDF http://arxiv.org/pdf/1609.06666v2.pdf
PWC https://paperswithcode.com/paper/vote3deep-fast-object-detection-in-3d-point
Repo https://github.com/s10803926/3D-Object-detection-from-Pointcloud
Framework tf

A constrained L1 minimization approach for estimating multiple Sparse Gaussian or Nonparanormal Graphical Models

Title A constrained L1 minimization approach for estimating multiple Sparse Gaussian or Nonparanormal Graphical Models
Authors Beilun Wang, Ritambhara Singh, Yanjun Qi
Abstract Identifying context-specific entity networks from aggregated data is an important task, arising often in bioinformatics and neuroimaging. Computationally, this task can be formulated as jointly estimating multiple different, but related, sparse Undirected Graphical Models (UGM) from aggregated samples across several contexts. Previous joint-UGM studies have mostly focused on sparse Gaussian Graphical Models (sGGMs) and can’t identify context-specific edge patterns directly. We, therefore, propose a novel approach, SIMULE (detecting Shared and Individual parts of MULtiple graphs Explicitly) to learn multi-UGM via a constrained L1 minimization. SIMULE automatically infers both specific edge patterns that are unique to each context and shared interactions preserved among all the contexts. Through the L1 constrained formulation, this problem is cast as multiple independent subtasks of linear programming that can be solved efficiently in parallel. In addition to Gaussian data, SIMULE can also handle multivariate Nonparanormal data that greatly relaxes the normality assumption that many real-world applications do not follow. We provide a novel theoretical proof showing that SIMULE achieves a consistent result at the rate O(log(Kp)/n_{tot}). On multiple synthetic datasets and two biomedical datasets, SIMULE shows significant improvement over state-of-the-art multi-sGGM and single-UGM baselines.
Tasks
Published 2016-05-11
URL http://arxiv.org/abs/1605.03468v6
PDF http://arxiv.org/pdf/1605.03468v6.pdf
PWC https://paperswithcode.com/paper/a-constrained-l1-minimization-approach-for
Repo https://github.com/QData/SIMULE
Framework none

Transfer Learning for Low-Resource Neural Machine Translation

Title Transfer Learning for Low-Resource Neural Machine Translation
Authors Barret Zoph, Deniz Yuret, Jonathan May, Kevin Knight
Abstract The encoder-decoder framework for neural machine translation (NMT) has been shown effective in large data scenarios, but is much less effective for low-resource languages. We present a transfer learning method that significantly improves Bleu scores across a range of low-resource languages. Our key idea is to first train a high-resource language pair (the parent model), then transfer some of the learned parameters to the low-resource pair (the child model) to initialize and constrain training. Using our transfer learning method we improve baseline NMT models by an average of 5.6 Bleu on four low-resource language pairs. Ensembling and unknown word replacement add another 2 Bleu which brings the NMT performance on low-resource machine translation close to a strong syntax based machine translation (SBMT) system, exceeding its performance on one language pair. Additionally, using the transfer learning model for re-scoring, we can improve the SBMT system by an average of 1.3 Bleu, improving the state-of-the-art on low-resource machine translation.
Tasks Low-Resource Neural Machine Translation, Machine Translation, Transfer Learning
Published 2016-04-08
URL http://arxiv.org/abs/1604.02201v1
PDF http://arxiv.org/pdf/1604.02201v1.pdf
PWC https://paperswithcode.com/paper/transfer-learning-for-low-resource-neural
Repo https://github.com/isi-nlp/Zoph_RNN
Framework none

CNN Architectures for Large-Scale Audio Classification

Title CNN Architectures for Large-Scale Audio Classification
Authors Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, Kevin Wilson
Abstract Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.
Tasks Audio Classification
Published 2016-09-29
URL http://arxiv.org/abs/1609.09430v2
PDF http://arxiv.org/pdf/1609.09430v2.pdf
PWC https://paperswithcode.com/paper/cnn-architectures-for-large-scale-audio
Repo https://github.com/deephdc/audio-classification-tf
Framework tf

Learning Local Descriptors by Optimizing the Keypoint-Correspondence Criterion: Applications to Face Matching, Learning from Unlabeled Videos and 3D-Shape Retrieval

Title Learning Local Descriptors by Optimizing the Keypoint-Correspondence Criterion: Applications to Face Matching, Learning from Unlabeled Videos and 3D-Shape Retrieval
Authors Nenad Markuš, Igor S. Pandžić, Jörgen Ahlberg
Abstract Current best local descriptors are learned on a large dataset of matching and non-matching keypoint pairs. However, data of this kind is not always available since detailed keypoint correspondences can be hard to establish. On the other hand, we can often obtain labels for pairs of keypoint bags. For example, keypoint bags extracted from two images of the same object under different views form a matching pair, and keypoint bags extracted from images of different objects form a non-matching pair. On average, matching pairs should contain more corresponding keypoints than non-matching pairs. We describe an end-to-end differentiable architecture that enables the learning of local keypoint descriptors from such weakly-labeled data. Additionally, we discuss how to improve the method by incorporating the procedure of mining hard negatives. We also show how can our approach be used to learn convolutional features from unlabeled video signals and 3D models. Our implementation is available at https://github.com/nenadmarkus/wlrn
Tasks 3D Shape Retrieval
Published 2016-03-30
URL https://arxiv.org/abs/1603.09095v6
PDF https://arxiv.org/pdf/1603.09095v6.pdf
PWC https://paperswithcode.com/paper/learning-local-descriptors-by-optimizing-the
Repo https://github.com/nenadmarkus/wlrn
Framework pytorch

Recognizing Surgical Activities with Recurrent Neural Networks

Title Recognizing Surgical Activities with Recurrent Neural Networks
Authors Robert DiPietro, Colin Lea, Anand Malpani, Narges Ahmidi, S. Swaroop Vedula, Gyusung I. Lee, Mija R. Lee, Gregory D. Hager
Abstract We apply recurrent neural networks to the task of recognizing surgical activities from robot kinematics. Prior work in this area focuses on recognizing short, low-level activities, or gestures, and has been based on variants of hidden Markov models and conditional random fields. In contrast, we work on recognizing both gestures and longer, higher-level activites, or maneuvers, and we model the mapping from kinematics to gestures/maneuvers with recurrent neural networks. To our knowledge, we are the first to apply recurrent neural networks to this task. Using a single model and a single set of hyperparameters, we match state-of-the-art performance for gesture recognition and advance state-of-the-art performance for maneuver recognition, in terms of both accuracy and edit distance. Code is available at https://github.com/rdipietro/miccai-2016-surgical-activity-rec .
Tasks Gesture Recognition
Published 2016-06-20
URL http://arxiv.org/abs/1606.06329v2
PDF http://arxiv.org/pdf/1606.06329v2.pdf
PWC https://paperswithcode.com/paper/recognizing-surgical-activities-with
Repo https://github.com/rdipietro/miccai-2016-surgical-activity-rec
Framework tf
comments powered by Disqus