Paper Group AWR 105
![Paper Group AWR 105](/2016/images/pwc/paper-arxiv_hu144ec288a26b3e360d673e256787de3e_28623_900x500_fit_q75_box.jpg)
EIE: Efficient Inference Engine on Compressed Deep Neural Network. Wav2Letter: an End-to-End ConvNet-based Speech Recognition System. Learning a Driving Simulator. Semantic Perceptual Image Compression using Deep Convolution Networks. Stochastic Variance Reduction for Nonconvex Optimization. Distributed Constraint Optimization Problems and Applicat …
EIE: Efficient Inference Engine on Compressed Deep Neural Network
Title | EIE: Efficient Inference Engine on Compressed Deep Neural Network |
Authors | Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally |
Abstract | State-of-the-art deep neural networks (DNNs) have hundreds of millions of connections and are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources and power budgets. While custom hardware helps the computation, fetching weights from DRAM is two orders of magnitude more expensive than ALU operations, and dominates the required power. Previously proposed ‘Deep Compression’ makes it possible to fit large DNNs (AlexNet and VGGNet) fully in on-chip SRAM. This compression is achieved by pruning the redundant connections and having multiple connections share the same weight. We propose an energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing. Going from DRAM to SRAM gives EIE 120x energy saving; Exploiting sparsity saves 10x; Weight sharing gives 8x; Skipping zero activations from ReLU saves another 3x. Evaluated on nine DNN benchmarks, EIE is 189x and 13x faster when compared to CPU and GPU implementations of the same DNN without compression. EIE has a processing power of 102GOPS/s working directly on a compressed network, corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of AlexNet at 1.88x10^4 frames/sec with a power dissipation of only 600mW. It is 24,000x and 3,400x more energy efficient than a CPU and GPU respectively. Compared with DaDianNao, EIE has 2.9x, 19x and 3x better throughput, energy efficiency and area efficiency. |
Tasks | |
Published | 2016-02-04 |
URL | http://arxiv.org/abs/1602.01528v2 |
http://arxiv.org/pdf/1602.01528v2.pdf | |
PWC | https://paperswithcode.com/paper/eie-efficient-inference-engine-on-compressed |
Repo | https://github.com/songhan/Deep-Compression-AlexNet |
Framework | caffe2 |
Wav2Letter: an End-to-End ConvNet-based Speech Recognition System
Title | Wav2Letter: an End-to-End ConvNet-based Speech Recognition System |
Authors | Ronan Collobert, Christian Puhrsch, Gabriel Synnaeve |
Abstract | This paper presents a simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding. It is trained to output letters, with transcribed speech, without the need for force alignment of phonemes. We introduce an automatic segmentation criterion for training from sequence annotation without alignment that is on par with CTC while being simpler. We show competitive results in word error rate on the Librispeech corpus with MFCC features, and promising results from raw waveform. |
Tasks | Speech Recognition |
Published | 2016-09-11 |
URL | http://arxiv.org/abs/1609.03193v2 |
http://arxiv.org/pdf/1609.03193v2.pdf | |
PWC | https://paperswithcode.com/paper/wav2letter-an-end-to-end-convnet-based-speech |
Repo | https://github.com/MrMao/wav2letter |
Framework | torch |
Learning a Driving Simulator
Title | Learning a Driving Simulator |
Authors | Eder Santana, George Hotz |
Abstract | Comma.ai’s approach to Artificial Intelligence for self-driving cars is based on an agent that learns to clone driver behaviors and plans maneuvers by simulating future events in the road. This paper illustrates one of our research approaches for driving simulation. One where we learn to simulate. Here we investigate variational autoencoders with classical and learned cost functions using generative adversarial networks for embedding road frames. Afterwards, we learn a transition model in the embedded space using action conditioned Recurrent Neural Networks. We show that our approach can keep predicting realistic looking video for several frames despite the transition model being optimized without a cost function in the pixel space. |
Tasks | Self-Driving Cars, Video Prediction |
Published | 2016-08-03 |
URL | http://arxiv.org/abs/1608.01230v1 |
http://arxiv.org/pdf/1608.01230v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-a-driving-simulator |
Repo | https://github.com/Sondreab/end-to-end_autonomous_driving |
Framework | tf |
Semantic Perceptual Image Compression using Deep Convolution Networks
Title | Semantic Perceptual Image Compression using Deep Convolution Networks |
Authors | Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, James Storer |
Abstract | It has long been considered a significant problem to improve the visual quality of lossy image and video compression. Recent advances in computing power together with the availability of large training data sets has increased interest in the application of deep learning cnns to address image recognition and image processing tasks. Here, we present a powerful cnn tailored to the specific task of semantic image understanding to achieve higher visual quality in lossy compression. A modest increase in complexity is incorporated to the encoder which allows a standard, off-the-shelf jpeg decoder to be used. While jpeg encoding may be optimized for generic images, the process is ultimately unaware of the specific content of the image to be compressed. Our technique makes jpeg content-aware by designing and training a model to identify multiple semantic regions in a given image. Unlike object detection techniques, our model does not require labeling of object positions and is able to identify objects in a single pass. We present a new cnn architecture directed specifically to image compression, which generates a map that highlights semantically-salient regions so that they can be encoded at higher quality as compared to background regions. By adding a complete set of features for every class, and then taking a threshold over the sum of all feature activations, we generate a map that highlights semantically-salient regions so that they can be encoded at a better quality compared to background regions. Experiments are presented on the Kodak PhotoCD dataset and the MIT Saliency Benchmark dataset, in which our algorithm achieves higher visual quality for the same compressed size. |
Tasks | Image Compression, Object Detection, Video Compression |
Published | 2016-12-27 |
URL | http://arxiv.org/abs/1612.08712v2 |
http://arxiv.org/pdf/1612.08712v2.pdf | |
PWC | https://paperswithcode.com/paper/semantic-perceptual-image-compression-using |
Repo | https://github.com/iamaaditya/image-compression-cnn |
Framework | tf |
Stochastic Variance Reduction for Nonconvex Optimization
Title | Stochastic Variance Reduction for Nonconvex Optimization |
Authors | Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, Alex Smola |
Abstract | We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD); but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to mini-batching in parallel settings. |
Tasks | |
Published | 2016-03-19 |
URL | http://arxiv.org/abs/1603.06160v2 |
http://arxiv.org/pdf/1603.06160v2.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-variance-reduction-for-nonconvex |
Repo | https://github.com/ryandgoldenberg1/svrg_project |
Framework | pytorch |
Distributed Constraint Optimization Problems and Applications: A Survey
Title | Distributed Constraint Optimization Problems and Applications: A Survey |
Authors | Ferdinando Fioretto, Enrico Pontelli, William Yeoh |
Abstract | The field of Multi-Agent System (MAS) is an active area of research within Artificial Intelligence, with an increasingly important impact in industrial and other real-world applications. Within a MAS, autonomous agents interact to pursue personal interests and/or to achieve common objectives. Distributed Constraint Optimization Problems (DCOPs) have emerged as one of the prominent agent architectures to govern the agents’ autonomous behavior, where both algorithms and communication models are driven by the structure of the specific problem. During the last decade, several extensions to the DCOP model have enabled them to support MAS in complex, real-time, and uncertain environments. This survey aims at providing an overview of the DCOP model, giving a classification of its multiple extensions and addressing both resolution methods and applications that find a natural mapping within each class of DCOPs. The proposed classification suggests several future perspectives for DCOP extensions, and identifies challenges in the design of efficient resolution algorithms, possibly through the adaptation of strategies from different areas. |
Tasks | |
Published | 2016-02-20 |
URL | http://arxiv.org/abs/1602.06347v4 |
http://arxiv.org/pdf/1602.06347v4.pdf | |
PWC | https://paperswithcode.com/paper/distributed-constraint-optimization-problems |
Repo | https://github.com/nandofioretto/distributed_multiagent_optimization_survey |
Framework | none |
Variational Fourier features for Gaussian processes
Title | Variational Fourier features for Gaussian processes |
Authors | James Hensman, Nicolas Durrande, Arno Solin |
Abstract | This work brings together two powerful concepts in Gaussian processes: the variational approach to sparse approximation and the spectral representation of Gaussian processes. This gives rise to an approximation that inherits the benefits of the variational approach but with the representational power and computational scalability of spectral representations. The work hinges on a key result that there exist spectral features related to a finite domain of the Gaussian process which exhibit almost-independent covariances. We derive these expressions for Matern kernels in one dimension, and generalize to more dimensions using kernels with specific structures. Under the assumption of additive Gaussian noise, our method requires only a single pass through the dataset, making for very fast and accurate computation. We fit a model to 4 million training points in just a few minutes on a standard laptop. With non-conjugate likelihoods, our MCMC scheme reduces the cost of computation from O(NM2) (for a sparse Gaussian process) to O(NM) per iteration, where N is the number of data and M is the number of features. |
Tasks | Gaussian Processes |
Published | 2016-11-21 |
URL | http://arxiv.org/abs/1611.06740v2 |
http://arxiv.org/pdf/1611.06740v2.pdf | |
PWC | https://paperswithcode.com/paper/variational-fourier-features-for-gaussian |
Repo | https://github.com/jameshensman/VFF |
Framework | none |
DOTmark - A Benchmark for Discrete Optimal Transport
Title | DOTmark - A Benchmark for Discrete Optimal Transport |
Authors | Jörn Schrieber, Dominic Schuhmacher, Carsten Gottschlich |
Abstract | The Wasserstein metric or earth mover’s distance (EMD) is a useful tool in statistics, machine learning and computer science with many applications to biological or medical imaging, among others. Especially in the light of increasingly complex data, the computation of these distances via optimal transport is often the limiting factor. Inspired by this challenge, a variety of new approaches to optimal transport has been proposed in recent years and along with these new methods comes the need for a meaningful comparison. In this paper, we introduce a benchmark for discrete optimal transport, called DOTmark, which is designed to serve as a neutral collection of problems, where discrete optimal transport methods can be tested, compared to one another, and brought to their limits on large-scale instances. It consists of a variety of grayscale images, in various resolutions and classes, such as several types of randomly generated images, classical test images and real data from microscopy. Along with the DOTmark we present a survey and a performance test for a cross section of established methods ranging from more traditional algorithms, such as the transportation simplex, to recently developed approaches, such as the shielding neighborhood method, and including also a comparison with commercial solvers. |
Tasks | |
Published | 2016-10-11 |
URL | http://arxiv.org/abs/1610.03368v1 |
http://arxiv.org/pdf/1610.03368v1.pdf | |
PWC | https://paperswithcode.com/paper/dotmark-a-benchmark-for-discrete-optimal |
Repo | https://github.com/nbonneel/network_simplex |
Framework | none |
Interactive Removal and Ground Truth for Difficult Shadow Scenes
Title | Interactive Removal and Ground Truth for Difficult Shadow Scenes |
Authors | Han Gong, Darren P. Cosker |
Abstract | A user-centric method for fast, interactive, robust and high-quality shadow removal is presented. Our algorithm can perform detection and removal in a range of difficult cases: such as highly textured and colored shadows. To perform detection an on-the-fly learning approach is adopted guided by two rough user inputs for the pixels of the shadow and the lit area. After detection, shadow removal is performed by registering the penumbra to a normalized frame which allows us efficient estimation of non-uniform shadow illumination changes, resulting in accurate and robust removal. Another major contribution of this work is the first validated and multi-scene category ground truth for shadow removal algorithms. This data set containing 186 images eliminates inconsistencies between shadow and shadow-free images and provides a range of different shadow types such as soft, textured, colored and broken shadow. Using this data, the most thorough comparison of state-of-the-art shadow removal methods to date is performed, showing our proposed new algorithm to outperform the state-of-the-art across several measures and shadow category. To complement our dataset, an online shadow removal benchmark website is also presented to encourage future open comparisons in this challenging field of research. |
Tasks | |
Published | 2016-08-02 |
URL | http://arxiv.org/abs/1608.00762v1 |
http://arxiv.org/pdf/1608.00762v1.pdf | |
PWC | https://paperswithcode.com/paper/interactive-removal-and-ground-truth-for |
Repo | https://github.com/hangong/deshadow |
Framework | none |
Vote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks
Title | Vote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks |
Authors | Martin Engelcke, Dushyant Rao, Dominic Zeng Wang, Chi Hay Tong, Ingmar Posner |
Abstract | This paper proposes a computationally efficient approach to detecting objects natively in 3D point clouds using convolutional neural networks (CNNs). In particular, this is achieved by leveraging a feature-centric voting scheme to implement novel convolutional layers which explicitly exploit the sparsity encountered in the input. To this end, we examine the trade-off between accuracy and speed for different architectures and additionally propose to use an L1 penalty on the filter activations to further encourage sparsity in the intermediate representations. To the best of our knowledge, this is the first work to propose sparse convolutional layers and L1 regularisation for efficient large-scale processing of 3D data. We demonstrate the efficacy of our approach on the KITTI object detection benchmark and show that Vote3Deep models with as few as three layers outperform the previous state of the art in both laser and laser-vision based approaches by margins of up to 40% while remaining highly competitive in terms of processing time. |
Tasks | Object Detection, Real-Time Object Detection |
Published | 2016-09-21 |
URL | http://arxiv.org/abs/1609.06666v2 |
http://arxiv.org/pdf/1609.06666v2.pdf | |
PWC | https://paperswithcode.com/paper/vote3deep-fast-object-detection-in-3d-point |
Repo | https://github.com/s10803926/3D-Object-detection-from-Pointcloud |
Framework | tf |
A constrained L1 minimization approach for estimating multiple Sparse Gaussian or Nonparanormal Graphical Models
Title | A constrained L1 minimization approach for estimating multiple Sparse Gaussian or Nonparanormal Graphical Models |
Authors | Beilun Wang, Ritambhara Singh, Yanjun Qi |
Abstract | Identifying context-specific entity networks from aggregated data is an important task, arising often in bioinformatics and neuroimaging. Computationally, this task can be formulated as jointly estimating multiple different, but related, sparse Undirected Graphical Models (UGM) from aggregated samples across several contexts. Previous joint-UGM studies have mostly focused on sparse Gaussian Graphical Models (sGGMs) and can’t identify context-specific edge patterns directly. We, therefore, propose a novel approach, SIMULE (detecting Shared and Individual parts of MULtiple graphs Explicitly) to learn multi-UGM via a constrained L1 minimization. SIMULE automatically infers both specific edge patterns that are unique to each context and shared interactions preserved among all the contexts. Through the L1 constrained formulation, this problem is cast as multiple independent subtasks of linear programming that can be solved efficiently in parallel. In addition to Gaussian data, SIMULE can also handle multivariate Nonparanormal data that greatly relaxes the normality assumption that many real-world applications do not follow. We provide a novel theoretical proof showing that SIMULE achieves a consistent result at the rate O(log(Kp)/n_{tot}). On multiple synthetic datasets and two biomedical datasets, SIMULE shows significant improvement over state-of-the-art multi-sGGM and single-UGM baselines. |
Tasks | |
Published | 2016-05-11 |
URL | http://arxiv.org/abs/1605.03468v6 |
http://arxiv.org/pdf/1605.03468v6.pdf | |
PWC | https://paperswithcode.com/paper/a-constrained-l1-minimization-approach-for |
Repo | https://github.com/QData/SIMULE |
Framework | none |
Transfer Learning for Low-Resource Neural Machine Translation
Title | Transfer Learning for Low-Resource Neural Machine Translation |
Authors | Barret Zoph, Deniz Yuret, Jonathan May, Kevin Knight |
Abstract | The encoder-decoder framework for neural machine translation (NMT) has been shown effective in large data scenarios, but is much less effective for low-resource languages. We present a transfer learning method that significantly improves Bleu scores across a range of low-resource languages. Our key idea is to first train a high-resource language pair (the parent model), then transfer some of the learned parameters to the low-resource pair (the child model) to initialize and constrain training. Using our transfer learning method we improve baseline NMT models by an average of 5.6 Bleu on four low-resource language pairs. Ensembling and unknown word replacement add another 2 Bleu which brings the NMT performance on low-resource machine translation close to a strong syntax based machine translation (SBMT) system, exceeding its performance on one language pair. Additionally, using the transfer learning model for re-scoring, we can improve the SBMT system by an average of 1.3 Bleu, improving the state-of-the-art on low-resource machine translation. |
Tasks | Low-Resource Neural Machine Translation, Machine Translation, Transfer Learning |
Published | 2016-04-08 |
URL | http://arxiv.org/abs/1604.02201v1 |
http://arxiv.org/pdf/1604.02201v1.pdf | |
PWC | https://paperswithcode.com/paper/transfer-learning-for-low-resource-neural |
Repo | https://github.com/isi-nlp/Zoph_RNN |
Framework | none |
CNN Architectures for Large-Scale Audio Classification
Title | CNN Architectures for Large-Scale Audio Classification |
Authors | Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, Kevin Wilson |
Abstract | Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task. |
Tasks | Audio Classification |
Published | 2016-09-29 |
URL | http://arxiv.org/abs/1609.09430v2 |
http://arxiv.org/pdf/1609.09430v2.pdf | |
PWC | https://paperswithcode.com/paper/cnn-architectures-for-large-scale-audio |
Repo | https://github.com/deephdc/audio-classification-tf |
Framework | tf |
Learning Local Descriptors by Optimizing the Keypoint-Correspondence Criterion: Applications to Face Matching, Learning from Unlabeled Videos and 3D-Shape Retrieval
Title | Learning Local Descriptors by Optimizing the Keypoint-Correspondence Criterion: Applications to Face Matching, Learning from Unlabeled Videos and 3D-Shape Retrieval |
Authors | Nenad Markuš, Igor S. Pandžić, Jörgen Ahlberg |
Abstract | Current best local descriptors are learned on a large dataset of matching and non-matching keypoint pairs. However, data of this kind is not always available since detailed keypoint correspondences can be hard to establish. On the other hand, we can often obtain labels for pairs of keypoint bags. For example, keypoint bags extracted from two images of the same object under different views form a matching pair, and keypoint bags extracted from images of different objects form a non-matching pair. On average, matching pairs should contain more corresponding keypoints than non-matching pairs. We describe an end-to-end differentiable architecture that enables the learning of local keypoint descriptors from such weakly-labeled data. Additionally, we discuss how to improve the method by incorporating the procedure of mining hard negatives. We also show how can our approach be used to learn convolutional features from unlabeled video signals and 3D models. Our implementation is available at https://github.com/nenadmarkus/wlrn |
Tasks | 3D Shape Retrieval |
Published | 2016-03-30 |
URL | https://arxiv.org/abs/1603.09095v6 |
https://arxiv.org/pdf/1603.09095v6.pdf | |
PWC | https://paperswithcode.com/paper/learning-local-descriptors-by-optimizing-the |
Repo | https://github.com/nenadmarkus/wlrn |
Framework | pytorch |
Recognizing Surgical Activities with Recurrent Neural Networks
Title | Recognizing Surgical Activities with Recurrent Neural Networks |
Authors | Robert DiPietro, Colin Lea, Anand Malpani, Narges Ahmidi, S. Swaroop Vedula, Gyusung I. Lee, Mija R. Lee, Gregory D. Hager |
Abstract | We apply recurrent neural networks to the task of recognizing surgical activities from robot kinematics. Prior work in this area focuses on recognizing short, low-level activities, or gestures, and has been based on variants of hidden Markov models and conditional random fields. In contrast, we work on recognizing both gestures and longer, higher-level activites, or maneuvers, and we model the mapping from kinematics to gestures/maneuvers with recurrent neural networks. To our knowledge, we are the first to apply recurrent neural networks to this task. Using a single model and a single set of hyperparameters, we match state-of-the-art performance for gesture recognition and advance state-of-the-art performance for maneuver recognition, in terms of both accuracy and edit distance. Code is available at https://github.com/rdipietro/miccai-2016-surgical-activity-rec . |
Tasks | Gesture Recognition |
Published | 2016-06-20 |
URL | http://arxiv.org/abs/1606.06329v2 |
http://arxiv.org/pdf/1606.06329v2.pdf | |
PWC | https://paperswithcode.com/paper/recognizing-surgical-activities-with |
Repo | https://github.com/rdipietro/miccai-2016-surgical-activity-rec |
Framework | tf |