Paper Group AWR 196
StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs. Inference, Learning and Attention Mechanisms that Exploit and Preserve Sparsity in Convolutional Networks. Deep Chain HDRI: Reconstructing a High Dynamic Range Image from a Single Low Dynamic Range Image. Describing a Knowledge Base. Realistic Evaluation of D …
StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs
Title | StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs |
Authors | Tianyun Zhang, Shaokai Ye, Kaiqi Zhang, Xiaolong Ma, Ning Liu, Linfeng Zhang, Jian Tang, Kaisheng Ma, Xue Lin, Makan Fardad, Yanzhi Wang |
Abstract | Weight pruning methods of DNNs have been demonstrated to achieve a good model pruning rate without loss of accuracy, thereby alleviating the significant computation/storage requirements of large-scale DNNs. Structured weight pruning methods have been proposed to overcome the limitation of irregular network structure and demonstrated actual GPU acceleration. However, in prior work the pruning rate (degree of sparsity) and GPU acceleration are limited (to less than 50%) when accuracy needs to be maintained. In this work,we overcome these limitations by proposing a unified, systematic framework of structured weight pruning for DNNs. It is a framework that can be used to induce different types of structured sparsity, such as filter-wise, channel-wise, and shape-wise sparsity, as well non-structured sparsity. The proposed framework incorporates stochastic gradient descent with ADMM, and can be understood as a dynamic regularization method in which the regularization target is analytically updated in each iteration. Without loss of accuracy on the AlexNet model, we achieve 2.58X and 3.65X average measured speedup on two GPUs, clearly outperforming the prior work. The average speedups reach 3.15X and 8.52X when allowing a moderate ac-curacy loss of 2%. In this case the model compression for convolutional layers is 15.0X, corresponding to 11.93X measured CPU speedup. Our experiments on ResNet model and on other data sets like UCF101 and CIFAR-10 demonstrate the consistently higher performance of our framework. |
Tasks | Model Compression |
Published | 2018-07-29 |
URL | http://arxiv.org/abs/1807.11091v3 |
http://arxiv.org/pdf/1807.11091v3.pdf | |
PWC | https://paperswithcode.com/paper/adam-admm-a-unified-systematic-framework-of |
Repo | https://github.com/KaiqiZhang/ADAM-ADMM |
Framework | none |
Inference, Learning and Attention Mechanisms that Exploit and Preserve Sparsity in Convolutional Networks
Title | Inference, Learning and Attention Mechanisms that Exploit and Preserve Sparsity in Convolutional Networks |
Authors | Timo Hackel, Mikhail Usvyatsov, Silvano Galliani, Jan D. Wegner, Konrad Schindler |
Abstract | While CNNs naturally lend themselves to densely sampled data, and sophisticated implementations are available, they lack the ability to efficiently process sparse data. In this work we introduce a suite of tools that exploit sparsity in both the feature maps and the filter weights, and thereby allow for significantly lower memory footprints and computation times than the conventional dense framework when processing data with a high degree of sparsity. Our scheme provides (i) an efficient GPU implementation of a convolution layer based on direct, sparse convolution; (ii) a filter step within the convolution layer, which we call attention, that prevents fill-in, i.e., the tendency of convolution to rapidly decrease sparsity, and guarantees an upper bound on the computational resources; and (iii) an adaptation of the back-propagation algorithm, which makes it possible to combine our approach with standard learning frameworks, while still exploiting sparsity in the data and the model. |
Tasks | |
Published | 2018-01-31 |
URL | https://arxiv.org/abs/1801.10585v3 |
https://arxiv.org/pdf/1801.10585v3.pdf | |
PWC | https://paperswithcode.com/paper/inference-learning-and-attention-mechanisms |
Repo | https://github.com/TimoHackel/ILA-SCNN |
Framework | tf |
Deep Chain HDRI: Reconstructing a High Dynamic Range Image from a Single Low Dynamic Range Image
Title | Deep Chain HDRI: Reconstructing a High Dynamic Range Image from a Single Low Dynamic Range Image |
Authors | Siyeong Lee, Gwon Hwan An, Suk-Ju Kang |
Abstract | In this paper, we propose a novel deep neural network model that reconstructs a high dynamic range (HDR) image from a single low dynamic range (LDR) image. The proposed model is based on a convolutional neural network composed of dilated convolutional layers, and infers LDR images with various exposures and illumination from a single LDR image of the same scene. Then, the final HDR image can be formed by merging these inference results. It is relatively easy for the proposed method to find the mapping between the LDR and an HDR with a different bit depth because of the chaining structure inferring the relationship between the LDR images with brighter (or darker) exposures from a given LDR image. The method not only extends the range, but also has the advantage of restoring the light information of the actual physical world. For the HDR images obtained by the proposed method, the HDR-VDP2 Q score, which is the most popular evaluation metric for HDR images, was 56.36 for a display with a 1920$\times$1200 resolution, which is an improvement of 6 compared with the scores of conventional algorithms. In addition, when comparing the peak signal-to-noise ratio values for tone mapped HDR images generated by the proposed and conventional algorithms, the average value obtained by the proposed algorithm is 30.86 dB, which is 10 dB higher than those obtained by the conventional algorithms. |
Tasks | |
Published | 2018-01-19 |
URL | http://arxiv.org/abs/1801.06277v1 |
http://arxiv.org/pdf/1801.06277v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-chain-hdri-reconstructing-a-high-dynamic |
Repo | https://github.com/vinthony/awesome-deep-hdr |
Framework | none |
Describing a Knowledge Base
Title | Describing a Knowledge Base |
Authors | Qingyun Wang, Xiaoman Pan, Lifu Huang, Boliang Zhang, Zhiying Jiang, Heng Ji, Kevin Knight |
Abstract | We aim to automatically generate natural language descriptions about an input structured knowledge base (KB). We build our generation framework based on a pointer network which can copy facts from the input KB, and add two attention mechanisms: (i) slot-aware attention to capture the association between a slot type and its corresponding slot value; and (ii) a new \emph{table position self-attention} to capture the inter-dependencies among related slots. For evaluation, besides standard metrics including BLEU, METEOR, and ROUGE, we propose a KB reconstruction based metric by extracting a KB from the generation output and comparing it with the input KB. We also create a new data set which includes 106,216 pairs of structured KBs and their corresponding natural language descriptions for two distinct entity types. Experiments show that our approach significantly outperforms state-of-the-art methods. The reconstructed KB achieves 68.8% - 72.6% F-score. |
Tasks | Data-to-Text Generation, KB-to-Language Generation, Table-to-Text Generation, Text Generation |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.01797v2 |
http://arxiv.org/pdf/1809.01797v2.pdf | |
PWC | https://paperswithcode.com/paper/describing-a-knowledge-base |
Repo | https://github.com/EagleW/Describing_a_Knowledge_Base |
Framework | pytorch |
Realistic Evaluation of Deep Semi-Supervised Learning Algorithms
Title | Realistic Evaluation of Deep Semi-Supervised Learning Algorithms |
Authors | Avital Oliver, Augustus Odena, Colin Raffel, Ekin D. Cubuk, Ian J. Goodfellow |
Abstract | Semi-supervised learning (SSL) provides a powerful framework for leveraging unlabeled data when labels are limited or expensive to obtain. SSL algorithms based on deep neural networks have recently proven successful on standard benchmark tasks. However, we argue that these benchmarks fail to address many issues that these algorithms would face in real-world applications. After creating a unified reimplementation of various widely-used SSL techniques, we test them in a suite of experiments designed to address these issues. We find that the performance of simple baselines which do not use unlabeled data is often underreported, that SSL methods differ in sensitivity to the amount of labeled and unlabeled data, and that performance can degrade substantially when the unlabeled dataset contains out-of-class examples. To help guide SSL research towards real-world applicability, we make our unified reimplemention and evaluation platform publicly available. |
Tasks | |
Published | 2018-04-24 |
URL | https://arxiv.org/abs/1804.09170v4 |
https://arxiv.org/pdf/1804.09170v4.pdf | |
PWC | https://paperswithcode.com/paper/realistic-evaluation-of-deep-semi-supervised |
Repo | https://github.com/brain-research/realistic-ssl-evaluation |
Framework | tf |
Ordinal Depth Supervision for 3D Human Pose Estimation
Title | Ordinal Depth Supervision for 3D Human Pose Estimation |
Authors | Georgios Pavlakos, Xiaowei Zhou, Kostas Daniilidis |
Abstract | Our ability to train end-to-end systems for 3D human pose estimation from single images is currently constrained by the limited availability of 3D annotations for natural images. Most datasets are captured using Motion Capture (MoCap) systems in a studio setting and it is difficult to reach the variability of 2D human pose datasets, like MPII or LSP. To alleviate the need for accurate 3D ground truth, we propose to use a weaker supervision signal provided by the ordinal depths of human joints. This information can be acquired by human annotators for a wide range of images and poses. We showcase the effectiveness and flexibility of training Convolutional Networks (ConvNets) with these ordinal relations in different settings, always achieving competitive performance with ConvNets trained with accurate 3D joint coordinates. Additionally, to demonstrate the potential of the approach, we augment the popular LSP and MPII datasets with ordinal depth annotations. This extension allows us to present quantitative and qualitative evaluation in non-studio conditions. Simultaneously, these ordinal annotations can be easily incorporated in the training procedure of typical ConvNets for 3D human pose. Through this inclusion we achieve new state-of-the-art performance for the relevant benchmarks and validate the effectiveness of ordinal depth supervision for 3D human pose. |
Tasks | 3D Human Pose Estimation, Motion Capture, Pose Estimation |
Published | 2018-05-10 |
URL | http://arxiv.org/abs/1805.04095v1 |
http://arxiv.org/pdf/1805.04095v1.pdf | |
PWC | https://paperswithcode.com/paper/ordinal-depth-supervision-for-3d-human-pose |
Repo | https://github.com/geopavlakos/ordinal-pose3d |
Framework | pytorch |
Reinforcement Learning and Deep Learning based Lateral Control for Autonomous Driving
Title | Reinforcement Learning and Deep Learning based Lateral Control for Autonomous Driving |
Authors | Dong Li, Dongbin Zhao, Qichao Zhang, Yaran Chen |
Abstract | This paper investigates the vision-based autonomous driving with deep learning and reinforcement learning methods. Different from the end-to-end learning method, our method breaks the vision-based lateral control system down into a perception module and a control module. The perception module which is based on a multi-task learning neural network first takes a driver-view image as its input and predicts the track features. The control module which is based on reinforcement learning then makes a control decision based on these features. In order to improve the data efficiency, we propose visual TORCS (VTORCS), a deep reinforcement learning environment which is based on the open racing car simulator (TORCS). By means of the provided functions, one can train an agent with the input of an image or various physical sensor measurement, or evaluate the perception algorithm on this simulator. The trained reinforcement learning controller outperforms the linear quadratic regulator (LQR) controller and model predictive control (MPC) controller on different tracks. The experiments demonstrate that the perception module shows promising performance and the controller is capable of controlling the vehicle drive well along the track center with visual input. |
Tasks | Autonomous Driving, Multi-Task Learning |
Published | 2018-10-30 |
URL | http://arxiv.org/abs/1810.12778v1 |
http://arxiv.org/pdf/1810.12778v1.pdf | |
PWC | https://paperswithcode.com/paper/reinforcement-learning-and-deep-learning |
Repo | https://github.com/hbzhang/AwesomeSelfDriving |
Framework | none |
MAP inference via Block-Coordinate Frank-Wolfe Algorithm
Title | MAP inference via Block-Coordinate Frank-Wolfe Algorithm |
Authors | Paul Swoboda, Vladimir Kolmogorov |
Abstract | We present a new proximal bundle method for Maximum-A-Posteriori (MAP) inference in structured energy minimization problems. The method optimizes a Lagrangean relaxation of the original energy minimization problem using a multi plane block-coordinate Frank-Wolfe method that takes advantage of the specific structure of the Lagrangean decomposition. We show empirically that our method outperforms state-of-the-art Lagrangean decomposition based algorithms on some challenging Markov Random Field, multi-label discrete tomography and graph matching problems. |
Tasks | Graph Matching |
Published | 2018-06-13 |
URL | http://arxiv.org/abs/1806.05049v2 |
http://arxiv.org/pdf/1806.05049v2.pdf | |
PWC | https://paperswithcode.com/paper/map-inference-via-block-coordinate-frank |
Repo | https://github.com/LPMP/LPMP |
Framework | pytorch |
TractSeg - Fast and accurate white matter tract segmentation
Title | TractSeg - Fast and accurate white matter tract segmentation |
Authors | Jakob Wasserthal, Peter Neher, Klaus H. Maier-Hein |
Abstract | The individual course of white matter fiber tracts is an important key for analysis of white matter characteristics in healthy and diseased brains. Uniquely, diffusion-weighted MRI tractography in combination with region-based or clustering-based selection of streamlines allows for the in-vivo delineation and analysis of anatomically well known tracts. This, however, currently requires complex, computationally intensive and tedious-to-set-up processing pipelines. TractSeg is a novel convolutional neural network-based approach that directly segments tracts in the field of fiber orientation distribution function (fODF) peaks without requiring tractography, image registration or parcellation. We demonstrate in 105 subjects from the Human Connectome Project that the proposed approach is much faster than existing methods while providing unprecedented accuracy. The code and data are openly available at https://github.com/MIC-DKFZ/TractSeg/ and https://doi.org/10.5281/zenodo.1088277, respectively. |
Tasks | Image Registration |
Published | 2018-05-18 |
URL | http://arxiv.org/abs/1805.07103v2 |
http://arxiv.org/pdf/1805.07103v2.pdf | |
PWC | https://paperswithcode.com/paper/tractseg-fast-and-accurate-white-matter-tract |
Repo | https://github.com/MIC-DKFZ/TractSeg |
Framework | pytorch |
The CodRep Machine Learning on Source Code Competition
Title | The CodRep Machine Learning on Source Code Competition |
Authors | Zimin Chen, Martin Monperrus |
Abstract | CodRep is a machine learning competition on source code data. It is carefully designed so that anybody can enter the competition, whether professional researchers, students or independent scholars, without specific knowledge in machine learning or program analysis. In particular, it aims at being a common playground on which the machine learning and the software engineering research communities can interact. The competition has started on April 14th 2018 and has ended on October 14th 2018. The CodRep data is hosted at https://github.com/KTH/CodRep-competition/. |
Tasks | |
Published | 2018-07-06 |
URL | http://arxiv.org/abs/1807.03200v2 |
http://arxiv.org/pdf/1807.03200v2.pdf | |
PWC | https://paperswithcode.com/paper/the-codrep-machine-learning-on-source-code |
Repo | https://github.com/KTH/CodRep-competition |
Framework | none |
Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation
Title | Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation |
Authors | Dane Corneil, Wulfram Gerstner, Johanni Brea |
Abstract | Modern reinforcement learning algorithms reach super-human performance on many board and video games, but they are sample inefficient, i.e. they typically require significantly more playing experience than humans to reach an equal performance level. To improve sample efficiency, an agent may build a model of the environment and use planning methods to update its policy. In this article we introduce Variational State Tabulation (VaST), which maps an environment with a high-dimensional state space (e.g. the space of visual inputs) to an abstract tabular model. Prioritized sweeping with small backups, a highly efficient planning method, can then be used to update state-action values. We show how VaST can rapidly learn to maximize reward in tasks like 3D navigation and efficiently adapt to sudden changes in rewards or transition probabilities. |
Tasks | |
Published | 2018-02-12 |
URL | http://arxiv.org/abs/1802.04325v2 |
http://arxiv.org/pdf/1802.04325v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-model-based-deep-reinforcement |
Repo | https://github.com/danecor/VaST |
Framework | tf |
Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation
Title | Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation |
Authors | Tiancheng Zhao, Kyusong Lee, Maxine Eskenazi |
Abstract | The encoder-decoder dialog model is one of the most prominent methods used to build dialog systems in complex domains. Yet it is limited because it cannot output interpretable actions as in traditional systems, which hinders humans from understanding its generation process. We present an unsupervised discrete sentence representation learning method that can integrate with any existing encoder-decoder dialog models for interpretable response generation. Building upon variational autoencoders (VAEs), we present two novel models, DI-VAE and DI-VST that improve VAEs and can discover interpretable semantics via either auto encoding or context predicting. Our methods have been validated on real-world dialog datasets to discover semantic representations and enhance encoder-decoder models with interpretable generation. |
Tasks | Dialogue Generation, Dialogue Interpretation, Representation Learning, Text Generation |
Published | 2018-04-22 |
URL | http://arxiv.org/abs/1804.08069v1 |
http://arxiv.org/pdf/1804.08069v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-discrete-sentence-representation |
Repo | https://github.com/snakeztc/NeuralDialog-LAED |
Framework | pytorch |
Zero-Shot Dialog Generation with Cross-Domain Latent Actions
Title | Zero-Shot Dialog Generation with Cross-Domain Latent Actions |
Authors | Tiancheng Zhao, Maxine Eskenazi |
Abstract | This paper introduces zero-shot dialog generation (ZSDG), as a step towards neural dialog systems that can instantly generalize to new situations with minimal data. ZSDG enables an end-to-end generative dialog system to generalize to a new domain for which only a domain description is provided and no training dialogs are available. Then a novel learning framework, Action Matching, is proposed. This algorithm can learn a cross-domain embedding space that models the semantics of dialog responses which, in turn, lets a neural dialog generation model generalize to new domains. We evaluate our methods on a new synthetic dialog dataset, and an existing human-human dialog dataset. Results show that our method has superior performance in learning dialog models that rapidly adapt their behavior to new domains and suggests promising future research. |
Tasks | Dialogue Generation, Goal-Oriented Dialog, Text Generation |
Published | 2018-05-13 |
URL | http://arxiv.org/abs/1805.04803v1 |
http://arxiv.org/pdf/1805.04803v1.pdf | |
PWC | https://paperswithcode.com/paper/zero-shot-dialog-generation-with-cross-domain |
Repo | https://github.com/snakeztc/NeuralDialog-ZSDG |
Framework | pytorch |
Adversarial Reprogramming of Neural Networks
Title | Adversarial Reprogramming of Neural Networks |
Authors | Gamaleldin F. Elsayed, Ian Goodfellow, Jascha Sohl-Dickstein |
Abstract | Deep neural networks are susceptible to \emph{adversarial} attacks. In computer vision, well-crafted perturbations to images can cause neural networks to make mistakes such as confusing a cat with a computer. Previous adversarial attacks have been designed to degrade performance of models or cause machine learning models to produce specific outputs chosen ahead of time by the attacker. We introduce attacks that instead {\em reprogram} the target model to perform a task chosen by the attacker—without the attacker needing to specify or compute the desired output for each test-time input. This attack finds a single adversarial perturbation, that can be added to all test-time inputs to a machine learning model in order to cause the model to perform a task chosen by the adversary—even if the model was not trained to do this task. These perturbations can thus be considered a program for the new task. We demonstrate adversarial reprogramming on six ImageNet classification models, repurposing these models to perform a counting task, as well as classification tasks: classification of MNIST and CIFAR-10 examples presented as inputs to the ImageNet model. |
Tasks | |
Published | 2018-06-28 |
URL | http://arxiv.org/abs/1806.11146v2 |
http://arxiv.org/pdf/1806.11146v2.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-reprogramming-of-neural-networks |
Repo | https://github.com/lizhuorong/Adversarial-Reprogramming-tensorflow |
Framework | tf |
A Curriculum Domain Adaptation Approach to the Semantic Segmentation of Urban Scenes
Title | A Curriculum Domain Adaptation Approach to the Semantic Segmentation of Urban Scenes |
Authors | Yang Zhang, Philip David, Hassan Foroosh, Boqing Gong |
Abstract | During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is one of the core tasks in many applications such as autonomous driving and augmented reality. However, to train CNNs requires a considerable amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNNs on photo-realistic synthetic imagery with computer-generated annotations. Despite this, the domain mismatch between the real images and the synthetic data hinders the models’ performance. Hence, we propose a curriculum-style learning approach to minimizing the domain gap in urban scene semantic segmentation. The curriculum domain adaptation solves easy tasks first to infer necessary properties about the target domain; in particular, the first task is to learn global label distributions over images and local distributions over landmark superpixels. These are easy to estimate because images of urban scenes have strong idiosyncrasies (e.g., the size and spatial relations of buildings, streets, cars, etc.). We then train a segmentation network, while regularizing its predictions in the target domain to follow those inferred properties. In experiments, our method outperforms the baselines on two datasets and two backbone networks. We also report extensive ablation studies about our approach. |
Tasks | Autonomous Driving, Domain Adaptation, Image-to-Image Translation, Semantic Segmentation, Synthetic-to-Real Translation |
Published | 2018-12-24 |
URL | http://arxiv.org/abs/1812.09953v3 |
http://arxiv.org/pdf/1812.09953v3.pdf | |
PWC | https://paperswithcode.com/paper/a-curriculum-domain-adaptation-approach-to |
Repo | https://github.com/YangZhang4065/AdaptationSeg |
Framework | tf |