October 20, 2019

3036 words 15 mins read

Paper Group AWR 196

StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs. Inference, Learning and Attention Mechanisms that Exploit and Preserve Sparsity in Convolutional Networks. Deep Chain HDRI: Reconstructing a High Dynamic Range Image from a Single Low Dynamic Range Image. Describing a Knowledge Base. Realistic Evaluation of D …

StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs


Title	StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs
Authors	Tianyun Zhang, Shaokai Ye, Kaiqi Zhang, Xiaolong Ma, Ning Liu, Linfeng Zhang, Jian Tang, Kaisheng Ma, Xue Lin, Makan Fardad, Yanzhi Wang
Abstract	Weight pruning methods of DNNs have been demonstrated to achieve a good model pruning rate without loss of accuracy, thereby alleviating the significant computation/storage requirements of large-scale DNNs. Structured weight pruning methods have been proposed to overcome the limitation of irregular network structure and demonstrated actual GPU acceleration. However, in prior work the pruning rate (degree of sparsity) and GPU acceleration are limited (to less than 50%) when accuracy needs to be maintained. In this work,we overcome these limitations by proposing a unified, systematic framework of structured weight pruning for DNNs. It is a framework that can be used to induce different types of structured sparsity, such as filter-wise, channel-wise, and shape-wise sparsity, as well non-structured sparsity. The proposed framework incorporates stochastic gradient descent with ADMM, and can be understood as a dynamic regularization method in which the regularization target is analytically updated in each iteration. Without loss of accuracy on the AlexNet model, we achieve 2.58X and 3.65X average measured speedup on two GPUs, clearly outperforming the prior work. The average speedups reach 3.15X and 8.52X when allowing a moderate ac-curacy loss of 2%. In this case the model compression for convolutional layers is 15.0X, corresponding to 11.93X measured CPU speedup. Our experiments on ResNet model and on other data sets like UCF101 and CIFAR-10 demonstrate the consistently higher performance of our framework.
Tasks	Model Compression
Published	2018-07-29
URL	http://arxiv.org/abs/1807.11091v3
PDF	http://arxiv.org/pdf/1807.11091v3.pdf
PWC	https://paperswithcode.com/paper/adam-admm-a-unified-systematic-framework-of
Repo	https://github.com/KaiqiZhang/ADAM-ADMM
Framework	none

Inference, Learning and Attention Mechanisms that Exploit and Preserve Sparsity in Convolutional Networks


Title	Inference, Learning and Attention Mechanisms that Exploit and Preserve Sparsity in Convolutional Networks
Authors	Timo Hackel, Mikhail Usvyatsov, Silvano Galliani, Jan D. Wegner, Konrad Schindler
Abstract	While CNNs naturally lend themselves to densely sampled data, and sophisticated implementations are available, they lack the ability to efficiently process sparse data. In this work we introduce a suite of tools that exploit sparsity in both the feature maps and the filter weights, and thereby allow for significantly lower memory footprints and computation times than the conventional dense framework when processing data with a high degree of sparsity. Our scheme provides (i) an efficient GPU implementation of a convolution layer based on direct, sparse convolution; (ii) a filter step within the convolution layer, which we call attention, that prevents fill-in, i.e., the tendency of convolution to rapidly decrease sparsity, and guarantees an upper bound on the computational resources; and (iii) an adaptation of the back-propagation algorithm, which makes it possible to combine our approach with standard learning frameworks, while still exploiting sparsity in the data and the model.
Tasks
Published	2018-01-31
URL	https://arxiv.org/abs/1801.10585v3
PDF	https://arxiv.org/pdf/1801.10585v3.pdf
PWC	https://paperswithcode.com/paper/inference-learning-and-attention-mechanisms
Repo	https://github.com/TimoHackel/ILA-SCNN
Framework	tf

Deep Chain HDRI: Reconstructing a High Dynamic Range Image from a Single Low Dynamic Range Image


Title	Deep Chain HDRI: Reconstructing a High Dynamic Range Image from a Single Low Dynamic Range Image
Authors	Siyeong Lee, Gwon Hwan An, Suk-Ju Kang
Abstract	In this paper, we propose a novel deep neural network model that reconstructs a high dynamic range (HDR) image from a single low dynamic range (LDR) image. The proposed model is based on a convolutional neural network composed of dilated convolutional layers, and infers LDR images with various exposures and illumination from a single LDR image of the same scene. Then, the final HDR image can be formed by merging these inference results. It is relatively easy for the proposed method to find the mapping between the LDR and an HDR with a different bit depth because of the chaining structure inferring the relationship between the LDR images with brighter (or darker) exposures from a given LDR image. The method not only extends the range, but also has the advantage of restoring the light information of the actual physical world. For the HDR images obtained by the proposed method, the HDR-VDP2 Q score, which is the most popular evaluation metric for HDR images, was 56.36 for a display with a 1920$\times$1200 resolution, which is an improvement of 6 compared with the scores of conventional algorithms. In addition, when comparing the peak signal-to-noise ratio values for tone mapped HDR images generated by the proposed and conventional algorithms, the average value obtained by the proposed algorithm is 30.86 dB, which is 10 dB higher than those obtained by the conventional algorithms.
Tasks
Published	2018-01-19
URL	http://arxiv.org/abs/1801.06277v1
PDF	http://arxiv.org/pdf/1801.06277v1.pdf
PWC	https://paperswithcode.com/paper/deep-chain-hdri-reconstructing-a-high-dynamic
Repo	https://github.com/vinthony/awesome-deep-hdr
Framework	none

Describing a Knowledge Base


Title	Describing a Knowledge Base
Authors	Qingyun Wang, Xiaoman Pan, Lifu Huang, Boliang Zhang, Zhiying Jiang, Heng Ji, Kevin Knight
Abstract	We aim to automatically generate natural language descriptions about an input structured knowledge base (KB). We build our generation framework based on a pointer network which can copy facts from the input KB, and add two attention mechanisms: (i) slot-aware attention to capture the association between a slot type and its corresponding slot value; and (ii) a new \emph{table position self-attention} to capture the inter-dependencies among related slots. For evaluation, besides standard metrics including BLEU, METEOR, and ROUGE, we propose a KB reconstruction based metric by extracting a KB from the generation output and comparing it with the input KB. We also create a new data set which includes 106,216 pairs of structured KBs and their corresponding natural language descriptions for two distinct entity types. Experiments show that our approach significantly outperforms state-of-the-art methods. The reconstructed KB achieves 68.8% - 72.6% F-score.
Tasks	Data-to-Text Generation, KB-to-Language Generation, Table-to-Text Generation, Text Generation
Published	2018-09-06
URL	http://arxiv.org/abs/1809.01797v2
PDF	http://arxiv.org/pdf/1809.01797v2.pdf
PWC	https://paperswithcode.com/paper/describing-a-knowledge-base
Repo	https://github.com/EagleW/Describing_a_Knowledge_Base
Framework	pytorch

Realistic Evaluation of Deep Semi-Supervised Learning Algorithms


Title	Realistic Evaluation of Deep Semi-Supervised Learning Algorithms
Authors	Avital Oliver, Augustus Odena, Colin Raffel, Ekin D. Cubuk, Ian J. Goodfellow
Abstract	Semi-supervised learning (SSL) provides a powerful framework for leveraging unlabeled data when labels are limited or expensive to obtain. SSL algorithms based on deep neural networks have recently proven successful on standard benchmark tasks. However, we argue that these benchmarks fail to address many issues that these algorithms would face in real-world applications. After creating a unified reimplementation of various widely-used SSL techniques, we test them in a suite of experiments designed to address these issues. We find that the performance of simple baselines which do not use unlabeled data is often underreported, that SSL methods differ in sensitivity to the amount of labeled and unlabeled data, and that performance can degrade substantially when the unlabeled dataset contains out-of-class examples. To help guide SSL research towards real-world applicability, we make our unified reimplemention and evaluation platform publicly available.
Tasks
Published	2018-04-24
URL	https://arxiv.org/abs/1804.09170v4
PDF	https://arxiv.org/pdf/1804.09170v4.pdf
PWC	https://paperswithcode.com/paper/realistic-evaluation-of-deep-semi-supervised
Repo	https://github.com/brain-research/realistic-ssl-evaluation
Framework	tf

Ordinal Depth Supervision for 3D Human Pose Estimation


Title	Ordinal Depth Supervision for 3D Human Pose Estimation
Authors	Georgios Pavlakos, Xiaowei Zhou, Kostas Daniilidis
Abstract	Our ability to train end-to-end systems for 3D human pose estimation from single images is currently constrained by the limited availability of 3D annotations for natural images. Most datasets are captured using Motion Capture (MoCap) systems in a studio setting and it is difficult to reach the variability of 2D human pose datasets, like MPII or LSP. To alleviate the need for accurate 3D ground truth, we propose to use a weaker supervision signal provided by the ordinal depths of human joints. This information can be acquired by human annotators for a wide range of images and poses. We showcase the effectiveness and flexibility of training Convolutional Networks (ConvNets) with these ordinal relations in different settings, always achieving competitive performance with ConvNets trained with accurate 3D joint coordinates. Additionally, to demonstrate the potential of the approach, we augment the popular LSP and MPII datasets with ordinal depth annotations. This extension allows us to present quantitative and qualitative evaluation in non-studio conditions. Simultaneously, these ordinal annotations can be easily incorporated in the training procedure of typical ConvNets for 3D human pose. Through this inclusion we achieve new state-of-the-art performance for the relevant benchmarks and validate the effectiveness of ordinal depth supervision for 3D human pose.
Tasks	3D Human Pose Estimation, Motion Capture, Pose Estimation
Published	2018-05-10
URL	http://arxiv.org/abs/1805.04095v1
PDF	http://arxiv.org/pdf/1805.04095v1.pdf
PWC	https://paperswithcode.com/paper/ordinal-depth-supervision-for-3d-human-pose
Repo	https://github.com/geopavlakos/ordinal-pose3d
Framework	pytorch

Reinforcement Learning and Deep Learning based Lateral Control for Autonomous Driving


Title	Reinforcement Learning and Deep Learning based Lateral Control for Autonomous Driving
Authors	Dong Li, Dongbin Zhao, Qichao Zhang, Yaran Chen
Abstract	This paper investigates the vision-based autonomous driving with deep learning and reinforcement learning methods. Different from the end-to-end learning method, our method breaks the vision-based lateral control system down into a perception module and a control module. The perception module which is based on a multi-task learning neural network first takes a driver-view image as its input and predicts the track features. The control module which is based on reinforcement learning then makes a control decision based on these features. In order to improve the data efficiency, we propose visual TORCS (VTORCS), a deep reinforcement learning environment which is based on the open racing car simulator (TORCS). By means of the provided functions, one can train an agent with the input of an image or various physical sensor measurement, or evaluate the perception algorithm on this simulator. The trained reinforcement learning controller outperforms the linear quadratic regulator (LQR) controller and model predictive control (MPC) controller on different tracks. The experiments demonstrate that the perception module shows promising performance and the controller is capable of controlling the vehicle drive well along the track center with visual input.
Tasks	Autonomous Driving, Multi-Task Learning
Published	2018-10-30
URL	http://arxiv.org/abs/1810.12778v1
PDF	http://arxiv.org/pdf/1810.12778v1.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-and-deep-learning
Repo	https://github.com/hbzhang/AwesomeSelfDriving
Framework	none

MAP inference via Block-Coordinate Frank-Wolfe Algorithm


Title	MAP inference via Block-Coordinate Frank-Wolfe Algorithm
Authors	Paul Swoboda, Vladimir Kolmogorov
Abstract	We present a new proximal bundle method for Maximum-A-Posteriori (MAP) inference in structured energy minimization problems. The method optimizes a Lagrangean relaxation of the original energy minimization problem using a multi plane block-coordinate Frank-Wolfe method that takes advantage of the specific structure of the Lagrangean decomposition. We show empirically that our method outperforms state-of-the-art Lagrangean decomposition based algorithms on some challenging Markov Random Field, multi-label discrete tomography and graph matching problems.
Tasks	Graph Matching
Published	2018-06-13
URL	http://arxiv.org/abs/1806.05049v2
PDF	http://arxiv.org/pdf/1806.05049v2.pdf
PWC	https://paperswithcode.com/paper/map-inference-via-block-coordinate-frank
Repo	https://github.com/LPMP/LPMP
Framework	pytorch

TractSeg - Fast and accurate white matter tract segmentation


Title	TractSeg - Fast and accurate white matter tract segmentation
Authors	Jakob Wasserthal, Peter Neher, Klaus H. Maier-Hein
Abstract	The individual course of white matter fiber tracts is an important key for analysis of white matter characteristics in healthy and diseased brains. Uniquely, diffusion-weighted MRI tractography in combination with region-based or clustering-based selection of streamlines allows for the in-vivo delineation and analysis of anatomically well known tracts. This, however, currently requires complex, computationally intensive and tedious-to-set-up processing pipelines. TractSeg is a novel convolutional neural network-based approach that directly segments tracts in the field of fiber orientation distribution function (fODF) peaks without requiring tractography, image registration or parcellation. We demonstrate in 105 subjects from the Human Connectome Project that the proposed approach is much faster than existing methods while providing unprecedented accuracy. The code and data are openly available at https://github.com/MIC-DKFZ/TractSeg/ and https://doi.org/10.5281/zenodo.1088277, respectively.
Tasks	Image Registration
Published	2018-05-18
URL	http://arxiv.org/abs/1805.07103v2
PDF	http://arxiv.org/pdf/1805.07103v2.pdf
PWC	https://paperswithcode.com/paper/tractseg-fast-and-accurate-white-matter-tract
Repo	https://github.com/MIC-DKFZ/TractSeg
Framework	pytorch

The CodRep Machine Learning on Source Code Competition


Title	The CodRep Machine Learning on Source Code Competition
Authors	Zimin Chen, Martin Monperrus
Abstract	CodRep is a machine learning competition on source code data. It is carefully designed so that anybody can enter the competition, whether professional researchers, students or independent scholars, without specific knowledge in machine learning or program analysis. In particular, it aims at being a common playground on which the machine learning and the software engineering research communities can interact. The competition has started on April 14th 2018 and has ended on October 14th 2018. The CodRep data is hosted at https://github.com/KTH/CodRep-competition/.
Tasks
Published	2018-07-06
URL	http://arxiv.org/abs/1807.03200v2
PDF	http://arxiv.org/pdf/1807.03200v2.pdf
PWC	https://paperswithcode.com/paper/the-codrep-machine-learning-on-source-code
Repo	https://github.com/KTH/CodRep-competition
Framework	none

Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation


Title	Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation
Authors	Dane Corneil, Wulfram Gerstner, Johanni Brea
Abstract	Modern reinforcement learning algorithms reach super-human performance on many board and video games, but they are sample inefficient, i.e. they typically require significantly more playing experience than humans to reach an equal performance level. To improve sample efficiency, an agent may build a model of the environment and use planning methods to update its policy. In this article we introduce Variational State Tabulation (VaST), which maps an environment with a high-dimensional state space (e.g. the space of visual inputs) to an abstract tabular model. Prioritized sweeping with small backups, a highly efficient planning method, can then be used to update state-action values. We show how VaST can rapidly learn to maximize reward in tasks like 3D navigation and efficiently adapt to sudden changes in rewards or transition probabilities.
Tasks
Published	2018-02-12
URL	http://arxiv.org/abs/1802.04325v2
PDF	http://arxiv.org/pdf/1802.04325v2.pdf
PWC	https://paperswithcode.com/paper/efficient-model-based-deep-reinforcement
Repo	https://github.com/danecor/VaST
Framework	tf

Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation


Title	Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation
Authors	Tiancheng Zhao, Kyusong Lee, Maxine Eskenazi
Abstract	The encoder-decoder dialog model is one of the most prominent methods used to build dialog systems in complex domains. Yet it is limited because it cannot output interpretable actions as in traditional systems, which hinders humans from understanding its generation process. We present an unsupervised discrete sentence representation learning method that can integrate with any existing encoder-decoder dialog models for interpretable response generation. Building upon variational autoencoders (VAEs), we present two novel models, DI-VAE and DI-VST that improve VAEs and can discover interpretable semantics via either auto encoding or context predicting. Our methods have been validated on real-world dialog datasets to discover semantic representations and enhance encoder-decoder models with interpretable generation.
Tasks	Dialogue Generation, Dialogue Interpretation, Representation Learning, Text Generation
Published	2018-04-22
URL	http://arxiv.org/abs/1804.08069v1
PDF	http://arxiv.org/pdf/1804.08069v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-discrete-sentence-representation
Repo	https://github.com/snakeztc/NeuralDialog-LAED
Framework	pytorch

Zero-Shot Dialog Generation with Cross-Domain Latent Actions


Title	Zero-Shot Dialog Generation with Cross-Domain Latent Actions
Authors	Tiancheng Zhao, Maxine Eskenazi
Abstract	This paper introduces zero-shot dialog generation (ZSDG), as a step towards neural dialog systems that can instantly generalize to new situations with minimal data. ZSDG enables an end-to-end generative dialog system to generalize to a new domain for which only a domain description is provided and no training dialogs are available. Then a novel learning framework, Action Matching, is proposed. This algorithm can learn a cross-domain embedding space that models the semantics of dialog responses which, in turn, lets a neural dialog generation model generalize to new domains. We evaluate our methods on a new synthetic dialog dataset, and an existing human-human dialog dataset. Results show that our method has superior performance in learning dialog models that rapidly adapt their behavior to new domains and suggests promising future research.
Tasks	Dialogue Generation, Goal-Oriented Dialog, Text Generation
Published	2018-05-13
URL	http://arxiv.org/abs/1805.04803v1
PDF	http://arxiv.org/pdf/1805.04803v1.pdf
PWC	https://paperswithcode.com/paper/zero-shot-dialog-generation-with-cross-domain
Repo	https://github.com/snakeztc/NeuralDialog-ZSDG
Framework	pytorch

Adversarial Reprogramming of Neural Networks


Title	Adversarial Reprogramming of Neural Networks
Authors	Gamaleldin F. Elsayed, Ian Goodfellow, Jascha Sohl-Dickstein
Abstract	Deep neural networks are susceptible to \emph{adversarial} attacks. In computer vision, well-crafted perturbations to images can cause neural networks to make mistakes such as confusing a cat with a computer. Previous adversarial attacks have been designed to degrade performance of models or cause machine learning models to produce specific outputs chosen ahead of time by the attacker. We introduce attacks that instead {\em reprogram} the target model to perform a task chosen by the attacker—without the attacker needing to specify or compute the desired output for each test-time input. This attack finds a single adversarial perturbation, that can be added to all test-time inputs to a machine learning model in order to cause the model to perform a task chosen by the adversary—even if the model was not trained to do this task. These perturbations can thus be considered a program for the new task. We demonstrate adversarial reprogramming on six ImageNet classification models, repurposing these models to perform a counting task, as well as classification tasks: classification of MNIST and CIFAR-10 examples presented as inputs to the ImageNet model.
Tasks
Published	2018-06-28
URL	http://arxiv.org/abs/1806.11146v2
PDF	http://arxiv.org/pdf/1806.11146v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-reprogramming-of-neural-networks
Repo	https://github.com/lizhuorong/Adversarial-Reprogramming-tensorflow
Framework	tf

A Curriculum Domain Adaptation Approach to the Semantic Segmentation of Urban Scenes


Title	A Curriculum Domain Adaptation Approach to the Semantic Segmentation of Urban Scenes
Authors	Yang Zhang, Philip David, Hassan Foroosh, Boqing Gong
Abstract	During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is one of the core tasks in many applications such as autonomous driving and augmented reality. However, to train CNNs requires a considerable amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNNs on photo-realistic synthetic imagery with computer-generated annotations. Despite this, the domain mismatch between the real images and the synthetic data hinders the models’ performance. Hence, we propose a curriculum-style learning approach to minimizing the domain gap in urban scene semantic segmentation. The curriculum domain adaptation solves easy tasks first to infer necessary properties about the target domain; in particular, the first task is to learn global label distributions over images and local distributions over landmark superpixels. These are easy to estimate because images of urban scenes have strong idiosyncrasies (e.g., the size and spatial relations of buildings, streets, cars, etc.). We then train a segmentation network, while regularizing its predictions in the target domain to follow those inferred properties. In experiments, our method outperforms the baselines on two datasets and two backbone networks. We also report extensive ablation studies about our approach.
Tasks	Autonomous Driving, Domain Adaptation, Image-to-Image Translation, Semantic Segmentation, Synthetic-to-Real Translation
Published	2018-12-24
URL	http://arxiv.org/abs/1812.09953v3
PDF	http://arxiv.org/pdf/1812.09953v3.pdf
PWC	https://paperswithcode.com/paper/a-curriculum-domain-adaptation-approach-to
Repo	https://github.com/YangZhang4065/AdaptationSeg
Framework	tf