Paper Group AWR 319
ConEx: Efficient Exploration of Big-Data System Configurations for Better Performance. Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM. Learning Convolutional Transforms for Lossy Point Cloud Geometry Compression. Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inferen …
ConEx: Efficient Exploration of Big-Data System Configurations for Better Performance
Title | ConEx: Efficient Exploration of Big-Data System Configurations for Better Performance |
Authors | Rahul Krishna, Chong Tang, Kevin Sullivan, Baishakhi Ray |
Abstract | Configuration space complexity makes the big-data software systems hard to configure well. Consider Hadoop, with over nine hundred parameters, developers often just use the default configurations provided with Hadoop distributions. The opportunity costs in lost performance are significant. Popular learning-based approaches to auto-tune software does not scale well for big-data systems because of the high cost of collecting training data. We present a new method based on a combination of Evolutionary Markov Chain Monte Carlo (EMCMC) sampling and cost reduction techniques to cost-effectively find better-performing configurations for big data systems. For cost reduction, we developed and experimentally tested and validated two approaches: using scaled-up big data jobs as proxies for the objective function for larger jobs and using a dynamic job similarity measure to infer that results obtained for one kind of big data problem will work well for similar problems. Our experimental results suggest that our approach promises to significantly improve the performance of big data systems and that it outperforms competing approaches based on random sampling, basic genetic algorithms (GA), and predictive model learning. Our experimental results support the conclusion that our approach has strongly demonstrated potential to significantly and cost-effectively improve the performance of big data systems. |
Tasks | Efficient Exploration |
Published | 2019-10-17 |
URL | https://arxiv.org/abs/1910.09644v1 |
https://arxiv.org/pdf/1910.09644v1.pdf | |
PWC | https://paperswithcode.com/paper/conex-efficient-exploration-of-big-data |
Repo | https://github.com/ARiSE-Lab/ConEX__Replication_Package |
Framework | none |
Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM
Title | Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM |
Authors | Shaokai Ye, Xiaoyu Feng, Tianyun Zhang, Xiaolong Ma, Sheng Lin, Zhengang Li, Kaidi Xu, Wujie Wen, Sijia Liu, Jian Tang, Makan Fardad, Xue Lin, Yongpan Liu, Yanzhi Wang |
Abstract | Weight pruning and weight quantization are two important categories of DNN model compression. Prior work on these techniques are mainly based on heuristics. A recent work developed a systematic frame-work of DNN weight pruning using the advanced optimization technique ADMM (Alternating Direction Methods of Multipliers), achieving one of state-of-art in weight pruning results. In this work, we first extend such one-shot ADMM-based framework to guarantee solution feasibility and provide fast convergence rate, and generalize to weight quantization as well. We have further developed a multi-step, progressive DNN weight pruning and quantization framework, with dual benefits of (i) achieving further weight pruning/quantization thanks to the special property of ADMM regularization, and (ii) reducing the search space within each step. Extensive experimental results demonstrate the superior performance compared with prior work. Some highlights: (i) we achieve 246x,36x, and 8x weight pruning on LeNet-5, AlexNet, and ResNet-50 models, respectively, with (almost) zero accuracy loss; (ii) even a significant 61x weight pruning in AlexNet (ImageNet) results in only minor degradation in actual accuracy compared with prior work; (iii) we are among the first to derive notable weight pruning results for ResNet and MobileNet models; (iv) we derive the first lossless, fully binarized (for all layers) LeNet-5 for MNIST and VGG-16 for CIFAR-10; and (v) we derive the first fully binarized (for all layers) ResNet for ImageNet with reasonable accuracy loss. |
Tasks | Model Compression, Quantization |
Published | 2019-03-23 |
URL | http://arxiv.org/abs/1903.09769v2 |
http://arxiv.org/pdf/1903.09769v2.pdf | |
PWC | https://paperswithcode.com/paper/progressive-dnn-compression-a-key-to-achieve |
Repo | https://github.com/yeshaokai/Robustness-Aware-Pruning-ADMM |
Framework | pytorch |
Learning Convolutional Transforms for Lossy Point Cloud Geometry Compression
Title | Learning Convolutional Transforms for Lossy Point Cloud Geometry Compression |
Authors | Maurice Quach, Giuseppe Valenzise, Frederic Dufaux |
Abstract | Efficient point cloud compression is fundamental to enable the deployment of virtual and mixed reality applications, since the number of points to code can range in the order of millions. In this paper, we present a novel data-driven geometry compression method for static point clouds based on learned convolutional transforms and uniform quantization. We perform joint optimization of both rate and distortion using a trade-off parameter. In addition, we cast the decoding process as a binary classification of the point cloud occupancy map. Our method outperforms the MPEG reference solution in terms of rate-distortion on the Microsoft Voxelized Upper Bodies dataset with 51.5% BDBR savings on average. Moreover, while octree-based methods face exponential diminution of the number of points at low bitrates, our method still produces high resolution outputs even at low bitrates. Code and supplementary material are available at https://github.com/mauriceqch/pcc_geo_cnn . |
Tasks | Quantization |
Published | 2019-03-20 |
URL | https://arxiv.org/abs/1903.08548v2 |
https://arxiv.org/pdf/1903.08548v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-convolutional-transforms-for-lossy |
Repo | https://github.com/mauriceqch/pcc_geo_cnn |
Framework | tf |
Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks
Title | Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks |
Authors | Sambhav R. Jain, Albert Gural, Michael Wu, Chris H. Dick |
Abstract | We propose a method of training quantization thresholds (TQT) for uniform symmetric quantizers using standard backpropagation and gradient descent. Contrary to prior work, we show that a careful analysis of the straight-through estimator for threshold gradients allows for a natural range-precision trade-off leading to better optima. Our quantizers are constrained to use power-of-2 scale-factors and per-tensor scaling of weights and activations to make it amenable for hardware implementations. We present analytical support for the general robustness of our methods and empirically validate them on various CNNs for ImageNet classification. We are able to achieve near-floating-point accuracy on traditionally difficult networks such as MobileNets with less than 5 epochs of quantized (8-bit) retraining. Finally, we present Graffitist, a framework that enables automatic quantization of TensorFlow graphs for TQT (available at https://github.com/Xilinx/graffitist ). |
Tasks | Quantization |
Published | 2019-03-19 |
URL | https://arxiv.org/abs/1903.08066v3 |
https://arxiv.org/pdf/1903.08066v3.pdf | |
PWC | https://paperswithcode.com/paper/trained-uniform-quantization-for-accurate-and |
Repo | https://github.com/Xilinx/graffitist |
Framework | tf |
Recon-GLGAN: A Global-Local context based Generative Adversarial Network for MRI Reconstruction
Title | Recon-GLGAN: A Global-Local context based Generative Adversarial Network for MRI Reconstruction |
Authors | Balamurali Murugesan, Vijaya Raghavan S, Kaushik Sarveswaran, Keerthi Ram, Mohanasankar Sivaprakasam |
Abstract | Magnetic resonance imaging (MRI) is one of the best medical imaging modalities as it offers excellent spatial resolution and soft-tissue contrast. But, the usage of MRI is limited by its slow acquisition time, which makes it expensive and causes patient discomfort. In order to accelerate the acquisition, multiple deep learning networks have been proposed. Recently, Generative Adversarial Networks (GANs) have shown promising results in MRI reconstruction. The drawback with the proposed GAN based methods is it does not incorporate the prior information about the end goal which could help in better reconstruction. For instance, in the case of cardiac MRI, the physician would be interested in the heart region which is of diagnostic relevance while excluding the peripheral regions. In this work, we show that incorporating prior information about a region of interest in the model would offer better performance. Thereby, we propose a novel GAN based architecture, Reconstruction Global-Local GAN (Recon-GLGAN) for MRI reconstruction. The proposed model contains a generator and a context discriminator which incorporates global and local contextual information from images. Our model offers significant performance improvement over the baseline models. Our experiments show that the concept of a context discriminator can be extended to existing GAN based reconstruction models to offer better performance. We also demonstrate that the reconstructions from the proposed method give segmentation results similar to fully sampled images. |
Tasks | |
Published | 2019-08-25 |
URL | https://arxiv.org/abs/1908.09262v1 |
https://arxiv.org/pdf/1908.09262v1.pdf | |
PWC | https://paperswithcode.com/paper/recon-glgan-a-global-local-context-based |
Repo | https://github.com/Bala93/Recon-GLGAN |
Framework | pytorch |
Deep Log-Likelihood Ratio Quantization
Title | Deep Log-Likelihood Ratio Quantization |
Authors | Marius Arvinte, Ahmed H. Tewfik, Sriram Vishwanath |
Abstract | In this work, a deep learning-based method for log-likelihood ratio (LLR) lossy compression and quantization is proposed, with emphasis on a single-input single-output uncorrelated fading communication setting. A deep autoencoder network is trained to compress, quantize and reconstruct the bit log-likelihood ratios corresponding to a single transmitted symbol. Specifically, the encoder maps to a latent space with dimension equal to the number of sufficient statistics required to recover the inputs - equal to three in this case - while the decoder aims to reconstruct a noisy version of the latent representation with the purpose of modeling quantization effects in a differentiable way. Simulation results show that, when applied to a standard rate-1/2 low-density parity-check (LDPC) code, a finite precision compression factor of nearly three times is achieved when storing an entire codeword, with an incurred loss of performance lower than 0.1 dB compared to straightforward scalar quantization of the log-likelihood ratios. |
Tasks | Quantization |
Published | 2019-03-11 |
URL | https://arxiv.org/abs/1903.04656v2 |
https://arxiv.org/pdf/1903.04656v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-log-likelihood-ratio-quantization |
Repo | https://github.com/mariusarvinte/deep-llr-quantization |
Framework | tf |
Order Matters: Shuffling Sequence Generation for Video Prediction
Title | Order Matters: Shuffling Sequence Generation for Video Prediction |
Authors | Junyan Wang, Bingzhang Hu, Yang Long, Yu Guan |
Abstract | Predicting future frames in natural video sequences is a new challenge that is receiving increasing attention in the computer vision community. However, existing models suffer from severe loss of temporal information when the predicted sequence is long. Compared to previous methods focusing on generating more realistic contents, this paper extensively studies the importance of sequential order information for video generation. A novel Shuffling sEquence gEneration network (SEE-Net) is proposed that can learn to discriminate unnatural sequential orders by shuffling the video frames and comparing them to the real video sequence. Systematic experiments on three datasets with both synthetic and real-world videos manifest the effectiveness of shuffling sequence generation for video prediction in our proposed model and demonstrate state-of-the-art performance by both qualitative and quantitative evaluations. The source code is available at https://github.com/andrewjywang/SEENet. |
Tasks | Video Generation, Video Prediction |
Published | 2019-07-20 |
URL | https://arxiv.org/abs/1907.08845v1 |
https://arxiv.org/pdf/1907.08845v1.pdf | |
PWC | https://paperswithcode.com/paper/order-matters-shuffling-sequence-generation |
Repo | https://github.com/andrewjywang/SEENet |
Framework | tf |
Video Generation from Single Semantic Label Map
Title | Video Generation from Single Semantic Label Map |
Authors | Junting Pan, Chengyu Wang, Xu Jia, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang |
Abstract | This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process. Different from typical end-to-end approaches, which model both scene content and dynamics in a single step, we propose to decompose this difficult task into two sub-problems. As current image generation methods do better than video generation in terms of detail, we synthesize high quality content by only generating the first frame. Then we animate the scene based on its semantic meaning to obtain the temporally coherent video, giving us excellent results overall. We employ a cVAE for predicting optical flow as a beneficial intermediate step to generate a video sequence conditioned on the initial single frame. A semantic label map is integrated into the flow prediction module to achieve major improvements in the image-to-video generation process. Extensive experiments on the Cityscapes dataset show that our method outperforms all competing methods. |
Tasks | Image Generation, Optical Flow Estimation, Video Generation |
Published | 2019-03-11 |
URL | http://arxiv.org/abs/1903.04480v1 |
http://arxiv.org/pdf/1903.04480v1.pdf | |
PWC | https://paperswithcode.com/paper/video-generation-from-single-semantic-label |
Repo | https://github.com/junting/seg2vid |
Framework | pytorch |
Neural reparameterization improves structural optimization
Title | Neural reparameterization improves structural optimization |
Authors | Stephan Hoyer, Jascha Sohl-Dickstein, Sam Greydanus |
Abstract | Structural optimization is a popular method for designing objects such as bridge trusses, airplane wings, and optical devices. Unfortunately, the quality of solutions depends heavily on how the problem is parameterized. In this paper, we propose using the implicit bias over functions induced by neural networks to improve the parameterization of structural optimization. Rather than directly optimizing densities on a grid, we instead optimize the parameters of a neural network which outputs those densities. This reparameterization leads to different and often better solutions. On a selection of 116 structural optimization tasks, our approach produces the best design 50% more often than the best baseline method. |
Tasks | |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04240v2 |
https://arxiv.org/pdf/1909.04240v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-reparameterization-improves-structural |
Repo | https://github.com/google-research/neural-structural-optimization |
Framework | tf |
Reversible GANs for Memory-efficient Image-to-Image Translation
Title | Reversible GANs for Memory-efficient Image-to-Image Translation |
Authors | Tycho F. A. van der Ouderaa, Daniel E. Worrall |
Abstract | The Pix2pix and CycleGAN losses have vastly improved the qualitative and quantitative visual quality of results in image-to-image translation tasks. We extend this framework by exploring approximately invertible architectures which are well suited to these losses. These architectures are approximately invertible by design and thus partially satisfy cycle-consistency before training even begins. Furthermore, since invertible architectures have constant memory complexity in depth, these models can be built arbitrarily deep. We are able to demonstrate superior quantitative output on the Cityscapes and Maps datasets at near constant memory budget. |
Tasks | Image-to-Image Translation |
Published | 2019-02-07 |
URL | http://arxiv.org/abs/1902.02729v1 |
http://arxiv.org/pdf/1902.02729v1.pdf | |
PWC | https://paperswithcode.com/paper/reversible-gans-for-memory-efficient-image-to |
Repo | https://github.com/silvandeleemput/memcnn |
Framework | pytorch |
Pairwise Learning to Rank by Neural Networks Revisited: Reconstruction, Theoretical Analysis and Practical Performance
Title | Pairwise Learning to Rank by Neural Networks Revisited: Reconstruction, Theoretical Analysis and Practical Performance |
Authors | Marius Köppel, Alexander Segner, Martin Wagener, Lukas Pensel, Andreas Karwath, Stefan Kramer |
Abstract | We present a pairwise learning to rank approach based on a neural net, called DirectRanker, that generalizes the RankNet architecture. We show mathematically that our model is reflexive, antisymmetric, and transitive allowing for simplified training and improved performance. Experimental results on the LETOR MSLR-WEB10K, MQ2007 and MQ2008 datasets show that our model outperforms numerous state-of-the-art methods, while being inherently simpler in structure and using a pairwise approach only. |
Tasks | Learning-To-Rank |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.02768v1 |
https://arxiv.org/pdf/1909.02768v1.pdf | |
PWC | https://paperswithcode.com/paper/pairwise-learning-to-rank-by-neural-networks |
Repo | https://github.com/kramerlab/direct-ranker |
Framework | tf |
Deep Modular Co-Attention Networks for Visual Question Answering
Title | Deep Modular Co-Attention Networks for Visual Question Answering |
Authors | Zhou Yu, Jun Yu, Yuhao Cui, Dacheng Tao, Qi Tian |
Abstract | Visual Question Answering (VQA) requires a fine-grained and simultaneous understanding of both the visual content of images and the textual content of questions. Therefore, designing an effective `co-attention’ model to associate key words in questions with key objects in images is central to VQA performance. So far, most successful attempts at co-attention learning have been achieved by using shallow models, and deep co-attention models show little improvement over their shallow counterparts. In this paper, we propose a deep Modular Co-Attention Network (MCAN) that consists of Modular Co-Attention (MCA) layers cascaded in depth. Each MCA layer models the self-attention of questions and images, as well as the guided-attention of images jointly using a modular composition of two basic attention units. We quantitatively and qualitatively evaluate MCAN on the benchmark VQA-v2 dataset and conduct extensive ablation studies to explore the reasons behind MCAN’s effectiveness. Experimental results demonstrate that MCAN significantly outperforms the previous state-of-the-art. Our best single model delivers 70.63$%$ overall accuracy on the test-dev set. Code is available at https://github.com/MILVLG/mcan-vqa. | |
Tasks | Question Answering, Visual Question Answering |
Published | 2019-06-25 |
URL | https://arxiv.org/abs/1906.10770v1 |
https://arxiv.org/pdf/1906.10770v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-modular-co-attention-networks-for-visual-1 |
Repo | https://github.com/MILVLG/mcan-vqa |
Framework | pytorch |
ModelicaGym: Applying Reinforcement Learning to Modelica Models
Title | ModelicaGym: Applying Reinforcement Learning to Modelica Models |
Authors | Oleh Lukianykhin, Tetiana Bogodorova |
Abstract | This paper presents ModelicaGym toolbox that was developed to employ Reinforcement Learning (RL) for solving optimization and control tasks in Modelica models. The developed tool allows connecting models using Functional Mock-up Interface (FMI) toOpenAI Gym toolkit in order to exploit Modelica equation-based modelling and co-simulation together with RL algorithms as a functionality of the tools correspondingly. Thus, ModelicaGym facilitates fast and convenient development of RL algorithms and their comparison when solving optimal control problem for Modelicadynamic models. Inheritance structure ofModelicaGymtoolbox’s classes and the implemented methods are discussed in details. The toolbox functionality validation is performed on Cart-Pole balancing problem. This includes physical system model description and its integration using the toolbox, experiments on selection and influence of the model parameters (i.e. force magnitude, Cart-pole mass ratio, reward ratio, and simulation time step) on the learning process of Q-learning algorithm supported with the discussion of the simulation results. |
Tasks | Q-Learning |
Published | 2019-09-18 |
URL | https://arxiv.org/abs/1909.08604v1 |
https://arxiv.org/pdf/1909.08604v1.pdf | |
PWC | https://paperswithcode.com/paper/modelicagym-applying-reinforcement-learning |
Repo | https://github.com/ucuapps/modelicagym |
Framework | none |
On Exploring Undetermined Relationships for Visual Relationship Detection
Title | On Exploring Undetermined Relationships for Visual Relationship Detection |
Authors | Yibing Zhan, Jun Yu, Ting Yu, Dacheng Tao |
Abstract | In visual relationship detection, human-notated relationships can be regarded as determinate relationships. However, there are still large amount of unlabeled data, such as object pairs with less significant relationships or even with no relationships. We refer to these unlabeled but potentially useful data as undetermined relationships. Although a vast body of literature exists, few methods exploit these undetermined relationships for visual relationship detection. In this paper, we explore the beneficial effect of undetermined relationships on visual relationship detection. We propose a novel multi-modal feature based undetermined relationship learning network (MF-URLN) and achieve great improvements in relationship detection. In detail, our MF-URLN automatically generates undetermined relationships by comparing object pairs with human-notated data according to a designed criterion. Then, the MF-URLN extracts and fuses features of object pairs from three complementary modals: visual, spatial, and linguistic modals. Further, the MF-URLN proposes two correlated subnetworks: one subnetwork decides the determinate confidence, and the other predicts the relationships. We evaluate the MF-URLN on two datasets: the Visual Relationship Detection (VRD) and the Visual Genome (VG) datasets. The experimental results compared with state-of-the-art methods verify the significant improvements made by the undetermined relationships, e.g., the top-50 relation detection recall improves from 19.5% to 23.9% on the VRD dataset. |
Tasks | |
Published | 2019-05-05 |
URL | https://arxiv.org/abs/1905.01595v1 |
https://arxiv.org/pdf/1905.01595v1.pdf | |
PWC | https://paperswithcode.com/paper/on-exploring-undetermined-relationships-for |
Repo | https://github.com/pranoyr/visual-relationship-detection |
Framework | pytorch |
A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation from a Single Depth Image
Title | A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation from a Single Depth Image |
Authors | Fu Xiong, Boshen Zhang, Yang Xiao, Zhiguo Cao, Taidong Yu, Joey Tianyi Zhou, Junsong Yuan |
Abstract | For 3D hand and body pose estimation task in depth image, a novel anchor-based approach termed Anchor-to-Joint regression network (A2J) with the end-to-end learning ability is proposed. Within A2J, anchor points able to capture global-local spatial context information are densely set on depth image as local regressors for the joints. They contribute to predict the positions of the joints in ensemble way to enhance generalization ability. The proposed 3D articulated pose estimation paradigm is different from the state-of-the-art encoder-decoder based FCN, 3D CNN and point-set based manners. To discover informative anchor points towards certain joint, anchor proposal procedure is also proposed for A2J. Meanwhile 2D CNN (i.e., ResNet-50) is used as backbone network to drive A2J, without using time-consuming 3D convolutional or deconvolutional layers. The experiments on 3 hand datasets and 2 body datasets verify A2J’s superiority. Meanwhile, A2J is of high running speed around 100 FPS on single NVIDIA 1080Ti GPU. |
Tasks | Hand Pose Estimation, Pose Estimation |
Published | 2019-08-27 |
URL | https://arxiv.org/abs/1908.09999v1 |
https://arxiv.org/pdf/1908.09999v1.pdf | |
PWC | https://paperswithcode.com/paper/a2j-anchor-to-joint-regression-network-for-3d |
Repo | https://github.com/zhangboshen/A2J |
Framework | none |