February 2, 2020

2980 words 14 mins read

Paper Group AWR 61

Paper Group AWR 61

Kernels of Mallows Models under the Hamming Distance for solving the Quadratic Assignment Problem. A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation. CUP: Cluster Pruning for Compressing Deep Neural Networks. Tracing Antisemitic Language Through Diachronic Embedding Projections: France 1789-1914. Image Super-R …

Kernels of Mallows Models under the Hamming Distance for solving the Quadratic Assignment Problem

Title Kernels of Mallows Models under the Hamming Distance for solving the Quadratic Assignment Problem
Authors Etor Arza, Aritz Perez, Ekhine Irurozki, Josu Ceberio
Abstract The Quadratic Assignment Problem (QAP) is a well-known permutation-based combinatorial optimization problem with real applications in industrial and logistics environments. Motivated by the challenge that this NP-hard problem represents, it has captured the attention of the evolutionary computation community for decades. As a result, a large number of algorithms have been proposed to optimize this algorithm. Among these, exact methods are only able to solve instances of size $n<40$, and thus, many heuristic and metaheuristic methods have been applied to the QAP. In this work, we follow this direction by approaching the QAP through Estimation of Distribution Algorithms (EDAs). Particularly, a non-parametric distance-based exponential probabilistic model is used. Based on the analysis of the characteristics of the QAP, and previous work in the area, we introduce Kernels of Mallows Model under the Hamming distance to the context of EDAs. Conducted experiments point out that the performance of the proposed algorithm in the QAP is superior to (i) the classical EDAs adapted to deal with the QAP, and also (ii) to the specific EDAs proposed in the literature to deal with permutation problems.
Tasks Combinatorial Optimization
Published 2019-10-19
URL https://arxiv.org/abs/1910.08800v1
PDF https://arxiv.org/pdf/1910.08800v1.pdf
PWC https://paperswithcode.com/paper/kernels-of-mallows-models-under-the-hamming
Repo https://github.com/EtorArza/SupplementaryKMMHamming
Framework none

A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation

Title A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation
Authors Runzhe Yang, Xingyuan Sun, Karthik Narasimhan
Abstract We introduce a new algorithm for multi-objective reinforcement learning (MORL) with linear preferences, with the goal of enabling few-shot adaptation to new tasks. In MORL, the aim is to learn policies over multiple competing objectives whose relative importance (preferences) is unknown to the agent. While this alleviates dependence on scalar reward design, the expected return of a policy can change significantly with varying preferences, making it challenging to learn a single model to produce optimal policies under different preference conditions. We propose a generalized version of the Bellman equation to learn a single parametric representation for optimal policies over the space of all possible preferences. After an initial learning phase, our agent can execute the optimal policy under any given preference, or automatically infer an underlying preference with very few samples. Experiments across four different domains demonstrate the effectiveness of our approach.
Tasks
Published 2019-08-21
URL https://arxiv.org/abs/1908.08342v2
PDF https://arxiv.org/pdf/1908.08342v2.pdf
PWC https://paperswithcode.com/paper/a-generalized-algorithm-for-multi-objective
Repo https://github.com/RunzheYang/MORL
Framework pytorch

CUP: Cluster Pruning for Compressing Deep Neural Networks

Title CUP: Cluster Pruning for Compressing Deep Neural Networks
Authors Rahul Duggal, Cao Xiao, Richard Vuduc, Jimeng Sun
Abstract We propose Cluster Pruning (CUP) for compressing and accelerating deep neural networks. Our approach prunes similar filters by clustering them based on features derived from both the incoming and outgoing weight connections. With CUP, we overcome two limitations of prior work-(1) non-uniform pruning: CUP can efficiently determine the ideal number of filters to prune in each layer of a neural network. This is in contrast to prior methods that either prune all layers uniformly or otherwise use resource-intensive methods such as manual sensitivity analysis or reinforcement learning to determine the ideal number. (2) Single-shot operation: We extend CUP to CUP-SS (for CUP single shot) whereby pruning is integrated into the initial training phase itself. This leads to large savings in training time compared to traditional pruning pipelines. Through extensive evaluation on multiple datasets (MNIST, CIFAR-10, and Imagenet) and models(VGG-16, Resnets-18/34/56) we show that CUP outperforms recent state of the art. Specifically, CUP-SS achieves 2.2x flops reduction for a Resnet-50 model trained on Imagenet while staying within 0.9% top-5 accuracy. It saves over 14 hours in training time with respect to the original Resnet-50. The code to reproduce results is available.
Tasks
Published 2019-11-19
URL https://arxiv.org/abs/1911.08630v1
PDF https://arxiv.org/pdf/1911.08630v1.pdf
PWC https://paperswithcode.com/paper/cup-cluster-pruning-for-compressing-deep
Repo https://github.com/duggalrahul/CUP_Public
Framework pytorch

Tracing Antisemitic Language Through Diachronic Embedding Projections: France 1789-1914

Title Tracing Antisemitic Language Through Diachronic Embedding Projections: France 1789-1914
Authors Rocco Tripodi, Massimo Warglien, Simon Levis Sullam, Deborah Paci
Abstract We investigate some aspects of the history of antisemitism in France, one of the cradles of modern antisemitism, using diachronic word embeddings. We constructed a large corpus of French books and periodicals issues that contain a keyword related to Jews and performed a diachronic word embedding over the 1789-1914 period. We studied the changes over time in the semantic spaces of 4 target words and performed embedding projections over 6 streams of antisemitic discourse. This allowed us to track the evolution of antisemitic bias in the religious, economic, socio-politic, racial, ethic and conspiratorial domains. Projections show a trend of growing antisemitism, especially in the years starting in the mid-80s and culminating in the Dreyfus affair. Our analysis also allows us to highlight the peculiar adverse bias towards Judaism in the broader context of other religions.
Tasks Word Embeddings
Published 2019-06-04
URL https://arxiv.org/abs/1906.01440v1
PDF https://arxiv.org/pdf/1906.01440v1.pdf
PWC https://paperswithcode.com/paper/tracing-antisemitic-language-through
Repo https://github.com/roccotrip/antisem
Framework none

Image Super-Resolution via Attention based Back Projection Networks

Title Image Super-Resolution via Attention based Back Projection Networks
Authors Zhi-Song Liu, Li-Wen Wang, Chu-Tak Li, Wan-Chi Siu, Yui-Lam Chan
Abstract Deep learning based image Super-Resolution (SR) has shown rapid development due to its ability of big data digestion. Generally, deeper and wider networks can extract richer feature maps and generate SR images with remarkable quality. However, the more complex network we have, the more time consumption is required for practical applications. It is important to have a simplified network for efficient image SR. In this paper, we propose an Attention based Back Projection Network (ABPN) for image super-resolution. Similar to some recent works, we believe that the back projection mechanism can be further developed for SR. Enhanced back projection blocks are suggested to iteratively update low- and high-resolution feature residues. Inspired by recent studies on attention models, we propose a Spatial Attention Block (SAB) to learn the cross-correlation across features at different layers. Based on the assumption that a good SR image should be close to the original LR image after down-sampling. We propose a Refined Back Projection Block (RBPB) for final reconstruction. Extensive experiments on some public and AIM2019 Image Super-Resolution Challenge datasets show that the proposed ABPN can provide state-of-the-art or even better performance in both quantitative and qualitative measurements.
Tasks Image Super-Resolution, Super-Resolution
Published 2019-10-10
URL https://arxiv.org/abs/1910.04476v1
PDF https://arxiv.org/pdf/1910.04476v1.pdf
PWC https://paperswithcode.com/paper/image-super-resolution-via-attention-based
Repo https://github.com/Holmes-Alan/ABPN
Framework pytorch

Unsupervised Multi-Task Feature Learning on Point Clouds

Title Unsupervised Multi-Task Feature Learning on Point Clouds
Authors Kaveh Hassani, Mike Haley
Abstract We introduce an unsupervised multi-task model to jointly learn point and shape features on point clouds. We define three unsupervised tasks including clustering, reconstruction, and self-supervised classification to train a multi-scale graph-based encoder. We evaluate our model on shape classification and segmentation benchmarks. The results suggest that it outperforms prior state-of-the-art unsupervised models: In the ModelNet40 classification task, it achieves an accuracy of 89.1% and in ShapeNet segmentation task, it achieves an mIoU of 68.2 and accuracy of 88.6%.
Tasks
Published 2019-10-18
URL https://arxiv.org/abs/1910.08207v1
PDF https://arxiv.org/pdf/1910.08207v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-multi-task-feature-learning-on
Repo https://github.com/AnTao97/UnsupervisedPointCloudReconstruction
Framework pytorch

An Adaptive View of Adversarial Robustness from Test-time Smoothing Defense

Title An Adaptive View of Adversarial Robustness from Test-time Smoothing Defense
Authors Chao Tang, Yifei Fan, Anthony Yezzi
Abstract The safety and robustness of learning-based decision-making systems are under threats from adversarial examples, as imperceptible perturbations can mislead neural networks to completely different outputs. In this paper, we present an adaptive view of the issue via evaluating various test-time smoothing defense against white-box untargeted adversarial examples. Through controlled experiments with pretrained ResNet-152 on ImageNet, we first illustrate the non-monotonic relation between adversarial attacks and smoothing defenses. Then at the dataset level, we observe large variance among samples and show that it is easy to inflate accuracy (even to 100%) or build large-scale (i.e., with size ~10^4) subsets on which a designated method outperforms others by a large margin. Finally at the sample level, as different adversarial examples require different degrees of defense, the potential advantages of iterative methods are also discussed. We hope this paper reveal useful behaviors of test-time defenses, which could help improve the evaluation process for adversarial robustness in the future.
Tasks Decision Making
Published 2019-11-26
URL https://arxiv.org/abs/1911.11881v1
PDF https://arxiv.org/pdf/1911.11881v1.pdf
PWC https://paperswithcode.com/paper/an-adaptive-view-of-adversarial-robustness
Repo https://github.com/mkt1412/testtime-smoothing-defense
Framework pytorch

Shaping Visual Representations with Language for Few-shot Classification

Title Shaping Visual Representations with Language for Few-shot Classification
Authors Jesse Mu, Percy Liang, Noah Goodman
Abstract Language is designed to convey useful information about the world, thus serving as a scaffold for efficient human learning. How can we let language guide representation learning in machine learning models? We explore this question in the setting of few-shot visual classification, proposing models which learn to perform visual classification while jointly predicting natural language task descriptions at train time. At test time, with no language available, we find that these language-influenced visual representations are more generalizable, compared to meta-learning baselines and approaches that explicitly use language as a bottleneck for classification.
Tasks Meta-Learning, Representation Learning
Published 2019-11-06
URL https://arxiv.org/abs/1911.02683v1
PDF https://arxiv.org/pdf/1911.02683v1.pdf
PWC https://paperswithcode.com/paper/shaping-visual-representations-with-language
Repo https://github.com/jayelm/lsl
Framework none

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis

Title EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis
Authors Chaoqi Wang, Roger Grosse, Sanja Fidler, Guodong Zhang
Abstract Reducing the test time resource requirements of a neural network while preserving test accuracy is crucial for running inference on resource-constrained devices. To achieve this goal, we introduce a novel network reparameterization based on the Kronecker-factored eigenbasis (KFE), and then apply Hessian-based structured pruning methods in this basis. As opposed to existing Hessian-based pruning algorithms which do pruning in parameter coordinates, our method works in the KFE where different weights are approximately independent, enabling accurate pruning and fast computation. We demonstrate empirically the effectiveness of the proposed method through extensive experiments. In particular, we highlight that the improvements are especially significant for more challenging datasets and networks. With negligible loss of accuracy, an iterative-pruning version gives a 10$\times$ reduction in model size and a 8$\times$ reduction in FLOPs on wide ResNet32.
Tasks Network Pruning
Published 2019-05-15
URL https://arxiv.org/abs/1905.05934v1
PDF https://arxiv.org/pdf/1905.05934v1.pdf
PWC https://paperswithcode.com/paper/eigendamage-structured-pruning-in-the
Repo https://github.com/alecwangcq/EigenDamage-Pytorch
Framework pytorch

Learning Data Manipulation for Augmentation and Weighting

Title Learning Data Manipulation for Augmentation and Weighting
Authors Zhiting Hu, Bowen Tan, Ruslan Salakhutdinov, Tom Mitchell, Eric P. Xing
Abstract Manipulating data, such as weighting data examples or augmenting with new instances, has been increasingly used to improve model training. Previous work has studied various rule- or learning-based approaches designed for specific types of data manipulation. In this work, we propose a new method that supports learning different manipulation schemes with the same gradient-based algorithm. Our approach builds upon a recent connection of supervised learning and reinforcement learning (RL), and adapts an off-the-shelf reward learning algorithm from RL for joint data manipulation learning and model training. Different parameterization of the “data reward” function instantiates different manipulation schemes. We showcase data augmentation that learns a text transformation network, and data weighting that dynamically adapts the data sample importance. Experiments show the resulting algorithms significantly improve the image and text classification performance in low data regime and class-imbalance problems.
Tasks Data Augmentation, Text Classification
Published 2019-10-28
URL https://arxiv.org/abs/1910.12795v1
PDF https://arxiv.org/pdf/1910.12795v1.pdf
PWC https://paperswithcode.com/paper/learning-data-manipulation-for-augmentation
Repo https://github.com/tanyuqian/learning-data-manipulation
Framework pytorch

Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs

Title Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs
Authors Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, Ziyue Huang, Qipeng Guo, Hao Zhang, Haibin Lin, Junbo Zhao, Jinyang Li, Alexander Smola, Zheng Zhang
Abstract Accelerating research in the emerging field of deep graph learning requires new tools. Such systems should support graph as the core abstraction and take care to maintain both forward (i.e. supporting new research ideas) and backward (i.e. integration with existing components) compatibility. In this paper, we present Deep Graph Library (DGL). DGL enables arbitrary message handling and mutation operators, flexible propagation rules, and is framework agnostic so as to leverage high-performance tensor, autograd operations, and other feature extraction modules already available in existing frameworks. DGL carefully handles the sparse and irregular graph structure, deals with graphs big and small which may change dynamically, fuses operations, and performs auto-batching, all to take advantages of modern hardware. DGL has been tested on a variety of models, including but not limited to the popular Graph Neural Networks (GNN) and its variants, with promising speed, memory footprint and scalability.
Tasks Node Classification
Published 2019-09-03
URL https://arxiv.org/abs/1909.01315v1
PDF https://arxiv.org/pdf/1909.01315v1.pdf
PWC https://paperswithcode.com/paper/deep-graph-library-towards-efficient-and
Repo https://github.com/dmlc/dgl
Framework pytorch

Improve Model Generalization and Robustness to Dataset Bias with Bias-regularized Learning and Domain-guided Augmentation

Title Improve Model Generalization and Robustness to Dataset Bias with Bias-regularized Learning and Domain-guided Augmentation
Authors Yundong Zhang, Hang Wu, Huiye Liu, Li Tong, May D Wang
Abstract Deep Learning has thrived on the emergence of biomedical big data. However, medical datasets acquired at different institutions have inherent bias caused by various confounding factors such as operation policies, machine protocols, treatment preference and etc. As the result, models trained on one dataset, regardless of volume, cannot be confidently utilized for the others. In this study, we investigated model robustness to dataset bias using three large-scale Chest X-ray datasets: first, we assessed the dataset bias using vanilla training baseline; second, we proposed a novel multi-source domain generalization model by (a) designing a new bias-regularized loss function; and (b) synthesizing new data for domain augmentation. We showed that our model significantly outperformed the baseline and other approaches on data from unseen domain in terms of accuracy and various bias measures, without retraining or finetuning. Our method is generally applicable to other biomedical data, providing new algorithms for training models robust to bias for big data analysis and applications. Demo training code is publicly available.
Tasks Domain Generalization
Published 2019-10-12
URL https://arxiv.org/abs/1910.06745v3
PDF https://arxiv.org/pdf/1910.06745v3.pdf
PWC https://paperswithcode.com/paper/mitigating-the-effect-of-dataset-bias-on
Repo https://github.com/ydzhang12345/Domain-Generalization-by-Domain-guided-Multilayer-Cross-gradient-Training
Framework pytorch

BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs

Title BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs
Authors Valentin Bazarevsky, Yury Kartynnik, Andrey Vakunov, Karthik Raveendran, Matthias Grundmann
Abstract We present BlazeFace, a lightweight and well-performing face detector tailored for mobile GPU inference. It runs at a speed of 200-1000+ FPS on flagship devices. This super-realtime performance enables it to be applied to any augmented reality pipeline that requires an accurate facial region of interest as an input for task-specific models, such as 2D/3D facial keypoint or geometry estimation, facial features or expression classification, and face region segmentation. Our contributions include a lightweight feature extraction network inspired by, but distinct from MobileNetV1/V2, a GPU-friendly anchor scheme modified from Single Shot MultiBox Detector (SSD), and an improved tie resolution strategy alternative to non-maximum suppression.
Tasks Face Detection
Published 2019-07-11
URL https://arxiv.org/abs/1907.05047v2
PDF https://arxiv.org/pdf/1907.05047v2.pdf
PWC https://paperswithcode.com/paper/blazeface-sub-millisecond-neural-face
Repo https://github.com/gordinmitya/mobile-dnn
Framework tf

Efficient Graph Generation with Graph Recurrent Attention Networks

Title Efficient Graph Generation with Graph Recurrent Attention Networks
Authors Renjie Liao, Yujia Li, Yang Song, Shenlong Wang, Charlie Nash, William L. Hamilton, David Duvenaud, Raquel Urtasun, Richard S. Zemel
Abstract We propose a new family of efficient and expressive deep generative models of graphs, called Graph Recurrent Attention Networks (GRANs). Our model generates graphs one block of nodes and associated edges at a time. The block size and sampling stride allow us to trade off sample quality for efficiency. Compared to previous RNN-based graph generative models, our framework better captures the auto-regressive conditioning between the already-generated and to-be-generated parts of the graph using Graph Neural Networks (GNNs) with attention. This not only reduces the dependency on node ordering but also bypasses the long-term bottleneck caused by the sequential nature of RNNs. Moreover, we parameterize the output distribution per block using a mixture of Bernoulli, which captures the correlations among generated edges within the block. Finally, we propose to handle node orderings in generation by marginalizing over a family of canonical orderings. On standard benchmarks, we achieve state-of-the-art time efficiency and sample quality compared to previous models. Additionally, we show our model is capable of generating large graphs of up to 5K nodes with good quality. To the best of our knowledge, GRAN is the first deep graph generative model that can scale to this size. Our code is released at: https://github.com/lrjconan/GRAN.
Tasks Graph Generation
Published 2019-10-02
URL https://arxiv.org/abs/1910.00760v1
PDF https://arxiv.org/pdf/1910.00760v1.pdf
PWC https://paperswithcode.com/paper/efficient-graph-generation-with-graph
Repo https://github.com/lrjconan/GRAN
Framework pytorch

DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

Title DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation
Authors Shashank Rajput, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos
Abstract To improve the resilience of distributed training to worst-case, or Byzantine node failures, several recent approaches have replaced gradient averaging with robust aggregation methods. Such techniques can have high computational costs, often quadratic in the number of compute nodes, and only have limited robustness guarantees. Other methods have instead used redundancy to guarantee robustness, but can only tolerate limited number of Byzantine failures. In this work, we present DETOX, a Byzantine-resilient distributed training framework that combines algorithmic redundancy with robust aggregation. DETOX operates in two steps, a filtering step that uses limited redundancy to significantly reduce the effect of Byzantine nodes, and a hierarchical aggregation step that can be used in tandem with any state-of-the-art robust aggregation method. We show theoretically that this leads to a substantial increase in robustness, and has a per iteration runtime that can be nearly linear in the number of compute nodes. We provide extensive experiments over real distributed setups across a variety of large-scale machine learning tasks, showing that DETOX leads to orders of magnitude accuracy and speedup improvements over many state-of-the-art Byzantine-resilient approaches.
Tasks
Published 2019-07-29
URL https://arxiv.org/abs/1907.12205v2
PDF https://arxiv.org/pdf/1907.12205v2.pdf
PWC https://paperswithcode.com/paper/detox-a-redundancy-based-framework-for-faster
Repo https://github.com/hwang595/DETOX
Framework pytorch
comments powered by Disqus