February 2, 2020

2980 words 14 mins read

Paper Group AWR 61

Kernels of Mallows Models under the Hamming Distance for solving the Quadratic Assignment Problem. A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation. CUP: Cluster Pruning for Compressing Deep Neural Networks. Tracing Antisemitic Language Through Diachronic Embedding Projections: France 1789-1914. Image Super-R …

Kernels of Mallows Models under the Hamming Distance for solving the Quadratic Assignment Problem


Title	Kernels of Mallows Models under the Hamming Distance for solving the Quadratic Assignment Problem
Authors	Etor Arza, Aritz Perez, Ekhine Irurozki, Josu Ceberio
Abstract	The Quadratic Assignment Problem (QAP) is a well-known permutation-based combinatorial optimization problem with real applications in industrial and logistics environments. Motivated by the challenge that this NP-hard problem represents, it has captured the attention of the evolutionary computation community for decades. As a result, a large number of algorithms have been proposed to optimize this algorithm. Among these, exact methods are only able to solve instances of size $n<40$, and thus, many heuristic and metaheuristic methods have been applied to the QAP. In this work, we follow this direction by approaching the QAP through Estimation of Distribution Algorithms (EDAs). Particularly, a non-parametric distance-based exponential probabilistic model is used. Based on the analysis of the characteristics of the QAP, and previous work in the area, we introduce Kernels of Mallows Model under the Hamming distance to the context of EDAs. Conducted experiments point out that the performance of the proposed algorithm in the QAP is superior to (i) the classical EDAs adapted to deal with the QAP, and also (ii) to the specific EDAs proposed in the literature to deal with permutation problems.
Tasks	Combinatorial Optimization
Published	2019-10-19
URL	https://arxiv.org/abs/1910.08800v1
PDF	https://arxiv.org/pdf/1910.08800v1.pdf
PWC	https://paperswithcode.com/paper/kernels-of-mallows-models-under-the-hamming
Repo	https://github.com/EtorArza/SupplementaryKMMHamming
Framework	none

A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation


Title	A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation
Authors	Runzhe Yang, Xingyuan Sun, Karthik Narasimhan
Abstract	We introduce a new algorithm for multi-objective reinforcement learning (MORL) with linear preferences, with the goal of enabling few-shot adaptation to new tasks. In MORL, the aim is to learn policies over multiple competing objectives whose relative importance (preferences) is unknown to the agent. While this alleviates dependence on scalar reward design, the expected return of a policy can change significantly with varying preferences, making it challenging to learn a single model to produce optimal policies under different preference conditions. We propose a generalized version of the Bellman equation to learn a single parametric representation for optimal policies over the space of all possible preferences. After an initial learning phase, our agent can execute the optimal policy under any given preference, or automatically infer an underlying preference with very few samples. Experiments across four different domains demonstrate the effectiveness of our approach.
Tasks
Published	2019-08-21
URL	https://arxiv.org/abs/1908.08342v2
PDF	https://arxiv.org/pdf/1908.08342v2.pdf
PWC	https://paperswithcode.com/paper/a-generalized-algorithm-for-multi-objective
Repo	https://github.com/RunzheYang/MORL
Framework	pytorch

CUP: Cluster Pruning for Compressing Deep Neural Networks


Title	CUP: Cluster Pruning for Compressing Deep Neural Networks
Authors	Rahul Duggal, Cao Xiao, Richard Vuduc, Jimeng Sun
Abstract	We propose Cluster Pruning (CUP) for compressing and accelerating deep neural networks. Our approach prunes similar filters by clustering them based on features derived from both the incoming and outgoing weight connections. With CUP, we overcome two limitations of prior work-(1) non-uniform pruning: CUP can efficiently determine the ideal number of filters to prune in each layer of a neural network. This is in contrast to prior methods that either prune all layers uniformly or otherwise use resource-intensive methods such as manual sensitivity analysis or reinforcement learning to determine the ideal number. (2) Single-shot operation: We extend CUP to CUP-SS (for CUP single shot) whereby pruning is integrated into the initial training phase itself. This leads to large savings in training time compared to traditional pruning pipelines. Through extensive evaluation on multiple datasets (MNIST, CIFAR-10, and Imagenet) and models(VGG-16, Resnets-18/34/56) we show that CUP outperforms recent state of the art. Specifically, CUP-SS achieves 2.2x flops reduction for a Resnet-50 model trained on Imagenet while staying within 0.9% top-5 accuracy. It saves over 14 hours in training time with respect to the original Resnet-50. The code to reproduce results is available.
Tasks
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08630v1
PDF	https://arxiv.org/pdf/1911.08630v1.pdf
PWC	https://paperswithcode.com/paper/cup-cluster-pruning-for-compressing-deep
Repo	https://github.com/duggalrahul/CUP_Public
Framework	pytorch

Tracing Antisemitic Language Through Diachronic Embedding Projections: France 1789-1914


Title	Tracing Antisemitic Language Through Diachronic Embedding Projections: France 1789-1914
Authors	Rocco Tripodi, Massimo Warglien, Simon Levis Sullam, Deborah Paci
Abstract	We investigate some aspects of the history of antisemitism in France, one of the cradles of modern antisemitism, using diachronic word embeddings. We constructed a large corpus of French books and periodicals issues that contain a keyword related to Jews and performed a diachronic word embedding over the 1789-1914 period. We studied the changes over time in the semantic spaces of 4 target words and performed embedding projections over 6 streams of antisemitic discourse. This allowed us to track the evolution of antisemitic bias in the religious, economic, socio-politic, racial, ethic and conspiratorial domains. Projections show a trend of growing antisemitism, especially in the years starting in the mid-80s and culminating in the Dreyfus affair. Our analysis also allows us to highlight the peculiar adverse bias towards Judaism in the broader context of other religions.
Tasks	Word Embeddings
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01440v1
PDF	https://arxiv.org/pdf/1906.01440v1.pdf
PWC	https://paperswithcode.com/paper/tracing-antisemitic-language-through
Repo	https://github.com/roccotrip/antisem
Framework	none

Image Super-Resolution via Attention based Back Projection Networks


Title	Image Super-Resolution via Attention based Back Projection Networks
Authors	Zhi-Song Liu, Li-Wen Wang, Chu-Tak Li, Wan-Chi Siu, Yui-Lam Chan
Abstract	Deep learning based image Super-Resolution (SR) has shown rapid development due to its ability of big data digestion. Generally, deeper and wider networks can extract richer feature maps and generate SR images with remarkable quality. However, the more complex network we have, the more time consumption is required for practical applications. It is important to have a simplified network for efficient image SR. In this paper, we propose an Attention based Back Projection Network (ABPN) for image super-resolution. Similar to some recent works, we believe that the back projection mechanism can be further developed for SR. Enhanced back projection blocks are suggested to iteratively update low- and high-resolution feature residues. Inspired by recent studies on attention models, we propose a Spatial Attention Block (SAB) to learn the cross-correlation across features at different layers. Based on the assumption that a good SR image should be close to the original LR image after down-sampling. We propose a Refined Back Projection Block (RBPB) for final reconstruction. Extensive experiments on some public and AIM2019 Image Super-Resolution Challenge datasets show that the proposed ABPN can provide state-of-the-art or even better performance in both quantitative and qualitative measurements.
Tasks	Image Super-Resolution, Super-Resolution
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04476v1
PDF	https://arxiv.org/pdf/1910.04476v1.pdf
PWC	https://paperswithcode.com/paper/image-super-resolution-via-attention-based
Repo	https://github.com/Holmes-Alan/ABPN
Framework	pytorch

Unsupervised Multi-Task Feature Learning on Point Clouds


Title	Unsupervised Multi-Task Feature Learning on Point Clouds
Authors	Kaveh Hassani, Mike Haley
Abstract	We introduce an unsupervised multi-task model to jointly learn point and shape features on point clouds. We define three unsupervised tasks including clustering, reconstruction, and self-supervised classification to train a multi-scale graph-based encoder. We evaluate our model on shape classification and segmentation benchmarks. The results suggest that it outperforms prior state-of-the-art unsupervised models: In the ModelNet40 classification task, it achieves an accuracy of 89.1% and in ShapeNet segmentation task, it achieves an mIoU of 68.2 and accuracy of 88.6%.
Tasks
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08207v1
PDF	https://arxiv.org/pdf/1910.08207v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-multi-task-feature-learning-on
Repo	https://github.com/AnTao97/UnsupervisedPointCloudReconstruction
Framework	pytorch

An Adaptive View of Adversarial Robustness from Test-time Smoothing Defense


Title	An Adaptive View of Adversarial Robustness from Test-time Smoothing Defense
Authors	Chao Tang, Yifei Fan, Anthony Yezzi
Abstract	The safety and robustness of learning-based decision-making systems are under threats from adversarial examples, as imperceptible perturbations can mislead neural networks to completely different outputs. In this paper, we present an adaptive view of the issue via evaluating various test-time smoothing defense against white-box untargeted adversarial examples. Through controlled experiments with pretrained ResNet-152 on ImageNet, we first illustrate the non-monotonic relation between adversarial attacks and smoothing defenses. Then at the dataset level, we observe large variance among samples and show that it is easy to inflate accuracy (even to 100%) or build large-scale (i.e., with size ~10^4) subsets on which a designated method outperforms others by a large margin. Finally at the sample level, as different adversarial examples require different degrees of defense, the potential advantages of iterative methods are also discussed. We hope this paper reveal useful behaviors of test-time defenses, which could help improve the evaluation process for adversarial robustness in the future.
Tasks	Decision Making
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11881v1
PDF	https://arxiv.org/pdf/1911.11881v1.pdf
PWC	https://paperswithcode.com/paper/an-adaptive-view-of-adversarial-robustness
Repo	https://github.com/mkt1412/testtime-smoothing-defense
Framework	pytorch

Shaping Visual Representations with Language for Few-shot Classification


Title	Shaping Visual Representations with Language for Few-shot Classification
Authors	Jesse Mu, Percy Liang, Noah Goodman
Abstract	Language is designed to convey useful information about the world, thus serving as a scaffold for efficient human learning. How can we let language guide representation learning in machine learning models? We explore this question in the setting of few-shot visual classification, proposing models which learn to perform visual classification while jointly predicting natural language task descriptions at train time. At test time, with no language available, we find that these language-influenced visual representations are more generalizable, compared to meta-learning baselines and approaches that explicitly use language as a bottleneck for classification.
Tasks	Meta-Learning, Representation Learning
Published	2019-11-06
URL	https://arxiv.org/abs/1911.02683v1
PDF	https://arxiv.org/pdf/1911.02683v1.pdf
PWC	https://paperswithcode.com/paper/shaping-visual-representations-with-language
Repo	https://github.com/jayelm/lsl
Framework	none

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis


Title	EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis
Authors	Chaoqi Wang, Roger Grosse, Sanja Fidler, Guodong Zhang
Abstract	Reducing the test time resource requirements of a neural network while preserving test accuracy is crucial for running inference on resource-constrained devices. To achieve this goal, we introduce a novel network reparameterization based on the Kronecker-factored eigenbasis (KFE), and then apply Hessian-based structured pruning methods in this basis. As opposed to existing Hessian-based pruning algorithms which do pruning in parameter coordinates, our method works in the KFE where different weights are approximately independent, enabling accurate pruning and fast computation. We demonstrate empirically the effectiveness of the proposed method through extensive experiments. In particular, we highlight that the improvements are especially significant for more challenging datasets and networks. With negligible loss of accuracy, an iterative-pruning version gives a 10$\times$ reduction in model size and a 8$\times$ reduction in FLOPs on wide ResNet32.
Tasks	Network Pruning
Published	2019-05-15
URL	https://arxiv.org/abs/1905.05934v1
PDF	https://arxiv.org/pdf/1905.05934v1.pdf
PWC	https://paperswithcode.com/paper/eigendamage-structured-pruning-in-the
Repo	https://github.com/alecwangcq/EigenDamage-Pytorch
Framework	pytorch

Learning Data Manipulation for Augmentation and Weighting


Title	Learning Data Manipulation for Augmentation and Weighting
Authors	Zhiting Hu, Bowen Tan, Ruslan Salakhutdinov, Tom Mitchell, Eric P. Xing
Abstract	Manipulating data, such as weighting data examples or augmenting with new instances, has been increasingly used to improve model training. Previous work has studied various rule- or learning-based approaches designed for specific types of data manipulation. In this work, we propose a new method that supports learning different manipulation schemes with the same gradient-based algorithm. Our approach builds upon a recent connection of supervised learning and reinforcement learning (RL), and adapts an off-the-shelf reward learning algorithm from RL for joint data manipulation learning and model training. Different parameterization of the “data reward” function instantiates different manipulation schemes. We showcase data augmentation that learns a text transformation network, and data weighting that dynamically adapts the data sample importance. Experiments show the resulting algorithms significantly improve the image and text classification performance in low data regime and class-imbalance problems.
Tasks	Data Augmentation, Text Classification
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12795v1
PDF	https://arxiv.org/pdf/1910.12795v1.pdf
PWC	https://paperswithcode.com/paper/learning-data-manipulation-for-augmentation
Repo	https://github.com/tanyuqian/learning-data-manipulation
Framework	pytorch

Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs


Title	Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs
Authors	Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, Ziyue Huang, Qipeng Guo, Hao Zhang, Haibin Lin, Junbo Zhao, Jinyang Li, Alexander Smola, Zheng Zhang
Abstract	Accelerating research in the emerging field of deep graph learning requires new tools. Such systems should support graph as the core abstraction and take care to maintain both forward (i.e. supporting new research ideas) and backward (i.e. integration with existing components) compatibility. In this paper, we present Deep Graph Library (DGL). DGL enables arbitrary message handling and mutation operators, flexible propagation rules, and is framework agnostic so as to leverage high-performance tensor, autograd operations, and other feature extraction modules already available in existing frameworks. DGL carefully handles the sparse and irregular graph structure, deals with graphs big and small which may change dynamically, fuses operations, and performs auto-batching, all to take advantages of modern hardware. DGL has been tested on a variety of models, including but not limited to the popular Graph Neural Networks (GNN) and its variants, with promising speed, memory footprint and scalability.
Tasks	Node Classification
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01315v1
PDF	https://arxiv.org/pdf/1909.01315v1.pdf
PWC	https://paperswithcode.com/paper/deep-graph-library-towards-efficient-and
Repo	https://github.com/dmlc/dgl
Framework	pytorch

Improve Model Generalization and Robustness to Dataset Bias with Bias-regularized Learning and Domain-guided Augmentation


Title	Improve Model Generalization and Robustness to Dataset Bias with Bias-regularized Learning and Domain-guided Augmentation
Authors	Yundong Zhang, Hang Wu, Huiye Liu, Li Tong, May D Wang
Abstract	Deep Learning has thrived on the emergence of biomedical big data. However, medical datasets acquired at different institutions have inherent bias caused by various confounding factors such as operation policies, machine protocols, treatment preference and etc. As the result, models trained on one dataset, regardless of volume, cannot be confidently utilized for the others. In this study, we investigated model robustness to dataset bias using three large-scale Chest X-ray datasets: first, we assessed the dataset bias using vanilla training baseline; second, we proposed a novel multi-source domain generalization model by (a) designing a new bias-regularized loss function; and (b) synthesizing new data for domain augmentation. We showed that our model significantly outperformed the baseline and other approaches on data from unseen domain in terms of accuracy and various bias measures, without retraining or finetuning. Our method is generally applicable to other biomedical data, providing new algorithms for training models robust to bias for big data analysis and applications. Demo training code is publicly available.
Tasks	Domain Generalization
Published	2019-10-12
URL	https://arxiv.org/abs/1910.06745v3
PDF	https://arxiv.org/pdf/1910.06745v3.pdf
PWC	https://paperswithcode.com/paper/mitigating-the-effect-of-dataset-bias-on
Repo	https://github.com/ydzhang12345/Domain-Generalization-by-Domain-guided-Multilayer-Cross-gradient-Training
Framework	pytorch

BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs


Title	BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs
Authors	Valentin Bazarevsky, Yury Kartynnik, Andrey Vakunov, Karthik Raveendran, Matthias Grundmann
Abstract	We present BlazeFace, a lightweight and well-performing face detector tailored for mobile GPU inference. It runs at a speed of 200-1000+ FPS on flagship devices. This super-realtime performance enables it to be applied to any augmented reality pipeline that requires an accurate facial region of interest as an input for task-specific models, such as 2D/3D facial keypoint or geometry estimation, facial features or expression classification, and face region segmentation. Our contributions include a lightweight feature extraction network inspired by, but distinct from MobileNetV1/V2, a GPU-friendly anchor scheme modified from Single Shot MultiBox Detector (SSD), and an improved tie resolution strategy alternative to non-maximum suppression.
Tasks	Face Detection
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05047v2
PDF	https://arxiv.org/pdf/1907.05047v2.pdf
PWC	https://paperswithcode.com/paper/blazeface-sub-millisecond-neural-face
Repo	https://github.com/gordinmitya/mobile-dnn
Framework	tf

Efficient Graph Generation with Graph Recurrent Attention Networks


Title	Efficient Graph Generation with Graph Recurrent Attention Networks
Authors	Renjie Liao, Yujia Li, Yang Song, Shenlong Wang, Charlie Nash, William L. Hamilton, David Duvenaud, Raquel Urtasun, Richard S. Zemel
Abstract	We propose a new family of efficient and expressive deep generative models of graphs, called Graph Recurrent Attention Networks (GRANs). Our model generates graphs one block of nodes and associated edges at a time. The block size and sampling stride allow us to trade off sample quality for efficiency. Compared to previous RNN-based graph generative models, our framework better captures the auto-regressive conditioning between the already-generated and to-be-generated parts of the graph using Graph Neural Networks (GNNs) with attention. This not only reduces the dependency on node ordering but also bypasses the long-term bottleneck caused by the sequential nature of RNNs. Moreover, we parameterize the output distribution per block using a mixture of Bernoulli, which captures the correlations among generated edges within the block. Finally, we propose to handle node orderings in generation by marginalizing over a family of canonical orderings. On standard benchmarks, we achieve state-of-the-art time efficiency and sample quality compared to previous models. Additionally, we show our model is capable of generating large graphs of up to 5K nodes with good quality. To the best of our knowledge, GRAN is the first deep graph generative model that can scale to this size. Our code is released at: https://github.com/lrjconan/GRAN.
Tasks	Graph Generation
Published	2019-10-02
URL	https://arxiv.org/abs/1910.00760v1
PDF	https://arxiv.org/pdf/1910.00760v1.pdf
PWC	https://paperswithcode.com/paper/efficient-graph-generation-with-graph
Repo	https://github.com/lrjconan/GRAN
Framework	pytorch

DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation


Title	DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation
Authors	Shashank Rajput, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos
Abstract	To improve the resilience of distributed training to worst-case, or Byzantine node failures, several recent approaches have replaced gradient averaging with robust aggregation methods. Such techniques can have high computational costs, often quadratic in the number of compute nodes, and only have limited robustness guarantees. Other methods have instead used redundancy to guarantee robustness, but can only tolerate limited number of Byzantine failures. In this work, we present DETOX, a Byzantine-resilient distributed training framework that combines algorithmic redundancy with robust aggregation. DETOX operates in two steps, a filtering step that uses limited redundancy to significantly reduce the effect of Byzantine nodes, and a hierarchical aggregation step that can be used in tandem with any state-of-the-art robust aggregation method. We show theoretically that this leads to a substantial increase in robustness, and has a per iteration runtime that can be nearly linear in the number of compute nodes. We provide extensive experiments over real distributed setups across a variety of large-scale machine learning tasks, showing that DETOX leads to orders of magnitude accuracy and speedup improvements over many state-of-the-art Byzantine-resilient approaches.
Tasks
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12205v2
PDF	https://arxiv.org/pdf/1907.12205v2.pdf
PWC	https://paperswithcode.com/paper/detox-a-redundancy-based-framework-for-faster
Repo	https://github.com/hwang595/DETOX
Framework	pytorch