Paper Group AWR 61
Kernels of Mallows Models under the Hamming Distance for solving the Quadratic Assignment Problem. A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation. CUP: Cluster Pruning for Compressing Deep Neural Networks. Tracing Antisemitic Language Through Diachronic Embedding Projections: France 1789-1914. Image Super-R …
Kernels of Mallows Models under the Hamming Distance for solving the Quadratic Assignment Problem
Title | Kernels of Mallows Models under the Hamming Distance for solving the Quadratic Assignment Problem |
Authors | Etor Arza, Aritz Perez, Ekhine Irurozki, Josu Ceberio |
Abstract | The Quadratic Assignment Problem (QAP) is a well-known permutation-based combinatorial optimization problem with real applications in industrial and logistics environments. Motivated by the challenge that this NP-hard problem represents, it has captured the attention of the evolutionary computation community for decades. As a result, a large number of algorithms have been proposed to optimize this algorithm. Among these, exact methods are only able to solve instances of size $n<40$, and thus, many heuristic and metaheuristic methods have been applied to the QAP. In this work, we follow this direction by approaching the QAP through Estimation of Distribution Algorithms (EDAs). Particularly, a non-parametric distance-based exponential probabilistic model is used. Based on the analysis of the characteristics of the QAP, and previous work in the area, we introduce Kernels of Mallows Model under the Hamming distance to the context of EDAs. Conducted experiments point out that the performance of the proposed algorithm in the QAP is superior to (i) the classical EDAs adapted to deal with the QAP, and also (ii) to the specific EDAs proposed in the literature to deal with permutation problems. |
Tasks | Combinatorial Optimization |
Published | 2019-10-19 |
URL | https://arxiv.org/abs/1910.08800v1 |
https://arxiv.org/pdf/1910.08800v1.pdf | |
PWC | https://paperswithcode.com/paper/kernels-of-mallows-models-under-the-hamming |
Repo | https://github.com/EtorArza/SupplementaryKMMHamming |
Framework | none |
A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation
Title | A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation |
Authors | Runzhe Yang, Xingyuan Sun, Karthik Narasimhan |
Abstract | We introduce a new algorithm for multi-objective reinforcement learning (MORL) with linear preferences, with the goal of enabling few-shot adaptation to new tasks. In MORL, the aim is to learn policies over multiple competing objectives whose relative importance (preferences) is unknown to the agent. While this alleviates dependence on scalar reward design, the expected return of a policy can change significantly with varying preferences, making it challenging to learn a single model to produce optimal policies under different preference conditions. We propose a generalized version of the Bellman equation to learn a single parametric representation for optimal policies over the space of all possible preferences. After an initial learning phase, our agent can execute the optimal policy under any given preference, or automatically infer an underlying preference with very few samples. Experiments across four different domains demonstrate the effectiveness of our approach. |
Tasks | |
Published | 2019-08-21 |
URL | https://arxiv.org/abs/1908.08342v2 |
https://arxiv.org/pdf/1908.08342v2.pdf | |
PWC | https://paperswithcode.com/paper/a-generalized-algorithm-for-multi-objective |
Repo | https://github.com/RunzheYang/MORL |
Framework | pytorch |
CUP: Cluster Pruning for Compressing Deep Neural Networks
Title | CUP: Cluster Pruning for Compressing Deep Neural Networks |
Authors | Rahul Duggal, Cao Xiao, Richard Vuduc, Jimeng Sun |
Abstract | We propose Cluster Pruning (CUP) for compressing and accelerating deep neural networks. Our approach prunes similar filters by clustering them based on features derived from both the incoming and outgoing weight connections. With CUP, we overcome two limitations of prior work-(1) non-uniform pruning: CUP can efficiently determine the ideal number of filters to prune in each layer of a neural network. This is in contrast to prior methods that either prune all layers uniformly or otherwise use resource-intensive methods such as manual sensitivity analysis or reinforcement learning to determine the ideal number. (2) Single-shot operation: We extend CUP to CUP-SS (for CUP single shot) whereby pruning is integrated into the initial training phase itself. This leads to large savings in training time compared to traditional pruning pipelines. Through extensive evaluation on multiple datasets (MNIST, CIFAR-10, and Imagenet) and models(VGG-16, Resnets-18/34/56) we show that CUP outperforms recent state of the art. Specifically, CUP-SS achieves 2.2x flops reduction for a Resnet-50 model trained on Imagenet while staying within 0.9% top-5 accuracy. It saves over 14 hours in training time with respect to the original Resnet-50. The code to reproduce results is available. |
Tasks | |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08630v1 |
https://arxiv.org/pdf/1911.08630v1.pdf | |
PWC | https://paperswithcode.com/paper/cup-cluster-pruning-for-compressing-deep |
Repo | https://github.com/duggalrahul/CUP_Public |
Framework | pytorch |
Tracing Antisemitic Language Through Diachronic Embedding Projections: France 1789-1914
Title | Tracing Antisemitic Language Through Diachronic Embedding Projections: France 1789-1914 |
Authors | Rocco Tripodi, Massimo Warglien, Simon Levis Sullam, Deborah Paci |
Abstract | We investigate some aspects of the history of antisemitism in France, one of the cradles of modern antisemitism, using diachronic word embeddings. We constructed a large corpus of French books and periodicals issues that contain a keyword related to Jews and performed a diachronic word embedding over the 1789-1914 period. We studied the changes over time in the semantic spaces of 4 target words and performed embedding projections over 6 streams of antisemitic discourse. This allowed us to track the evolution of antisemitic bias in the religious, economic, socio-politic, racial, ethic and conspiratorial domains. Projections show a trend of growing antisemitism, especially in the years starting in the mid-80s and culminating in the Dreyfus affair. Our analysis also allows us to highlight the peculiar adverse bias towards Judaism in the broader context of other religions. |
Tasks | Word Embeddings |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01440v1 |
https://arxiv.org/pdf/1906.01440v1.pdf | |
PWC | https://paperswithcode.com/paper/tracing-antisemitic-language-through |
Repo | https://github.com/roccotrip/antisem |
Framework | none |
Image Super-Resolution via Attention based Back Projection Networks
Title | Image Super-Resolution via Attention based Back Projection Networks |
Authors | Zhi-Song Liu, Li-Wen Wang, Chu-Tak Li, Wan-Chi Siu, Yui-Lam Chan |
Abstract | Deep learning based image Super-Resolution (SR) has shown rapid development due to its ability of big data digestion. Generally, deeper and wider networks can extract richer feature maps and generate SR images with remarkable quality. However, the more complex network we have, the more time consumption is required for practical applications. It is important to have a simplified network for efficient image SR. In this paper, we propose an Attention based Back Projection Network (ABPN) for image super-resolution. Similar to some recent works, we believe that the back projection mechanism can be further developed for SR. Enhanced back projection blocks are suggested to iteratively update low- and high-resolution feature residues. Inspired by recent studies on attention models, we propose a Spatial Attention Block (SAB) to learn the cross-correlation across features at different layers. Based on the assumption that a good SR image should be close to the original LR image after down-sampling. We propose a Refined Back Projection Block (RBPB) for final reconstruction. Extensive experiments on some public and AIM2019 Image Super-Resolution Challenge datasets show that the proposed ABPN can provide state-of-the-art or even better performance in both quantitative and qualitative measurements. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2019-10-10 |
URL | https://arxiv.org/abs/1910.04476v1 |
https://arxiv.org/pdf/1910.04476v1.pdf | |
PWC | https://paperswithcode.com/paper/image-super-resolution-via-attention-based |
Repo | https://github.com/Holmes-Alan/ABPN |
Framework | pytorch |
Unsupervised Multi-Task Feature Learning on Point Clouds
Title | Unsupervised Multi-Task Feature Learning on Point Clouds |
Authors | Kaveh Hassani, Mike Haley |
Abstract | We introduce an unsupervised multi-task model to jointly learn point and shape features on point clouds. We define three unsupervised tasks including clustering, reconstruction, and self-supervised classification to train a multi-scale graph-based encoder. We evaluate our model on shape classification and segmentation benchmarks. The results suggest that it outperforms prior state-of-the-art unsupervised models: In the ModelNet40 classification task, it achieves an accuracy of 89.1% and in ShapeNet segmentation task, it achieves an mIoU of 68.2 and accuracy of 88.6%. |
Tasks | |
Published | 2019-10-18 |
URL | https://arxiv.org/abs/1910.08207v1 |
https://arxiv.org/pdf/1910.08207v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-multi-task-feature-learning-on |
Repo | https://github.com/AnTao97/UnsupervisedPointCloudReconstruction |
Framework | pytorch |
An Adaptive View of Adversarial Robustness from Test-time Smoothing Defense
Title | An Adaptive View of Adversarial Robustness from Test-time Smoothing Defense |
Authors | Chao Tang, Yifei Fan, Anthony Yezzi |
Abstract | The safety and robustness of learning-based decision-making systems are under threats from adversarial examples, as imperceptible perturbations can mislead neural networks to completely different outputs. In this paper, we present an adaptive view of the issue via evaluating various test-time smoothing defense against white-box untargeted adversarial examples. Through controlled experiments with pretrained ResNet-152 on ImageNet, we first illustrate the non-monotonic relation between adversarial attacks and smoothing defenses. Then at the dataset level, we observe large variance among samples and show that it is easy to inflate accuracy (even to 100%) or build large-scale (i.e., with size ~10^4) subsets on which a designated method outperforms others by a large margin. Finally at the sample level, as different adversarial examples require different degrees of defense, the potential advantages of iterative methods are also discussed. We hope this paper reveal useful behaviors of test-time defenses, which could help improve the evaluation process for adversarial robustness in the future. |
Tasks | Decision Making |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11881v1 |
https://arxiv.org/pdf/1911.11881v1.pdf | |
PWC | https://paperswithcode.com/paper/an-adaptive-view-of-adversarial-robustness |
Repo | https://github.com/mkt1412/testtime-smoothing-defense |
Framework | pytorch |
Shaping Visual Representations with Language for Few-shot Classification
Title | Shaping Visual Representations with Language for Few-shot Classification |
Authors | Jesse Mu, Percy Liang, Noah Goodman |
Abstract | Language is designed to convey useful information about the world, thus serving as a scaffold for efficient human learning. How can we let language guide representation learning in machine learning models? We explore this question in the setting of few-shot visual classification, proposing models which learn to perform visual classification while jointly predicting natural language task descriptions at train time. At test time, with no language available, we find that these language-influenced visual representations are more generalizable, compared to meta-learning baselines and approaches that explicitly use language as a bottleneck for classification. |
Tasks | Meta-Learning, Representation Learning |
Published | 2019-11-06 |
URL | https://arxiv.org/abs/1911.02683v1 |
https://arxiv.org/pdf/1911.02683v1.pdf | |
PWC | https://paperswithcode.com/paper/shaping-visual-representations-with-language |
Repo | https://github.com/jayelm/lsl |
Framework | none |
EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis
Title | EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis |
Authors | Chaoqi Wang, Roger Grosse, Sanja Fidler, Guodong Zhang |
Abstract | Reducing the test time resource requirements of a neural network while preserving test accuracy is crucial for running inference on resource-constrained devices. To achieve this goal, we introduce a novel network reparameterization based on the Kronecker-factored eigenbasis (KFE), and then apply Hessian-based structured pruning methods in this basis. As opposed to existing Hessian-based pruning algorithms which do pruning in parameter coordinates, our method works in the KFE where different weights are approximately independent, enabling accurate pruning and fast computation. We demonstrate empirically the effectiveness of the proposed method through extensive experiments. In particular, we highlight that the improvements are especially significant for more challenging datasets and networks. With negligible loss of accuracy, an iterative-pruning version gives a 10$\times$ reduction in model size and a 8$\times$ reduction in FLOPs on wide ResNet32. |
Tasks | Network Pruning |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.05934v1 |
https://arxiv.org/pdf/1905.05934v1.pdf | |
PWC | https://paperswithcode.com/paper/eigendamage-structured-pruning-in-the |
Repo | https://github.com/alecwangcq/EigenDamage-Pytorch |
Framework | pytorch |
Learning Data Manipulation for Augmentation and Weighting
Title | Learning Data Manipulation for Augmentation and Weighting |
Authors | Zhiting Hu, Bowen Tan, Ruslan Salakhutdinov, Tom Mitchell, Eric P. Xing |
Abstract | Manipulating data, such as weighting data examples or augmenting with new instances, has been increasingly used to improve model training. Previous work has studied various rule- or learning-based approaches designed for specific types of data manipulation. In this work, we propose a new method that supports learning different manipulation schemes with the same gradient-based algorithm. Our approach builds upon a recent connection of supervised learning and reinforcement learning (RL), and adapts an off-the-shelf reward learning algorithm from RL for joint data manipulation learning and model training. Different parameterization of the “data reward” function instantiates different manipulation schemes. We showcase data augmentation that learns a text transformation network, and data weighting that dynamically adapts the data sample importance. Experiments show the resulting algorithms significantly improve the image and text classification performance in low data regime and class-imbalance problems. |
Tasks | Data Augmentation, Text Classification |
Published | 2019-10-28 |
URL | https://arxiv.org/abs/1910.12795v1 |
https://arxiv.org/pdf/1910.12795v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-data-manipulation-for-augmentation |
Repo | https://github.com/tanyuqian/learning-data-manipulation |
Framework | pytorch |
Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs
Title | Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs |
Authors | Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, Ziyue Huang, Qipeng Guo, Hao Zhang, Haibin Lin, Junbo Zhao, Jinyang Li, Alexander Smola, Zheng Zhang |
Abstract | Accelerating research in the emerging field of deep graph learning requires new tools. Such systems should support graph as the core abstraction and take care to maintain both forward (i.e. supporting new research ideas) and backward (i.e. integration with existing components) compatibility. In this paper, we present Deep Graph Library (DGL). DGL enables arbitrary message handling and mutation operators, flexible propagation rules, and is framework agnostic so as to leverage high-performance tensor, autograd operations, and other feature extraction modules already available in existing frameworks. DGL carefully handles the sparse and irregular graph structure, deals with graphs big and small which may change dynamically, fuses operations, and performs auto-batching, all to take advantages of modern hardware. DGL has been tested on a variety of models, including but not limited to the popular Graph Neural Networks (GNN) and its variants, with promising speed, memory footprint and scalability. |
Tasks | Node Classification |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01315v1 |
https://arxiv.org/pdf/1909.01315v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-graph-library-towards-efficient-and |
Repo | https://github.com/dmlc/dgl |
Framework | pytorch |
Improve Model Generalization and Robustness to Dataset Bias with Bias-regularized Learning and Domain-guided Augmentation
Title | Improve Model Generalization and Robustness to Dataset Bias with Bias-regularized Learning and Domain-guided Augmentation |
Authors | Yundong Zhang, Hang Wu, Huiye Liu, Li Tong, May D Wang |
Abstract | Deep Learning has thrived on the emergence of biomedical big data. However, medical datasets acquired at different institutions have inherent bias caused by various confounding factors such as operation policies, machine protocols, treatment preference and etc. As the result, models trained on one dataset, regardless of volume, cannot be confidently utilized for the others. In this study, we investigated model robustness to dataset bias using three large-scale Chest X-ray datasets: first, we assessed the dataset bias using vanilla training baseline; second, we proposed a novel multi-source domain generalization model by (a) designing a new bias-regularized loss function; and (b) synthesizing new data for domain augmentation. We showed that our model significantly outperformed the baseline and other approaches on data from unseen domain in terms of accuracy and various bias measures, without retraining or finetuning. Our method is generally applicable to other biomedical data, providing new algorithms for training models robust to bias for big data analysis and applications. Demo training code is publicly available. |
Tasks | Domain Generalization |
Published | 2019-10-12 |
URL | https://arxiv.org/abs/1910.06745v3 |
https://arxiv.org/pdf/1910.06745v3.pdf | |
PWC | https://paperswithcode.com/paper/mitigating-the-effect-of-dataset-bias-on |
Repo | https://github.com/ydzhang12345/Domain-Generalization-by-Domain-guided-Multilayer-Cross-gradient-Training |
Framework | pytorch |
BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs
Title | BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs |
Authors | Valentin Bazarevsky, Yury Kartynnik, Andrey Vakunov, Karthik Raveendran, Matthias Grundmann |
Abstract | We present BlazeFace, a lightweight and well-performing face detector tailored for mobile GPU inference. It runs at a speed of 200-1000+ FPS on flagship devices. This super-realtime performance enables it to be applied to any augmented reality pipeline that requires an accurate facial region of interest as an input for task-specific models, such as 2D/3D facial keypoint or geometry estimation, facial features or expression classification, and face region segmentation. Our contributions include a lightweight feature extraction network inspired by, but distinct from MobileNetV1/V2, a GPU-friendly anchor scheme modified from Single Shot MultiBox Detector (SSD), and an improved tie resolution strategy alternative to non-maximum suppression. |
Tasks | Face Detection |
Published | 2019-07-11 |
URL | https://arxiv.org/abs/1907.05047v2 |
https://arxiv.org/pdf/1907.05047v2.pdf | |
PWC | https://paperswithcode.com/paper/blazeface-sub-millisecond-neural-face |
Repo | https://github.com/gordinmitya/mobile-dnn |
Framework | tf |
Efficient Graph Generation with Graph Recurrent Attention Networks
Title | Efficient Graph Generation with Graph Recurrent Attention Networks |
Authors | Renjie Liao, Yujia Li, Yang Song, Shenlong Wang, Charlie Nash, William L. Hamilton, David Duvenaud, Raquel Urtasun, Richard S. Zemel |
Abstract | We propose a new family of efficient and expressive deep generative models of graphs, called Graph Recurrent Attention Networks (GRANs). Our model generates graphs one block of nodes and associated edges at a time. The block size and sampling stride allow us to trade off sample quality for efficiency. Compared to previous RNN-based graph generative models, our framework better captures the auto-regressive conditioning between the already-generated and to-be-generated parts of the graph using Graph Neural Networks (GNNs) with attention. This not only reduces the dependency on node ordering but also bypasses the long-term bottleneck caused by the sequential nature of RNNs. Moreover, we parameterize the output distribution per block using a mixture of Bernoulli, which captures the correlations among generated edges within the block. Finally, we propose to handle node orderings in generation by marginalizing over a family of canonical orderings. On standard benchmarks, we achieve state-of-the-art time efficiency and sample quality compared to previous models. Additionally, we show our model is capable of generating large graphs of up to 5K nodes with good quality. To the best of our knowledge, GRAN is the first deep graph generative model that can scale to this size. Our code is released at: https://github.com/lrjconan/GRAN. |
Tasks | Graph Generation |
Published | 2019-10-02 |
URL | https://arxiv.org/abs/1910.00760v1 |
https://arxiv.org/pdf/1910.00760v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-graph-generation-with-graph |
Repo | https://github.com/lrjconan/GRAN |
Framework | pytorch |
DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation
Title | DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation |
Authors | Shashank Rajput, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos |
Abstract | To improve the resilience of distributed training to worst-case, or Byzantine node failures, several recent approaches have replaced gradient averaging with robust aggregation methods. Such techniques can have high computational costs, often quadratic in the number of compute nodes, and only have limited robustness guarantees. Other methods have instead used redundancy to guarantee robustness, but can only tolerate limited number of Byzantine failures. In this work, we present DETOX, a Byzantine-resilient distributed training framework that combines algorithmic redundancy with robust aggregation. DETOX operates in two steps, a filtering step that uses limited redundancy to significantly reduce the effect of Byzantine nodes, and a hierarchical aggregation step that can be used in tandem with any state-of-the-art robust aggregation method. We show theoretically that this leads to a substantial increase in robustness, and has a per iteration runtime that can be nearly linear in the number of compute nodes. We provide extensive experiments over real distributed setups across a variety of large-scale machine learning tasks, showing that DETOX leads to orders of magnitude accuracy and speedup improvements over many state-of-the-art Byzantine-resilient approaches. |
Tasks | |
Published | 2019-07-29 |
URL | https://arxiv.org/abs/1907.12205v2 |
https://arxiv.org/pdf/1907.12205v2.pdf | |
PWC | https://paperswithcode.com/paper/detox-a-redundancy-based-framework-for-faster |
Repo | https://github.com/hwang595/DETOX |
Framework | pytorch |