Paper Group NAWR 5
Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks. MAGNET: Multi-Label Text Classification using Attention-based Graph Neural Network. MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius. Chameleon: Adaptive Code Optimization For Expedited Deep Neural Network Compilation. ProtoAttend: Attention- …
Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks
Title | Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks |
Authors | Anonymous |
Abstract | (Frankle & Carbin, 2019) shows that there exist winning tickets (small but critical subnetworks) for dense, randomly initialized networks, that can be trained alone to achieve comparable accuracies to the latter in a similar number of iterations. However, the identification of these winning tickets still requires the costly train-prune-retrain process, limiting their practical benefits. In this paper, we discover for the first time that the winning tickets can be identified at the very early training stage, which we term as early-bird (EB) tickets, via low-cost training schemes (e.g., early stopping and low-precision training) at large learning rates. Our finding of EB tickets is consistent with recently reported observations that the key connectivity patterns of neural networks emerge early. Furthermore, we propose a mask distance metric that can be used to identify EB tickets with low computational overhead, without needing to know the true winning tickets that emerge after the full training. Finally, we leverage the existence of EB tickets and the proposed mask distance to develop efficient training methods, which are achieved by first identifying EB tickets via low-cost schemes, and then continuing to train merely the EB tickets towards the target accuracy. Experiments based on various deep networks and datasets validate: 1) the existence of EB tickets, and the effectiveness of mask distance in efficiently identifying them; and 2) that the proposed efficient training via EB tickets can achieve up to 4.7x energy savings while maintaining comparable or even better accuracy, demonstrating a promising and easily adopted method for tackling cost-prohibitive deep network training. |
Tasks | Network Pruning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJxsrgStvr |
https://openreview.net/pdf?id=BJxsrgStvr | |
PWC | https://paperswithcode.com/paper/drawing-early-bird-tickets-toward-more |
Repo | https://github.com/RICE-EIC/Early-Bird-Tickets |
Framework | pytorch |
MAGNET: Multi-Label Text Classification using Attention-based Graph Neural Network
Title | MAGNET: Multi-Label Text Classification using Attention-based Graph Neural Network |
Authors | Ankit Pal, Muru Selvakumar and Malaikannan Sankarasubbu |
Abstract | In Multi-Label Text Classification (MLTC), one sample can belong to more than one class. It is observed that most MLTC tasks, there are dependencies or correlations among labels. Existing methods tend to ignore the relationship among labels. In this paper, a graph attention network-based model is proposed to capture the attentive dependency structure among the labels. The graph attention network uses a feature matrix and a correlation matrix to capture and explore the crucial dependencies between the labels and generate classifiers for the task. The generated classifiers are applied to sentence feature vectors obtained from the text feature extraction network(BiLSTM) to enable end-to-end training. Attention allows the system to assign different weights to neighbor nodes per label, thus allowing it to learn the dependencies among labels implicitly. The results of the proposed model are validated on five real-world MLTC datasets. The proposed model achieves similar or better performance compared to the previous state-of-the-art models. |
Tasks | Graph Representation Learning, Multi-Label Text Classification, Text Classification |
Published | 2020-02-24 |
URL | https://www.scitepress.org/PublicationsDetail.aspx?ID=siCYSzSoEx0=&t=1 |
https://www.scitepress.org/PublicationsDetail.aspx?ID=siCYSzSoEx0=&t=1 | |
PWC | https://paperswithcode.com/paper/magnet-multi-label-text-classification-using |
Repo | https://github.com/monk1337/MAGnet |
Framework | none |
MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius
Title | MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius |
Authors | Anonymous |
Abstract | Adversarial training is one of the most popular ways to learn robust models but is usually attack-dependent and time costly. In this paper, we propose the MACER algorithm, which learns robust models without using adversarial training but performs better than all existing provable l2-defenses. Recent work shows that randomized smoothing can be used to provide certified l2 radius to smoothed classifiers, and our algorithm trains provably robust smoothed classifiers via MAximizing the CErtified Radius (MACER). The attack-free characteristic makes MACER faster to train and easier to optimize. In our experiments, we show that our method can be applied to modern deep neural networks on a wide range of datasets, including Cifar-10, ImageNet, MNIST, and SVHN. For all tasks, MACER spends less training time than state-of-the-art adversarial training algorithms, and the learned models achieve larger average certified radius. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJx1Na4Fwr |
https://openreview.net/pdf?id=rJx1Na4Fwr | |
PWC | https://paperswithcode.com/paper/macer-attack-free-and-scalable-robust |
Repo | https://github.com/MacerAuthors/macer |
Framework | pytorch |
Chameleon: Adaptive Code Optimization For Expedited Deep Neural Network Compilation
Title | Chameleon: Adaptive Code Optimization For Expedited Deep Neural Network Compilation |
Authors | Anonymous |
Abstract | Achieving faster execution with shorter compilation time can foster further diversity and innovation in neural networks. However, the current paradigm of executing neural networks either relies on hand-optimized libraries, traditional compilation heuristics, or very recently genetic algorithms and other stochastic methods. These methods suffer from frequent costly hardware measurements rendering them not only too time consuming but also suboptimal. As such, we devise a solution that can learn to quickly adapt to a previously unseen design space for code optimization, both accelerating the search and improving the output performance. This solution dubbed CHAMELEON leverages reinforcement learning whose solution takes fewer steps to converge, and develops an adaptive sampling algorithm that not only focuses on the costly samples (real hardware measurements) on representative points but also uses a domain knowledge inspired logic to improve the samples itself. Experimentation with real hardware shows that CHAMELEON provides 4.45×speed up in optimization time over AutoTVM, while also improving inference time of the modern deep networks by 5.6%. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rygG4AVFvH |
https://openreview.net/pdf?id=rygG4AVFvH | |
PWC | https://paperswithcode.com/paper/chameleon-adaptive-code-optimization-for |
Repo | https://github.com/anony-sub/chameleon |
Framework | none |
ProtoAttend: Attention-Based Prototypical Learning
Title | ProtoAttend: Attention-Based Prototypical Learning |
Authors | Anonymous |
Abstract | We propose a novel inherently interpretable machine learning method that bases decisions on few relevant examples that we call prototypes. Our method, ProtoAttend, can be integrated into a wide range of neural network architectures including pre-trained models. It utilizes an attention mechanism that relates the encoded representations to samples in order to determine prototypes. The resulting model outperforms state of the art in three high impact problems without sacrificing accuracy of the original model: (1) it enables high-quality interpretability that outputs samples most relevant to the decision-making (i.e. a sample-based interpretability method); (2) it achieves state of the art confidence estimation by quantifying the mismatch across prototype labels; and (3) it obtains state of the art in distribution mismatch detection. All this can be achieved with minimal additional test time and a practically viable training time computational cost. |
Tasks | Decision Making, Interpretable Machine Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hyepjh4FwB |
https://openreview.net/pdf?id=Hyepjh4FwB | |
PWC | https://paperswithcode.com/paper/protoattend-attention-based-prototypical |
Repo | https://github.com/google-research/google-research/tree/master/protoattend |
Framework | tf |
RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments
Title | RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments |
Authors | Anonymous |
Abstract | Exploration in sparse reward environments remains one of the key challenges of model-free reinforcement learning (RL). Instead of solely relying on extrinsic rewards provided by the environment, many state-of-the-art methods use intrinsic rewards to encourage the agent to explore the environment. However, we show that existing methods fall short in procedurally-generated environments where an agent is unlikely to ever visit the same state more than once. We propose a novel type of intrinsic exploration bonus which rewards the agent for actions that change the agent’s learned state representation. We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid, as well as on tasks used in prior curiosity-driven exploration work. Our experiments demonstrate that our approach is more sample efficient than existing exploration methods, particularly for procedurally-generated MiniGrid environments. Furthermore, we analyze the learned behavior as well as the intrinsic reward received by our agent. In contrast to previous approaches, our intrinsic reward does not diminish during the course of training and it rewards the agent substantially more for interacting with objects that it can control. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkg-TJBFPB |
https://openreview.net/pdf?id=rkg-TJBFPB | |
PWC | https://paperswithcode.com/paper/ride-rewarding-impact-driven-exploration-for |
Repo | https://github.com/maximecb/gym-minigrid |
Framework | pytorch |
Discovering the compositional structure of vector representations with Role Learning Networks
Title | Discovering the compositional structure of vector representations with Role Learning Networks |
Authors | Anonymous |
Abstract | Neural networks (NNs) are able to perform tasks that rely on compositional structure even though they lack obvious mechanisms for representing this structure. To analyze the internal representations that enable such success, we propose ROLE, a technique that detects whether these representations implicitly encode symbolic structure. ROLE learns to approximate the representations of a target encoder E by learning a symbolic constituent structure and an embedding of that structure into E’s representational vector space. The constituents of the approximating symbol structure are defined by structural positions — roles — that can be filled by symbols. We show that when E is constructed to explicitly embed a particular type of structure (e.g., string or tree), ROLE successfully extracts the ground-truth roles defining that structure. We then analyze a seq2seq network trained to perform a more complex compositional task (SCAN), where there is no ground truth role scheme available. For this model, ROLE successfully discovers an interpretable symbolic structure that the model implicitly uses to perform the SCAN task, providing a comprehensive account of the link between the representations and the behavior of a notoriously hard-to-interpret type of model. We verify the causal importance of the discovered symbolic structure by showing that, when we systematically manipulate hidden embeddings based on this symbolic structure, the model’s output is also changed in the way predicted by our analysis. Finally, we use ROLE to explore whether popular sentence embedding models are capturing compositional structure and find evidence that they are not; we conclude by discussing how insights from ROLE can be used to impart new inductive biases that will improve the compositional abilities of such models. |
Tasks | Sentence Embedding |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BklMDCVtvr |
https://openreview.net/pdf?id=BklMDCVtvr | |
PWC | https://paperswithcode.com/paper/discovering-the-compositional-structure-of-1 |
Repo | https://github.com/iclr2020-anonymous1/role-learner |
Framework | pytorch |
Automatically Discovering and Learning New Visual Categories with Ranking Statistics
Title | Automatically Discovering and Learning New Visual Categories with Ranking Statistics |
Authors | Anonymous |
Abstract | We tackle the problem of discovering novel classes in an image collection given labelled examples of other classes. This setting is similar to semi-supervised learning, but significantly harder because there are no labelled examples for the new classes. The challenge, then, is to leverage the information contained in the labelled images in order to learn a general-purpose clustering model and use the latter to identify the new classes in the unlabelled data. In this work we address this problem by combining three ideas: (1) We suggest that the common approach of bootstrapping an image representation using the available labels introduces an unwanted bias, which we avoid by using self-supervised learning to train the representation from scratch on the union of labelled and unlabelled data; (2) We use rank statistics to transfer the model’s knowledge of the labelled classes to the problem of clustering the unlabelled images; Finally, (3) we train the data representation by optimizing a joint objective function on the labelled and unlabelled subsets of the data, improving both the supervised classification of the labelled data, and the clustering of the unlabelled, simultaneously. We evaluate our approach on standard classification benchmarks and outperform current methods for novel category discovery by a significant margin. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJl2_nVFPB |
https://openreview.net/pdf?id=BJl2_nVFPB | |
PWC | https://paperswithcode.com/paper/automatically-discovering-and-learning-new |
Repo | https://github.com/k-han/AutoNovel |
Framework | pytorch |
Attention-based View Selection Networks for Light-field Disparity Estimation
Title | Attention-based View Selection Networks for Light-field Disparity Estimation |
Authors | Yu-Ju Tsai, Yu-Lun Liu, Ming Ouhyoung, Yung-Yu Chuang |
Abstract | This paper introduces a novel deep network for estimating depth maps from a light field image. For utilizing the views more effectively and reducing redundancy within views, we propose a view selection module that generates an attention map indicating the importance of each view and its potential for contributing to accurate depth estimation. By exploring the symmetric property of light field views, we enforce symmetry in the attention map and further improve accuracy. With the attention map, our architecture utilizes all views more effectively and efficiently. Experiments show that the proposed method achieves state-of-the-art performance in terms of accuracy and ranks the first on a popular benchmark for disparity estimation for light field images. |
Tasks | Depth Estimation, Disparity Estimation |
Published | 2020-02-07 |
URL | http://www.cmlab.csie.ntu.edu.tw/~r06922009/AAAI2020/aaai2020_LFattNet_camera_ready.pdf |
http://www.cmlab.csie.ntu.edu.tw/~r06922009/AAAI2020/aaai2020_LFattNet_camera_ready.pdf | |
PWC | https://paperswithcode.com/paper/attention-based-view-selection-networks-for |
Repo | https://github.com/LIAGM/LFattNet |
Framework | tf |
State-only Imitation with Transition Dynamics Mismatch
Title | State-only Imitation with Transition Dynamics Mismatch |
Authors | Anonymous |
Abstract | Imitation Learning (IL) is a popular paradigm for training agents to achieve complicated goals by leveraging expert behavior, rather than dealing with the hardships of designing a correct reward function. With the environment modeled as a Markov Decision Process (MDP), most of the existing IL algorithms are contingent on the availability of expert demonstrations in the same MDP as the one in which a new imitator policy is to be learned. This is uncharacteristic of many real-life scenarios where discrepancies between the expert and the imitator MDPs are common, especially in the transition dynamics function. Furthermore, obtaining expert actions may be costly or infeasible, making the recent trend towards state-only IL (where expert demonstrations constitute only states or observations) ever so promising. Building on recent adversarial imitation approaches that are motivated by the idea of divergence minimization, we present a new state-only IL algorithm in this paper. It divides the overall optimization objective into two sub-problems by introducing an indirection step, and solves the sub-problems iteratively. We show that our algorithm is particularly effective when there is a transition dynamics mismatch between the expert and imitator MDPs, while the baseline IL methods suffer from performance degradation. To analyze this, we construct several interesting MDPs by modifying the configuration parameters for the MuJoCo locomotion tasks from OpenAI Gym. |
Tasks | Imitation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJgLLyrYwB |
https://openreview.net/pdf?id=HJgLLyrYwB | |
PWC | https://paperswithcode.com/paper/state-only-imitation-with-transition-dynamics |
Repo | https://github.com/tgangwani/RL-Indirect-imitation |
Framework | pytorch |
DivideMix: Learning with Noisy Labels as Semi-supervised Learning
Title | DivideMix: Learning with Noisy Labels as Semi-supervised Learning |
Authors | Anonymous |
Abstract | Deep neural networks are known to be annotation-hungry. Numerous efforts have been devoted to reduce the annotation cost when learning with deep networks. Two prominent directions include learning with noisy labels and semi-supervised learning by exploiting unlabeled data. In this work, we propose DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques. In particular, DivideMix models the per-sample loss distribution with a mixture model to dynamically divide the training data into a labeled set with clean samples and an unlabeled set with noisy samples, and trains the model on both the labeled and unlabeled data in a semi-supervised manner. To avoid confirmation bias, we simultaneously train two diverged networks where each network uses the dataset division from the other network. During the semi-supervised training phase, we improve the MixMatch strategy by performing label co-refinement and label co-guessing on labeled and unlabeled samples, respectively. Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods. Source code to reproduce all results will be released. |
Tasks | Image Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJgExaVtwr |
https://openreview.net/pdf?id=HJgExaVtwr | |
PWC | https://paperswithcode.com/paper/dividemix-learning-with-noisy-labels-as-semi |
Repo | https://github.com/LiJunnan1992/DivideMix |
Framework | pytorch |
DropEdge: Towards Deep Graph Convolutional Networks on Node Classification
Title | DropEdge: Towards Deep Graph Convolutional Networks on Node Classification |
Authors | Anonymous |
Abstract | Over-fitting and over-smoothing are two main obstacles of developing deep Graph Convolutional Networks (GCNs) for node classification. In particular, over-fitting weakens the generalization ability on small dataset, while over-smoothing impedes model training by isolating output representations from the input features with the increase in network depth. This paper proposes DropEdge, a novel and flexible technique to alleviate both issues. At its core, DropEdge randomly removes a certain number of edges from the input graph at each training epoch, acting like a data augmenter and also a message passing reducer. Furthermore, we theoretically demonstrate that DropEdge either retards the convergence speed of over-smoothing or relieves the information loss caused by it. More importantly, our DropEdge is a general skill that can be equipped with many other backbone models (e.g. GCN, ResGCN, GraphSAGE, and JKNet) for enhanced performance. Extensive experiments on several benchmarks verify that DropEdge consistently improves the performance on a variety of both shallow and deep GCNs. The effect of DropEdge on preventing over-smoothing is empirically visualized and validated as well. Codes will be made public upon the publication. |
Tasks | Node Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hkx1qkrKPr |
https://openreview.net/pdf?id=Hkx1qkrKPr | |
PWC | https://paperswithcode.com/paper/dropedge-towards-deep-graph-convolutional |
Repo | https://github.com/DropEdge/DropEdge |
Framework | pytorch |
FasterSeg: Searching for Faster Real-time Semantic Segmentation
Title | FasterSeg: Searching for Faster Real-time Semantic Segmentation |
Authors | Anonymous |
Abstract | We present FasterSeg, an automatically designed semantic segmentation network with not only state-of-the-art performance but also faster speed than current methods. Utilizing neural architecture search (NAS), FasterSeg is discovered from a novel and broader search space integrating multi-resolution branches, that has been recently found to be vital in manually designed segmentation models. To better calibrate the balance between the goals of high accuracy and low latency, we propose a decoupled and fine-grained latency regularization, that effectively overcomes our observed phenomenons that the searched networks are prone to “collapsing” to low-latency yet poor-accuracy models. Moreover, we seamlessly extend FasterSeg to a new collaborative search (co-searching) framework, simultaneously searching for a teacher and a student network in the same single run. The teacher-student distillation further boosts the student model’s accuracy. Experiments on popular segmentation benchmarks demonstrate the competency of FasterSeg. For example, FasterSeg can run over 30% faster than the closest manually designed competitor on Cityscapes, while maintaining comparable accuracy. |
Tasks | Neural Architecture Search, Real-Time Semantic Segmentation, Semantic Segmentation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJgqQ6NYvB |
https://openreview.net/pdf?id=BJgqQ6NYvB | |
PWC | https://paperswithcode.com/paper/fasterseg-searching-for-faster-real-time |
Repo | https://github.com/TAMU-VITA/FasterSeg |
Framework | pytorch |
Learning Robust Representations via Multi-View Information Bottleneck
Title | Learning Robust Representations via Multi-View Information Bottleneck |
Authors | Anonymous |
Abstract | The information bottleneck method provides an information-theoretic view of representation learning. The original formulation, however, can only be applied in the supervised setting where task-specific labels are available at learning time. We extend this method to the unsupervised setting, by taking advantage of multi-view data, which provides two views of the same underlying entity. A theoretical analysis leads to the definition of a new multi-view model which produces state-of-the-art results on two standard multi-view datasets, Sketchy and MIR-Flickr. We also extend our theory to the single-view setting by taking advantage of standard data augmentation techniques, empirically showing better generalization capabilities when compared to traditional unsupervised approaches. |
Tasks | Data Augmentation, MULTI-VIEW LEARNING, Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1xwcyHFDr |
https://openreview.net/pdf?id=B1xwcyHFDr | |
PWC | https://paperswithcode.com/paper/learning-robust-representations-via-multi |
Repo | https://github.com/mfederici/Multi-View-Information-Bottleneck |
Framework | pytorch |
Understanding and Robustifying Differentiable Architecture Search
Title | Understanding and Robustifying Differentiable Architecture Search |
Authors | Anonymous |
Abstract | Differentiable Architecture Search (DARTS) has attracted a lot of attention due to its simplicity and small search costs achieved by a continuous relaxation and an approximation of the resulting bi-level optimization problem. However, DARTS does not work robustly for new problems: we identify a wide range of search spaces for which DARTS yields degenerate architectures with very poor test performance. We study this failure mode and show that, while DARTS successfully minimizes validation loss, the found solutions generalize poorly when they coincide with high validation loss curvature in the architecture space. We show that by adding one of various types of regularization we can robustify DARTS to find solutions with less curvature and better generalization properties. Based on these observations, we propose several simple variations of DARTS that perform substantially more robustly in practice. Our observations are robust across five search spaces on three image classification tasks and also hold for the very different domains of disparity estimation (a dense regression task) and language modelling. |
Tasks | Disparity Estimation, Image Classification, Language Modelling |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1gDNyrKDS |
https://openreview.net/pdf?id=H1gDNyrKDS | |
PWC | https://paperswithcode.com/paper/understanding-and-robustifying-differentiable |
Repo | https://github.com/MetaAnonym/RobustDARTS |
Framework | pytorch |