Paper Group NANR 7
BETANAS: Balanced Training and selective drop for Neural Architecture Search. INTERPRETING CNN PREDICTION THROUGH LAYER - WISE SELECTED DISCERNIBLE NEURONS. Adversarial Interpolation Training: A Simple Approach for Improving Model Robustness. Robustness Verification for Transformers. Diversely Stale Parameters for Efficient Training of Deep Convolu …
BETANAS: Balanced Training and selective drop for Neural Architecture Search
Title | BETANAS: Balanced Training and selective drop for Neural Architecture Search |
Authors | Anonymous |
Abstract | Automatic neural architecture search techniques are becoming increasingly important in machine learning area recently. Especially, weight sharing methods have shown remarkable potentials on searching good network architectures with few computational resources. However, existing weight sharing methods mainly suffer limitations on searching strategies: these methods either uniformly train all network paths to convergence which introduces conflicts between branches and wastes a large amount of computation on unpromising candidates, or selectively train branches with different frequency which leads to unfair evaluation and comparison among paths. To address these issues, we propose a novel neural architecture search method with balanced training strategy to ensure fair comparisons and a selective drop mechanism to reduces conflicts among candidate paths. The experimental results show that our proposed method can achieve a leading performance of 79.0% on ImageNet under mobile settings, which outperforms other state-of-the-art methods in both accuracy and efficiency. |
Tasks | Neural Architecture Search |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyeEIyBtvr |
https://openreview.net/pdf?id=HyeEIyBtvr | |
PWC | https://paperswithcode.com/paper/betanas-balanced-training-and-selective-drop |
Repo | |
Framework | |
INTERPRETING CNN PREDICTION THROUGH LAYER - WISE SELECTED DISCERNIBLE NEURONS
Title | INTERPRETING CNN PREDICTION THROUGH LAYER - WISE SELECTED DISCERNIBLE NEURONS |
Authors | Anonymous |
Abstract | In recent years, researchers have seen working on interpreting the insights of deep networks in the pursuit of overcoming their opaqueness and so-called ‘black-box’ tag from them. In this work, we present a new visual interpretation technique that finds out discriminative image locations contributing highly towards networks’ prediction. We select the most contributing set of neurons per layer and engineer the forward pass operation to gradually reach to the important locations of the in-put image. We explore the connectivity structure of the neuron and obtain support from succeeding and preceding layer along with its evidence from current layer to advocate for a neuron’s importance. While conducting this operation, we also add priorities to the supports from neighboring layers, which, in practice, provides a reliable way of selecting the discriminative set of neurons for the target layer.We conduct both the objective and subjective evaluations to examine the performance of our method in terms of model’s faithfulness and human-trust, where we visualize its efficacy over other existing methods. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HylYBlBYvB |
https://openreview.net/pdf?id=HylYBlBYvB | |
PWC | https://paperswithcode.com/paper/interpreting-cnn-prediction-through-layer |
Repo | |
Framework | |
Adversarial Interpolation Training: A Simple Approach for Improving Model Robustness
Title | Adversarial Interpolation Training: A Simple Approach for Improving Model Robustness |
Authors | Anonymous |
Abstract | We propose a simple approach for adversarial training. The proposed approach utilizes an adversarial interpolation scheme for generating adversarial images and accompanying adversarial labels, which are then used in place of the original data for model training. The proposed approach is intuitive to understand, simple to implement and achieves state-of-the-art performance. We evaluate the proposed approach on a number of datasets including CIFAR10, CIFAR100 and SVHN. Extensive empirical results compared with several state-of-the-art methods against different attacks verify the effectiveness of the proposed approach. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Syejj0NYvr |
https://openreview.net/pdf?id=Syejj0NYvr | |
PWC | https://paperswithcode.com/paper/adversarial-interpolation-training-a-simple |
Repo | |
Framework | |
Robustness Verification for Transformers
Title | Robustness Verification for Transformers |
Authors | Anonymous |
Abstract | Robustness verification that aims to formally certify the prediction behavior of neural networks has become an important tool for understanding the behavior of a given model and for obtaining safety guarantees. However, previous methods are usually limited to relatively simple neural networks. In this paper, we consider the robustness verification problem for Transformers. Transformers have very complicated self-attention layers that create many challenges for verification, including cross-nonlinearity and cross-position dependency that have not been solved in previous work. We resolve these key challenges and develop the first verification algorithm for Transformers. The certified robustness bounds computed by our method are significantly tighter than those by naive Interval Bound Propagation, and they also consistently reflect the importance of different words in sentiment analysis and thus are meaningful in practice. |
Tasks | Sentiment Analysis |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJxwPJHFwS |
https://openreview.net/pdf?id=BJxwPJHFwS | |
PWC | https://paperswithcode.com/paper/robustness-verification-for-transformers |
Repo | |
Framework | |
Diversely Stale Parameters for Efficient Training of Deep Convolutional Networks
Title | Diversely Stale Parameters for Efficient Training of Deep Convolutional Networks |
Authors | Anonymous |
Abstract | The backpropagation algorithm is the most popular algorithm training neural networks nowadays. However, it suffers from the forward locking, backward locking and update locking problems, especially when a neural network is so large that its layers are distributed across multiple devices. Existing solutions either can only handle one locking problem or lead to severe accuracy loss or memory inefficiency. Moreover, none of them consider the straggler problem among devices. In this paper, we propose \textbf{Layer-wise Staleness} and a novel efficient training algorithm, \textbf{Diversely Stale Parameters} (DSP), which can address all these challenges without loss of accuracy nor memory issue. We also analyze the convergence of DSP with two popular gradient-based methods and prove that both of them are guaranteed to converge to critical points for non-convex problems. Finally, extensive experimental results on training deep convolutional neural networks demonstrate that our proposed DSP algorithm can achieve significant training speedup with stronger robustness and better generalization than compared methods. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJgLlgBKvH |
https://openreview.net/pdf?id=HJgLlgBKvH | |
PWC | https://paperswithcode.com/paper/diversely-stale-parameters-for-efficient-1 |
Repo | |
Framework | |
Towards Interpreting Deep Neural Networks via Understanding Layer Behaviors
Title | Towards Interpreting Deep Neural Networks via Understanding Layer Behaviors |
Authors | Anonymous |
Abstract | Deep neural networks (DNNs) have achieved unprecedented practical success in many applications. However, how to interpret DNNs is still an open problem. In particular, what do hidden layers behave is not clearly understood. In this paper, relying on a teacher-student paradigm, we seek to understand the layer behaviors of DNNs by monitoring" both across-layer and single-layer distribution evolution to some target distribution in the training. Here, the across-layer” and ``single-layer” considers the layer behavior \emph{along the depth} and a specific layer \emph{along training epochs}, respectively. Relying on optimal transport theory, we employ the Wasserstein distance ($W$-distance) to measure the divergence between the layer distribution and the target distribution. Theoretically, we prove that i) the $W$-distance of across layers to the target distribution tends to decrease along the depth. ii) the $W$-distance of a specific layer to the target distribution tends to decrease along training iterations. iii) However, a deep layer is not always better than a shallow layer for some samples. Moreover, our results helps to analyze the stability of layer distributions and explains why auxiliary losses helps the training of DNNs. Extensive experiments on real-world datasets justify our theoretical findings. | |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkxMKerYwr |
https://openreview.net/pdf?id=rkxMKerYwr | |
PWC | https://paperswithcode.com/paper/towards-interpreting-deep-neural-networks-via |
Repo | |
Framework | |
Plan2Vec: Unsupervised Representation Learning by Latent Plans
Title | Plan2Vec: Unsupervised Representation Learning by Latent Plans |
Authors | Anonymous |
Abstract | Creating a useful representation of the world takes more than just rote memorization of individual data samples. This is because fundamentally, we use our internal representation to plan, to solve problems, and to navigate the world. For a representation to be amenable to planning, it is critical for it to embody some notion of optimality. A representation learning objective that explicitly considers some form of planning should generate representations which are more computationally valuable than those that memorize samples. In this paper, we introduce \textbf{Plan2Vec}, an unsupervised representation learning objective inspired by value-based reinforcement learning methods. By abstracting away low-level control with a learned local metric, we show that it is possible to learn plannable representations that inform long-range structures, entirely passively from high-dimensional sequential datasets without supervision. A latent space is learned by playing an ``Imagined Planning Game” on the graph formed by the data points, using a local metric function trained contrastively from context. We show that the global metric on this learned embedding can be used to plan with O(1) complexity by linear interpolation. This exponential speed-up is critical for planning with a learned representation on any problem containing non-trivial global topology. We demonstrate the effectiveness of Plan2Vec on simulated toy tasks from both proprioceptive and image states, as well as two real-world image datasets, showing that Plan2Vec can effectively plan using learned representations. Additional results and videos can be found at \url{https://sites.google.com/view/plan2vec}. | |
Tasks | Representation Learning, Unsupervised Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Bye6weHFvB |
https://openreview.net/pdf?id=Bye6weHFvB | |
PWC | https://paperswithcode.com/paper/plan2vec-unsupervised-representation-learning |
Repo | |
Framework | |
Potential Flow Generator with $L_2$ Optimal Transport Regularity for Generative Models
Title | Potential Flow Generator with $L_2$ Optimal Transport Regularity for Generative Models |
Authors | Anonymous |
Abstract | We propose a potential flow generator with $L_2$ optimal transport regularity, which can be easily integrated into a wide range of generative models including different versions of GANs and flow-based models. With up to a slight augmentation of the original generator loss functions, our generator is not only a transport map from the input distribution to the target one, but also the one with minimum $L_2$ transport cost. We show the correctness and robustness of the potential flow generator in several 2D problems, and illustrate the concept of ``proximity’’ due to the $L_2$ optimal transport regularity. Subsequently, we demonstrate the effectiveness of the potential flow generator in image translation tasks with unpaired training data from the MNIST dataset and the CelebA dataset. | |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkexNpNFwS |
https://openreview.net/pdf?id=SkexNpNFwS | |
PWC | https://paperswithcode.com/paper/potential-flow-generator-with-l_2-optimal-1 |
Repo | |
Framework | |
Simple but effective techniques to reduce dataset biases
Title | Simple but effective techniques to reduce dataset biases |
Authors | Anonymous |
Abstract | There have been several studies recently showing that strong natural language understanding (NLU) models are prone to relying on unwanted dataset biases without learning the underlying task, resulting in models which fail to generalize to out-of-domain datasets, and are likely to perform poorly in real-world scenarios. We propose several learning strategies to train neural models which are more robust to such biases and transfer better to out-of-domain datasets. We introduce an additional lightweight bias-only model which learns dataset biases and uses its prediction to adjust the loss of the base model to reduce the biases. In other words, our methods down-weight the importance of the biased examples, and focus training on hard examples, i.e. examples that cannot be correctly classified by only relying on biases. Our approaches are model agnostic and simple to implement. We experiment on large-scale natural language inference and fact verification datasets and their out-of-domain datasets and show that our debiased models significantly improve the robustness in all settings, including gaining 9.76 points on the FEVER symmetric evaluation dataset, 5.45 on the HANS dataset and 4.78 points on the SNLI hard set. These datasets are specifically designed to assess the robustness of models in the out-of-domain setting where typical biases in the training data do not exist in the evaluation set. |
Tasks | Natural Language Inference |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJlCK1rYwB |
https://openreview.net/pdf?id=SJlCK1rYwB | |
PWC | https://paperswithcode.com/paper/simple-but-effective-techniques-to-reduce-1 |
Repo | |
Framework | |
Distributionally Robust Neural Networks
Title | Distributionally Robust Neural Networks |
Authors | Anonymous |
Abstract | Overparameterized neural networks trained to minimize average loss can be highly accurate on average on an i.i.d. test set, yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that do not hold at test time). Distributionally robust optimization (DRO) provides an approach for learning models that instead minimize worst-case training loss over a set of pre-defined groups. We find, however, that naively applying DRO to overparameterized neural networks fails: these models can perfectly fit the training data, and any model with vanishing average training loss will also already have vanishing worst-case training loss. Instead, the poor worst-case performance of these models arises from poor generalization on some groups. As a solution, we show that increased regularization—e.g., stronger-than-typical weight decay or early stopping—allows DRO models to achieve substantially higher worst-group accuracies, with 10% to 40% improvements over standard models on a natural language inference task and two image tasks, while maintaining high average accuracies. Our results suggest that regularization is critical for worst-group performance in the overparameterized regime, even if it is not needed for average performance. Finally, we introduce and provide convergence guarantees for a stochastic optimizer for this group DRO setting, underpinning the empirical study above. |
Tasks | Natural Language Inference |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryxGuJrFvS |
https://openreview.net/pdf?id=ryxGuJrFvS | |
PWC | https://paperswithcode.com/paper/distributionally-robust-neural-networks |
Repo | |
Framework | |
Adapting to Label Shift with Bias-Corrected Calibration
Title | Adapting to Label Shift with Bias-Corrected Calibration |
Authors | Anonymous |
Abstract | Label shift refers to the phenomenon where the marginal probability p(y) of observing a particular class changes between the training and test distributions, while the conditional probability p(xy) stays fixed. This is relevant in settings such as medical diagnosis, where a classifier trained to predict disease based on observed symptoms may need to be adapted to a different distribution where the baseline frequency of the disease is higher. Given estimates of p(yx) from a predictive model, one can apply domain adaptation procedures including Expectation Maximization (EM) and Black-Box Shift Estimation (BBSE) to efficiently correct for the difference in class proportions between the training and test distributions. Unfortunately, modern neural networks typically fail to produce well-calibrated estimates of p(yx), reducing the effectiveness of these approaches. In recent years, Temperature Scaling has emerged as an efficient approach to combat miscalibration. However, the effectiveness of Temperature Scaling in the context of adaptation to label shift has not been explored. In this work, we study the impact of various calibration approaches on shift estimates produced by EM or BBSE. In experiments with image classification and diabetic retinopathy detection, we find that calibration consistently tends to improve shift estimation. In particular, calibration approaches that include class-specific bias parameters are significantly better than approaches that lack class-specific bias parameters, suggesting that reducing systematic bias in the calibrated probabilities is especially important for domain adaptation. |
Tasks | Calibration, Diabetic Retinopathy Detection, Domain Adaptation, Image Classification, Medical Diagnosis |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkx-wA4YPS |
https://openreview.net/pdf?id=rkx-wA4YPS | |
PWC | https://paperswithcode.com/paper/adapting-to-label-shift-with-bias-corrected |
Repo | |
Framework | |
ExpandNets: Linear Over-parameterization to Train Compact Convolutional Networks
Title | ExpandNets: Linear Over-parameterization to Train Compact Convolutional Networks |
Authors | Anonymous |
Abstract | In this paper, we introduce a novel approach to training a given compact network. To this end, we build upon over-parameterization, which typically improves both optimization and generalization in neural network training, while being unnecessary at inference time. We propose to expand each linear layer of the compact network into multiple linear layers, without adding any nonlinearity. As such, the resulting expanded network can benefit from over-parameterization during training but can be compressed back to the compact one algebraically at inference. As evidenced by our experiments, this consistently outperforms training the compact network from scratch and knowledge distillation using a teacher. In this context, we introduce several expansion strategies, together with an initialization scheme, and demonstrate the benefits of our ExpandNets on several tasks, including image classification, object detection, and semantic segmentation. |
Tasks | Image Classification, Object Detection, Semantic Segmentation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1x3EgHtwB |
https://openreview.net/pdf?id=B1x3EgHtwB | |
PWC | https://paperswithcode.com/paper/expandnets-linear-over-parameterization-to |
Repo | |
Framework | |
Sensible adversarial learning
Title | Sensible adversarial learning |
Authors | Anonymous |
Abstract | The trade-off between robustness and standard accuracy has been consistently reported in the machine learning literature. Although the problem has been widely studied to understand and explain this trade-off, no studies have shown the possibility of a no trade-off solution. In this paper, motivated by the fact that the high dimensional distribution is poorly represented by limited data samples, we introduce sensible adversarial learning and demonstrate the synergistic effect between pursuits of natural accuracy and robustness. Specifically, we define a sensible adversary which is useful for learning a defense model and keeping a high natural accuracy simultaneously. We theoretically establish that the Bayes rule is the most robust multi-class classifier with the 0-1 loss under sensible adversarial learning. We propose a novel and efficient algorithm that trains a robust model with sensible adversarial examples, without a significant drop in natural accuracy. Our model on CIFAR10 yields state-of-the-art results against various attacks with perturbations restricted to l∞ with ε = 8/255, e.g., the robust accuracy 65.17% against PGD attacks as well as the natural accuracy 91.51%. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJlf_RVKwr |
https://openreview.net/pdf?id=rJlf_RVKwr | |
PWC | https://paperswithcode.com/paper/sensible-adversarial-learning |
Repo | |
Framework | |
SPROUT: Self-Progressing Robust Training
Title | SPROUT: Self-Progressing Robust Training |
Authors | Anonymous |
Abstract | Enhancing model robustness under new and even adversarial environments is a crucial milestone toward building trustworthy and reliable machine learning systems. Current robust training methods such as adversarial training explicitly specify an ``attack’’ (e.g., $\ell_{\infty}$-norm bounded perturbation) to generate adversarial examples during model training in order to improve adversarial robustness. In this paper, we take a different perspective and propose a new framework SPROUT, self-progressing robust training. During model training, SPROUT progressively adjusts training label distribution via our proposed parametrized label smoothing technique, making training free of attack generation and more scalable. We also motivate SPROUT using a general formulation based on vicinity risk minimization, which includes many robust training methods as special cases. Compared with state-of-the-art adversarial training methods (PGD-$\ell_\infty$ and TRADES) under $\ell_{\infty}$-norm bounded attacks and various invariance tests, SPROUT consistently attains superior performance and is more scalable to large neural networks. Our results shed new light on scalable, effective and attack-independent robust training methods. | |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SyxGoJrtPr |
https://openreview.net/pdf?id=SyxGoJrtPr | |
PWC | https://paperswithcode.com/paper/sprout-self-progressing-robust-training |
Repo | |
Framework | |
Bias in word embeddings
Title | Bias in word embeddings |
Authors | Orestis Papakyriakopoulos, Simon Hegelich, Juan Carlos Medina Serrano, Fabienne Marco |
Abstract | Word embeddings are a widely used set of natural language processing techniques that map words to vectors of real numbers. These vectors are used to improve the quality of generative and predictive models. Recent studies demonstrate that word embeddings contain and amplify biases present in data, such as stereotypes and prejudice. In this study, we provide a complete overview of bias in word embeddings. We develop a new technique for bias detection for gendered languages and use it to compare bias in embeddings trained on Wikipedia and on political social media data. We investigate bias diffusion and prove that existing biases are transferred to further machine learning models. We test two techniques for bias mitigation and show that the generally proposed methodology for debiasing models at the embeddings level is insufficient. Finally, we employ biased word embeddings and illustrate that they can be used for the detection of similar biases in new data. Given that word embeddings are widely used by commercial companies, we discuss the challenges and required actions towards fair algorithmic implementations and applications. |
Tasks | Word Embeddings |
Published | 2020-01-27 |
URL | https://dl.acm.org/doi/abs/10.1145/3351095.3372843 |
https://dl.acm.org/doi/pdf/10.1145/3351095.3372843 | |
PWC | https://paperswithcode.com/paper/bias-in-word-embeddings |
Repo | |
Framework | |