April 1, 2020

2848 words 14 mins read

Paper Group NANR 130

Paper Group NANR 130

How to 0wn the NAS in Your Spare Time. Auto Completion of User Interface Layout Design Using Transformer-Based Tree Decoders. Deep Variational Semi-Supervised Novelty Detection. DS-VIC: Unsupervised Discovery of Decision States for Transfer in RL. NAS-BENCH-1SHOT1: BENCHMARKING AND DISSECTING ONE-SHOT NEURAL ARCHITECTURE SEARCH. An Information Theo …

How to 0wn the NAS in Your Spare Time

Title How to 0wn the NAS in Your Spare Time
Authors Anonymous
Abstract New data processing pipelines and unique network architectures increasingly drive the success of deep learning. In consequence, the industry considers top-performing architectures as intellectual property and devotes considerable computational resources to discovering such architectures through neural architecture search (NAS). This provides an incentive for adversaries to steal these unique architectures; when used in the cloud, to provide Machine Learning as a Service (MLaaS), the adversaries also have an opportunity to reconstruct the architectures by exploiting a range of hardware side-channels. However, it is challenging to reconstruct unique architectures and pipelines without knowing the computational graph (e.g., the layers, branches or skip connections), the architectural parameters (e.g., the number of filters in a convolutional layer) or the specific pre-processing steps (e.g. embeddings). In this paper, we design an algorithm that reconstructs the key components of a unique deep learning system by exploiting a small amount of information leakage from a cache side-channel attack, Flush+Reload. We use Flush+Reload to infer the trace of computations and the timing for each computation. Our algorithm then generates candidate computational graphs from the trace and eliminates incompatible candidates through a parameter estimation process. We implement our algorithm on PyTorch and Tensorflow. We demonstrate experimentally that we can reconstruct MalConv, a novel data pre-processing pipeline for malware detection, and ProxylessNAS-CPU, a novel network architecture for the ImageNet classification optimized to run on CPUs, without knowing the architecture family. In both cases, we achieve 0% error. These results suggest hardware side channels are a practical attack vector against MLaaS, and more efforts should be devoted to understanding their impact on the security of deep learning systems.
Tasks Malware Detection, Neural Architecture Search
Published 2020-01-01
URL https://openreview.net/forum?id=S1erpeBFPB
PDF https://openreview.net/pdf?id=S1erpeBFPB
PWC https://paperswithcode.com/paper/how-to-0wn-the-nas-in-your-spare-time
Repo
Framework

Auto Completion of User Interface Layout Design Using Transformer-Based Tree Decoders

Title Auto Completion of User Interface Layout Design Using Transformer-Based Tree Decoders
Authors Anonymous
Abstract It has been of increasing interest in the field to develop automatic machineries to facilitate the design process. In this paper, we focus on assisting graphical user interface (UI) layout design, a crucial task in app development. Given a partial layout, which a designer has entered, our model learns to complete the layout by predicting the remaining UI elements with a correct position and dimension as well as the hierarchical structures. Such automation will significantly ease the effort of UI designers and developers. While we focus on interface layout prediction, our model can be generally applicable for other layout prediction problems that involve tree structures and 2-dimensional placements. Particularly, we design two versions of Transformer-based tree decoders: Pointer and Recursive Transformer, and experiment with these models on a public dataset. We also propose several metrics for measuring the accuracy of tree prediction and ground these metrics in the domain of user experience. These contribute a new task and methods to deep learning research.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SylWNC4FPH
PDF https://openreview.net/pdf?id=SylWNC4FPH
PWC https://paperswithcode.com/paper/auto-completion-of-user-interface-layout
Repo
Framework

Deep Variational Semi-Supervised Novelty Detection

Title Deep Variational Semi-Supervised Novelty Detection
Authors Anonymous
Abstract In anomaly detection (AD), one seeks to identify whether a test sample is abnormal, given a data set of normal samples. A recent and promising approach to AD relies on deep generative models, such as variational autoencoders (VAEs),for unsupervised learning of the normal data distribution. In semi-supervised AD (SSAD), the data also includes a small sample of labeled anomalies. In this work,we propose two variational methods for training VAEs for SSAD. The intuitive idea in both methods is to train the encoder to ‘separate’ between latent vectors for normal and outlier data. We show that this idea can be derived from principled probabilistic formulations of the problem, and propose simple and effective algorithms. Our methods can be applied to various data types, as we demonstrate on SSAD datasets ranging from natural images to astronomy and medicine, and can be combined with any VAE model architecture. When comparing to state-of-the-art SSAD methods that are not specific to particular data types, we obtain marked improvement in outlier detection.
Tasks Anomaly Detection, Outlier Detection
Published 2020-01-01
URL https://openreview.net/forum?id=Hkxp3JHtPr
PDF https://openreview.net/pdf?id=Hkxp3JHtPr
PWC https://paperswithcode.com/paper/deep-variational-semi-supervised-novelty
Repo
Framework

DS-VIC: Unsupervised Discovery of Decision States for Transfer in RL

Title DS-VIC: Unsupervised Discovery of Decision States for Transfer in RL
Authors Anonymous
Abstract We learn to identify decision states, namely the parsimonious set of states where decisions meaningfully affect the future states an agent can reach in an environment. We utilize the VIC framework, which maximizes an agent’s `empowerment’, ie the ability to reliably reach a diverse set of states – and formulate a sandwich bound on the empowerment objective that allows identification of decision states. Unlike previous work, our decision states are discovered without extrinsic rewards – simply by interacting with the world. Our results show that our decision states are: 1) often interpretable, and 2) lead to better exploration on downstream goal-driven tasks in partially observable environments. |
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SJeQGJrKwH
PDF https://openreview.net/pdf?id=SJeQGJrKwH
PWC https://paperswithcode.com/paper/ds-vic-unsupervised-discovery-of-decision
Repo
Framework
Title NAS-BENCH-1SHOT1: BENCHMARKING AND DISSECTING ONE-SHOT NEURAL ARCHITECTURE SEARCH
Authors Anonymous
Abstract One-shot neural architecture search (NAS) has played a crucial role in making NAS methods computationally feasible in practice. Nevertheless, there is still a lack of understanding on how these weight-sharing algorithms exactly work due to the many factors controlling the dynamics of the process. In order to allow a scientific study of these components, we introduce a general framework for one-shot NAS that can be instantiated to many recently-introduced variants and introduce a general benchmarking framework that draws on the recent large-scale tabular benchmark NAS-Bench-101 for cheap anytime evaluations of one-shot NAS methods. To showcase the framework, we compare several state-of-the-art one-shot NAS methods, examine how sensitive they are to their hyperparameters and how they can be improved by careful regularization, and compare their per- formance to that of blackbox optimizers for NAS-Bench-101.
Tasks Neural Architecture Search
Published 2020-01-01
URL https://openreview.net/forum?id=SJx9ngStPH
PDF https://openreview.net/pdf?id=SJx9ngStPH
PWC https://paperswithcode.com/paper/nas-bench-1shot1-benchmarking-and-dissecting
Repo
Framework

An Information Theoretic Approach to Distributed Representation Learning

Title An Information Theoretic Approach to Distributed Representation Learning
Authors Anonymous
Abstract The problem of distributed representation learning is one in which multiple sources of information X1,…,XK are processed separately so as to extract useful information about some statistically correlated ground truth Y. We investigate this problem from information-theoretic grounds. For both discrete memoryless (DM) and memoryless vector Gaussian models, we establish fundamental limits of learning in terms of optimal tradeoffs between accuracy and complexity. We also develop a variational bound on the optimal tradeoff that generalizes the evidence lower bound (ELBO) to the distributed setting. Furthermore, we provide a variational inference type algorithm that allows to compute this bound and in which the mappings are parametrized by neural networks and the bound approximated by Markov sampling and optimized with stochastic gradient descent. Experimental results on synthetic and real datasets are provided to support the efficiency of the approaches and algorithms which we develop in this paper.
Tasks Representation Learning
Published 2020-01-01
URL https://openreview.net/forum?id=H1gyy1BtDS
PDF https://openreview.net/pdf?id=H1gyy1BtDS
PWC https://paperswithcode.com/paper/an-information-theoretic-approach-to-2
Repo
Framework

TabFact: A Large-scale Dataset for Table-based Fact Verification

Title TabFact: A Large-scale Dataset for Table-based Fact Verification
Authors Anonymous
Abstract The problem of verifying whether a textual hypothesis holds based on the given evidence, also known as fact verification, plays an important role in the study of natural language understanding and semantic representation. However, existing studies are mainly restricted to dealing with unstructured evidence (e.g., natural language sentences and documents, news, etc), while verification under structured evidence, such as tables, graphs, and databases, remains unexplored. This paper specifically aims to study the fact verification given semi-structured data as evidence. To this end, we construct a large-scale dataset called TabFact with 16k Wikipedia tables as the evidence for 118k human-annotated natural language statements, which are labeled as either ENTAILED or REFUTED. TabFact is challenging since it involves both soft linguistic reasoning and hard symbolic reasoning. To address these reasoning challenges, we design two different models: Table-BERT and Latent Program Algorithm (LPA). Table-BERT leverages the state-of-the-art pre-trained language model to encode the linearized tables and statements into continuous vectors for verification. LPA parses statements into LISP-like programs and executes them against the tables to obtain the returned binary value for verification. Both methods achieve similar accuracy but still lag far behind human performance. We also perform a comprehensive analysis to demonstrate great future opportunities.
Tasks Language Modelling, Table-based Fact Verification
Published 2020-01-01
URL https://openreview.net/forum?id=rkeJRhNYDH
PDF https://openreview.net/pdf?id=rkeJRhNYDH
PWC https://paperswithcode.com/paper/tabfact-a-large-scale-dataset-for-table-based-1
Repo
Framework
Title Understanding Architectures Learnt by Cell-based Neural Architecture Search
Authors Anonymous
Abstract Neural architecture search (NAS) searches architectures automatically for given tasks, e.g., image classification and language modeling. Improving the search efficiency and effectiveness have attracted increasing attention in recent years. However, few efforts have been devoted to understanding the generated architectures and particularly the commonality these architectures may share. In this paper, we first summarize the common connection pattern of NAS architectures. We empirically and theoretically show that the common connection pattern contributes to a smooth loss landscape and more accurate gradient information, and therefore fast convergence. As a consequence, NAS algorithms tend to select architectures with such common connection pattern during architecture search. However, we show that the selected architectures with the common connection pattern may not necessarily lead to a better generalization performance compared with other candidate architectures in the same search space, and therefore further improvement is possible by revising existing NAS algorithms.
Tasks Image Classification, Language Modelling, Neural Architecture Search
Published 2020-01-01
URL https://openreview.net/forum?id=BJxH22EKPS
PDF https://openreview.net/pdf?id=BJxH22EKPS
PWC https://paperswithcode.com/paper/understanding-architectures-learnt-by-cell-1
Repo
Framework

Domain-Independent Dominance of Adaptive Methods

Title Domain-Independent Dominance of Adaptive Methods
Authors Anonymous
Abstract From a simplified analysis of adaptive methods, we derive AvaGrad, a new optimizer which outperforms SGD on vision tasks when its adaptability is properly tuned. We observe that the power of our method is partially explained by a decoupling of learning rate and adaptability, greatly simplifying hyperparameter search. In light of this observation, we demonstrate that, against conventional wisdom, Adam can also outperform SGD on vision tasks, as long as the coupling between its learning rate and adaptability is taken into account. In practice, AvaGrad matches the best results, as measured by generalization accuracy, delivered by any existing optimizer (SGD or adaptive) across image classification (CIFAR, ImageNet) and character-level language modelling (Penn Treebank) tasks. This later observation, alongside of AvaGrad’s decoupling of hyperparameters, could make it the preferred optimizer for deep learning, replacing both SGD and Adam.
Tasks Image Classification, Language Modelling
Published 2020-01-01
URL https://openreview.net/forum?id=HylNWkHtvB
PDF https://openreview.net/pdf?id=HylNWkHtvB
PWC https://paperswithcode.com/paper/domain-independent-dominance-of-adaptive
Repo
Framework

LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning

Title LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning
Authors Anonymous
Abstract While pre-training and fine-tuning, e.g., BERT~\citep{devlin2018bert}, GPT-2~\citep{radford2019language}, have achieved great success in language understanding and generation tasks, the pre-trained models are usually too big for online deployment in terms of both memory cost and inference speed, which hinders them from practical online usage. In this paper, we propose LightPAFF, a Lightweight Pre-training And Fine-tuning Framework that leverages two-stage knowledge distillation to transfer knowledge from a big teacher model to a lightweight student model in both pre-training and fine-tuning stages. In this way the lightweight model can achieve similar accuracy as the big teacher model, but with much fewer parameters and thus faster online inference speed. LightPAFF can support different pre-training methods (such as BERT, GPT-2 and MASS~\citep{song2019mass}) and be applied to many downstream tasks. Experiments on three language understanding tasks, three language modeling tasks and three sequence to sequence generation tasks demonstrate that while achieving similar accuracy with the big BERT, GPT-2 and MASS models, LightPAFF reduces the model size by nearly 5x and improves online inference speed by 5x-7x.
Tasks Language Modelling
Published 2020-01-01
URL https://openreview.net/forum?id=B1xv9pEKDS
PDF https://openreview.net/pdf?id=B1xv9pEKDS
PWC https://paperswithcode.com/paper/lightpaff-a-two-stage-distillation-framework
Repo
Framework

OPTIMAL BINARY QUANTIZATION FOR DEEP NEURAL NETWORKS

Title OPTIMAL BINARY QUANTIZATION FOR DEEP NEURAL NETWORKS
Authors Anonymous
Abstract Quantizing weights and activations of deep neural networks results in significant improvement in inference efficiency at the cost of lower accuracy. A source of the accuracy gap between full precision and quantized models is the quantization error. In this work, we focus on the binary quantization, in which values are mapped to -1 and 1. We introduce several novel quantization algorithms: optimal 2-bits, optimal ternary, and greedy. Our quantization algorithms can be implemented efficiently on the hardware using bitwise operations. We present proofs to show that our proposed methods are optimal, and also provide empirical error analysis. We conduct experiments on the ImageNet dataset and show a reduced accuracy gap when using the proposed optimal quantization algorithms.
Tasks Quantization
Published 2020-01-01
URL https://openreview.net/forum?id=S1gTwJSKvr
PDF https://openreview.net/pdf?id=S1gTwJSKvr
PWC https://paperswithcode.com/paper/optimal-binary-quantization-for-deep-neural
Repo
Framework

Off-policy Multi-step Q-learning

Title Off-policy Multi-step Q-learning
Authors Anonymous
Abstract In the past few years, off-policy reinforcement learning methods have shown promising results in their application for robot control. Deep Q-learning, however, still suffers from poor data-efficiency which is limiting with regard to real-world applications. We follow the idea of multi-step TD-learning to enhance data-efficiency while remaining off-policy by proposing two novel Temporal-Difference formulations: (1) Truncated Q-functions which represent the return for the first n steps of a policy rollout and (2) Shifted Q-functions, acting as the farsighted return after this truncated rollout. We prove that the combination of these short- and long-term predictions is a representation of the full return, leading to the Composite Q-learning algorithm. We show the efficacy of Composite Q-learning in the tabular case and compare our approach in the function-approximation setting with TD3, Model-based Value Expansion and TD3(Delta), which we introduce as an off-policy variant of TD(Delta). We show on three simulated robot tasks that Composite TD3 outperforms TD3 as well as state-of-the-art off-policy multi-step approaches in terms of data-efficiency.
Tasks Q-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=r1lczkHKPr
PDF https://openreview.net/pdf?id=r1lczkHKPr
PWC https://paperswithcode.com/paper/off-policy-multi-step-q-learning-1
Repo
Framework

Storage Efficient and Dynamic Flexible Runtime Channel Pruning via Deep Reinforcement Learning

Title Storage Efficient and Dynamic Flexible Runtime Channel Pruning via Deep Reinforcement Learning
Authors Anonymous
Abstract In this paper, we propose a deep reinforcement learning (DRL) based framework to efficiently perform runtime channel pruning on convolutional neural networks (CNNs). Our DRL-based framework aims to learn a pruning strategy to determine how many and which channels to be pruned in each convolutional layer, depending on each specific input instance in runtime. The learned policy optimizes the performance of the network by restricting the computational resource on layers under an overall computation budget. Furthermore, unlike other runtime pruning methods which require to store all channels parameters in inference, our framework can reduce parameters storage consumption at deployment by introducing a static pruning component. Comparison experimental results with existing runtime and static pruning methods on state-of-the-art CNNs demonstrate that our proposed framework is able to provide a tradeoff between dynamic flexibility and storage efficiency in runtime channel pruning.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=S1ewjhEFwr
PDF https://openreview.net/pdf?id=S1ewjhEFwr
PWC https://paperswithcode.com/paper/storage-efficient-and-dynamic-flexible
Repo
Framework

Detecting Noisy Training Data with Loss Curves

Title Detecting Noisy Training Data with Loss Curves
Authors Anonymous
Abstract This paper introduces a new method to discover mislabeled training samples and to mitigate their impact on the training process of deep networks. At the heart of our algorithm lies the Area Under the Loss (AUL) statistic, which can be easily computed for each sample in the training set. We show that the AUL can use training dynamics to differentiate between (clean) samples that benefit from generalization and (mislabeled) samples that need to be “memorized”. We demonstrate that the estimated AUL score conditioned on clean vs. noisy is approximately Gaussian distributed and can be well estimated with a simple Gaussian Mixture Model (GMM). The resulting GMM provides us with mixing coefficients that reveal the percentage of mislabeled samples in a data set as well as probability estimates that each individual training sample is mislabeled. We show that these probability estimates can be used to down-weight suspicious training samples and successfully alleviate the damaging impact of label noise. We demonstrate on the CIFAR10/100 datasets that our proposed approach is significantly more accurate and consistent across model architectures than all prior work.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HyenUkrtDB
PDF https://openreview.net/pdf?id=HyenUkrtDB
PWC https://paperswithcode.com/paper/detecting-noisy-training-data-with-loss
Repo
Framework

PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction

Title PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction
Authors Anonymous
Abstract We propose an algorithm combining calibrated prediction and generalization bounds from learning theory to construct confidence sets for deep neural networks with PAC guarantees—i.e., the confidence set for a given input contains the true label with high probability. We demonstrate how our approach can be used to construct PAC confidence sets on ResNet for ImageNet, and on a dynamics model the half-cheetah reinforcement learning problem.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BJxVI04YvB
PDF https://openreview.net/pdf?id=BJxVI04YvB
PWC https://paperswithcode.com/paper/pac-confidence-sets-for-deep-neural-networks
Repo
Framework
comments powered by Disqus