April 1, 2020

3354 words 16 mins read

Paper Group NANR 3

Paper Group NANR 3

Global Adversarial Robustness Guarantees for Neural Networks. Ternary MobileNets via Per-Layer Hybrid Filter Banks. Geometric Analysis of Nonconvex Optimization Landscapes for Overcomplete Learning. Robust Instruction-Following in a Situated Agent via Transfer-Learning from Text. Learning to Reason: Distilling Hierarchy via Self-Supervision and Rei …

Global Adversarial Robustness Guarantees for Neural Networks

Title Global Adversarial Robustness Guarantees for Neural Networks
Authors Anonymous
Abstract We investigate global adversarial robustness guarantees for machine learning models. Specifically, given a trained model we consider the problem of computing the probability that its prediction at any point sampled from the (unknown) input distribution is susceptible to adversarial attacks. Assuming continuity of the model, we prove measurability for a selection of local robustness properties used in the literature. We then show how concentration inequalities can be employed to compute global robustness with estimation error upper-bounded by $\epsilon$, for any $\epsilon > 0$ selected a priori. We utilise the methods to provide statistically sound analysis of the robustness/accuracy trade-off for a variety of neural networks architectures and training methods on MNIST, Fashion-MNIST and CIFAR. We empirically observe that robustness and accuracy tend to be negatively correlated for networks trained via stochastic gradient descent and with iterative pruning techniques, while a positive trend is observed between them in Bayesian settings.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BJgyn1BFwS
PDF https://openreview.net/pdf?id=BJgyn1BFwS
PWC https://paperswithcode.com/paper/global-adversarial-robustness-guarantees-for
Repo
Framework

Ternary MobileNets via Per-Layer Hybrid Filter Banks

Title Ternary MobileNets via Per-Layer Hybrid Filter Banks
Authors Anonymous
Abstract MobileNets family of computer vision neural networks have fueled tremendous progress in the design and organization of resource-efficient architectures in recent years. New applications with stringent real-time requirements in highly constrained devices require further compression of MobileNets-like already computeefficient networks. Model quantization is a widely used technique to compress and accelerate neural network inference and prior works have quantized MobileNets to 4 − 6 bits albeit with a modest to significant drop in accuracy. While quantization to sub-byte values (i.e. precision ≤ 8 bits) has been valuable, even further quantization of MobileNets to binary or ternary values is necessary to realize significant energy savings and possibly runtime speedups on specialized hardware, such as ASICs and FPGAs. Under the key observation that convolutional filters at each layer of a deep neural network may respond differently to ternary quantization, we propose a novel quantization method that generates per-layer hybrid filter banks consisting of full-precision and ternary weight filters for MobileNets. The layer-wise hybrid filter banks essentially combine the strengths of full-precision and ternary weight filters to derive a compact, energy-efficient architecture for MobileNets. Using this proposed quantization method, we quantized a substantial portion of weight filters of MobileNets to ternary values resulting in 27.98% savings in energy, and a 51.07% reduction in the model size, while achieving comparable accuracy and no degradation in throughput on specialized hardware in comparison to the baseline full-precision MobileNets.
Tasks Quantization
Published 2020-01-01
URL https://openreview.net/forum?id=S1lVhxSYPH
PDF https://openreview.net/pdf?id=S1lVhxSYPH
PWC https://paperswithcode.com/paper/ternary-mobilenets-via-per-layer-hybrid
Repo
Framework

Geometric Analysis of Nonconvex Optimization Landscapes for Overcomplete Learning

Title Geometric Analysis of Nonconvex Optimization Landscapes for Overcomplete Learning
Authors Anonymous
Abstract Learning overcomplete representations finds many applications in machine learning and data analytics. In the past decade, despite the empirical success of heuristic methods, theoretical understandings and explanations of these algorithms are still far from satisfactory. In this work, we provide new theoretical insights for several important representation learning problems: learning \emph{(i)} sparsely used overcomplete dictionaries and \emph{(ii)} convolutional dictionaries. We formulate these problems as $\ell^4$-norm optimization problems over the sphere, and study the geometric properties of their nonconvex optimization landscapes. For both problems, we show the nonconvex objective has benign (global) geometric structures, which enable development of efficient optimization methods finding the target solutions. Finally, our theoretical results are justified by numerical simulations.
Tasks Representation Learning
Published 2020-01-01
URL https://openreview.net/forum?id=rygixkHKDH
PDF https://openreview.net/pdf?id=rygixkHKDH
PWC https://paperswithcode.com/paper/geometric-analysis-of-nonconvex-optimization
Repo
Framework

Robust Instruction-Following in a Situated Agent via Transfer-Learning from Text

Title Robust Instruction-Following in a Situated Agent via Transfer-Learning from Text
Authors Anonymous
Abstract Recent work has described neural-network-based agents that are trained to execute language-like commands in simulated worlds, as a step towards an intelligent agent or robot that can be instructed by human users. However, the instructions that such agents are trained to follow are typically generated from templates (by an environment simulator), and do not reflect the varied or ambiguous expressions used by real people. We address this issue by integrating language encoders that are pretrained on large text corpora into a situated, instruction-following agent. In a procedurally-randomized first-person 3D world, we first train agents to follow synthetic instructions requiring the identification, manipulation and relative positioning of visually-realistic objects models. We then show how these abilities can transfer to a context where humans provide instructions in natural language, but only when agents are endowed with language encoding components that were pretrained on text-data. We explore techniques for integrating text-trained and environment-trained components into an agent, observing clear advantages for the fully-contextual phrase representations computed by the well-known BERT model, and additional gains by integrating a self-attention operation optimized to adapt BERT’s representations for the agent’s tasks and environment. These results bridge the gap between two successful strands of recent AI research: agent-centric behavior optimization and text-based representation learning.
Tasks Representation Learning, Transfer Learning
Published 2020-01-01
URL https://openreview.net/forum?id=rklraTNFwB
PDF https://openreview.net/pdf?id=rklraTNFwB
PWC https://paperswithcode.com/paper/robust-instruction-following-in-a-situated
Repo
Framework

Learning to Reason: Distilling Hierarchy via Self-Supervision and Reinforcement Learning

Title Learning to Reason: Distilling Hierarchy via Self-Supervision and Reinforcement Learning
Authors Anonymous
Abstract We present a hierarchical planning and control framework that enables an agent to perform various tasks and adapt to a new task flexibly. Rather than learning an individual policy for each particular task, the proposed framework, DISH, distills a hierarchical policy from a set of tasks by self-supervision and reinforcement learning. The framework is based on the idea of latent variable models that represent high-dimensional observations using low-dimensional latent variables. The resulting policy consists of two levels of hierarchy: (i) a planning module that reasons a sequence of latent intentions that would lead to optimistic future and (ii) a feedback control policy, shared across the tasks, that executes the inferred intention. Because the reasoning is performed in low-dimensional latent space, the learned policy can immediately be used to solve or adapt to new tasks without additional training. We demonstrate the proposed framework can learn compact representations (3-dimensional latent states for a 90-dimensional humanoid system) while solving a small number of imitation tasks, and the resulting policy is directly applicable to other types of tasks, i.e., navigation in cluttered environments.
Tasks Latent Variable Models
Published 2020-01-01
URL https://openreview.net/forum?id=HJgzpgrYDr
PDF https://openreview.net/pdf?id=HJgzpgrYDr
PWC https://paperswithcode.com/paper/learning-to-reason-distilling-hierarchy-via
Repo
Framework

A Constructive Prediction of the Generalization Error Across Scales

Title A Constructive Prediction of the Generalization Error Across Scales
Authors Anonymous
Abstract The dependency of the generalization error of neural networks on model and dataset size is of critical importance both in practice and for understanding the theory of neural networks. Nevertheless, the functional form of this dependency remains elusive. In this work, we present a functional form which approximates well the generalization error in practice. Capitalizing on the successful concept of model scaling (e.g., width, depth), we are able to simultaneously construct such a form and specify the exact models which can attain it across model/data scales. Our construction follows insights obtained from observations conducted over a range of model/data scales, in various model types and datasets, in vision and language tasks. We show that the form both fits the observations well across scales, and provides accurate predictions from small- to large-scale models and data.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=ryenvpEKDr
PDF https://openreview.net/pdf?id=ryenvpEKDr
PWC https://paperswithcode.com/paper/a-constructive-prediction-of-the-1
Repo
Framework

A Copula approach for hyperparameter transfer learning

Title A Copula approach for hyperparameter transfer learning
Authors Anonymous
Abstract Bayesian optimization (BO) is a popular methodology to tune the hyperparameters of expensive black-box functions. Despite its success, standard BO focuses on a single task at a time and is not designed to leverage information from related functions, such as tuning performance metrics of the same algorithm across multiple datasets. In this work, we introduce a novel approach to achieve transfer learning across different datasets as well as different metrics. The main idea is to regress the mapping from hyperparameter to metric quantiles with a semi-parametric Gaussian Copula distribution, which provides robustness against different scales or outliers that can occur in different tasks. We introduce two methods to leverage this estimation: a Thompson sampling strategy as well as a Gaussian Copula process using such quantile estimate as a prior. We show that these strategies can combine the estimation of multiple metrics such as runtime and accuracy, steering the optimization toward cheaper hyperparameters for the same level of accuracy. Experiments on an extensive set of hyperparameter tuning tasks demonstrate significant improvements over state-of-the-art methods.
Tasks Transfer Learning
Published 2020-01-01
URL https://openreview.net/forum?id=ryx4PJrtvS
PDF https://openreview.net/pdf?id=ryx4PJrtvS
PWC https://paperswithcode.com/paper/a-copula-approach-for-hyperparameter-transfer
Repo
Framework

AN EXPONENTIAL LEARNING RATE SCHEDULE FOR BATCH NORMALIZED NETWORKS

Title AN EXPONENTIAL LEARNING RATE SCHEDULE FOR BATCH NORMALIZED NETWORKS
Authors Anonymous
Abstract Intriguing empirical evidence exists that deep learning can work well with exotic schedules for varying the learning rate. This paper suggests that the phenomenon may be due to Batch Normalization or BN(Ioffe & Szegedy, 2015), which is ubiq- uitous and provides benefits in optimization and generalization across all standard architectures. The following new results are shown about BN with weight decay and momentum (in other words, the typical use case which was not considered in earlier theoretical analyses of stand-alone BN (Ioffe & Szegedy, 2015; Santurkar et al., 2018; Arora et al., 2018) • Training can be done using SGD with momentum and an exponentially in- creasing learning rate schedule, i.e., learning rate increases by some (1 + α) factor in every epoch for some α > 0. (Precise statement in the paper.) To the best of our knowledge this is the first time such a rate schedule has been successfully used, let alone for highly successful architectures. As ex- pected, such training rapidly blows up network weights, but the net stays well-behaved due to normalization. • Mathematical explanation of the success of the above rate schedule: a rigor- ous proof that it is equivalent to the standard setting of BN + SGD + Standard Rate Tuning + Weight Decay + Momentum. This equivalence holds for other normalization layers as well, Group Normalization(Wu & He, 2018), Layer Normalization(Ba et al., 2016), Instance Norm(Ulyanov et al., 2016), etc. • A worked-out toy example illustrating the above linkage of hyper- parameters. Using either weight decay or BN alone reaches global minimum, but convergence fails when both are used.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rJg8TeSFDH
PDF https://openreview.net/pdf?id=rJg8TeSFDH
PWC https://paperswithcode.com/paper/an-exponential-learning-rate-schedule-for
Repo
Framework

Meta Decision Trees for Explainable Recommendation Systems

Title Meta Decision Trees for Explainable Recommendation Systems
Authors Anonymous
Abstract We tackle the problem of building explainable recommendation systems that are based on a per-user decision tree, with decision rules that are based on single attribute values. We build the trees by applying learned regression functions to obtain the decision rules as well as the values at the leaf nodes. The regression functions receive as input the embedding of the user’s training set, as well as the embedding of the samples that arrive at the current node. The embedding and the regressors are learned end-to-end with a loss that encourages the decision rules to be sparse. By applying our method, we obtain a collaborative filtering solution that provides a direct explanation to every rating it provides. With regards to accuracy, it is competitive with other algorithms. However, as expected, explainability comes at a cost and the accuracy is typically slightly lower than the state of the art result reported in the literature. Our code is attached as supplementary.
Tasks Recommendation Systems
Published 2020-01-01
URL https://openreview.net/forum?id=S1ebsJrFwS
PDF https://openreview.net/pdf?id=S1ebsJrFwS
PWC https://paperswithcode.com/paper/meta-decision-trees-for-explainable
Repo
Framework

A Data-Efficient Mutual Information Neural Estimator for Statistical Dependency Testing

Title A Data-Efficient Mutual Information Neural Estimator for Statistical Dependency Testing
Authors Anonymous
Abstract Measuring Mutual Information (MI) between high-dimensional, continuous, random variables from observed samples has wide theoretical and practical applications. Recent works have developed accurate MI estimators through provably low-bias approximations and tight variational lower bounds assuming abundant supply of samples, but require an unrealistic number of samples to guarantee statistical significance of the estimation. In this work, we focus on improving data efficiency and propose a Data-Efficient MINE Estimator (DEMINE) that can provide a tight lower confident interval of MI under limited data, through adding cross-validation to the MINE lower bound (Belghazi et al., 2018). Hyperparameter search is employed and a novel meta-learning approach with task augmentation is developed to increase robustness to hyperparamters, reduce overfitting and improve accuracy. With improved data-efficiency, our DEMINE estimator enables statistical testing of dependency at practical dataset sizes. We demonstrate the effectiveness of DEMINE on synthetic benchmarks and a real world fMRI dataset, with application of inter-subject correlation analysis.
Tasks Meta-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=SklOypVKvS
PDF https://openreview.net/pdf?id=SklOypVKvS
PWC https://paperswithcode.com/paper/a-data-efficient-mutual-information-neural
Repo
Framework

Hebbian Graph Embeddings

Title Hebbian Graph Embeddings
Authors Anonymous
Abstract Representation learning has recently been successfully used to create vector representations of entities in language learning, recommender systems and in similarity learning. Graph embeddings exploit the locality structure of a graph and generate embeddings for nodes which could be words in a language, products on a retail website; and the nodes are connected based on a context window. In this paper, we consider graph embeddings with an error-free associative learning update rule, which models the embedding vector of node as a non-convex Gaussian mixture of the embeddings of the nodes in its immediate vicinity with some constant variance that is reduced as iterations progress. It is very easy to parallelize our algorithm without any form of shared memory, which makes it possible to use it on very large graphs with a much higher dimensionality of the embeddings. We study the efficacy of proposed method on several benchmark data sets in Goyal & Ferrara(2018b) and favorably compare with state of the art methods. Further, proposed method is applied to generate relevant recommendations for a large retailer.
Tasks Recommendation Systems, Representation Learning
Published 2020-01-01
URL https://openreview.net/forum?id=H1ep5TNKwr
PDF https://openreview.net/pdf?id=H1ep5TNKwr
PWC https://paperswithcode.com/paper/hebbian-graph-embeddings-1
Repo
Framework

Constant Time Graph Neural Networks

Title Constant Time Graph Neural Networks
Authors Anonymous
Abstract The recent advancements in graph neural networks (GNNs) have led to state-of-the-art performances in various applications, including chemo-informatics, question-answering systems, and recommender systems. However, scaling up these methods to huge graphs such as social network graphs and web graphs still remains a challenge. In particular, the existing methods for accelerating GNNs are either not theoretically guaranteed in terms of approximation error, or they require at least a linear time computation cost. In this study, we analyze the neighbor sampling technique to obtain a constant time approximation algorithm for GraphSAGE, the graph attention networks (GAT), and the graph convolutional networks (GCN). The proposed approximation algorithm can theoretically guarantee the precision of approximation. The key advantage of the proposed approximation algorithm is that the complexity is completely independent of the numbers of the nodes, edges, and neighbors of the input and depends only on the error tolerance and confidence probability. To the best of our knowledge, this is the first constant time approximation algorithm for GNNs with a theoretical guarantee. Through experiments using synthetic and real-world datasets, we demonstrate the speed and precision of the proposed approximation algorithm and validate our theoretical results.
Tasks Question Answering, Recommendation Systems
Published 2020-01-01
URL https://openreview.net/forum?id=rkgKW64FPH
PDF https://openreview.net/pdf?id=rkgKW64FPH
PWC https://paperswithcode.com/paper/constant-time-graph-neural-networks-1
Repo
Framework

Rotation-invariant clustering of functional cell types in primary visual cortex

Title Rotation-invariant clustering of functional cell types in primary visual cortex
Authors Anonymous
Abstract Similar to a convolutional neural network (CNN), the mammalian retina encodes visual information into several dozen nonlinear feature maps, each formed by one ganglion cell type that tiles the visual space in an approximately shift-equivariant manner. Whether such organization into distinct cell types is maintained at the level of cortical image processing is an open question. Predictive models building upon convolutional features have been shown to provide state-of-the-art performance, and have recently been extended to include rotation equivariance in order to account for the orientation selectivity of V1 neurons. However, generally no direct correspondence between CNN feature maps and groups of individual neurons emerges in these models, thus rendering it an open question whether V1 neurons form distinct functional clusters. Here we build upon the rotation-equivariant representation of a CNN-based V1 model and propose a methodology for clustering the representations of neurons in this model to find functional cell types independent of preferred orientations of the neurons. We apply this method to a dataset of 6000 neurons and provide evidence that discrete functional cell types exist in V1. By visualizing the preferred stimuli of these clusters, we highlight the range of non-linear computations executed by V1 neurons.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rklr9kHFDB
PDF https://openreview.net/pdf?id=rklr9kHFDB
PWC https://paperswithcode.com/paper/rotation-invariant-clustering-of-functional
Repo
Framework

An Algorithm-Agnostic NAS Benchmark

Title An Algorithm-Agnostic NAS Benchmark
Authors Anonymous
Abstract Neural architecture search (NAS) has achieved breakthrough success in a great number of applications in the past few years. It could be time to take a step back and analyze the good and bad aspects in the field of NAS. A variety of algorithms search architectures under different search space. These searched architectures are trained using different setups, e.g., hyper-parameters, data augmentation, regularization. This raises a fairness problem when comparing the performance of various NAS algorithms. In this work, we propose an Algorithm-Agnostic NAS Benchmark (AA-NAS-Bench) with a fixed search space, which provides a unified benchmark for almost any up-to-date NAS algorithms. The design of our search space is inspired from that used in the most popular cell-based searching algorithms, where a cell is represented as a directed acyclic graph. Each edge here is associated with an operation selected from a predefined operation set. For it to be applicable for all NAS algorithms, the search space defined in AA-NAS-Bench includes 4 nodes and 5 associated operation options, which generates 15,625 neural cell candidates in total. The training log using the same setup and performance for each architecture candidate are provided for three datasets. This allows researchers to avoid unnecessary repetitive training for selected architecture and focus solely on the search algorithm itself. The training time saved for every architecture also largely improves the efficiency of most NAS algorithms and presents a more computational cost friendly NAS community for a broader range of researchers. Side information such as fine-grained loss and accuracy is also provided, which can give inspirations to new designs of NAS algorithms. We demonstrate the applicability of the proposed AA-NAS-Bench via benchmarking many recent NAS algorithms.
Tasks Data Augmentation, Neural Architecture Search
Published 2020-01-01
URL https://openreview.net/forum?id=HJxyZkBKDr
PDF https://openreview.net/pdf?id=HJxyZkBKDr
PWC https://paperswithcode.com/paper/an-algorithm-agnostic-nas-benchmark
Repo
Framework

Economy Statistical Recurrent Units For Inferring Nonlinear Granger Causality

Title Economy Statistical Recurrent Units For Inferring Nonlinear Granger Causality
Authors Anonymous
Abstract Granger causality is a widely-used criterion for analyzing interactions in large-scale networks. As most physical interactions are inherently nonlinear, we consider the problem of inferring the existence of pairwise Granger causality between nonlinearly interacting stochastic processes from their time series measurements. Our proposed approach relies on modeling the embedded nonlinearities in the measurements using a component-wise time series prediction model based on Statistical Recurrent Units (SRUs). We make a case that the network topology of Granger causal relations is directly inferrable from a structured sparse estimate of the internal parameters of the SRU networks trained to predict the processes’ time series measurements. We propose a variant of SRU, called economy-SRU, which, by design has considerably fewer trainable parameters, and therefore less prone to overfitting. The economy-SRU computes a low-dimensional sketch of its high-dimensional hidden state in the form of random projections to generate the feedback for its recurrent processing. Additionally, the internal weight parameters of the economy-SRU are strategically regularized in a group-wise manner to facilitate the proposed network in extracting meaningful predictive features that are highly time-localized to mimic real-world causal events. Extensive experiments are carried out to demonstrate that the proposed economy-SRU based time series prediction model outperforms the MLP, LSTM and attention-gated CNN-based time series models considered previously for inferring Granger causality.
Tasks Time Series, Time Series Prediction
Published 2020-01-01
URL https://openreview.net/forum?id=SyxV9ANFDH
PDF https://openreview.net/pdf?id=SyxV9ANFDH
PWC https://paperswithcode.com/paper/economy-statistical-recurrent-units-for
Repo
Framework
comments powered by Disqus