April 1, 2020

3168 words 15 mins read

Paper Group NANR 1

Recurrent Neural Networks are Universal Filters. Anomalous Pattern Detection in Activations and Reconstruction Error of Autoencoders. Improved Generalization Bound of Permutation Invariant Deep Neural Networks. Domain-Invariant Representations: A Look on Compression and Weights. Zeno++: Robust Fully Asynchronous SGD. Collaborative Generated Hashing …

Recurrent Neural Networks are Universal Filters


Title	Recurrent Neural Networks are Universal Filters
Authors	Anonymous
Abstract	Recurrent neural networks (RNN) are powerful time series modeling tools in ma- chine learning. It has been successfully applied in a variety of fields such as natural language processing (Mikolov et al. (2010), Graves et al. (2013), Du et al. (2015)), control (Fei & Lu (2017)) and traffic forecasting (Ma et al. (2015)), etc. In those application scenarios, RNN can be viewed as implicitly modelling a stochastic dy- namic system. Another type of popular neural network, deep (feed-forward) neural network has also been successfully applied in different engineering disciplines, whose approximation capability has been well characterized by universal approxi- mation theorem (Hornik et al. (1989), Park & Sandberg (1991), Lu et al. (2017)). However, the underlying approximation capability of RNN has not been fully understood in a quantitative way. In our paper, we consider a stochastic dynamic system with noisy observations and analyze the approximation capability of RNN in synthesizing the optimal state estimator, namely optimal filter. We unify the recurrent neural network into Bayesian filtering framework and show that recurrent neural network is a universal approximator of optimal finite dimensional filters under some mild conditions. That is to say, for any stochastic dynamic systems with noisy sequential observations that satisfy some mild conditions, we show that (informal) ∀ > 0, ∃ RNN-based filter, s.t. lim sup x̂ kk − E[x k Y k ] < , k→∞ where x̂ kk is RNN-based filter’s estimate of state x k at step k conditioned on the observation history and E[x k Y k ] is the conditional mean of x k , known as the optimal estimate of the state in minimum mean square error sense. As an interesting special case, the widely used Kalman filter (KF) can be synthesized by RNN.
Tasks	Time Series
Published	2020-01-01
URL	https://openreview.net/forum?id=rJgHC2VKvB
PDF	https://openreview.net/pdf?id=rJgHC2VKvB
PWC	https://paperswithcode.com/paper/recurrent-neural-networks-are-universal
Repo
Framework

Anomalous Pattern Detection in Activations and Reconstruction Error of Autoencoders


Title	Anomalous Pattern Detection in Activations and Reconstruction Error of Autoencoders
Authors	Anonymous
Abstract	In real-world machine learning applications, large outliers and pervasive noise are commonplace, and access to clean training data as required by standard deep autoencoders is unlikely. Reliably detecting anomalies in a given set of images is a task of high practical relevance for visual quality inspection, surveillance, or medical image analysis. Autoencoder neural networks learn to reconstruct normal images, and hence can classify those images as anomalous if the reconstruction error exceeds some threshold. In this paper, we proposed an unsupervised method based on subset scanning over autoencoder activations. The contributions of our work are threefold. First, we propose a novel method combining detection with reconstruction error and subset scanning scores to improve the anomaly score of current autoencoders without requiring any retraining. Second, we provide the ability to inspect and visualize the set of anomalous nodes in the reconstruction error space that make a sample noised. Third, we show that subset scanning can be used for anomaly detection in the inner layers of the autoencoder. We provide detection power results for several untargeted adversarial noise models under standard datasets.
Tasks	Anomaly Detection
Published	2020-01-01
URL	https://openreview.net/forum?id=S1lLvyBtPB
PDF	https://openreview.net/pdf?id=S1lLvyBtPB
PWC	https://paperswithcode.com/paper/anomalous-pattern-detection-in-activations
Repo
Framework

Improved Generalization Bound of Permutation Invariant Deep Neural Networks


Title	Improved Generalization Bound of Permutation Invariant Deep Neural Networks
Authors	Anonymous
Abstract	We theoretically prove that a permutation invariant property of deep neural networks largely improves its generalization performance. Learning problems with data that are invariant to permutations are frequently observed in various applications, for example, point cloud data and graph neural networks. Numerous methodologies have been developed and they achieve great performances, however, understanding a mechanism of the performance is still a developing problem. In this paper, we derive a theoretical generalization bound for invariant deep neural networks with a ReLU activation to clarify their mechanism. Consequently, our bound shows that the main term of their generalization gap is improved by $\sqrt{n!}$ where $n$ is a number of permuting coordinates of data. Moreover, we prove that an approximation power of invariant deep neural networks can achieve an optimal rate, though the networks are restricted to be invariant. To achieve the results, we develop several new proof techniques such as correspondence with a fundamental domain and a scale-sensitive metric entropy.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1eiJyrtDB
PDF	https://openreview.net/pdf?id=B1eiJyrtDB
PWC	https://paperswithcode.com/paper/improved-generalization-bound-of-permutation-1
Repo
Framework

Domain-Invariant Representations: A Look on Compression and Weights


Title	Domain-Invariant Representations: A Look on Compression and Weights
Authors	Anonymous
Abstract	Learning Invariant Representations to adapt deep classifiers of a source domain to a new target domain has recently attracted much attention. In this paper, we show that the search for invariance favors the compression of representations. We point out this may have a bad impact on adaptability of representations expressed as a minimal combined domain error. By considering the risk of compression, we show that weighting representations can align representation distributions without impacting their adaptability. This supports the claim that representation invariance is too strict a constraint. First, we introduce a new bound on the target risk that reveals a trade-off between compression and invariance of learned representations. More precisely, our results show that the adaptability of a representation can be better controlled when the compression risk is taken into account. In contrast, preserving adaptability may overestimate the risk of compression that makes the bound impracticable. We support these statements with a theoretical analysis illustrated on a standard domain adaptation benchmark. Second, we show that learning weighted representations plays a key role in relaxing the constraint of invariance and then preserving the risk of compression. Taking advantage of this trade-off may open up promising directions for the design of new adaptation methods.
Tasks	Domain Adaptation
Published	2020-01-01
URL	https://openreview.net/forum?id=B1xGxgSYvH
PDF	https://openreview.net/pdf?id=B1xGxgSYvH
PWC	https://paperswithcode.com/paper/domain-invariant-representations-a-look-on
Repo
Framework

Zeno++: Robust Fully Asynchronous SGD


Title	Zeno++: Robust Fully Asynchronous SGD
Authors	Anonymous
Abstract	We propose Zeno++, a new robust asynchronous Stochastic Gradient Descent~(SGD) procedure which tolerates Byzantine failures of the workers. In contrast to previous work, Zeno++ removes some unrealistic restrictions on worker-server communications, allowing for fully asynchronous updates from anonymous workers, arbitrarily stale worker updates, and the possibility of an unbounded number of Byzantine workers. The key idea is to estimate the descent of the loss value after the candidate gradient is applied, where large descent values indicate that the update results in optimization progress. We prove the convergence of Zeno++ for non-convex problems under Byzantine failures. Experimental results show that Zeno++ outperforms existing approaches.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rygHe64FDS
PDF	https://openreview.net/pdf?id=rygHe64FDS
PWC	https://paperswithcode.com/paper/zeno-robust-fully-asynchronous-sgd
Repo
Framework

Collaborative Generated Hashing for Market Analysis and Fast Cold-start Recommendation


Title	Collaborative Generated Hashing for Market Analysis and Fast Cold-start Recommendation
Authors	Yan Zhang, Ivor W. Tsang, Lixin Duan, Guowu Yang
Abstract	Cold-start and efficiency issues of the Top-k recommendation are critical to large-scale recommender systems. Previous hybrid recommendation methods are effective to deal with the cold-start issues by extracting real latent factors of cold-start items(users) from side information, but they still suffer low efficiency in online recommendation caused by the expensive similarity search in real latent space. This paper presents a collaborative generated hashing (CGH) to improve the efficiency by denoting users and items as binary codes, which applies to various settings: cold-start users, cold-start items and warm-start ones. Specifically, CGH is designed to learn hash functions of users and items through the Minimum Description Length (MDL) principle; thus, it can deal with various recommendation settings. In addition, CGH initiates a new marketing strategy through mining potential users by a generative step. To reconstruct effective users, the MDL principle is used to learn compact and informative binary codes from the content data. Extensive experiments on two public datasets show the advantages for recommendations in various settings over competing baselines and analyze the feasibility of the application in marketing.
Tasks	Recommendation Systems
Published	2020-01-01
URL	https://openreview.net/forum?id=HJel76NYPS
PDF	https://openreview.net/pdf?id=HJel76NYPS
PWC	https://paperswithcode.com/paper/collaborative-generated-hashing-for-market
Repo
Framework

Unsupervised domain adaptation with imputation


Title	Unsupervised domain adaptation with imputation
Authors	Anonymous
Abstract	Motivated by practical applications, we consider unsupervised domain adaptation for classification problems, in the presence of missing data in the target domain. More precisely, we focus on the case where there is a domain shift between source and target domains, while some components of the target data are systematically absent. We propose a way to impute non-stochastic missing data for a classification task by leveraging supervision from a complete source domain through domain adaptation. We introduce a single model performing joint domain adaptation, imputation and classification which is shown to perform well under various representative divergence families (H-divergence, Optimal Transport). We perform experiments on two families of datasets: a classical digit classification benchmark commonly used in domain adaptation papers and real world digital advertising datasets, on which we evaluate our model’s classification performance in an unsupervised setting. We analyze its behavior showing the benefit of explicitly imputing non-stochastic missing data jointly with domain adaptation.
Tasks	Domain Adaptation, Imputation, Unsupervised Domain Adaptation
Published	2020-01-01
URL	https://openreview.net/forum?id=B1lgUkBFwr
PDF	https://openreview.net/pdf?id=B1lgUkBFwr
PWC	https://paperswithcode.com/paper/unsupervised-domain-adaptation-with-4
Repo
Framework

Meta-Learning for Variational Inference


Title	Meta-Learning for Variational Inference
Authors	Anonymous
Abstract	Variational inference (VI) plays an essential role in approximate Bayesian inference due to its computational efficiency and general applicability. Crucial to the performance of VI is the selection of the divergence measure in the optimization objective, as it affects the properties of the approximate posterior significantly. In this paper, we propose a meta-learning algorithm to learn (i) the divergence measure suited for the task of interest to automate the design of the VI method; and (ii) initialization of the variational parameters, which reduces the number of VI optimization steps drastically. We demonstrate the learned divergence outperforms the hand-designed divergence on Gaussian mixture distribution approximation, Bayesian neural network regression, and partial variational autoencoder based recommender systems.
Tasks	Bayesian Inference, Meta-Learning, Recommendation Systems
Published	2020-01-01
URL	https://openreview.net/forum?id=S1lACa4YDS
PDF	https://openreview.net/pdf?id=S1lACa4YDS
PWC	https://paperswithcode.com/paper/meta-learning-for-variational-inference
Repo
Framework

Sparse Weight Activation Training


Title	Sparse Weight Activation Training
Authors	Anonymous
Abstract	Training convolutional neural networks (CNNs) is time consuming. Prior work has explored how to reduce the computational demands of training by eliminating gradients with relatively small magnitude. We show that eliminating small magnitude components has limited impact on the direction of high-dimensional vectors. However, in the context of training a CNN, we find that eliminating small magnitude components of weight and activation vectors allows us to train deeper networks on more complex datasets versus eliminating small magnitude components of gradients. We propose Sparse Weight Activation Training (SWAT), an algorithm that embodies these observations. SWAT reduces computations by 50% to 80% with better accuracy at a given level of sparsity versus the Dynamic Sparse Graph algorithm. SWAT also reduces memory footprint by 23% to 37% for activations and 50% to 80% for weights.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJgw51HFDr
PDF	https://openreview.net/pdf?id=SJgw51HFDr
PWC	https://paperswithcode.com/paper/sparse-weight-activation-training
Repo
Framework

Semi-Supervised Boosting via Self Labelling


Title	Semi-Supervised Boosting via Self Labelling
Authors	Anonymous
Abstract	Attention to semi-supervised learning grows in machine learning as the price to expertly label data increases. Like most previous works in the area, we focus on improving an algorithm’s ability to discover the inherent property of the entire dataset from a few expertly labelled samples. In this paper we introduce Boosting via Self Labelling (BSL), a solution to semi-supervised boosting when there is only limited access to labelled instances. Our goal is to learn a classifier that is trained on a data set that is generated by combining the generalization of different algorithms which have been trained with a limited amount of supervised training samples. Our method builds upon a combination of several different components. First, an inference aided ensemble algorithm developed on a set of weak classifiers will offer the initial noisy labels. Second, an agreement based estimation approach will return the average error rates of the noisy labels. Third and finally, a noise-resistant boosting algorithm will train over the noisy labels and their error rates to describe the underlying structure as closely as possible. We provide both analytical justifications and experimental results to back the performance of our model. Based on several benchmark datasets, our results demonstrate that BSL is able to outperform state-of-the-art semi-supervised methods consistently, achieving over 90% test accuracy with only 10% of the data being labelled.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJl0ceBtDH
PDF	https://openreview.net/pdf?id=rJl0ceBtDH
PWC	https://paperswithcode.com/paper/semi-supervised-boosting-via-self-labelling
Repo
Framework

Neural Reverse Engineering of Stripped Binaries


Title	Neural Reverse Engineering of Stripped Binaries
Authors	Anonymous
Abstract	We address the problem of reverse engineering of stripped executables which contain no debug information. This is a challenging problem because of the low amount of syntactic information available in stripped executables, and due to the diverse assembly code patterns arising from compiler optimizations. We present a novel approach for predicting procedure names in stripped executables. Our approach combines static analysis with encoder-decoder-based models. The main idea is to use static analysis to obtain enriched representations of API call sites; encode a set of sequences of these call sites by traversing the Control-Flow Graph; and finally, attend to the encoded sequences while decoding the target name. Our evaluation shows that our model performs predictions that are difficult and time consuming for humans, while improving on the state-of-the-art by 20%.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Hkg6TySFDr
PDF	https://openreview.net/pdf?id=Hkg6TySFDr
PWC	https://paperswithcode.com/paper/neural-reverse-engineering-of-stripped-1
Repo
Framework

Mesh-Free Unsupervised Learning-Based PDE Solver of Forward and Inverse problems


Title	Mesh-Free Unsupervised Learning-Based PDE Solver of Forward and Inverse problems
Authors	Anonymous
Abstract	We introduce a novel neural network-based partial differential equations solver for forward and inverse problems. The solver is grid free, mesh free and shape free, and the solution is approximated by a neural network. We employ an unsupervised approach such that the input to the network is a points set in an arbitrary domain, and the output is the set of the corresponding function values. The network is trained to minimize deviations of the learned function from the PDE solution and satisfy the boundary conditions. The resulting solution in turn is an explicit smooth differentiable function with a known analytical form. Unlike other numerical methods such as finite differences and finite elements, the derivatives of the desired function can be analytically calculated to any order. This framework therefore, enables the solution of high order non-linear PDEs. The proposed algorithm is a unified formulation of both forward and inverse problems where the optimized loss function consists of few elements: fidelity terms of L2 and L infinity norms, boundary conditions constraints and additional regularizers. This setting is flexible in the sense that regularizers can be tailored to specific problems. We demonstrate our method on a free shape 2D second order elliptical system with application to Electrical Impedance Tomography (EIT).
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rklv-a4tDB
PDF	https://openreview.net/pdf?id=rklv-a4tDB
PWC	https://paperswithcode.com/paper/mesh-free-unsupervised-learning-based-pde
Repo
Framework

Unsupervised Hierarchical Graph Representation Learning with Variational Bayes


Title	Unsupervised Hierarchical Graph Representation Learning with Variational Bayes
Authors	Anonymous
Abstract	Hierarchical graph representation learning is an emerging subject owing to the increasingly popular adoption of graph neural networks in machine learning and applications. Loosely speaking, work under this umbrella falls into two categories: (a) use a predefined graph hierarchy to perform pooling; and (b) learn the hierarchy for a given graph through differentiable parameterization of the coarsening process. These approaches are supervised; a predictive task with ground-truth labels is used to drive the learning. In this work, we propose an unsupervised approach, \textsc{BayesPool}, with the use of variational Bayes. It produces graph representations given a predefined hierarchy. Rather than relying on labels, the training signal comes from the evidence lower bound of encoding a graph and decoding the subsequent one in the hierarchy. Node features are treated latent in this variational machinery, so that they are produced as a byproduct and are used in downstream tasks. We demonstrate a comprehensive set of experiments to show the usefulness of the learned representation in the context of graph classification.
Tasks	Graph Classification, Graph Representation Learning, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=BkgGJlBFPS
PDF	https://openreview.net/pdf?id=BkgGJlBFPS
PWC	https://paperswithcode.com/paper/unsupervised-hierarchical-graph
Repo
Framework

Diagonal Graph Convolutional Networks with Adaptive Neighborhood Aggregation


Title	Diagonal Graph Convolutional Networks with Adaptive Neighborhood Aggregation
Authors	Anonymous
Abstract	Graph convolutional networks (GCNs) and their variants have generalized deep learning methods into non-Euclidean graph data, bringing a substantial improvement on many graph mining tasks. In this paper, we revisit the mathematical foundation of GCNs and study how to extend their representation capacity. We discover that their performance can be improved with an adaptive neighborhood aggregation step. The core idea is to adaptively scale the output signal for each node and automatically train a suitable nonlinear encoder for the input signal. In this work, we present a new method named Diagonal Graph Convolutional Networks (DiagGCN) based on this idea. Importantly, one of the adaptive aggregation techniques—the permutations of diagonal matrices—used in DiagGCN offers a flexible framework to design GCNs and in fact, some of the most expressive GCNs, e.g., the graph attention network, can be reformulated as a particular instance of our model. Standard experiments on open graph benchmarks show that our proposed framework can consistently improve the graph classification accuracy when compared to state-of-the-art baselines.
Tasks	Graph Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=SkezP1HYvS
PDF	https://openreview.net/pdf?id=SkezP1HYvS
PWC	https://paperswithcode.com/paper/diagonal-graph-convolutional-networks-with
Repo
Framework

Effective and Robust Detection of Adversarial Examples via Benford-Fourier Coefficients


Title	Effective and Robust Detection of Adversarial Examples via Benford-Fourier Coefficients
Authors	Anonymous
Abstract	Adversarial examples have been well known as a serious threat to deep neural networks (DNNs). To ensure successful and safe operations of DNNs on realworld tasks, it is urgent to equip DNNs with effective defense strategies. In this work, we study the detection of adversarial examples, based on the assumption that the output and internal responses of one DNN model for both adversarial and benign examples follow the generalized Gaussian distribution (GGD), but with different parameters (i.e., shape factor, mean, and variance). GGD is a general distribution family to cover many popular distributions (e.g., Laplacian, Gaussian, or uniform). It is more likely to approximate the intrinsic distributions of internal responses than any specific distribution. Besides, since the shape factor is more robust to different databases rather than the other two parameters, we propose to construct discriminative features via the shape factor for adversarial detection, employing the magnitude of Benford-Fourier coefficients (MBF), which can be easily estimated using responses. Finally, a support vector machine is trained as the adversarial detector through leveraging the MBF features. Through the Kolmogorov-Smirnov (KS) test, we empirically verify that: 1) the posterior vectors of both adversarial and benign examples follow GGD; 2) the extracted MBF features of adversarial and benign examples follow different distributions. Extensive experiments in terms of image classification demonstrate that the proposed detector is much more effective and robust on detecting adversarial examples of different crafting methods and different sources, in contrast to state-of-the-art adversarial detection methods.
Tasks	Image Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=ryeK6nNFDr
PDF	https://openreview.net/pdf?id=ryeK6nNFDr
PWC	https://paperswithcode.com/paper/effective-and-robust-detection-of-adversarial
Repo
Framework