April 1, 2020

2973 words 14 mins read

Paper Group NANR 25

GRASPEL: GRAPH SPECTRAL LEARNING AT SCALE. Neural Networks for Principal Component Analysis: A New Loss Function Provably Yields Ordered Exact Eigenvectors. AMRL: Aggregated Memory For Reinforcement Learning. Adjustable Real-time Style Transfer. Counterfactuals uncover the modular structure of deep generative models. Semi-Implicit Back Propagation. …

GRASPEL: GRAPH SPECTRAL LEARNING AT SCALE


Title	GRASPEL: GRAPH SPECTRAL LEARNING AT SCALE
Authors	Anonymous
Abstract	Learning meaningful graphs from data plays important roles in many data mining and machine learning tasks, such as data representation and analysis, dimension reduction, data clustering, and visualization, etc. In this work, we present a scalable spectral approach to graph learning from data. By limiting the precision matrix to be a graph Laplacian, our approach aims to estimate ultra-sparse weighted graphs and has a clear connection with the prior graphical Lasso method. By interleaving nearly-linear time spectral graph sparsification, coarsening and embedding procedures, ultra-sparse yet spectrally-stable graphs can be iteratively constructed in a highly-scalable manner. Compared with prior graph learning approaches that do not scale to large problems, our approach is highly-scalable for constructing graphs that can immediately lead to substantially improved computing efficiency and solution quality for a variety of data mining and machine learning applications, such as spectral clustering (SC), and t-Distributed Stochastic Neighbor Embedding (t-SNE).
Tasks	Dimensionality Reduction
Published	2020-01-01
URL	https://openreview.net/forum?id=BJes_xStwS
PDF	https://openreview.net/pdf?id=BJes_xStwS
PWC	https://paperswithcode.com/paper/graspel-graph-spectral-learning-at-scale
Repo
Framework

Neural Networks for Principal Component Analysis: A New Loss Function Provably Yields Ordered Exact Eigenvectors


Title	Neural Networks for Principal Component Analysis: A New Loss Function Provably Yields Ordered Exact Eigenvectors
Authors	Anonymous
Abstract	In this paper, we propose a new loss function for performing principal component analysis (PCA) using linear autoencoders (LAEs). Optimizing the standard L2 loss results in a decoder matrix that spans the principal subspace of the sample covariance of the data, but fails to identify the exact eigenvectors. This downside originates from an invariance that cancels out in the global map. Here, we prove that our loss function eliminates this issue, i.e. the decoder converges to the exact ordered unnormalized eigenvectors of the sample covariance matrix. For this new loss, we establish that all local minima are global optima and also show that computing the new loss (and also its gradients) has the same order of complexity as the classical loss. We report numerical results on both synthetic simulations, and a real-data PCA experiment on MNIST (i.e., a 60,000 x784 matrix), demonstrating our approach to be practically applicable and rectify previous LAEs’ downsides.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=ByeVWkBYPH
PDF	https://openreview.net/pdf?id=ByeVWkBYPH
PWC	https://paperswithcode.com/paper/neural-networks-for-principal-component
Repo
Framework

AMRL: Aggregated Memory For Reinforcement Learning


Title	AMRL: Aggregated Memory For Reinforcement Learning
Authors	Anonymous
Abstract	In many partially observable scenarios, Reinforcement Learning (RL) agents must rely on long-term memory in order to learn an optimal policy. We demonstrate that using techniques from NLP and supervised learning fails at RL tasks due to stochasticity from the environment and from exploration. Utilizing our insights on the limitations of traditional memory methods in RL, we propose AMRL, a class of models that can learn better policies with greater sample efficiency and are resilient to noisy inputs. Specifically, our models use a standard memory module to summarize short-term context, and then aggregate all prior states from the standard model without respect to order. We show that this provides advantages both in terms of gradient decay and signal-to-noise ratio over time. Evaluating in Minecraft and maze environments that test long-term memory, we find that our model improves average return by 19% over a baseline that has the same number of parameters and by 9% over a stronger baseline that has far more parameters.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Bkl7bREtDr
PDF	https://openreview.net/pdf?id=Bkl7bREtDr
PWC	https://paperswithcode.com/paper/amrl-aggregated-memory-for-reinforcement
Repo
Framework

Adjustable Real-time Style Transfer


Title	Adjustable Real-time Style Transfer
Authors	Anonymous
Abstract	Artistic style transfer is the problem of synthesizing an image with content similar to a given image and style similar to another. Although recent feed-forward neural networks can generate stylized images in real-time, these models produce a single stylization given a pair of style/content images, and the user doesn’t have control over the synthesized output. Moreover, the style transfer depends on the hyper-parameters of the model with varying ``optimum” for different input images. Therefore, if the stylized output is not appealing to the user, she/he has to try multiple models or retrain one with different hyper-parameters to get a favorite stylization. In this paper, we address these issues by proposing a novel method which allows adjustment of crucial hyper-parameters, after the training and in real-time, through a set of manually adjustable parameters. These parameters enable the user to modify the synthesized outputs from the same pair of style/content images, in search of a favorite stylized image. Our quantitative and qualitative experiments indicate how adjusting these parameters is comparable to retraining the model with different hyper-parameters. We also demonstrate how these parameters can be randomized to generate results which are diverse but still very similar in style and content. \|
Tasks	Style Transfer
Published	2020-01-01
URL	https://openreview.net/forum?id=HJe_Z04Yvr
PDF	https://openreview.net/pdf?id=HJe_Z04Yvr
PWC	https://paperswithcode.com/paper/adjustable-real-time-style-transfer-1
Repo
Framework

Counterfactuals uncover the modular structure of deep generative models


Title	Counterfactuals uncover the modular structure of deep generative models
Authors	Anonymous
Abstract	Deep generative models can emulate the perceptual properties of complex image datasets, providing a latent representation of the data. However, manipulating such representation to perform meaningful and controllable transformations in the data space remains challenging without some form of supervision. While previous work has focused on exploiting statistical independence to \textit{disentangle} latent factors, we argue that such requirement can be advantageously relaxed and propose instead a non-statistical framework that relies on identifying a modular organization of the network, based on counterfactual manipulations. Our experiments support that modularity between groups of channels is achieved to a certain degree on a variety of generative models. This allowed the design of targeted interventions on complex image datasets, opening the way to applications such as computationally efficient style transfer and the automated assessment of robustness to contextual changes in pattern recognition systems.
Tasks	Style Transfer
Published	2020-01-01
URL	https://openreview.net/forum?id=SJxDDpEKvH
PDF	https://openreview.net/pdf?id=SJxDDpEKvH
PWC	https://paperswithcode.com/paper/counterfactuals-uncover-the-modular-structure-1
Repo
Framework

Semi-Implicit Back Propagation


Title	Semi-Implicit Back Propagation
Authors	Anonymous
Abstract	Neural network has attracted great attention for a long time and many researchers are devoted to improve the effectiveness of neural network training algorithms. Though stochastic gradient descent (SGD) and other explicit gradient-based methods are widely adopted, there are still many challenges such as gradient vanishing and small step sizes, which leads to slow convergence and instability of SGD algorithms. Motivated by error back propagation (BP) and proximal methods, we propose a semi-implicit back propagation method for neural network training. Similar to BP, the difference on the neurons are propagated in a backward fashion and the parameters are updated with proximal mapping. The implicit update for both hidden neurons and parameters allows to choose large step size in the training algorithm. Finally, we also show that any fixed point of convergent sequences produced by this algorithm is a stationary point of the objective loss function. The experiments on both MNIST and CIFAR-10 demonstrate that the proposed semi-implicit BP algorithm leads to better performance in terms of both loss decreasing and training/validation accuracy, compared to SGD and a similar algorithm ProxBP.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SyeRIgBYDB
PDF	https://openreview.net/pdf?id=SyeRIgBYDB
PWC	https://paperswithcode.com/paper/semi-implicit-back-propagation
Repo
Framework

A Fair Comparison of Graph Neural Networks for Graph Classification


Title	A Fair Comparison of Graph Neural Networks for Graph Classification
Authors	Anonymous
Abstract	The graph representation learning field has recently attracted the attention of a wide research community. Several Graph Neural Network models are being developed to tackle effective graph classification. However, experimental procedures often lack rigorousness and are hardly reproducible. Motivated by this, we provide an overview of common practices that should be avoided to fairly compare with the state of the art. To counter this troubling trend, we ran more than 47000 experiments in a controlled and uniform framework to re-evaluate five popular models across nine common benchmarks. Moreover, by comparing GNNs with structure-agnostic baselines we provide convincing evidence that, on some datasets, structural information has not been exploited yet. We believe that this work can contribute to the development of the graph learning field, by providing a much needed grounding for rigorous evaluations of graph classification models.
Tasks	Graph Classification, Graph Representation Learning, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HygDF6NFPB
PDF	https://openreview.net/pdf?id=HygDF6NFPB
PWC	https://paperswithcode.com/paper/a-fair-comparison-of-graph-neural-networks
Repo
Framework

Improving the Gating Mechanism of Recurrent Neural Networks


Title	Improving the Gating Mechanism of Recurrent Neural Networks
Authors	Anonymous
Abstract	In this work, we revisit the gating mechanisms widely used in various recurrent and feedforward networks such as LSTMs, GRUs, or highway networks. These gates are meant to control information flow, allowing gradients to better propagate back in time for recurrent models. However, to propagate gradients over very long temporal windows, they need to operate close to their saturation regime. We propose two independent and synergistic modifications to the standard gating mechanism that are easy to implement, introduce no additional hyper-parameters, and are aimed at improving learnability of the gates when they are close to saturation. Our proposals are theoretically justified, and we show a generic framework that encompasses other recently proposed gating mechanisms such as chrono-initialization and master gates . We perform systematic analyses and ablation studies on the proposed improvements and evaluate our method on a wide range of applications including synthetic memorization tasks, sequential image classification, language modeling, and reinforcement learning. Empirically, our proposed gating mechanisms robustly increase the performance of recurrent models such as LSTMs, especially on tasks requiring long temporal dependencies.
Tasks	Image Classification, Language Modelling, Sequential Image Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=r1lnigSFDr
PDF	https://openreview.net/pdf?id=r1lnigSFDr
PWC	https://paperswithcode.com/paper/improving-the-gating-mechanism-of-recurrent
Repo
Framework

Cover Filtration and Stable Paths in the Mapper


Title	Cover Filtration and Stable Paths in the Mapper
Authors	Anonymous
Abstract	The contributions of this paper are two-fold. We define a new filtration called the cover filtration built from a single cover based on a generalized Steinhaus distance, which is a generalization of Jaccard distance. We then develop a language and theory for stable paths within this filtration, inspired by ideas of persistent homology. This framework can be used to develop several new learning representations in applications where an obvious metric may not be defined but a cover is readily available. We demonstrate the utility of our framework as applied to recommendation systems and explainable machine learning. We demonstrate a new perspective for modeling recommendation system data sets that does not require manufacturing a bespoke metric. As a direct application, we find that the stable paths identified by our framework in a movies data set represent a sequence of movies constituting a gentle transition and ordering from one genre to another. For explainable machine learning, we apply the Mapper for model induction, providing explanations in the form of paths between subpopulations. Our framework provides an alternative way of building a filtration from a single mapper that is then used to explore stable paths. As a direct illustration, we build a mapper from a supervised machine learning model trained on the FashionMNIST data set. We show that the stable paths in the cover filtration provide improved explanations of relationships between subpopulations of images.
Tasks	Recommendation Systems
Published	2020-01-01
URL	https://openreview.net/forum?id=SJx0oAEYwH
PDF	https://openreview.net/pdf?id=SJx0oAEYwH
PWC	https://paperswithcode.com/paper/cover-filtration-and-stable-paths-in-the
Repo
Framework

Frequency-based Search-control in Dyna


Title	Frequency-based Search-control in Dyna
Authors	Anonymous
Abstract	Model-based reinforcement learning has been empirically demonstrated as a successful strategy to improve sample efficiency. Particularly, Dyna architecture, as an elegant model-based architecture integrating learning and planning, provides huge flexibility of using a model. One of the most important components in Dyna is called search-control, which refers to the process of generating state or state-action pairs from which we query the model to acquire simulated experiences. Search-control is critical to improve learning efficiency. In this work, we propose a simple and novel search-control strategy by searching high frequency region on value function. Our main intuition is built on Shannon sampling theorem from signal processing, which indicates that a high frequency signal requires more samples to reconstruct. We empirically show that a high frequency function is more difficult to approximate. This suggests a search-control strategy: we should use states in high frequency region of the value function to query the model to acquire more samples. We develop a simple strategy to locally measure the frequency of a function by gradient norm, and provide theoretical justification for this approach. We then apply our strategy to search-control in Dyna, and conduct experiments to show its property and effectiveness on benchmark domains.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1gskyStwr
PDF	https://openreview.net/pdf?id=B1gskyStwr
PWC	https://paperswithcode.com/paper/frequency-based-search-control-in-dyna
Repo
Framework

Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers


Title	Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers
Authors	Anonymous
Abstract	We present a novel network pruning algorithm called Dynamic Sparse Training that can jointly ﬁnd the optimal network parameters and sparse network structure in a uniﬁed optimization process with trainable pruning thresholds. These thresholds can have ﬁne-grained layer-wise adjustments dynamically via backpropagation. We demonstrate that our dynamic sparse training algorithm can easily train very sparse neural network models with little performance loss using the same training epochs as dense models. Dynamic Sparse Training achieves prior art performance compared with other sparse training algorithms on various network architectures. Additionally, we have several surprising observations that provide strong evidence to the effectiveness and efﬁciency of our algorithm. These observations reveal the underlying problems of traditional three-stage pruning algorithms and present the potential guidance provided by our algorithm to the design of more compact network architectures.
Tasks	Network Pruning
Published	2020-01-01
URL	https://openreview.net/forum?id=SJlbGJrtDB
PDF	https://openreview.net/pdf?id=SJlbGJrtDB
PWC	https://paperswithcode.com/paper/dynamic-sparse-training-find-efficient-sparse
Repo
Framework

Classification-Based Anomaly Detection for General Data


Title	Classification-Based Anomaly Detection for General Data
Authors	Anonymous
Abstract	Anomaly detection, finding patterns that substantially deviate from those seen previously, is one of the fundamental problems of artificial intelligence. Recently, classification-based methods were shown to achieve superior results on this task. In this work, we present a unifying view and propose an open-set method to relax current generalization assumptions. Furthermore, we extend the applicability of transformation-based methods to non-image data using random affine transformations. Our method is shown to obtain state-of-the-art accuracy and is applicable to broad data types. The strong performance of our method is extensively validated on multiple datasets from different domains.
Tasks	Anomaly Detection
Published	2020-01-01
URL	https://openreview.net/forum?id=H1lK_lBtvS
PDF	https://openreview.net/pdf?id=H1lK_lBtvS
PWC	https://paperswithcode.com/paper/classification-based-anomaly-detection-for
Repo
Framework

Meta Dropout: Learning to Perturb Latent Features for Generalization


Title	Meta Dropout: Learning to Perturb Latent Features for Generalization
Authors	Anonymous
Abstract	A machine learning model that generalizes well should obtain low errors on unseen test examples. Thus, if we know how to optimally perturb training examples to account for test examples, we may achieve better generalization performance. However, obtaining such perturbation is not possible in standard machine learning frameworks as the distribution of the test data is unknown. To tackle this challenge, we propose a novel regularization method, meta-dropout, which learns to perturb the latent features of training examples for generalization in a meta-learning framework. Specifically, we meta-learn a noise generator which outputs a multiplicative noise distribution for latent features, to obtain low errors on the test instances in an input-dependent manner. Then, the learned noise generator can perturb the training examples of unseen tasks at the meta-test time for improved generalization. We validate our method on few-shot classification datasets, whose results show that it significantly improves the generalization performance of the base model, and largely outperforms existing regularization methods such as information bottleneck, manifold mixup, and information dropout.
Tasks	Meta-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=BJgd81SYwr
PDF	https://openreview.net/pdf?id=BJgd81SYwr
PWC	https://paperswithcode.com/paper/meta-dropout-learning-to-perturb-latent
Repo
Framework

GUIDEGAN: ATTENTION BASED SPATIAL GUIDANCE FOR IMAGE-TO-IMAGE TRANSLATION


Title	GUIDEGAN: ATTENTION BASED SPATIAL GUIDANCE FOR IMAGE-TO-IMAGE TRANSLATION
Authors	Anonymous
Abstract	Recently, Generative Adversarial Network (GAN) and numbers of its variants have been widely used to solve the image-to-image translation problem and achieved extraordinary results in both a supervised and unsupervised manner. However, most GAN-based methods suffer from the imbalance problem between the generator and discriminator in practice. Namely, the relative model capacities of the generator and discriminator do not match, leading to mode collapse and/or diminished gradients. To tackle this problem, we propose a GuideGAN based on attention mechanism. More specifically, we arm the discriminator with an attention mechanism so not only it estimates the probability that its input is real, but also does it create an attention map that highlights the critical features for such prediction. This attention map then assists the generator to produce more plausible and realistic images. We extensively evaluate the proposed GuideGAN framework on a number of image transfer tasks. Both qualitative results and quantitative comparison demonstrate the superiority of our proposed approach.
Tasks	Image-to-Image Translation
Published	2020-01-01
URL	https://openreview.net/forum?id=rJl3YC4YPH
PDF	https://openreview.net/pdf?id=rJl3YC4YPH
PWC	https://paperswithcode.com/paper/guidegan-attention-based-spatial-guidance-for
Repo
Framework

Fooling Pre-trained Language Models: An Evolutionary Approach to Generate Wrong Sentences with High Acceptability Score


Title	Fooling Pre-trained Language Models: An Evolutionary Approach to Generate Wrong Sentences with High Acceptability Score
Authors	Anonymous
Abstract	Large pre-trained language representation models have recently collected numerous successes in language understanding. They obtained state-of-the-art results in many classical benchmark datasets, such as GLUE benchmark and SQuAD dataset, but do they really understand the language? In this paper we investigate two among the best pre-trained language models, BERT and RoBERTa, analysing their weaknesses by generating adversarial sentences in an evolutionary approach. Our goal is to discover if and why it is possible to fool these models, and how to face this issue. This adversarial attack is followed by a cross analysis, understanding robustness and generalization proprieties of models and fooling techniques. We find that BERT can be easily fooled, but an augmentation of the original dataset with adversarial samples is enough to make it learn how not to be fooled again. RoBERTa, instead, is more resistent to this approach even if it still have some weak spots.
Tasks	Adversarial Attack
Published	2020-01-01
URL	https://openreview.net/forum?id=S1gV5gHKvB
PDF	https://openreview.net/pdf?id=S1gV5gHKvB
PWC	https://paperswithcode.com/paper/fooling-pre-trained-language-models-an
Repo
Framework