April 1, 2020

3204 words 16 mins read

Paper Group NANR 116

Probabilistic modeling the hidden layers of deep neural networks. Hierarchical Bayes Autoencoders. Influence-aware Memory for Deep Reinforcement Learning. DIVA: Domain Invariant Variational Autoencoder. Dynamic Graph Message Passing Networks. SoftLoc: Robust Temporal Localization under Label Misalignment. Unsupervised Temperature Scaling: Robust Po …

Probabilistic modeling the hidden layers of deep neural networks


Title	Probabilistic modeling the hidden layers of deep neural networks
Authors	Anonymous
Abstract	In this paper, we demonstrate that the parameters of Deep Neural Networks (DNNs) cannot satisfy the i.i.d. prior assumption and activations being i.i.d. is not valid for all the hidden layers of DNNs. Hence, the Gaussian Process cannot correctly explain all the hidden layers of DNNs. Alternatively, we introduce a novel probabilistic representation for the hidden layers of DNNs in two aspects: (i) a hidden layer formulates a Gibbs distribution, in which neurons define the energy function, and (ii) the connection between two adjacent layers can be modeled by a product of experts model. Based on the probabilistic representation, we demonstrate that the entire architecture of DNNs can be explained as a Bayesian hierarchical model. Moreover, the proposed probabilistic representation indicates that DNNs have explicit regularizations defined by the hidden layers serving as prior distributions. Based on the Bayesian explanation for the regularization of DNNs, we propose a novel regularization approach to improve the generalization performance of DNNs. Simulation results validate the proposed theories.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=ByxJO3VFwB
PDF	https://openreview.net/pdf?id=ByxJO3VFwB
PWC	https://paperswithcode.com/paper/probabilistic-modeling-the-hidden-layers-of
Repo
Framework

Hierarchical Bayes Autoencoders


Title	Hierarchical Bayes Autoencoders
Authors	Anonymous
Abstract	Autoencoders are powerful generative models for complex data, such as images. However, standard models like the variational autoencoder (VAE) typically have unimodal Gaussian decoders, which cannot effectively represent the possible semantic variations in the space of images. To address this problem, we present a new probabilistic generative model called the \emph{Hierarchical Bayes Autoencoder (HBAE)}. The HBAE contains a multimodal decoder in the form of an energy-based model (EBM), instead of the commonly adopted unimodal Gaussian distribution. The HBAE can be trained using variational inference, similar to a VAE, to recover latent codes conditioned on inputs. For the decoder, we use an adversarial approximation where a conditional generator is trained to match the EBM distribution. During inference time, the HBAE consists of two sampling steps: first a latent code for the input is sampled, and then this code is passed to the conditional generator to output a stochastic reconstruction. The HBAE is also capable of modeling sets, by inferring a latent code for a set of examples, and sampling set members through the multimodal decoder. In both single image and set cases, the decoder generates plausible variations consistent with the input data, and generates realistic unconditional samples. To the best our knowledge, Set-HBAE is the first model that is able to generate complex image sets.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1leCRNYvS
PDF	https://openreview.net/pdf?id=H1leCRNYvS
PWC	https://paperswithcode.com/paper/hierarchical-bayes-autoencoders
Repo
Framework

Influence-aware Memory for Deep Reinforcement Learning


Title	Influence-aware Memory for Deep Reinforcement Learning
Authors	Anonymous
Abstract	Making the right decisions when some of the state variables are hidden, involves reasoning about all the possible states of the environment. An agent receiving only partial observations needs to infer the true values of these hidden variables based on the history of experiences. Recent deep reinforcement learning methods use recurrent models to keep track of past information. However, these models are sometimes expensive to train and have convergence difficulties, especially when dealing with high dimensional input spaces. Taking inspiration from influence-based abstraction, we show that effective policies can be learned in the presence of uncertainty by only memorizing a small subset of input variables. We also incorporate a mechanism in our network that learns to automatically choose the important pieces of information that need to be remembered. The results indicate that, by forcing the agent’s internal memory to focus on the selected regions while treating the rest of the observable variables as Markovian, we can outperform ordinary recurrent architectures in situations where the amount of information that the agent needs to retain represents a small fraction of the entire observation input. The method also reduces training time and obtains better scores than methods that use a fixed window of experiences as input to remove partial observability in domains where long-term memory is required.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJlS-ertwr
PDF	https://openreview.net/pdf?id=rJlS-ertwr
PWC	https://paperswithcode.com/paper/influence-aware-memory-for-deep-reinforcement
Repo
Framework

DIVA: Domain Invariant Variational Autoencoder


Title	DIVA: Domain Invariant Variational Autoencoder
Authors	Anonymous
Abstract	We consider the problem of domain generalization, namely, how to learn representations given data from a set of domains that generalize to data from a previously unseen domain. We propose the Domain Invariant Variational Autoencoder (DIVA), a generative model that tackles this problem by learning three independent latent subspaces, one for the domain, one for the class, and one for any residual variations. We highlight that due to the generative nature of our model we can also incorporate unlabeled data from known or previously unseen domains. To the best of our knowledge this has not been done before in a domain generalization setting. This property is highly desirable in fields like medical imaging where labeled data is scarce. We experimentally evaluate our model on the rotated MNIST benchmark and a malaria cell images dataset where we show that (i) the learned subspaces are indeed complementary to each other, (ii) we improve upon recent works on this task and (iii) incorporating unlabelled data can boost the performance even further.
Tasks	Domain Generalization
Published	2020-01-01
URL	https://openreview.net/forum?id=rJxotpNYPS
PDF	https://openreview.net/pdf?id=rJxotpNYPS
PWC	https://paperswithcode.com/paper/diva-domain-invariant-variational-autoencoder
Repo
Framework

Dynamic Graph Message Passing Networks


Title	Dynamic Graph Message Passing Networks
Authors	Anonymous
Abstract	Modelling long-range dependencies is critical for scene understanding tasks in computer vision. Although CNNs have excelled in many vision tasks, they are still limited in capturing long-range structured relationships as they typically consist of layers of local kernels. A fully-connected graph is beneficial for such modelling, however, its computational overhead is prohibitive. We propose a dynamic graph message passing network, that significantly reduces the computational complexity compared to related works modelling a fully-connected graph. This is achieved by adaptively sampling nodes in the graph, conditioned on the input, for message passing. Based on the sampled nodes, we dynamically predict node-dependent filter weights and the affinity matrix for propagating information between them. Using this model, we show significant improvements with respect to strong, state-of-the-art baselines on three different tasks and backbone architectures. Our approach also outperforms fully-connected graphs while using substantially fewer floating-point operations.
Tasks	Scene Understanding
Published	2020-01-01
URL	https://openreview.net/forum?id=BJgcxxSKvr
PDF	https://openreview.net/pdf?id=BJgcxxSKvr
PWC	https://paperswithcode.com/paper/dynamic-graph-message-passing-networks-1
Repo
Framework

SoftLoc: Robust Temporal Localization under Label Misalignment


Title	SoftLoc: Robust Temporal Localization under Label Misalignment
Authors	Anonymous
Abstract	This work addresses the long-standing problem of robust event localization in the presence of temporally of misaligned labels in the training data. We propose a novel versatile loss function that generalizes a number of training regimes from standard fully-supervised cross-entropy to count-based weakly-supervised learning. Unlike classical models which are constrained to strictly fit the annotations during training, our soft localization learning approach relaxes the reliance on the exact position of labels instead. Training with this new loss function exhibits strong robustness to temporal misalignment of labels, thus alleviating the burden of precise annotation of temporal sequences. We demonstrate state-of-the-art performance against standard benchmarks in a number of challenging experiments and further show that robustness to label noise is not achieved at the expense of raw performance.
Tasks	Temporal Localization
Published	2020-01-01
URL	https://openreview.net/forum?id=SylUzpNFDS
PDF	https://openreview.net/pdf?id=SylUzpNFDS
PWC	https://paperswithcode.com/paper/softloc-robust-temporal-localization-under
Repo
Framework

Unsupervised Temperature Scaling: Robust Post-processing Calibration for Domain Shift


Title	Unsupervised Temperature Scaling: Robust Post-processing Calibration for Domain Shift
Authors	Anonymous
Abstract	The uncertainty estimation is critical in real-world decision making applications, especially when distributional shift between the training and test data are prevalent. Many calibration methods in the literature have been proposed to improve the predictive uncertainty of DNNs which are generally not well-calibrated. However, none of them is specifically designed to work properly under domain shift condition. In this paper, we propose Unsupervised Temperature Scaling (UTS) as a robust calibration method to domain shift. It exploits test samples to adjust the uncertainty prediction of deep models towards the test distribution. UTS utilizes a novel loss function, weighted NLL, that allows unsupervised calibration. We evaluate UTS on a wide range of model-datasets which shows the possibility of calibration without labels and demonstrate the robustness of UTS compared to other methods (e.g., TS, MC-dropout, SVI, ensembles) in shifted domains.
Tasks	Calibration, Decision Making
Published	2020-01-01
URL	https://openreview.net/forum?id=Hyg5TRNtDH
PDF	https://openreview.net/pdf?id=Hyg5TRNtDH
PWC	https://paperswithcode.com/paper/unsupervised-temperature-scaling-robust-post
Repo
Framework

Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets


Title	Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets
Authors	Anonymous
Abstract	Adaptive gradient algorithms perform gradient-based updates using the history of gradients and are ubiquitous in training deep neural networks. While adaptive gradient methods theory is well understood for minimization problems, the underlying factors driving their empirical success in min-max problems such as GANs remain unclear. In this paper, we aim at bridging this gap from both theoretical and empirical perspectives. Theoretically, we develop an algorithm (Optimistic Stochastic Gradient, OSG) for solving a class of non-convex non-concave min-max problem and establish $O(\epsilon^{-4})$ complexity for finding $\epsilon$-first-order stationary point, in which only one stochastic first-order oracle is invoked in each iteration. An adaptive variant of the proposed algorithm (Optimistic Adagrad, OAdagrad) is also analyzed, revealing an \emph{improved} adaptive complexity $\widetilde{O}\left(\epsilon^{-\frac{2}{1-\alpha}}\right)$~\footnote{Here $\widetilde{O}(\cdot)$ compresses a logarithmic factor of $\epsilon$.}, where $\alpha$ characterizes the growth rate of the cumulative stochastic gradient and $0\leq \alpha\leq 1/2$. To the best of our knowledge, this is the first work for establishing adaptive complexity in non-convex non-concave min-max optimization. Empirically, our experiments show that indeed adaptive gradient algorithms outperform their non-adaptive counterparts in GAN training. Moreover, this observation can be explained by the slow growth rate of the cumulative stochastic gradient, as observed empirically.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJxIm0VtwH
PDF	https://openreview.net/pdf?id=SJxIm0VtwH
PWC	https://paperswithcode.com/paper/towards-better-understanding-of-adaptive
Repo
Framework

Robust anomaly detection and backdoor attack detection via differential privacy


Title	Robust anomaly detection and backdoor attack detection via differential privacy
Authors	Anonymous
Abstract	Outlier detection and novelty detection are two important topics for anomaly detection. Suppose the majority of a dataset are drawn from a certain distribution, outlier detection and novelty detection both aim to detect data samples that do not fit the distribution. Outliers refer to data samples within this dataset, while novelties refer to new samples. In the meantime, backdoor poisoning attacks for machine learning models are achieved through injecting poisoning samples into the training dataset, which could be regarded as “outliers” that are intentionally added by attackers. Differential privacy has been proposed to avoid leaking any individual’s information, when aggregated analysis is performed on a given dataset. It is typically achieved by adding random noise, either directly to the input dataset, or to intermediate results of the aggregation mechanism. In this paper, we demonstrate that applying differential privacy could improve the utility of outlier detection and novelty detection, with an extension to detect poisoning samples in backdoor attacks. We first present a theoretical analysis on how differential privacy helps with the detection, and then conduct extensive experiments to validate the effectiveness of differential privacy in improving outlier detection, novelty detection, and backdoor attack detection.
Tasks	Anomaly Detection, Outlier Detection
Published	2020-01-01
URL	https://openreview.net/forum?id=SJx0q1rtvS
PDF	https://openreview.net/pdf?id=SJx0q1rtvS
PWC	https://paperswithcode.com/paper/robust-anomaly-detection-and-backdoor-attack
Repo
Framework

Walking the Tightrope: An Investigation of the Convolutional Autoencoder Bottleneck


Title	Walking the Tightrope: An Investigation of the Convolutional Autoencoder Bottleneck
Authors	Anonymous
Abstract	In this paper, we present an in-depth investigation of the convolutional autoencoder (CAE) bottleneck. Autoencoders (AE), and especially their convolutional variants, play a vital role in the current deep learning toolbox. Researchers and practitioners employ CAEs for a variety of tasks, ranging from outlier detection and compression to transfer and representation learning. Despite their widespread adoption, we have limited insight into how the bottleneck shape impacts the emergent properties of the CAE. We demonstrate that increased height and width of the bottleneck drastically improves generalization, which in turn leads to better performance of the latent codes in downstream transfer learning tasks. The number of channels in the bottleneck, on the other hand, is secondary in importance. Furthermore, we show empirically, that, contrary to popular belief, CAEs do not learn to copy their input, even when the bottleneck has the same number of neurons as there are pixels in the input. Copying does not occur, despite training the CAE for 1,000 epochs on a tiny (~ 600 images) dataset. We believe that the findings in this paper are directly applicable and will lead to improvements in models that rely on CAEs.
Tasks	Outlier Detection, Representation Learning, Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=ryguP1BFwr
PDF	https://openreview.net/pdf?id=ryguP1BFwr
PWC	https://paperswithcode.com/paper/walking-the-tightrope-an-investigation-of-the
Repo
Framework

Annealed Denoising score matching: learning Energy based model in high-dimensional spaces


Title	Annealed Denoising score matching: learning Energy based model in high-dimensional spaces
Authors	Anonymous
Abstract	Energy based models outputs unmormalized log-probability values given datasamples. Such a estimation is essential in a variety of application problems suchas sample generation, denoising, sample restoration, outlier detection, Bayesianreasoning, and many more. However, standard maximum likelihood training iscomputationally expensive due to the requirement of sampling model distribution.Score matching potentially alleviates this problem, and denoising score matching(Vincent, 2011) is a particular convenient version. However, previous attemptsfailed to produce models capable of high quality sample synthesis. We believethat it is because they only performed denoising score matching over a singlenoise scale. To overcome this limitation, here we instead learn an energy functionusing all noise scales. When sampled using Annealed Langevin dynamics andsingle step denoising jump, our model produced high-quality samples comparableto state-of-the-art techniques such as GANs, in addition to assigning likelihood totest data comparable to previous likelihood models. Our model set a new sam-ple quality baseline in likelihood-based models. We further demonstrate that our model learns sample distribution and generalize well on an image inpainting tasks.
Tasks	Denoising, Image Inpainting, Outlier Detection
Published	2020-01-01
URL	https://openreview.net/forum?id=HJeFmkBtvB
PDF	https://openreview.net/pdf?id=HJeFmkBtvB
PWC	https://paperswithcode.com/paper/annealed-denoising-score-matching-learning
Repo
Framework

A General Upper Bound for Unsupervised Domain Adaptation


Title	A General Upper Bound for Unsupervised Domain Adaptation
Authors	Anonymous
Abstract	In this work, we present a novel upper bound of target error to address the problem for unsupervised domain adaptation. Recent studies reveal that a deep neural network can learn transferable features which generalize well to novel tasks. Furthermore, Ben-David et al. (2010) provide an upper bound for target error when transferring the knowledge, which can be summarized as minimizing the source error and distance between marginal distributions simultaneously. However, common methods based on the theory usually ignore the joint error such that samples from different classes might be mixed together when matching marginal distribution. And in such case, no matter how we minimize the marginal discrepancy, the target error is not bounded due to an increasing joint error. To address this problem, we propose a general upper bound taking joint error into account, such that the undesirable case can be properly penalized. In addition, we utilize constrained hypothesis space to further formalize a tighter bound as well as a novel cross margin discrepancy to measure the dissimilarity between hypotheses which alleviates instability during adversarial learning. Extensive empirical evidence shows that our proposal outperforms related approaches in image classification error rates on standard domain adaptation benchmarks.
Tasks	Domain Adaptation, Image Classification, Unsupervised Domain Adaptation
Published	2020-01-01
URL	https://openreview.net/forum?id=rkerLaVtDr
PDF	https://openreview.net/pdf?id=rkerLaVtDr
PWC	https://paperswithcode.com/paper/a-general-upper-bound-for-unsupervised-domain-1
Repo
Framework

Learning Numeral Embedding


Title	Learning Numeral Embedding
Authors	Anonymous
Abstract	Word embedding is an essential building block for deep learning methods for natural language processing. Although word embedding has been extensively studied over the years, the problem of how to effectively embed numerals, a special subset of words, is still underexplored. Existing word embedding methods do not learn numeral embeddings well because there are an infinite number of numerals and their individual appearances in training corpora are highly scarce. In this paper, we propose two novel numeral embedding methods that can handle the out-of-vocabulary (OOV) problem for numerals. We first induce a finite set of prototype numerals using either a self-organizing map or a Gaussian mixture model. We then represent the embedding of a numeral as a weighted average of the prototype number embeddings. Numeral embeddings represented in this manner can be plugged into existing word embedding learning approaches such as skip-gram for training. We evaluated our methods and showed its effectiveness on four intrinsic and extrinsic tasks: word similarity, embedding numeracy, numeral prediction, and sequence labeling.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Syg6jTNtDH
PDF	https://openreview.net/pdf?id=Syg6jTNtDH
PWC	https://paperswithcode.com/paper/learning-numeral-embedding
Repo
Framework

Depth-Width Trade-offs for ReLU Networks via Sharkovsky’s Theorem


Title	Depth-Width Trade-offs for ReLU Networks via Sharkovsky’s Theorem
Authors	Anonymous
Abstract	Understanding the representational power of Deep Neural Networks (DNNs) and how their structural properties (e.g., depth, width, type of activation unit) affect the functions they can compute, has been an important yet challenging question in deep learning and approximation theory. In a seminal paper, Telgarsky high- lighted the benefits of depth by presenting a family of functions (based on sim- ple triangular waves) for which DNNs achieve zero classification error, whereas shallow networks with fewer than exponentially many nodes incur constant error. Even though Telgarsky’s work reveals the limitations of shallow neural networks, it doesn’t inform us on why these functions are difficult to represent and in fact he states it as a tantalizing open question to characterize those functions that cannot be well-approximated by smaller depths. In this work, we point to a new connection between DNNs expressivity and Sharkovsky’s Theorem from dynamical systems, that enables us to characterize the depth-width trade-offs of ReLU networks for representing functions based on the presence of a generalized notion of fixed points, called periodic points (a fixed point is a point of period 1). Motivated by our observation that the triangle waves used in Telgarsky’s work contain points of period 3 – a period that is special in that it implies chaotic behaviour based on the celebrated result by Li-Yorke – we proceed to give general lower bounds for the width needed to represent periodic functions as a function of the depth. Technically, the crux of our approach is based on an eigenvalue analysis of the dynamical systems associated with such functions.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJe55gBtvH
PDF	https://openreview.net/pdf?id=BJe55gBtvH
PWC	https://paperswithcode.com/paper/depth-width-trade-offs-for-relu-networks-via
Repo
Framework

On the Reflection of Sensitivity in the Generalization Error


Title	On the Reflection of Sensitivity in the Generalization Error
Authors	Anonymous
Abstract	Even though recent works have brought some insight into the performance improvement of techniques used in state-of-the-art deep-learning models, more work is needed to understand the generalization properties of over-parameterized deep neural networks. We shed light on this matter by linking the loss function to the output’s sensitivity to its input. We find a rather strong empirical relation between the output sensitivity and the variance in the bias-variance decomposition of the loss function, which hints on using sensitivity as a metric for comparing generalization performance of networks, without requiring labeled data. We find that sensitivity is decreased by applying popular methods which improve the generalization performance of the model, such as (1) using a deep network rather than a wide one, (2) adding convolutional layers to baseline classifiers instead of adding fully connected layers, (3) using batch normalization, dropout and max-pooling, and (4) applying parameter initialization techniques.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Hygq3JrtwS
PDF	https://openreview.net/pdf?id=Hygq3JrtwS
PWC	https://paperswithcode.com/paper/on-the-reflection-of-sensitivity-in-the
Repo
Framework