April 1, 2020

3273 words 16 mins read

Paper Group NAWR 1

Paper Group NAWR 1

Retrieving Signals in the Frequency Domain with Deep Complex Extractors. Learning Cross-modal Context Graph for Visual Grounding. Differentially Private Mixed-Type Data Generation For Unsupervised Learning. HighRes-net: Multi-Frame Super-Resolution by Recursive Fusion. Cost-Effective Testing of a Deep Learning Model through Input Reduction. Learnin …

Retrieving Signals in the Frequency Domain with Deep Complex Extractors

Title Retrieving Signals in the Frequency Domain with Deep Complex Extractors
Authors Anonymous
Abstract Recent advances have made it possible to create deep complex-valued neural networks. Despite this progress, the potential power of fully complex intermediate computations and representations has not yet been explored for many challenging learning problems. Building on recent advances, we propose a novel mechanism for extracting signals in the frequency domain. As a case study, we perform audio source separation in the Fourier domain. Our extraction mechanism could be regarded as a local ensembling method that combines a complex-valued convolutional version of Feature-Wise Linear Modulation (FiLM) and a signal averaging operation. We also introduce a new explicit amplitude and phase-aware loss, which is scale and time invariant, taking into account the complex-valued components of the spectrogram. Using the Wall Street Journal Dataset, we compare our phase-aware loss to several others that operate both in the time and frequency domains and demonstrate the effectiveness of our proposed signal extraction method and proposed loss. When operating in the complex-valued frequency domain, our deep complex-valued network substantially outperforms its real-valued counterparts even with half the depth and a third of the parameters. Our proposed mechanism improves significantly deep complex-valued networks’ performance and we demonstrate the usefulness of its regularizing effect.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BylB4kBtwB
PDF https://openreview.net/pdf?id=BylB4kBtwB
PWC https://paperswithcode.com/paper/retrieving-signals-in-the-frequency-domain
Repo https://github.com/FourierSignalRetrievalICLR2020/FourierExtraction
Framework pytorch

Learning Cross-modal Context Graph for Visual Grounding

Title Learning Cross-modal Context Graph for Visual Grounding
Authors Yongfei Liu; Bo Wan; Xiaodan Zhu; Xuming He
Abstract Visual grounding is a ubiquitous building block in many vision-language tasks and yet remains challenging due to large variations in visual and linguistic features of grounding entities, strong context effect and the resulting semantic ambiguities. Prior works typically focus on learning representations of individual phrases with limited context information. To address their limitations, this paper proposes a language-guided graph representation to capture the global context of grounding entities and their relations, and develop a cross-modal graph matching strategy for the multiple-phrase visual grounding task. In particular, we introduce a modular graph neural network to compute context-aware representations of phrases and object proposals respectively via message propagation, followed by a graph-based matching module to generate globally consistent localization of grounding phrases. We train the entire graph neural network jointly in a two-stage strategy and evaluate it on the Flickr30K Entities benchmark. Extensive experiments show that our method outperforms the prior state of the arts by a sizable margin, evidencing the efficacy of our grounding framework. Code is available at https://github.com/youngfly11/LCMCG-PyTorch.
Tasks Graph Matching, Language Modelling, Natural Language Visual Grounding, Phrase Grounding
Published 2020-02-13
URL https://arxiv.org/pdf/1911.09042.pdf
PDF https://arxiv.org/pdf/1911.09042.pdf
PWC https://paperswithcode.com/paper/learning-cross-modal-context-graph-for-visual
Repo https://github.com/youngfly11/LCMCG-PyTorch
Framework pytorch

Differentially Private Mixed-Type Data Generation For Unsupervised Learning

Title Differentially Private Mixed-Type Data Generation For Unsupervised Learning
Authors Anonymous
Abstract In this work we introduce the DP-auto-GAN framework for synthetic data generation, which combines the low dimensional representation of autoencoders with the flexibility of GANs. This framework can be used to take in raw sensitive data, and privately train a model for generating synthetic data that should satisfy the same statistical properties as the original data. This learned model can be used to generate arbitrary amounts of publicly available synthetic data, which can then be freely shared due to the post-processing guarantees of differential privacy. Our framework is applicable to unlabled \emph{mixed-type data}, that may include binary, categorical, and real-valued data. We implement this framework on both unlabeled binary data (MIMIC-III) and unlabeled mixed-type data (ADULT). We also introduce new metrics for evaluating the quality of synthetic mixed-type data, particularly in unsupervised settings.
Tasks Synthetic Data Generation
Published 2020-01-01
URL https://openreview.net/forum?id=HygFxxrFvB
PDF https://openreview.net/pdf?id=HygFxxrFvB
PWC https://paperswithcode.com/paper/differentially-private-mixed-type-data
Repo https://github.com/DPautoGAN/DPautoGAN
Framework pytorch

HighRes-net: Multi-Frame Super-Resolution by Recursive Fusion

Title HighRes-net: Multi-Frame Super-Resolution by Recursive Fusion
Authors Michel Deudon, Alfredo Kalaitzis, Md Rifat Arefin, Israel Goytom, Zhichao Lin, Kris Sankaran, Vincent Michalski, Samira E Kahou, Julien Cornebise, Yoshua Bengio
Abstract Generative deep learning has sparked a new wave of Super-Resolution (SR) algorithms that enhance single images with impressive aesthetic results, albeit with imaginary details. Multi-frame Super-Resolution (MFSR) offers a more grounded approach to the ill-posed problem, by conditioning on multiple low-resolution views. This is important for satellite monitoring of human impact on the planet – from deforestation, to human rights violations – that depend on reliable imagery. To this end, we present HighRes-net, the first deep learning approach to MFSR that learns its sub-tasks in an end-to-end fashion: (i) co-registration, (ii) fusion, (iii) up-sampling, and (iv) registration-at-the-loss. Co-registration of low-res views is learned implicitly through a reference-frame channel, with no explicit registration mechanism. We learn a global fusion operator that is applied recursively on an arbitrary number of low-res pairs. We introduce a registered loss, by learning to align the SR output to a ground-truth through ShiftNet. We show that by learning deep representations of multiple views, we can super-resolve low-resolution signals and enhance Earth observation data at scale. Our approach recently topped the European Space Agency’s MFSR competition on real-world satellite imagery.
Tasks De-aliasing, Image Registration, Image Super-Resolution, Multi-Frame Super-Resolution, Super-Resolution
Published 2020-01-01
URL https://openreview.net/forum?id=HJxJ2h4tPr
PDF https://openreview.net/pdf?id=HJxJ2h4tPr
PWC https://paperswithcode.com/paper/highres-net-multi-frame-super-resolution-by
Repo https://github.com/ElementAI/HighRes-net
Framework pytorch

Cost-Effective Testing of a Deep Learning Model through Input Reduction

Title Cost-Effective Testing of a Deep Learning Model through Input Reduction
Authors Anonymous
Abstract With the increasing adoption of Deep Learning (DL) models in various applications, testing DL models is vitally important. However, testing DL models is costly and expensive, especially when developers explore alternative designs of DL models and tune the hyperparameters. To reduce testing cost, we propose to use only a selected subset of testing data, which is small but representative enough for quick estimation of the performance of DL models. Our approach, called DeepReduce, adopts a two-phase strategy. At first, our approach selects testing data for the purpose of satisfying testing adequacy. Then, it selects more testing data in order to approximate the distribution between the whole testing data and the selected data leveraging relative entropy minimization. Experiments with various DL models and datasets show that our approach can reduce the whole testing data to 4.6% on average, and can reliably estimate the performance of DL models. Our approach significantly outperforms the random approach, and is more stable and reliable than the state-of-the-art approach.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=S1xCcpNYPr
PDF https://openreview.net/pdf?id=S1xCcpNYPr
PWC https://paperswithcode.com/paper/cost-effective-testing-of-a-deep-learning
Repo https://github.com/DeepReduce/DeepReduce
Framework tf

Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement Learning

Title Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement Learning
Authors Anonymous
Abstract Conservation laws are considered to be fundamental laws of nature. It has broad application in many fields including physics, chemistry, biology, geology, and engineering. Solving the differential equations associated with conservation laws is a major branch in computational mathematics. Recent success of machine learning, especially deep learning, in areas such as computer vision and natural language processing, has attracted a lot of attention from the community of computational mathematics and inspired many intriguing works in combining machine learning with traditional methods. In this paper, we are the first to explore the possibility and benefit of solving nonlinear conservation laws using deep reinforcement learning. As a proof of concept, we focus on 1-dimensional scalar conservation laws. We deploy the machinery of deep reinforcement learning to train a policy network that can decide on how the numerical solutions should be approximated in a sequential and spatial-temporal adaptive manner. We will show that the problem of solving conservation laws can be naturally viewed as a sequential decision making process and the numerical schemes learned in such a way can easily enforce long-term accuracy. Furthermore, the learned policy network is carefully designed to determine a good local discrete approximation based on the current state of the solution, which essentially makes the proposed method a meta-learning approach. In other words, the proposed method is capable of learning how to discretize for a given situation mimicking human experts. Finally, we will provide details on how the policy network is trained, how well it performs compared with some state-of-the-art numerical solvers such as WENO schemes, and how well it generalizes. Our code is released anomynously at \url{https://github.com/qwerlanksdf/L2D}.
Tasks Decision Making, Meta-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=rygBVTVFPB
PDF https://openreview.net/pdf?id=rygBVTVFPB
PWC https://paperswithcode.com/paper/learning-to-discretize-solving-1d-scalar-1
Repo https://github.com/qwerlanksdf/L2D
Framework pytorch

CZ-GEM: A FRAMEWORK FOR DISENTANGLED REPRESENTATION LEARNING

Title CZ-GEM: A FRAMEWORK FOR DISENTANGLED REPRESENTATION LEARNING
Authors Akash Srivastava, Yamini Bansal, Yukun Ding, Bernhard Egger, Prasanna Sattigeri, Josh Tenenbaum, David D. Cox, Dan Gutfreund
Abstract Learning disentangled representations of data is one of the central themes in unsupervised learning in general and generative modelling in particular. In this work, we tackle a slightly more intricate scenario where the observations are generated from a conditional distribution of some known control variate and some latent noise variate. To this end, we present a hierarchical model and a training method (CZ-GEM) that leverages some of the recent developments in likelihood-based and likelihood-free generative models. We show that by formulation, CZ-GEM introduces the right inductive biases that ensure the disentanglement of the control from the noise variables, while also keeping the components of the control variate disentangled. This is achieved without compromising on the quality of the generated samples. Our approach is simple, general, and can be applied both in supervised and unsupervised settings.
Tasks Representation Learning
Published 2020-01-01
URL https://openreview.net/forum?id=r1e74a4twH
PDF https://openreview.net/pdf?id=r1e74a4twH
PWC https://paperswithcode.com/paper/cz-gem-a-framework-for-disentangled
Repo https://github.com/AnonymousAuthors000/CZ-GEM
Framework tf

A Fine-Grained Spectral Perspective on Neural Networks

Title A Fine-Grained Spectral Perspective on Neural Networks
Authors Anonymous
Abstract Are neural networks biased toward simple functions? Does depth always help learn more complex features? Is training the last layer of a network as good as training all layers? These questions seem unrelated at face value, but in this work we give all of them a common treatment from the spectral perspective. We will study the spectra of the Conjugate Kernel, CK, (also called the Neural Network-Gaussian Process Kernel), and the Neural Tangent Kernel, NTK. Roughly, the CK and the NTK tell us respectively "what a network looks like at initialization" and "what a network looks like during and after training.” Their spectra then encode valuable information about the initial distribution and the training and generalization properties of neural networks. By analyzing the eigenvalues, we lend novel insights into the questions put forth at the beginning, and we verify these insights by extensive experiments of neural networks. We believe the computational tools we develop here for analyzing the spectra of CK and NTK serve as a solid foundation for future studies of deep neural networks. We have open-sourced the code for it and for generating the plots in this paper at github.com/jxVmnLgedVwv6mNcGCBy/NNspectra.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HJlU-AVtvS
PDF https://openreview.net/pdf?id=HJlU-AVtvS
PWC https://paperswithcode.com/paper/a-fine-grained-spectral-perspective-on-neural-1
Repo https://github.com/jxVmnLgedVwv6mNcGCBy/NNspectra
Framework pytorch

Your classifier is secretly an energy based model and you should treat it like one

Title Your classifier is secretly an energy based model and you should treat it like one
Authors Anonymous
Abstract We propose to reinterpret a standard discriminative classifier of p(yx) as an energy based model for the joint distribution p(x, y). In this setting, the standard class probabilities can be easily computed as well as unnormalized values of p(x) and p(xy). Within this framework, standard discriminative architectures may be used and the model can also be trained on unlabeled data. We demonstrate that energy based training of the joint distribution improves calibration, robustness, and out-of-distribution detection while also enabling our models to generate samples rivaling the quality of recent GAN approaches. We improve upon recently proposed techniques for scaling up the training of energy based models and present an approach which adds little overhead compared to standard classification training. Our approach is the first to achieve performance rivaling the state-of-the-art in both generative and discriminative learning within one hybrid model.
Tasks Calibration, Out-of-Distribution Detection
Published 2020-01-01
URL https://openreview.net/forum?id=Hkxzx0NtDB
PDF https://openreview.net/pdf?id=Hkxzx0NtDB
PWC https://paperswithcode.com/paper/your-classifier-is-secretly-an-energy-based
Repo https://github.com/wgrathwohl/JEM
Framework pytorch

Spatial-Temporal Moving Target Defense: A Markov Stackelberg Game Model

Title Spatial-Temporal Moving Target Defense: A Markov Stackelberg Game Model
Authors Henger Li, Wen Shen, Zizhan Li
Abstract Moving target defense has emerged as a critical paradigm of protecting a vulnerable system against persistent and stealthy attacks. To protect a system, a defender proactively changes the system configurations to limit the exposure of security vulnerabilities to potential attackers. In doing so, the defender creates asymmetric uncertainty and complexity for the attackers, making it much harder for them to compromise the system. In practice, the defender incurs a switching cost for each migration of the system configurations. The switching cost usually depends on both the current configuration and the following configuration. Besides, different system configurations typically require a different amount of time for an attacker to exploit and attack. Therefore, a defender must simultaneously decide both the optimal sequence of system configurations and the optimal timing for switching. In this paper, we propose a Markov Stackelberg Game framework to precisely characterize the defender’s spatial and temporal decision-making in the face of advanced attackers. We introduce a value iteration algorithm that computes the defender’s optimal moving target defense strategies. Empirical evaluation on real-world problems demonstrates the advantages of the Markov Stackelberg game model for spatial-temporal moving target defense.
Tasks Decision Making
Published 2020-05-12
URL https://arxiv.org/abs/2002.10390
PDF https://arxiv.org/pdf/2002.10390.pdf
PWC https://paperswithcode.com/paper/spatial-temporal-moving-target-defense-a
Repo https://github.com/HengerLi/SPT-MTD
Framework none

Source Model Selection for Deep Learning in the Time Series Domain

Title Source Model Selection for Deep Learning in the Time Series Domain
Authors Amiel Meiseles, Lior Rokach
Abstract Transfer Learning aims to transfer knowledge from a source task to a target task. We focus on a situation when there is a large number of available source models, and we are interested in choosing a single source model that can maximize the predictive performance in the target domain. Existing methods compute some form of “similarity” between the source task data and the target task data. They then select the most similar source task and use the model trained on it for transfer learning. Previous methods do not account for the fact that it is the model parameters that are transferred rather than the data. Therefore, the “similarity” of the source data does not directly influence transfer learning performance. In addition, we would like the possibility of confidently selecting a source model even when the data it was trained on is not available, for example, due to privacy or copyright constraints. We propose to use the truncated source models as encoders for the target data. We then select a source model based on how well it clusters the target data in the latent encoding space, which we calculate using the Mean Silhouette Coefficient. We prove that if the encodings achieve a Mean Silhouette Coefficient of 1, optimal classification can be achieved using just the final layer of the target network. We evaluate our method using the University of California, Riverside (UCR) time series archive and show that the proposed method achieves comparable results to previous work, without using the source data.
Tasks Model Selection, Time Series, Time Series Classification, Transfer Learning
Published 2020-01-03
URL https://ieeexplore.ieee.org/document/8949507
PDF https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8949507
PWC https://paperswithcode.com/paper/source-model-selection-for-deep-learning-in
Repo https://github.com/amielm/Source-Model-Selection
Framework none

The Convex Information Bottleneck Lagrangian

Title The Convex Information Bottleneck Lagrangian
Authors Anonymous
Abstract The information bottleneck (IB) problem tackles the issue of obtaining relevant compressed representations T of some random variable X for the task of predicting Y. It is defined as a constrained optimization problem which maximizes the information the representation has about the task, I(T;Y), while ensuring that a minimum level of compression r is achieved (i.e., I(X;T) <= r). For practical reasons the problem is usually solved by maximizing the IB Lagrangian for many values of the Lagrange multiplier, therefore drawing the IB curve (i.e., the curve of maximal I(T;Y) for a given I(X;Y)) and selecting the representation of desired predictability and compression. It is known when Y is a deterministic function of X, the IB curve cannot be explored and other Lagrangians have been proposed to tackle this problem (e.g., the squared IB Lagrangian). In this paper we (i) present a general family of Lagrangians which allow for the exploration of the IB curve in all scenarios; (ii) prove that if these Lagrangians are used, there is a one-to-one mapping between the Lagrange multiplier and the desired compression rate r for known IB curve shapes, hence, freeing from the burden of solving the optimization problem for many values of the Lagrange multiplier.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SkxhS6EYvH
PDF https://openreview.net/pdf?id=SkxhS6EYvH
PWC https://paperswithcode.com/paper/the-convex-information-bottleneck-lagrangian
Repo https://github.com/burklight/convex-IB-Lagrangian-PyTorch
Framework pytorch

AN EFFICIENT HOMOTOPY TRAINING ALGORITHM FOR NEURAL NETWORKS

Title AN EFFICIENT HOMOTOPY TRAINING ALGORITHM FOR NEURAL NETWORKS
Authors Qipin Chen, Wenrui Hao
Abstract We present a Homotopy Training Algorithm (HTA) to solve optimization problems arising from neural networks. The HTA starts with several decoupled systems with low dimensional structure and tracks the solution to the high dimensional coupled system. The decoupled systems are easy to solve due to the low dimensionality but can be connected to the original system via a continuous homotopy path guided by the HTA. We have proved the convergence of HTA for the non-convex case and existence of the homotopy solution path for the convex case. The HTA has provided a better accuracy on several examples including VGG models on CIFAR-10. Moreover, the HTA would be combined with the dropout technique to provide an alternative way to train the neural networks.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=B1l6nnEtwr
PDF https://openreview.net/pdf?id=B1l6nnEtwr
PWC https://paperswithcode.com/paper/an-efficient-homotopy-training-algorithm-for
Repo https://github.com/Bill-research/homotopy
Framework pytorch

Editable Neural Networks

Title Editable Neural Networks
Authors Anonymous
Abstract These days deep neural networks are ubiquitously used in a wide range of tasks, from image classification and machine translation to face identification and self-driving cars. In many applications, a single model error can lead to devastating financial, reputational and even life-threatening consequences. Therefore, it is crucially important to correct model mistakes quickly as they appear. In this work, we investigate the problem of neural network editing - how one can efficiently patch a mistake of the model on a particular sample, without influencing the model behavior on other samples. Namely, we propose Editable Training, a model-agnostic training technique that encourages fast editing of the trained model. We empirically demonstrate the effectiveness of this method on large-scale image classification and machine translation tasks.
Tasks Face Identification, Image Classification, Machine Translation, Self-Driving Cars
Published 2020-01-01
URL https://openreview.net/forum?id=HJedXaEtvS
PDF https://openreview.net/pdf?id=HJedXaEtvS
PWC https://paperswithcode.com/paper/editable-neural-networks
Repo https://github.com/editable-ICLR2020/editable
Framework pytorch

NAMSG: An Efficient Method for Training Neural Networks

Title NAMSG: An Efficient Method for Training Neural Networks
Authors Anonymous
Abstract We introduce NAMSG, an adaptive first-order algorithm for training neural networks. The method is efficient in computation and memory, and is straightforward to implement. It computes the gradients at configurable remote observation points, in order to expedite the convergence by adjusting the step size for directions with different curvatures in the stochastic setting. It also scales the updating vector elementwise by a nonincreasing preconditioner to take the advantages of AMSGRAD. We analyze the convergence properties for both convex and nonconvex problems by modeling the training process as a dynamic system, and provide a strategy to select the observation factor without grid search. A data-dependent regret bound is proposed to guarantee the convergence in the convex setting. The method can further achieve a O(log(T)) regret bound for strongly convex functions. Experiments demonstrate that NAMSG works well in practical problems and compares favorably to popular adaptive methods, such as ADAM, NADAM, and AMSGRAD.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HkxGaeHKvB
PDF https://openreview.net/pdf?id=HkxGaeHKvB
PWC https://paperswithcode.com/paper/namsg-an-efficient-method-for-training-neural-1
Repo https://github.com/rationalspark/NAMSG
Framework none
comments powered by Disqus