April 1, 2020

2978 words 14 mins read

Paper Group NANR 89

Paper Group NANR 89

Attributes Obfuscation with Complex-Valued Features. IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks. Progressive Upsampling Audio Synthesis via Effective Adversarial Training. Accelerating SGD with momentum for over-parameterized learning. Estimating Gradients for Discrete Random Variables by Sampling without Re …

Attributes Obfuscation with Complex-Valued Features

Title Attributes Obfuscation with Complex-Valued Features
Authors Anonymous
Abstract The paper studies the possibility of hiding sensitive input information from the intermediate-layer features without too much accuracy degradation. We propose a generic method to revise a conventional neural network to boost the challenge of adversarially inferring about the input but still yields useful outputs. In particular, the method transforms real-valued features into complex-valued ones, in which the input information is hidden in a randomized phase of the transformed features. The knowledge of the phase acts like a key, with which any party can easily recover the prediction from the processing result, but without which the party can neither recover the output nor distinguish the original input. Preliminary experiments on various datasets and network structures have shown that our method significantly diminishes the adversary’s ability in inferring about the input while largely preserves the accuracy of the predicted outcome.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=S1xFl64tDr
PDF https://openreview.net/pdf?id=S1xFl64tDr
PWC https://paperswithcode.com/paper/attributes-obfuscation-with-complex-valued
Repo
Framework

IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks

Title IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks
Authors Anonymous
Abstract The practical usage of reinforcement learning agents is often bottlenecked by the duration of training time. To accelerate training, practitioners often turn to distributed reinforcement learning architectures to parallelize and accelerate the training process. However, modern methods for scalable reinforcement learning (RL) often tradeoff between the throughput of samples that an RL agent can learn from (sample throughput) and the quality of learning from each sample (sample efficiency). In these scalable RL architectures, as one increases sample throughput (i.e. increasing parallelization in IMPALA (Espeholt et al., 2018)), sample efficiency drops significantly. To address this, we propose a new distributed reinforcement learning algorithm, IMPACT. IMPACT extends PPO with three changes: a target network for stabilizing the surrogate objective, a circular buffer, and truncated importance sampling. In discrete action-space environments, we show that IMPACT attains higher reward and, simultaneously, achieves up to 30% decrease in training wall-time than that of IMPALA. For continuous control environments, IMPACT trains faster than existing scalable agents while preserving the sample efficiency of synchronous PPO.
Tasks Continuous Control
Published 2020-01-01
URL https://openreview.net/forum?id=BJeGlJStPr
PDF https://openreview.net/pdf?id=BJeGlJStPr
PWC https://paperswithcode.com/paper/impact-importance-weighted-asynchronous
Repo
Framework

Progressive Upsampling Audio Synthesis via Effective Adversarial Training

Title Progressive Upsampling Audio Synthesis via Effective Adversarial Training
Authors Anonymous
Abstract This paper proposes a novel generative model called PUGAN, which progressively synthesizes high-quality audio in a raw waveform. PUGAN leverages on the recently proposed idea of progressive generation of higher-resolution images by stacking multiple encode-decoder architectures. To effectively apply it to raw audio generation, we propose two novel modules: (1) a neural upsampling layer and (2) a sinc convolutional layer. Compared to the existing state-of-the-art model called WaveGAN, which uses a single decoder architecture, our model generates audio signals and converts them in a higher resolution in a progressive manner, while using a significantly smaller number of parameters, e.g., 20x smaller for 44.1kHz output, than an existing technique called WaveGAN. Our experiments show that the audio signals can be generated in real-time with the comparable quality to that of WaveGAN with respect to the inception scores and the human evaluation.
Tasks Audio Generation
Published 2020-01-01
URL https://openreview.net/forum?id=Skg9jnVFvH
PDF https://openreview.net/pdf?id=Skg9jnVFvH
PWC https://paperswithcode.com/paper/progressive-upsampling-audio-synthesis-via
Repo
Framework

Accelerating SGD with momentum for over-parameterized learning

Title Accelerating SGD with momentum for over-parameterized learning
Authors Anonymous
Abstract Nesterov SGD is widely used for training modern neural networks and other machine learning models. Yet, its advantages over SGD have not been theoretically clarified. Indeed, as we show in this paper, both theoretically and empirically, Nesterov SGD with any parameter selection does not in general provide acceleration over ordinary SGD. Furthermore, Nesterov SGD may diverge for step sizes that ensure convergence of ordinary SGD. This is in contrast to the classical results in the deterministic setting, where the same step size ensures accelerated convergence of the Nesterov’s method over optimal gradient descent. To address the non-acceleration issue, we introduce a compensation term to Nesterov SGD. The resulting algorithm, which we call MaSS, converges for same step sizes as SGD. We prove that MaSS obtains an accelerated convergence rates over SGD for any mini-batch size in the linear setting. For full batch, the convergence rate of MaSS matches the well-known accelerated rate of the Nesterov’s method. We also analyze the practically important question of the dependence of the convergence rate and optimal hyper-parameters on the mini-batch size, demonstrating three distinct regimes: linear scaling, diminishing returns and saturation. Experimental evaluation of MaSS for several standard architectures of deep networks, including ResNet and convolutional networks, shows improved performance over SGD, Nesterov SGD and Adam.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=r1gixp4FPH
PDF https://openreview.net/pdf?id=r1gixp4FPH
PWC https://paperswithcode.com/paper/accelerating-sgd-with-momentum-for-over
Repo
Framework

Estimating Gradients for Discrete Random Variables by Sampling without Replacement

Title Estimating Gradients for Discrete Random Variables by Sampling without Replacement
Authors Anonymous
Abstract We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement, which reduces variance as it avoids duplicate samples. We show that our estimator can be derived as the Rao-Blackwellization of three different estimators. Combining our estimator with REINFORCE, we obtain a policy gradient estimator and we reduce its variance using a built-in control variate which is obtained without additional model evaluations. The resulting estimator is closely related to other gradient estimators. Experiments with a toy problem, a categorical Variational Auto-Encoder and a structured prediction problem show that our estimator is the only estimator that is consistently among the best estimators in both high and low entropy settings.
Tasks Structured Prediction
Published 2020-01-01
URL https://openreview.net/forum?id=rklEj2EFvB
PDF https://openreview.net/pdf?id=rklEj2EFvB
PWC https://paperswithcode.com/paper/estimating-gradients-for-discrete-random
Repo
Framework

Mean-field Behaviour of Neural Tangent Kernel for Deep Neural Networks

Title Mean-field Behaviour of Neural Tangent Kernel for Deep Neural Networks
Authors Anonymous
Abstract Recent work by Jacot et al. (2018) has showed that training a neural network of any kind with gradient descent in parameter space is equivalent to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Lee et al. (2019) built on this result to show that the output of a neural network trained using full batch gradient descent can be approximated by a linear model for wide networks. In parallel, a recent line of studies ( Schoenhols et al. (2017), Hayou et al. (2019)) suggested that a special initialization known as the Edge of Chaos leads to good performance. In this paper, we bridge the gap between this two concepts and show the impact of the initialization and the activation function on the NTK as the network depth becomes large. We provide experiments illustrating our theoretical results.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=r1eCy0NtDH
PDF https://openreview.net/pdf?id=r1eCy0NtDH
PWC https://paperswithcode.com/paper/mean-field-behaviour-of-neural-tangent-kernel
Repo
Framework

Weakly-Supervised Trajectory Segmentation for Learning Reusable Skills

Title Weakly-Supervised Trajectory Segmentation for Learning Reusable Skills
Authors Anonymous
Abstract Learning useful and reusable skill, or sub-task primitives, is a long-standing problem in sensorimotor control. This is challenging because it’s hard to define what constitutes a useful skill. Instead of direct manual supervision which is tedious and prone to bias, in this work, our goal is to extract reusable skills from a collection of human demonstrations collected directly for several end-tasks. We propose a weakly-supervised approach for trajectory segmentation following the classic work on multiple instance learning. Our approach is end-to-end trainable, works directly from high-dimensional input (e.g., images) and only requires the knowledge of what skill primitives are present at training, without any need of segmentation or ordering of primitives. We evaluate our approach via rigorous experimentation across four environments ranging from simulation to real world robots, procedurally generated to human collected demonstrations and discrete to continuous action space. Finally, we leverage the generated skill segmentation to demonstrate preliminary evidence of zero-shot transfer to new combinations of skills. Result videos at https://sites.google.com/view/trajectory-segmentation/
Tasks Multiple Instance Learning
Published 2020-01-01
URL https://openreview.net/forum?id=HygkpxStvr
PDF https://openreview.net/pdf?id=HygkpxStvr
PWC https://paperswithcode.com/paper/weakly-supervised-trajectory-segmentation-for
Repo
Framework

Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories

Title Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories
Authors Anonymous
Abstract We address the problem of learning to discover 3D parts for objects in unseen categories. Being able to learn the geometry prior of parts and transfer this prior to unseen categories pose fundamental challenges on data-driven shape segmentation approaches. Formulated as a contextual bandit problem, we propose a learning-based iterative grouping framework which learns a grouping policy to progressively merge small part proposals into bigger ones in a bottom-up fashion. At the core of our approach is to restrict the local context for extracting part-level features, which guarantees the generalizability to novel categories. On a recently proposed large-scale fine-grained 3D part dataset, PartNet, we demonstrate that our method can transfer knowledge of parts learned from 3 training categories to 21 unseen testing categories without seeing any annotated samples. Quantitative comparisons against four strong shape segmentation baselines show that we achieve the state-of-the-art performance.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rkl8dlHYvB
PDF https://openreview.net/pdf?id=rkl8dlHYvB
PWC https://paperswithcode.com/paper/learning-to-group-a-bottom-up-framework-for
Repo
Framework

Model-Agnostic Feature Selection with Additional Mutual Information

Title Model-Agnostic Feature Selection with Additional Mutual Information
Authors Anonymous
Abstract Answering questions about data can require understanding what parts of an input X influence the response Y. Finding such an understanding can be built by testing relationships between variables through a machine learning model. For example, conditional randomization tests help determine whether a variable relates to the response given the rest of the variables. However, randomization tests require users to specify test statistics. We formalize a class of proper test statistics that are guaranteed to select a feature when it provides information about the response even when the rest of the features are known. We show that f-divergences provide a broad class of proper test statistics. In the class of f-divergences, the KL-divergence yields an easy-to-compute proper test statistic that relates to the AMI. Questions of feature importance can be asked at the level of an individual sample. We show that estimators from the same AMI test can also be used to find important features in a particular instance. We provide an example to show that perfect predictive models are insufficient for instance-wise feature selection. We evaluate our method on several simulation experiments, on a genomic dataset, a clinical dataset for hospital readmission, and on a subset of classes in ImageNet. Our method outperforms several baselines in various simulated datasets, is able to identify biologically significant genes, can select the most important predictors of a hospital readmission event, and is able to identify distinguishing features in an image-classification task.
Tasks Feature Importance, Feature Selection, Image Classification
Published 2020-01-01
URL https://openreview.net/forum?id=HJg_tkBtwS
PDF https://openreview.net/pdf?id=HJg_tkBtwS
PWC https://paperswithcode.com/paper/model-agnostic-feature-selection-with
Repo
Framework

GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation

Title GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation
Authors Anonymous
Abstract Molecular graph generation is a fundamental problem for drug discovery and has been attracting growing attention. The problem is challenging since it requires not only generating chemically valid molecular structures but also optimizing their chemical properties in the meantime. Inspired by the recent progress in deep generative models, in this paper we propose a flow-based autoregressive model for graph generation called GraphAF. GraphAF combines the advantages of both autoregressive and flow-based approaches and enjoys: (1) high model flexibility for data density estimation; (2) efficient parallel computation for training; (3) an iterative sampling process, which allows leveraging chemical domain knowledge for valency checking. Experimental results show that GraphAF is able to generate 68% chemically valid molecules even without chemical knowledge rules and 100% valid molecules with chemical rules. The training process of GraphAF is two times faster than the existing state-of-the-art approach GCPN. After fine-tuning the model for goal-directed property optimization with reinforcement learning, GraphAF achieves state-of-the-art performance on both chemical property optimization and constrained property optimization.
Tasks Density Estimation, Drug Discovery, Graph Generation
Published 2020-01-01
URL https://openreview.net/forum?id=S1esMkHYPr
PDF https://openreview.net/pdf?id=S1esMkHYPr
PWC https://paperswithcode.com/paper/graphaf-a-flow-based-autoregressive-model-for
Repo
Framework

The Curious Case of Neural Text Degeneration

Title The Curious Case of Neural Text Degeneration
Authors Anonymous
Abstract Despite considerable advances in neural language modeling, it remains an open question what the best decoding strategy is for text generation from a language model (e.g. to generate a story). The counter-intuitive empirical observation is that even though the use of likelihood as training objective leads to high quality models for a broad range of language understanding tasks, maximization-based decoding methods such as beam search lead to degeneration — output text that is bland, incoherent, or gets stuck in repetitive loops. To address this we propose Nucleus Sampling, a simple but effective method to draw considerably higher quality text out of neural language models. Our approach avoids text degeneration by truncating the unreliable tail of the probability distribution, sampling from the dynamic nucleus of tokens containing the vast majority of the probability mass. To properly examine current maximization-based and stochastic decoding methods, we compare generations from each of these methods to the distribution of human text along several axes such as likelihood, diversity, and repetition. Our results show that (1) maximization is an inappropriate decoding objective for open-ended text generation, (2) the probability distributions of the best current language models have an unreliable tail which needs to be truncated during generation and (3) Nucleus Sampling is the best decoding strategy for generating long-form text that is both high-quality — as measured by human evaluation — and as diverse as human-written text.
Tasks Language Modelling, Text Generation
Published 2020-01-01
URL https://openreview.net/forum?id=rygGQyrFvH
PDF https://openreview.net/pdf?id=rygGQyrFvH
PWC https://paperswithcode.com/paper/the-curious-case-of-neural-text-degeneration-1
Repo
Framework

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

Title Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks
Authors Anonymous
Abstract Recent theoretical work has established connections between over-parametrized neural networks and linearized models governed by the Neural Tangent Kernels (NTKs). NTK theory leads to concrete convergence and generalization results, yet the empirical performance of neural networks are observed to exceed their linearized models, suggesting insufficiency of this theory. Towards closing this gap, we investigate the training of over-parametrized neural networks that are beyond the NTK regime yet still governed by the Taylor expansion of the network. We bring forward the idea of randomizing the neural networks, which allows them to escape their NTK and couple with quadratic models. We show that the optimization landscape of randomized two-layer networks are nice and amenable to escaping-saddle algorithms. We prove concrete generalization and expressivity results on these randomized networks, which leads to sample complexity bounds (of learning certain simple functions) that match the NTK and can in addition be better by a dimension factor when mild distributional assumptions are present. We demonstrate that our randomization technique can be generalized systematically beyond the quadratic case, by using it to find networks that are coupled with higher-order terms in their Taylor series.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rkllGyBFPH
PDF https://openreview.net/pdf?id=rkllGyBFPH
PWC https://paperswithcode.com/paper/beyond-linearization-on-quadratic-and-higher-1
Repo
Framework

Question Generation from Paragraphs: A Tale of Two Hierarchical Models

Title Question Generation from Paragraphs: A Tale of Two Hierarchical Models
Authors Anonymous
Abstract Automatic question generation from paragraphs is an important and challenging problem, particularly due to the long context from paragraphs. In this paper, we propose and study two hierarchical models for the task of question generation from paragraphs. Specifically, we propose (a) a novel hierarchical BiLSTM model with selective attention and (b) a novel hierarchical Transformer architecture, both of which learn hierarchical representations of paragraphs. We model a paragraph in terms of its constituent sentences, and a sentence in terms of its constituent words. While the introduction of the attention mechanism benefits the hierarchical BiLSTM model, the hierarchical Transformer, with its inherent attention and positional encoding mechanisms also performs better than flat transformer model. We conducted empirical evaluation on the widely used SQuAD and MS MARCO datasets using standard metrics. The results demonstrate the overall effectiveness of the hierarchical models over their flat counterparts. Qualitatively, our hierarchical models are able to generate fluent and relevant questions.
Tasks Question Generation
Published 2020-01-01
URL https://openreview.net/forum?id=BJeVXgBKDH
PDF https://openreview.net/pdf?id=BJeVXgBKDH
PWC https://paperswithcode.com/paper/question-generation-from-paragraphs-a-tale-of
Repo
Framework

Variational lower bounds on mutual information based on nonextensive statistical mechanics

Title Variational lower bounds on mutual information based on nonextensive statistical mechanics
Authors Anonymous
Abstract This paper aims to address the limitations of mutual information estimators based on variational optimization. By redefining the cost using generalized functions from nonextensive statistical mechanics we raise the upper bound of previous estimators and enable the control of the bias variance trade off. Variational based estimators outperform previous methods especially in high dependence high dimensional scenarios found in machine learning setups. Despite their performance, these estimators either exhibit a high variance or are upper bounded by log(batch size). Our approach inspired by nonextensive statistical mechanics uses different generalizations for the logarithm and the exponential in the partition function. This enables the estimator to capture changes in mutual information over a wider range of dimensions and correlations of the input variables whereas previous estimators saturate them.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HJgR5lSFwr
PDF https://openreview.net/pdf?id=HJgR5lSFwr
PWC https://paperswithcode.com/paper/variational-lower-bounds-on-mutual
Repo
Framework

Modeling question asking using neural program generation

Title Modeling question asking using neural program generation
Authors Anonymous
Abstract People ask questions that are far richer, more informative, and more creative than current AI systems. We propose a neural program generation framework for modeling human question asking, which represents questions as formal programs and generates programs with an encoder-decoder based deep neural network. From extensive experiments using an information-search game, we show that our method can ask optimal questions in synthetic settings, and predict which questions humans are likely to ask in unconstrained settings. We also propose a novel grammar-based question generation framework trained with reinforcement learning, which is able to generate creative questions without supervised data.
Tasks Question Generation
Published 2020-01-01
URL https://openreview.net/forum?id=SylR-CEKDS
PDF https://openreview.net/pdf?id=SylR-CEKDS
PWC https://paperswithcode.com/paper/modeling-question-asking-using-neural-program-1
Repo
Framework
comments powered by Disqus