April 1, 2020

3030 words 15 mins read

Paper Group NANR 93

PNAT: Non-autoregressive Transformer by Position Learning. Diagnosing the Environment Bias in Vision-and-Language Navigation. Differentially Private Meta-Learning. Learning Underlying Physical Properties From Observations For Trajectory Prediction. Universal Approximation with Certified Networks. Neural Policy Gradient Methods: Global Optimality an …

PNAT: Non-autoregressive Transformer by Position Learning


Title	PNAT: Non-autoregressive Transformer by Position Learning
Authors	Anonymous
Abstract	Non-autoregressive generation is a new paradigm for text generation. Previous work hardly considers to explicitly model the positions of generated words. However, position modeling of output words is an essential problem in non-autoregressive text generation. In this paper, we propose PNAT, which explicitly models positions of output words as latent variables in text generation. The proposed PNATis simple yet effective. Experimental results show that PNATgives very promising results in machine translation and paraphrase generation tasks, outperforming many strong baselines.
Tasks	Machine Translation, Paraphrase Generation, Text Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=BJe932EYwS
PDF	https://openreview.net/pdf?id=BJe932EYwS
PWC	https://paperswithcode.com/paper/pnat-non-autoregressive-transformer-by
Repo
Framework


Title	Diagnosing the Environment Bias in Vision-and-Language Navigation
Authors	Anonymous
Abstract	Vision-and-Language Navigation (VLN) requires an agent to follow natural-language instructions, explore the given environments, and reach the desired target locations. These step-by-step navigational instructions are extremely useful in navigating new environments which the agent does not know about previously. Most recent works that study VLN observe a significant performance drop when tested on unseen environments (i.e., environments not used in training), indicating that the neural agent models are highly biased towards training environments. Although this issue is considered as one of major challenges in VLN research, it is still under-studied and needs a clearer explanation. In this work, we design novel diagnosis experiments via environment re-splitting and feature replacement, looking into possible reasons of this environment bias. We observe that neither the language nor the underlying navigational graph, but the low-level visual appearance conveyed by ResNet features directly affects the agent model and contributes to this environment bias in results. According to this observation, we explore several kinds of semantic representations which contain less low-level visual information, hence the agent learned with these features could be better generalized to unseen testing environments. Without modifying the baseline agent model and its training method, our explored semantic features significantly decrease the performance gap between seen and unseen on multiple datasets (i.e., 8.6% to 0.2% on R2R, 23.9% to 0.1% on R4R, and 3.74 to 0.17 on CVDN) and achieve competitive unseen results to previous state-of-the-art models.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=S1eYKlrYvr
PDF	https://openreview.net/pdf?id=S1eYKlrYvr
PWC	https://paperswithcode.com/paper/diagnosing-the-environment-bias-in-vision-and
Repo
Framework

Differentially Private Meta-Learning


Title	Differentially Private Meta-Learning
Authors	Anonymous
Abstract	Parameter-transfer is a well-known and versatile approach for meta-learning, with applications including few-shot learning, federated learning, with personalization, and reinforcement learning. However, parameter-transfer algorithms often require sharing models that have been trained on the samples from specific tasks, thus leaving the task-owners susceptible to breaches of privacy. We conduct the first formal study of privacy in this setting and formalize the notion of task-global differential privacy as a practical relaxation of more commonly studied threat models. We then propose a new differentially private algorithm for gradient-based parameter transfer that not only satisfies this privacy requirement but also retains provable transfer learning guarantees in convex settings. Empirically, we apply our analysis to the problems of federated learning with personalization and few-shot classification, showing that allowing the relaxation to task-global privacy from the more commonly studied notion of local privacy leads to dramatically increased performance in recurrent neural language modeling and image classification.
Tasks	Few-Shot Learning, Image Classification, Language Modelling, Meta-Learning, Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rJgqMRVYvr
PDF	https://openreview.net/pdf?id=rJgqMRVYvr
PWC	https://paperswithcode.com/paper/differentially-private-meta-learning-1
Repo
Framework

Learning Underlying Physical Properties From Observations For Trajectory Prediction


Title	Learning Underlying Physical Properties From Observations For Trajectory Prediction
Authors	Anonymous
Abstract	In this work we present an approach that combines deep learning together with laws of Newton’s physics for accurate trajectory predictions in physical games. Our model learns to estimate physical properties and forces that generated given observations, learns the relationships between available player’s actions and estimated physical properties and uses these extracted forces for predictions. We show the advantages of using physical laws together with deep learning by evaluating it against two baseline models that automatically discover features from the data without such a knowledge. We evaluate our model abilities to extract physical properties and to generalize to unseen trajectories in two games with a shooting mechanism. We also evaluate our model capabilities to transfer learned knowledge from a 2D game for predictions in a 3D game with a similar physics. We show that by using physical laws together with deep learning we achieve a better human-interpretability of learned physical properties, transfer of knowledge to a game with similar physics and very accurate predictions for previously unseen data.
Tasks	Trajectory Prediction
Published	2020-01-01
URL	https://openreview.net/forum?id=BJgZBxBYPB
PDF	https://openreview.net/pdf?id=BJgZBxBYPB
PWC	https://paperswithcode.com/paper/learning-underlying-physical-properties-from
Repo
Framework

Universal Approximation with Certified Networks


Title	Universal Approximation with Certified Networks
Authors	Anonymous
Abstract	Training neural networks to be certifiably robust is a powerful defense against adversarial attacks. However, while promising, state-of-the-art results with certified training are far from satisfactory. Currently, it is very difficult to train a neural network that is both accurate and certified on realistic datasets and specifications (e.g., robustness). Given this difficulty, a pressing existential question is: given a dataset and a specification, is there a network that is both certified and accurate with respect to these? While the evidence suggests “no”, we prove that for realistic datasets and specifications, such a network does exist and its certification can be established by propagating lower and upper bounds of each neuron through the network (interval analysis) – the most relaxed yet computationally efficient convex relaxation. Our result can be seen as a Universal Approximation Theorem for interval-certified ReLU networks. To the best of our knowledge, this is the first work to prove the existence of accurate, interval-certified networks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1gX8kBtPr
PDF	https://openreview.net/pdf?id=B1gX8kBtPr
PWC	https://paperswithcode.com/paper/universal-approximation-with-certified
Repo
Framework

Neural Policy Gradient Methods: Global Optimality and Rates of Convergence


Title	Neural Policy Gradient Methods: Global Optimality and Rates of Convergence
Authors	Anonymous
Abstract	Policy gradient methods with actor-critic schemes demonstrate tremendous empirical successes, especially when the actors and critics are parameterized by neural networks. However, it remains less clear whether such “neural” policy gradient methods converge to globally optimal policies and whether they even converge at all. We answer both the questions affirmatively in the overparameterized regime. In detail, we prove that neural natural policy gradient converges to a globally optimal policy at a sublinear rate. Also, we show that neural vanilla policy gradient converges sublinearly to a stationary point. Meanwhile, by relating the suboptimality of the stationary points to the~representation power of neural actor and critic classes, we prove the global optimality of all stationary points under mild regularity conditions. Particularly, we show that a key to the global optimality and convergence is the “compatibility” between the actor and critic, which is ensured by sharing neural architectures and random initializations across the actor and critic. To the best of our knowledge, our analysis establishes the first global optimality and convergence guarantees for neural policy gradient methods.
Tasks	Policy Gradient Methods
Published	2020-01-01
URL	https://openreview.net/forum?id=BJgQfkSYDS
PDF	https://openreview.net/pdf?id=BJgQfkSYDS
PWC	https://paperswithcode.com/paper/neural-policy-gradient-methods-global-1
Repo
Framework

Stochastic AUC Maximization with Deep Neural Networks


Title	Stochastic AUC Maximization with Deep Neural Networks
Authors	Anonymous
Abstract	Stochastic AUC maximization has garnered an increasing interest due to better fit to imbalanced data classification. However, existing works are limited to stochastic AUC maximization with a linear predictive model, which restricts its predictive power when dealing with extremely complex data. In this paper, we consider stochastic AUC maximization problem with a deep neural network as the predictive model. Building on the saddle point reformulation of a surrogated loss of AUC, the problem can be cast into a {\it non-convex concave} min-max problem. The main contribution made in this paper is to make stochastic AUC maximization more practical for deep neural networks and big data with theoretical insights as well. In particular, we propose to explore Polyak-\L{}ojasiewicz (PL) condition that has been proved and observed in deep learning, which enables us to develop new stochastic algorithms with even faster convergence rate and more practical step size scheme. An AdaGrad-style algorithm is also analyzed under the PL condition with adaptive convergence rate. Our experimental results demonstrate the effectiveness of the proposed algorithms.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HJepXaVYDr
PDF	https://openreview.net/pdf?id=HJepXaVYDr
PWC	https://paperswithcode.com/paper/stochastic-auc-maximization-with-deep-neural-1
Repo
Framework

V4D: 4D Covolutional Neural Networks for Video-level Representations Learning


Title	V4D: 4D Covolutional Neural Networks for Video-level Representations Learning
Authors	Anonymous
Abstract	Most existing 3D CNN structures for video representation learning are clip-based methods, and do not consider video-level temporal evolution of spatio-temporal features. In this paper, we propose Video-level 4D Convolutional Neural Networks, namely V4D, to model the evolution of long-range spatio-temporal representation with 4D convolutions, as well as preserving 3D spatio-temporal representations with residual connections. We further introduce the training and inference methods for the proposed V4D. Extensive experiments are conducted on three video recognition benchmarks, where V4D achieves excellent results, surpassing recent 3D CNNs by a large margin.
Tasks	Representation Learning, Video Recognition
Published	2020-01-01
URL	https://openreview.net/forum?id=SJeLopEYDH
PDF	https://openreview.net/pdf?id=SJeLopEYDH
PWC	https://paperswithcode.com/paper/v4d-4d-covolutional-neural-networks-for-video
Repo
Framework

Encoder-Decoder Based Convolutional Neural Network with Multi-Scale-Aware Modules for Crowd Counting


Title	Encoder-Decoder Based Convolutional Neural Network with Multi-Scale-Aware Modules for Crowd Counting
Authors	Pongpisit Thanasutives, Ken-ichi Fukui, Masayuki Numao, Boonserm Kijsirikul
Abstract	In this paper, we proposed two modified neural network architectures based on SFANet and SegNet respectively for accurate and efficient crowd counting. Inspired by SFANet, the first model is attached with two novel multi-scale-aware modules called, ASSP and CAN. This model is called M-SFANet. The encoder of M-SFANet is enhanced with ASSP containing parallel atrous convolution with different sampling rates and hence able to extract multi-scale features of the target object and incorporate larger context. To further deal with scale variation throughout an input image, we leverage contextual module called CAN which adaptively encodes the scales of the contextual information. The combination yields an effective model for counting in both dense and sparse crowd scenes. Based on SFANet decoder structure, M-SFANet decoder has dual paths, for density map generation and attention map generation. The second model is called M-SegNet. For M-SegNet, we simply change bilinear upsampling used in SFANet to max unpooling originally from SegNet and propose the faster model while providing competitive counting performance. Designed for high-speed surveillance applications, M-SegNet has no additional multi-scale-aware module in order to not increase the complexity. Both models are encoder-decoder based architectures and end-to-end trainable. We also conduct extensive experiments on four crowd counting datasets and one vehicle counting dataset to show that these modifications yield algorithms that could outperform some of state-of-the-art crowd counting methods.
Tasks	Crowd Counting
Published	2020-03-13
URL	https://arxiv.org/abs/2003.05586
PDF	https://arxiv.org/abs/2003.05586
PWC	https://paperswithcode.com/paper/encoder-decoder-based-convolutional-neural-1
Repo
Framework

Ergodic Inference: Accelerate Convergence by Optimisation


Title	Ergodic Inference: Accelerate Convergence by Optimisation
Authors	Anonymous
Abstract	Statistical inference methods are fundamentally important in machine learning. Most state-of-the-art inference algorithms are variants of Markov chain Monte Carlo (MCMC) or variational inference (VI). However, both methods struggle with limitations in practice: MCMC methods can be computationally demanding; VI methods may have large bias. In this work, we aim to improve upon MCMC and VI by a novel hybrid method based on the idea of reducing simulation bias of finite-length MCMC chains using gradient-based optimisation. The proposed method can generate low-biased samples by increasing the length of MCMC simulation and optimising the MCMC hyper-parameters, which offers attractive balance between approximation bias and computational efficiency. We show that our method produces promising results on popular benchmarks when compared to recent hybrid methods of MCMC and VI.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HkxZVlHYvH
PDF	https://openreview.net/pdf?id=HkxZVlHYvH
PWC	https://paperswithcode.com/paper/ergodic-inference-accelerate-convergence-by
Repo
Framework

Data-Independent Neural Pruning via Coresets


Title	Data-Independent Neural Pruning via Coresets
Authors	Anonymous
Abstract	Previous work showed empirically that large neural networks can be significantly reduced in size while preserving their accuracy. Model compression became a central research topic, as it is crucial for deployment of neural networks on devices with limited computational and memory resources. The majority of the compression methods are based on heuristics and offer no worst-case guarantees on the trade-off between the compression rate and the approximation error for an arbitrarily new sample. We propose the first efficient, data-independent neural pruning algorithm with a provable trade-off between its compression rate and the approximation error for any future test sample. Our method is based on the coreset framework, which finds a small weighted subset of points that provably approximates the original inputs. Specifically, we approximate the output of a layer of neurons by a coreset of neurons in the previous layer and discard the rest. We apply this framework in a layer-by-layer fashion from the top to the bottom. Unlike previous works, our coreset is data independent, meaning that it provably guarantees the accuracy of the function for any input $x\in \mathbb{R}^d$, including an adversarial one. We demonstrate the effectiveness of our method on popular network architectures. In particular, our coresets yield 90% compression of the LeNet-300-100 architecture on MNIST while improving the accuracy.
Tasks	Model Compression
Published	2020-01-01
URL	https://openreview.net/forum?id=H1gmHaEKwB
PDF	https://openreview.net/pdf?id=H1gmHaEKwB
PWC	https://paperswithcode.com/paper/data-independent-neural-pruning-via-coresets
Repo
Framework

Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery


Title	Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery
Authors	Anonymous
Abstract	Reinforcement learning requires manual specification of a reward function to learn a task. While in principle this reward function only needs to specify the task goal, in practice reinforcement learning can be very time-consuming or even infeasible unless the reward function is shaped so as to provide a smooth gradient towards a successful outcome. This shaping is difficult to specify by hand, particularly when the task is learned from raw observations, such as images. In this paper, we study how we can automatically learn dynamical distances: a measure of the expected number of time steps to reach a given goal state from any other state. These dynamical distances can be used to provide well-shaped reward functions for reaching new goals, making it possible to learn complex tasks efficiently. We show that dynamical distances can be used in a semi-supervised regime, where unsupervised interaction with the environment is used to learn the dynamical distances, while a small amount of preference supervision is used to determine the task goal, without any manually engineered reward function or goal examples. We evaluate our method both on a real-world robot and in simulation. We show that our method can learn to turn a valve with a real-world 9-DoF hand, using raw image observations and just ten preference labels, without any other supervision. Videos of the learned skills can be found on the project website: https://sites.google.com/view/skills-via-distance-learning.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1lmhaVtvr
PDF	https://openreview.net/pdf?id=H1lmhaVtvr
PWC	https://paperswithcode.com/paper/dynamical-distance-learning-for-semi
Repo
Framework

High Fidelity Speech Synthesis with Adversarial Networks


Title	High Fidelity Speech Synthesis with Adversarial Networks
Authors	Anonymous
Abstract	Generative adversarial networks have seen rapid development in recent years and have led to remarkable improvements in generative modelling of images. However, their application in the audio domain has received limited attention, and autoregressive models, such as WaveNet, remain the state of the art in generative modelling of audio signals such as human speech. To address this paucity, we introduce GAN-TTS, a Generative Adversarial Network for Text-to-Speech. Our architecture is composed of a conditional feed-forward generator producing raw speech audio, and an ensemble of discriminators which operate on random windows of different sizes. The discriminators analyse the audio both in terms of general realism, as well as how well the audio corresponds to the utterance that should be pronounced. To measure the performance of GAN-TTS, we employ both subjective human evaluation (MOS - Mean Opinion Score), as well as novel quantitative metrics (Fréchet DeepSpeech Distance and Kernel DeepSpeech Distance), which we find to be well correlated with MOS. We show that GAN-TTS is capable of generating high-fidelity speech with naturalness comparable to the state-of-the-art models, and unlike autoregressive models, it is highly parallelisable thanks to an efficient feed-forward generator. Listen to GAN-TTS reading this abstract at http://tiny.cc/gantts.
Tasks	Speech Synthesis
Published	2020-01-01
URL	https://openreview.net/forum?id=r1gfQgSFDr
PDF	https://openreview.net/pdf?id=r1gfQgSFDr
PWC	https://paperswithcode.com/paper/high-fidelity-speech-synthesis-with
Repo
Framework

An Explicitly Relational Neural Network Architecture


Title	An Explicitly Relational Neural Network Architecture
Authors	Anonymous
Abstract	With a view to bridging the gap between deep learning and symbolic AI, we present a novel end-to-end neural network architecture that learns to form propositional representations with an explicitly relational structure from raw pixel data. In order to evaluate and analyse the architecture, we introduce a family of simple visual relational reasoning tasks of varying complexity. We show that the proposed architecture, when pre-trained on a curriculum of such tasks, learns to generate reusable representations that better facilitate subsequent learning on previously unseen tasks when compared to a number of baseline architectures. The workings of a successfully trained model are visualised to shed some light on how the architecture functions.
Tasks	Relational Reasoning
Published	2020-01-01
URL	https://openreview.net/forum?id=S1l6ITVKPS
PDF	https://openreview.net/pdf?id=S1l6ITVKPS
PWC	https://paperswithcode.com/paper/an-explicitly-relational-neural-network-1
Repo
Framework

Synthetic vs Real: Deep Learning on Controlled Noise


Title	Synthetic vs Real: Deep Learning on Controlled Noise
Authors	Anonymous
Abstract	Performing controlled experiments on noisy data is essential in thoroughly understanding deep learning across a spectrum of noise levels. Due to the lack of suitable datasets, previous research have only examined deep learning on controlled synthetic noise, and real-world noise has never been systematically studied in a controlled setting. To this end, this paper establishes a benchmark of real-world noisy labels at 10 controlled noise levels. As real-world noise possesses unique properties, to understand the difference, we conduct a large-scale study across a variety of noise levels and types, architectures, methods, and training settings. Our study shows that: (1) Deep Neural Networks (DNNs) generalize much better on real-world noise. (2) DNNs may not learn patterns first on real-world noisy data. (3) When networks are fine-tuned, ImageNet architectures generalize well on noisy data. (4) Real-world noise appears to be less harmful, yet it is more difficult for robust DNN methods to improve. (5) Robust learning methods that work well on synthetic noise may not work as well on real-world noise, and vice versa. We hope our benchmark, as well as our findings, will facilitate deep learning research on noisy data.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Syx-bCEFPS
PDF	https://openreview.net/pdf?id=Syx-bCEFPS
PWC	https://paperswithcode.com/paper/synthetic-vs-real-deep-learning-on-controlled
Repo
Framework