July 29, 2019

3259 words 16 mins read

Paper Group AWR 110

Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?. Face Synthesis from Visual Attributes via Sketch using Conditional VAEs and GANs. RIDI: Robust IMU Double Integration. Neural Natural Language Inference Models Enhanced with External Knowled …

Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning


Title	Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning
Authors	Benjamin Eysenbach, Shixiang Gu, Julian Ibarz, Sergey Levine
Abstract	Deep reinforcement learning algorithms can learn complex behavioral skills, but real-world application of these methods requires a large amount of experience to be collected by the agent. In practical settings, such as robotics, this involves repeatedly attempting a task, resetting the environment between each attempt. However, not all tasks are easily or automatically reversible. In practice, this learning process requires extensive human intervention. In this work, we propose an autonomous method for safe and efficient reinforcement learning that simultaneously learns a forward and reset policy, with the reset policy resetting the environment for a subsequent attempt. By learning a value function for the reset policy, we can automatically determine when the forward policy is about to enter a non-reversible state, providing for uncertainty-aware safety aborts. Our experiments illustrate that proper use of the reset policy can greatly reduce the number of manual resets required to learn a task, can reduce the number of unsafe actions that lead to non-reversible states, and can automatically induce a curriculum.
Tasks
Published	2017-11-18
URL	http://arxiv.org/abs/1711.06782v1
PDF	http://arxiv.org/pdf/1711.06782v1.pdf
PWC	https://paperswithcode.com/paper/leave-no-trace-learning-to-reset-for-safe-and
Repo	https://github.com/brain-research/LeaveNoTrace
Framework	none

What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?


Title	What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?
Authors	Alex Kendall, Yarin Gal
Abstract	There are two major types of uncertainty one can model. Aleatoric uncertainty captures noise inherent in the observations. On the other hand, epistemic uncertainty accounts for uncertainty in the model – uncertainty which can be explained away given enough data. Traditionally it has been difficult to model epistemic uncertainty in computer vision, but with new Bayesian deep learning tools this is now possible. We study the benefits of modeling epistemic vs. aleatoric uncertainty in Bayesian deep learning models for vision tasks. For this we present a Bayesian deep learning framework combining input-dependent aleatoric uncertainty together with epistemic uncertainty. We study models under the framework with per-pixel semantic segmentation and depth regression tasks. Further, our explicit uncertainty formulation leads to new loss functions for these tasks, which can be interpreted as learned attenuation. This makes the loss more robust to noisy data, also giving new state-of-the-art results on segmentation and depth regression benchmarks.
Tasks	Semantic Segmentation
Published	2017-03-15
URL	http://arxiv.org/abs/1703.04977v2
PDF	http://arxiv.org/pdf/1703.04977v2.pdf
PWC	https://paperswithcode.com/paper/what-uncertainties-do-we-need-in-bayesian
Repo	https://github.com/pmorerio/dl-uncertainty
Framework	tf

Face Synthesis from Visual Attributes via Sketch using Conditional VAEs and GANs


Title	Face Synthesis from Visual Attributes via Sketch using Conditional VAEs and GANs
Authors	Xing Di, Vishal M. Patel
Abstract	Automatic synthesis of faces from visual attributes is an important problem in computer vision and has wide applications in law enforcement and entertainment. With the advent of deep generative convolutional neural networks (CNNs), attempts have been made to synthesize face images from attributes and text descriptions. In this paper, we take a different approach, where we formulate the original problem as a stage-wise learning problem. We first synthesize the facial sketch corresponding to the visual attributes and then we reconstruct the face image based on the synthesized sketch. The proposed Attribute2Sketch2Face framework, which is based on a combination of deep Conditional Variational Autoencoder (CVAE) and Generative Adversarial Networks (GANs), consists of three stages: (1) Synthesis of facial sketch from attributes using a CVAE architecture, (2) Enhancement of coarse sketches to produce sharper sketches using a GAN-based framework, and (3) Synthesis of face from sketch using another GAN-based network. Extensive experiments and comparison with recent methods are performed to verify the effectiveness of the proposed attribute-based three stage face synthesis method.
Tasks	Face Generation
Published	2017-12-30
URL	http://arxiv.org/abs/1801.00077v1
PDF	http://arxiv.org/pdf/1801.00077v1.pdf
PWC	https://paperswithcode.com/paper/face-synthesis-from-visual-attributes-via
Repo	https://github.com/DetionDX/Attribute2Sketch2Face
Framework	pytorch

RIDI: Robust IMU Double Integration


Title	RIDI: Robust IMU Double Integration
Authors	Hang Yan, Qi Shan, Yasutaka Furukawa
Abstract	This paper proposes a novel data-driven approach for inertial navigation, which learns to estimate trajectories of natural human motions just from an inertial measurement unit (IMU) in every smartphone. The key observation is that human motions are repetitive and consist of a few major modes (e.g., standing, walking, or turning). Our algorithm regresses a velocity vector from the history of linear accelerations and angular velocities, then corrects low-frequency bias in the linear accelerations, which are integrated twice to estimate positions. We have acquired training data with ground-truth motions across multiple human subjects and multiple phone placements (e.g., in a bag or a hand). The qualitatively and quantitatively evaluations have demonstrated that our algorithm has surprisingly shown comparable results to full Visual Inertial navigation. To our knowledge, this paper is the first to integrate sophisticated machine learning techniques with inertial navigation, potentially opening up a new line of research in the domain of data-driven inertial navigation. We will publicly share our code and data to facilitate further research.
Tasks
Published	2017-12-25
URL	http://arxiv.org/abs/1712.09004v2
PDF	http://arxiv.org/pdf/1712.09004v2.pdf
PWC	https://paperswithcode.com/paper/ridi-robust-imu-double-integration
Repo	https://github.com/higerra/ridi_imu
Framework	none

Neural Natural Language Inference Models Enhanced with External Knowledge


Title	Neural Natural Language Inference Models Enhanced with External Knowledge
Authors	Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Diana Inkpen, Si Wei
Abstract	Modeling natural language inference is a very challenging task. With the availability of large annotated data, it has recently become feasible to train complex models such as neural-network-based inference models, which have shown to achieve the state-of-the-art performance. Although there exist relatively large annotated data, can machines learn all knowledge needed to perform natural language inference (NLI) from these data? If not, how can neural-network-based NLI models benefit from external knowledge and how to build NLI models to leverage it? In this paper, we enrich the state-of-the-art neural natural language inference models with external knowledge. We demonstrate that the proposed models improve neural NLI models to achieve the state-of-the-art performance on the SNLI and MultiNLI datasets.
Tasks	Natural Language Inference
Published	2017-11-12
URL	http://arxiv.org/abs/1711.04289v3
PDF	http://arxiv.org/pdf/1711.04289v3.pdf
PWC	https://paperswithcode.com/paper/neural-natural-language-inference-models
Repo	https://github.com/lukecq1231/kim
Framework	none

Hindsight Experience Replay


Title	Hindsight Experience Replay
Authors	Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba
Abstract	Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). We present a novel technique called Hindsight Experience Replay which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering. It can be combined with an arbitrary off-policy RL algorithm and may be seen as a form of implicit curriculum. We demonstrate our approach on the task of manipulating objects with a robotic arm. In particular, we run experiments on three different tasks: pushing, sliding, and pick-and-place, in each case using only binary rewards indicating whether or not the task is completed. Our ablation studies show that Hindsight Experience Replay is a crucial ingredient which makes training possible in these challenging environments. We show that our policies trained on a physics simulation can be deployed on a physical robot and successfully complete the task.
Tasks
Published	2017-07-05
URL	http://arxiv.org/abs/1707.01495v3
PDF	http://arxiv.org/pdf/1707.01495v3.pdf
PWC	https://paperswithcode.com/paper/hindsight-experience-replay
Repo	https://github.com/sjYoondeltar/IQN_example
Framework	tf

Grounded Language Learning in a Simulated 3D World


Title	Grounded Language Learning in a Simulated 3D World
Authors	Karl Moritz Hermann, Felix Hill, Simon Green, Fumin Wang, Ryan Faulkner, Hubert Soyer, David Szepesvari, Wojciech Marian Czarnecki, Max Jaderberg, Denis Teplyashin, Marcus Wainwright, Chris Apps, Demis Hassabis, Phil Blunsom
Abstract	We are increasingly surrounded by artificially intelligent technology that takes decisions and executes actions on our behalf. This creates a pressing need for general means to communicate with, instruct and guide artificial agents, with human language the most compelling means for such communication. To achieve this in a scalable fashion, agents must be able to relate language to the world and to actions; that is, their understanding of language must be grounded and embodied. However, learning grounded language is a notoriously challenging problem in artificial intelligence research. Here we present an agent that learns to interpret language in a simulated 3D environment where it is rewarded for the successful execution of written instructions. Trained via a combination of reinforcement and unsupervised learning, and beginning with minimal prior knowledge, the agent learns to relate linguistic symbols to emergent perceptual representations of its physical surroundings and to pertinent sequences of actions. The agent’s comprehension of language extends beyond its prior experience, enabling it to apply familiar language to unfamiliar situations and to interpret entirely novel instructions. Moreover, the speed with which this agent learns new words increases as its semantic knowledge grows. This facility for generalising and bootstrapping semantic knowledge indicates the potential of the present approach for reconciling ambiguous natural language with the complexity of the physical world.
Tasks
Published	2017-06-20
URL	http://arxiv.org/abs/1706.06551v2
PDF	http://arxiv.org/pdf/1706.06551v2.pdf
PWC	https://paperswithcode.com/paper/grounded-language-learning-in-a-simulated-3d
Repo	https://github.com/SophiaAr/OpenAI-final-project
Framework	tf

HP-GAN: Probabilistic 3D human motion prediction via GAN


Title	HP-GAN: Probabilistic 3D human motion prediction via GAN
Authors	Emad Barsoum, John Kender, Zicheng Liu
Abstract	Predicting and understanding human motion dynamics has many applications, such as motion synthesis, augmented reality, security, and autonomous vehicles. Due to the recent success of generative adversarial networks (GAN), there has been much interest in probabilistic estimation and synthetic data generation using deep neural network architectures and learning algorithms. We propose a novel sequence-to-sequence model for probabilistic human motion prediction, trained with a modified version of improved Wasserstein generative adversarial networks (WGAN-GP), in which we use a custom loss function designed for human motion prediction. Our model, which we call HP-GAN, learns a probability density function of future human poses conditioned on previous poses. It predicts multiple sequences of possible future human poses, each from the same input sequence but a different vector z drawn from a random distribution. Furthermore, to quantify the quality of the non-deterministic predictions, we simultaneously train a motion-quality-assessment model that learns the probability that a given skeleton sequence is a real human motion. We test our algorithm on two of the largest skeleton datasets: NTURGB-D and Human3.6M. We train our model on both single and multiple action types. Its predictive power for long-term motion estimation is demonstrated by generating multiple plausible futures of more than 30 frames from just 10 frames of input. We show that most sequences generated from the same input have more than 50% probabilities of being judged as a real human sequence. We will release all the code used in this paper to Github.
Tasks	Autonomous Vehicles, Motion Estimation, motion prediction, Synthetic Data Generation
Published	2017-11-27
URL	http://arxiv.org/abs/1711.09561v1
PDF	http://arxiv.org/pdf/1711.09561v1.pdf
PWC	https://paperswithcode.com/paper/hp-gan-probabilistic-3d-human-motion
Repo	https://github.com/ebarsoum/hpgan
Framework	tf

Detail-revealing Deep Video Super-resolution


Title	Detail-revealing Deep Video Super-resolution
Authors	Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, Jiaya Jia
Abstract	Previous CNN-based video super-resolution approaches need to align multiple frames to the reference. In this paper, we show that proper frame alignment and motion compensation is crucial for achieving high quality results. We accordingly propose a `sub-pixel motion compensation’ (SPMC) layer in a CNN framework. Analysis and experiments show the suitability of this layer in video SR. The final end-to-end, scalable CNN framework effectively incorporates the SPMC layer and fuses multiple frames to reveal image details. Our implementation can generate visually and quantitatively high-quality results, superior to current state-of-the-arts, without the need of parameter tuning. \|
Tasks	Image Super-Resolution, Motion Compensation, Super-Resolution, Video Super-Resolution
Published	2017-04-10
URL	http://arxiv.org/abs/1704.02738v1
PDF	http://arxiv.org/pdf/1704.02738v1.pdf
PWC	https://paperswithcode.com/paper/detail-revealing-deep-video-super-resolution
Repo	https://github.com/jiangsutx/SPMC_VideoSR
Framework	tf

Integral Curvature Representation and Matching Algorithms for Identification of Dolphins and Whales


Title	Integral Curvature Representation and Matching Algorithms for Identification of Dolphins and Whales
Authors	Hendrik J. Weideman, Zachary M. Jablons, Jason Holmberg, Kiirsten Flynn, John Calambokidis, Reny B. Tyson, Jason B. Allen, Randall S. Wells, Krista Hupman, Kim Urian, Charles V. Stewart
Abstract	We address the problem of identifying individual cetaceans from images showing the trailing edge of their fins. Given the trailing edge from an unknown individual, we produce a ranking of known individuals from a database. The nicks and notches along the trailing edge define an individual’s unique signature. We define a representation based on integral curvature that is robust to changes in viewpoint and pose, and captures the pattern of nicks and notches in a local neighborhood at multiple scales. We explore two ranking methods that use this representation. The first uses a dynamic programming time-warping algorithm to align two representations, and interprets the alignment cost as a measure of similarity. This algorithm also exploits learned spatial weights to downweight matches from regions of unstable curvature. The second interprets the representation as a feature descriptor. Feature keypoints are defined at the local extrema of the representation. Descriptors for the set of known individuals are stored in a tree structure, which allows us to perform queries given the descriptors from an unknown trailing edge. We evaluate the top-k accuracy on two real-world datasets to demonstrate the effectiveness of the curvature representation, achieving top-1 accuracy scores of approximately 95% and 80% for bottlenose dolphins and humpback whales, respectively.
Tasks
Published	2017-08-25
URL	http://arxiv.org/abs/1708.07785v1
PDF	http://arxiv.org/pdf/1708.07785v1.pdf
PWC	https://paperswithcode.com/paper/integral-curvature-representation-and
Repo	https://github.com/omallo/kaggle-whale
Framework	pytorch

Deep Learning Techniques for Music Generation – A Survey


Title	Deep Learning Techniques for Music Generation – A Survey
Authors	Jean-Pierre Briot, Gaëtan Hadjeres, François-David Pachet
Abstract	This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.
Tasks	Music Generation
Published	2017-09-05
URL	https://arxiv.org/abs/1709.01620v4
PDF	https://arxiv.org/pdf/1709.01620v4.pdf
PWC	https://paperswithcode.com/paper/deep-learning-techniques-for-music-generation
Repo	https://github.com/thomount/AI_compose
Framework	none

GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks


Title	GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks
Authors	Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, Andrew Rabinovich
Abstract	Deep multitask networks, in which one neural network produces multiple predictive outputs, can offer better speed and performance than their single-task counterparts but are challenging to train properly. We present a gradient normalization (GradNorm) algorithm that automatically balances training in deep multitask models by dynamically tuning gradient magnitudes. We show that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting across multiple tasks when compared to single-task networks, static baselines, and other adaptive multitask loss balancing techniques. GradNorm also matches or surpasses the performance of exhaustive grid search methods, despite only involving a single asymmetry hyperparameter $\alpha$. Thus, what was once a tedious search process that incurred exponentially more compute for each task added can now be accomplished within a few training runs, irrespective of the number of tasks. Ultimately, we will demonstrate that gradient manipulation affords us great control over the training dynamics of multitask networks and may be one of the keys to unlocking the potential of multitask learning.
Tasks
Published	2017-11-07
URL	http://arxiv.org/abs/1711.02257v4
PDF	http://arxiv.org/pdf/1711.02257v4.pdf
PWC	https://paperswithcode.com/paper/gradnorm-gradient-normalization-for-adaptive
Repo	https://github.com/hav4ik/Hydra
Framework	pytorch

Mean Actor Critic


Title	Mean Actor Critic
Authors	Cameron Allen, Kavosh Asadi, Melrose Roderick, Abdel-rahman Mohamed, George Konidaris, Michael Littman
Abstract	We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent’s explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. We prove that this approach reduces variance in the policy gradient estimate relative to traditional actor-critic methods. We show empirical results on two control domains and on six Atari games, where MAC is competitive with state-of-the-art policy search algorithms.
Tasks	Atari Games
Published	2017-09-01
URL	http://arxiv.org/abs/1709.00503v2
PDF	http://arxiv.org/pdf/1709.00503v2.pdf
PWC	https://paperswithcode.com/paper/mean-actor-critic
Repo	https://github.com/kavosh8/MAC
Framework	tf

FreezeOut: Accelerate Training by Progressively Freezing Layers


Title	FreezeOut: Accelerate Training by Progressively Freezing Layers
Authors	Andrew Brock, Theodore Lim, J. M. Ritchie, Nick Weston
Abstract	The early layers of a deep neural net have the fewest parameters, but take up the most computation. In this extended abstract, we propose to only train the hidden layers for a set portion of the training run, freezing them out one-by-one and excluding them from the backward pass. Through experiments on CIFAR, we empirically demonstrate that FreezeOut yields savings of up to 20% wall-clock time during training with 3% loss in accuracy for DenseNets, a 20% speedup without loss of accuracy for ResNets, and no improvement for VGG networks. Our code is publicly available at https://github.com/ajbrock/FreezeOut
Tasks
Published	2017-06-15
URL	http://arxiv.org/abs/1706.04983v2
PDF	http://arxiv.org/pdf/1706.04983v2.pdf
PWC	https://paperswithcode.com/paper/freezeout-accelerate-training-by
Repo	https://github.com/ajbrock/FreezeOut
Framework	pytorch

Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation


Title	Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation
Authors	Guanghan Ning, Zhi Zhang, Zhihai He
Abstract	Human pose estimation using deep neural networks aims to map input images with large variations into multiple body keypoints which must satisfy a set of geometric constraints and inter-dependency imposed by the human body model. This is a very challenging nonlinear manifold learning process in a very high dimensional feature space. We believe that the deep neural network, which is inherently an algebraic computation system, is not the most effecient way to capture highly sophisticated human knowledge, for example those highly coupled geometric characteristics and interdependence between keypoints in human poses. In this work, we propose to explore how external knowledge can be effectively represented and injected into the deep neural networks to guide its training process using learned projections that impose proper prior. Specifically, we use the stacked hourglass design and inception-resnet module to construct a fractal network to regress human pose images into heatmaps with no explicit graphical modeling. We encode external knowledge with visual features which are able to characterize the constraints of human body models and evaluate the fitness of intermediate network output. We then inject these external features into the neural network using a projection matrix learned using an auxiliary cost function. The effectiveness of the proposed inception-resnet module and the benefit in guided learning with knowledge projection is evaluated on two widely used benchmarks. Our approach achieves state-of-the-art performance on both datasets.
Tasks	Pose Estimation
Published	2017-05-05
URL	http://arxiv.org/abs/1705.02407v2
PDF	http://arxiv.org/pdf/1705.02407v2.pdf
PWC	https://paperswithcode.com/paper/knowledge-guided-deep-fractal-neural-networks
Repo	https://github.com/Guanghan/GNet-pose
Framework	none