July 29, 2019

3259 words 16 mins read

Paper Group AWR 110

Paper Group AWR 110

Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?. Face Synthesis from Visual Attributes via Sketch using Conditional VAEs and GANs. RIDI: Robust IMU Double Integration. Neural Natural Language Inference Models Enhanced with External Knowled …

Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning

Title Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning
Authors Benjamin Eysenbach, Shixiang Gu, Julian Ibarz, Sergey Levine
Abstract Deep reinforcement learning algorithms can learn complex behavioral skills, but real-world application of these methods requires a large amount of experience to be collected by the agent. In practical settings, such as robotics, this involves repeatedly attempting a task, resetting the environment between each attempt. However, not all tasks are easily or automatically reversible. In practice, this learning process requires extensive human intervention. In this work, we propose an autonomous method for safe and efficient reinforcement learning that simultaneously learns a forward and reset policy, with the reset policy resetting the environment for a subsequent attempt. By learning a value function for the reset policy, we can automatically determine when the forward policy is about to enter a non-reversible state, providing for uncertainty-aware safety aborts. Our experiments illustrate that proper use of the reset policy can greatly reduce the number of manual resets required to learn a task, can reduce the number of unsafe actions that lead to non-reversible states, and can automatically induce a curriculum.
Tasks
Published 2017-11-18
URL http://arxiv.org/abs/1711.06782v1
PDF http://arxiv.org/pdf/1711.06782v1.pdf
PWC https://paperswithcode.com/paper/leave-no-trace-learning-to-reset-for-safe-and
Repo https://github.com/brain-research/LeaveNoTrace
Framework none

What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?

Title What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?
Authors Alex Kendall, Yarin Gal
Abstract There are two major types of uncertainty one can model. Aleatoric uncertainty captures noise inherent in the observations. On the other hand, epistemic uncertainty accounts for uncertainty in the model – uncertainty which can be explained away given enough data. Traditionally it has been difficult to model epistemic uncertainty in computer vision, but with new Bayesian deep learning tools this is now possible. We study the benefits of modeling epistemic vs. aleatoric uncertainty in Bayesian deep learning models for vision tasks. For this we present a Bayesian deep learning framework combining input-dependent aleatoric uncertainty together with epistemic uncertainty. We study models under the framework with per-pixel semantic segmentation and depth regression tasks. Further, our explicit uncertainty formulation leads to new loss functions for these tasks, which can be interpreted as learned attenuation. This makes the loss more robust to noisy data, also giving new state-of-the-art results on segmentation and depth regression benchmarks.
Tasks Semantic Segmentation
Published 2017-03-15
URL http://arxiv.org/abs/1703.04977v2
PDF http://arxiv.org/pdf/1703.04977v2.pdf
PWC https://paperswithcode.com/paper/what-uncertainties-do-we-need-in-bayesian
Repo https://github.com/pmorerio/dl-uncertainty
Framework tf

Face Synthesis from Visual Attributes via Sketch using Conditional VAEs and GANs

Title Face Synthesis from Visual Attributes via Sketch using Conditional VAEs and GANs
Authors Xing Di, Vishal M. Patel
Abstract Automatic synthesis of faces from visual attributes is an important problem in computer vision and has wide applications in law enforcement and entertainment. With the advent of deep generative convolutional neural networks (CNNs), attempts have been made to synthesize face images from attributes and text descriptions. In this paper, we take a different approach, where we formulate the original problem as a stage-wise learning problem. We first synthesize the facial sketch corresponding to the visual attributes and then we reconstruct the face image based on the synthesized sketch. The proposed Attribute2Sketch2Face framework, which is based on a combination of deep Conditional Variational Autoencoder (CVAE) and Generative Adversarial Networks (GANs), consists of three stages: (1) Synthesis of facial sketch from attributes using a CVAE architecture, (2) Enhancement of coarse sketches to produce sharper sketches using a GAN-based framework, and (3) Synthesis of face from sketch using another GAN-based network. Extensive experiments and comparison with recent methods are performed to verify the effectiveness of the proposed attribute-based three stage face synthesis method.
Tasks Face Generation
Published 2017-12-30
URL http://arxiv.org/abs/1801.00077v1
PDF http://arxiv.org/pdf/1801.00077v1.pdf
PWC https://paperswithcode.com/paper/face-synthesis-from-visual-attributes-via
Repo https://github.com/DetionDX/Attribute2Sketch2Face
Framework pytorch

RIDI: Robust IMU Double Integration

Title RIDI: Robust IMU Double Integration
Authors Hang Yan, Qi Shan, Yasutaka Furukawa
Abstract This paper proposes a novel data-driven approach for inertial navigation, which learns to estimate trajectories of natural human motions just from an inertial measurement unit (IMU) in every smartphone. The key observation is that human motions are repetitive and consist of a few major modes (e.g., standing, walking, or turning). Our algorithm regresses a velocity vector from the history of linear accelerations and angular velocities, then corrects low-frequency bias in the linear accelerations, which are integrated twice to estimate positions. We have acquired training data with ground-truth motions across multiple human subjects and multiple phone placements (e.g., in a bag or a hand). The qualitatively and quantitatively evaluations have demonstrated that our algorithm has surprisingly shown comparable results to full Visual Inertial navigation. To our knowledge, this paper is the first to integrate sophisticated machine learning techniques with inertial navigation, potentially opening up a new line of research in the domain of data-driven inertial navigation. We will publicly share our code and data to facilitate further research.
Tasks
Published 2017-12-25
URL http://arxiv.org/abs/1712.09004v2
PDF http://arxiv.org/pdf/1712.09004v2.pdf
PWC https://paperswithcode.com/paper/ridi-robust-imu-double-integration
Repo https://github.com/higerra/ridi_imu
Framework none

Neural Natural Language Inference Models Enhanced with External Knowledge

Title Neural Natural Language Inference Models Enhanced with External Knowledge
Authors Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Diana Inkpen, Si Wei
Abstract Modeling natural language inference is a very challenging task. With the availability of large annotated data, it has recently become feasible to train complex models such as neural-network-based inference models, which have shown to achieve the state-of-the-art performance. Although there exist relatively large annotated data, can machines learn all knowledge needed to perform natural language inference (NLI) from these data? If not, how can neural-network-based NLI models benefit from external knowledge and how to build NLI models to leverage it? In this paper, we enrich the state-of-the-art neural natural language inference models with external knowledge. We demonstrate that the proposed models improve neural NLI models to achieve the state-of-the-art performance on the SNLI and MultiNLI datasets.
Tasks Natural Language Inference
Published 2017-11-12
URL http://arxiv.org/abs/1711.04289v3
PDF http://arxiv.org/pdf/1711.04289v3.pdf
PWC https://paperswithcode.com/paper/neural-natural-language-inference-models
Repo https://github.com/lukecq1231/kim
Framework none

Hindsight Experience Replay

Title Hindsight Experience Replay
Authors Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba
Abstract Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). We present a novel technique called Hindsight Experience Replay which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering. It can be combined with an arbitrary off-policy RL algorithm and may be seen as a form of implicit curriculum. We demonstrate our approach on the task of manipulating objects with a robotic arm. In particular, we run experiments on three different tasks: pushing, sliding, and pick-and-place, in each case using only binary rewards indicating whether or not the task is completed. Our ablation studies show that Hindsight Experience Replay is a crucial ingredient which makes training possible in these challenging environments. We show that our policies trained on a physics simulation can be deployed on a physical robot and successfully complete the task.
Tasks
Published 2017-07-05
URL http://arxiv.org/abs/1707.01495v3
PDF http://arxiv.org/pdf/1707.01495v3.pdf
PWC https://paperswithcode.com/paper/hindsight-experience-replay
Repo https://github.com/sjYoondeltar/IQN_example
Framework tf

Grounded Language Learning in a Simulated 3D World

Title Grounded Language Learning in a Simulated 3D World
Authors Karl Moritz Hermann, Felix Hill, Simon Green, Fumin Wang, Ryan Faulkner, Hubert Soyer, David Szepesvari, Wojciech Marian Czarnecki, Max Jaderberg, Denis Teplyashin, Marcus Wainwright, Chris Apps, Demis Hassabis, Phil Blunsom
Abstract We are increasingly surrounded by artificially intelligent technology that takes decisions and executes actions on our behalf. This creates a pressing need for general means to communicate with, instruct and guide artificial agents, with human language the most compelling means for such communication. To achieve this in a scalable fashion, agents must be able to relate language to the world and to actions; that is, their understanding of language must be grounded and embodied. However, learning grounded language is a notoriously challenging problem in artificial intelligence research. Here we present an agent that learns to interpret language in a simulated 3D environment where it is rewarded for the successful execution of written instructions. Trained via a combination of reinforcement and unsupervised learning, and beginning with minimal prior knowledge, the agent learns to relate linguistic symbols to emergent perceptual representations of its physical surroundings and to pertinent sequences of actions. The agent’s comprehension of language extends beyond its prior experience, enabling it to apply familiar language to unfamiliar situations and to interpret entirely novel instructions. Moreover, the speed with which this agent learns new words increases as its semantic knowledge grows. This facility for generalising and bootstrapping semantic knowledge indicates the potential of the present approach for reconciling ambiguous natural language with the complexity of the physical world.
Tasks
Published 2017-06-20
URL http://arxiv.org/abs/1706.06551v2
PDF http://arxiv.org/pdf/1706.06551v2.pdf
PWC https://paperswithcode.com/paper/grounded-language-learning-in-a-simulated-3d
Repo https://github.com/SophiaAr/OpenAI-final-project
Framework tf

HP-GAN: Probabilistic 3D human motion prediction via GAN

Title HP-GAN: Probabilistic 3D human motion prediction via GAN
Authors Emad Barsoum, John Kender, Zicheng Liu
Abstract Predicting and understanding human motion dynamics has many applications, such as motion synthesis, augmented reality, security, and autonomous vehicles. Due to the recent success of generative adversarial networks (GAN), there has been much interest in probabilistic estimation and synthetic data generation using deep neural network architectures and learning algorithms. We propose a novel sequence-to-sequence model for probabilistic human motion prediction, trained with a modified version of improved Wasserstein generative adversarial networks (WGAN-GP), in which we use a custom loss function designed for human motion prediction. Our model, which we call HP-GAN, learns a probability density function of future human poses conditioned on previous poses. It predicts multiple sequences of possible future human poses, each from the same input sequence but a different vector z drawn from a random distribution. Furthermore, to quantify the quality of the non-deterministic predictions, we simultaneously train a motion-quality-assessment model that learns the probability that a given skeleton sequence is a real human motion. We test our algorithm on two of the largest skeleton datasets: NTURGB-D and Human3.6M. We train our model on both single and multiple action types. Its predictive power for long-term motion estimation is demonstrated by generating multiple plausible futures of more than 30 frames from just 10 frames of input. We show that most sequences generated from the same input have more than 50% probabilities of being judged as a real human sequence. We will release all the code used in this paper to Github.
Tasks Autonomous Vehicles, Motion Estimation, motion prediction, Synthetic Data Generation
Published 2017-11-27
URL http://arxiv.org/abs/1711.09561v1
PDF http://arxiv.org/pdf/1711.09561v1.pdf
PWC https://paperswithcode.com/paper/hp-gan-probabilistic-3d-human-motion
Repo https://github.com/ebarsoum/hpgan
Framework tf

Detail-revealing Deep Video Super-resolution

Title Detail-revealing Deep Video Super-resolution
Authors Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, Jiaya Jia
Abstract Previous CNN-based video super-resolution approaches need to align multiple frames to the reference. In this paper, we show that proper frame alignment and motion compensation is crucial for achieving high quality results. We accordingly propose a `sub-pixel motion compensation’ (SPMC) layer in a CNN framework. Analysis and experiments show the suitability of this layer in video SR. The final end-to-end, scalable CNN framework effectively incorporates the SPMC layer and fuses multiple frames to reveal image details. Our implementation can generate visually and quantitatively high-quality results, superior to current state-of-the-arts, without the need of parameter tuning. |
Tasks Image Super-Resolution, Motion Compensation, Super-Resolution, Video Super-Resolution
Published 2017-04-10
URL http://arxiv.org/abs/1704.02738v1
PDF http://arxiv.org/pdf/1704.02738v1.pdf
PWC https://paperswithcode.com/paper/detail-revealing-deep-video-super-resolution
Repo https://github.com/jiangsutx/SPMC_VideoSR
Framework tf

Integral Curvature Representation and Matching Algorithms for Identification of Dolphins and Whales

Title Integral Curvature Representation and Matching Algorithms for Identification of Dolphins and Whales
Authors Hendrik J. Weideman, Zachary M. Jablons, Jason Holmberg, Kiirsten Flynn, John Calambokidis, Reny B. Tyson, Jason B. Allen, Randall S. Wells, Krista Hupman, Kim Urian, Charles V. Stewart
Abstract We address the problem of identifying individual cetaceans from images showing the trailing edge of their fins. Given the trailing edge from an unknown individual, we produce a ranking of known individuals from a database. The nicks and notches along the trailing edge define an individual’s unique signature. We define a representation based on integral curvature that is robust to changes in viewpoint and pose, and captures the pattern of nicks and notches in a local neighborhood at multiple scales. We explore two ranking methods that use this representation. The first uses a dynamic programming time-warping algorithm to align two representations, and interprets the alignment cost as a measure of similarity. This algorithm also exploits learned spatial weights to downweight matches from regions of unstable curvature. The second interprets the representation as a feature descriptor. Feature keypoints are defined at the local extrema of the representation. Descriptors for the set of known individuals are stored in a tree structure, which allows us to perform queries given the descriptors from an unknown trailing edge. We evaluate the top-k accuracy on two real-world datasets to demonstrate the effectiveness of the curvature representation, achieving top-1 accuracy scores of approximately 95% and 80% for bottlenose dolphins and humpback whales, respectively.
Tasks
Published 2017-08-25
URL http://arxiv.org/abs/1708.07785v1
PDF http://arxiv.org/pdf/1708.07785v1.pdf
PWC https://paperswithcode.com/paper/integral-curvature-representation-and
Repo https://github.com/omallo/kaggle-whale
Framework pytorch

Deep Learning Techniques for Music Generation – A Survey

Title Deep Learning Techniques for Music Generation – A Survey
Authors Jean-Pierre Briot, Gaëtan Hadjeres, François-David Pachet
Abstract This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.
Tasks Music Generation
Published 2017-09-05
URL https://arxiv.org/abs/1709.01620v4
PDF https://arxiv.org/pdf/1709.01620v4.pdf
PWC https://paperswithcode.com/paper/deep-learning-techniques-for-music-generation
Repo https://github.com/thomount/AI_compose
Framework none

GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks

Title GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks
Authors Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, Andrew Rabinovich
Abstract Deep multitask networks, in which one neural network produces multiple predictive outputs, can offer better speed and performance than their single-task counterparts but are challenging to train properly. We present a gradient normalization (GradNorm) algorithm that automatically balances training in deep multitask models by dynamically tuning gradient magnitudes. We show that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting across multiple tasks when compared to single-task networks, static baselines, and other adaptive multitask loss balancing techniques. GradNorm also matches or surpasses the performance of exhaustive grid search methods, despite only involving a single asymmetry hyperparameter $\alpha$. Thus, what was once a tedious search process that incurred exponentially more compute for each task added can now be accomplished within a few training runs, irrespective of the number of tasks. Ultimately, we will demonstrate that gradient manipulation affords us great control over the training dynamics of multitask networks and may be one of the keys to unlocking the potential of multitask learning.
Tasks
Published 2017-11-07
URL http://arxiv.org/abs/1711.02257v4
PDF http://arxiv.org/pdf/1711.02257v4.pdf
PWC https://paperswithcode.com/paper/gradnorm-gradient-normalization-for-adaptive
Repo https://github.com/hav4ik/Hydra
Framework pytorch

Mean Actor Critic

Title Mean Actor Critic
Authors Cameron Allen, Kavosh Asadi, Melrose Roderick, Abdel-rahman Mohamed, George Konidaris, Michael Littman
Abstract We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent’s explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. We prove that this approach reduces variance in the policy gradient estimate relative to traditional actor-critic methods. We show empirical results on two control domains and on six Atari games, where MAC is competitive with state-of-the-art policy search algorithms.
Tasks Atari Games
Published 2017-09-01
URL http://arxiv.org/abs/1709.00503v2
PDF http://arxiv.org/pdf/1709.00503v2.pdf
PWC https://paperswithcode.com/paper/mean-actor-critic
Repo https://github.com/kavosh8/MAC
Framework tf

FreezeOut: Accelerate Training by Progressively Freezing Layers

Title FreezeOut: Accelerate Training by Progressively Freezing Layers
Authors Andrew Brock, Theodore Lim, J. M. Ritchie, Nick Weston
Abstract The early layers of a deep neural net have the fewest parameters, but take up the most computation. In this extended abstract, we propose to only train the hidden layers for a set portion of the training run, freezing them out one-by-one and excluding them from the backward pass. Through experiments on CIFAR, we empirically demonstrate that FreezeOut yields savings of up to 20% wall-clock time during training with 3% loss in accuracy for DenseNets, a 20% speedup without loss of accuracy for ResNets, and no improvement for VGG networks. Our code is publicly available at https://github.com/ajbrock/FreezeOut
Tasks
Published 2017-06-15
URL http://arxiv.org/abs/1706.04983v2
PDF http://arxiv.org/pdf/1706.04983v2.pdf
PWC https://paperswithcode.com/paper/freezeout-accelerate-training-by
Repo https://github.com/ajbrock/FreezeOut
Framework pytorch

Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation

Title Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation
Authors Guanghan Ning, Zhi Zhang, Zhihai He
Abstract Human pose estimation using deep neural networks aims to map input images with large variations into multiple body keypoints which must satisfy a set of geometric constraints and inter-dependency imposed by the human body model. This is a very challenging nonlinear manifold learning process in a very high dimensional feature space. We believe that the deep neural network, which is inherently an algebraic computation system, is not the most effecient way to capture highly sophisticated human knowledge, for example those highly coupled geometric characteristics and interdependence between keypoints in human poses. In this work, we propose to explore how external knowledge can be effectively represented and injected into the deep neural networks to guide its training process using learned projections that impose proper prior. Specifically, we use the stacked hourglass design and inception-resnet module to construct a fractal network to regress human pose images into heatmaps with no explicit graphical modeling. We encode external knowledge with visual features which are able to characterize the constraints of human body models and evaluate the fitness of intermediate network output. We then inject these external features into the neural network using a projection matrix learned using an auxiliary cost function. The effectiveness of the proposed inception-resnet module and the benefit in guided learning with knowledge projection is evaluated on two widely used benchmarks. Our approach achieves state-of-the-art performance on both datasets.
Tasks Pose Estimation
Published 2017-05-05
URL http://arxiv.org/abs/1705.02407v2
PDF http://arxiv.org/pdf/1705.02407v2.pdf
PWC https://paperswithcode.com/paper/knowledge-guided-deep-fractal-neural-networks
Repo https://github.com/Guanghan/GNet-pose
Framework none
comments powered by Disqus