Paper Group AWR 110
Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?. Face Synthesis from Visual Attributes via Sketch using Conditional VAEs and GANs. RIDI: Robust IMU Double Integration. Neural Natural Language Inference Models Enhanced with External Knowled …
Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning
Title | Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning |
Authors | Benjamin Eysenbach, Shixiang Gu, Julian Ibarz, Sergey Levine |
Abstract | Deep reinforcement learning algorithms can learn complex behavioral skills, but real-world application of these methods requires a large amount of experience to be collected by the agent. In practical settings, such as robotics, this involves repeatedly attempting a task, resetting the environment between each attempt. However, not all tasks are easily or automatically reversible. In practice, this learning process requires extensive human intervention. In this work, we propose an autonomous method for safe and efficient reinforcement learning that simultaneously learns a forward and reset policy, with the reset policy resetting the environment for a subsequent attempt. By learning a value function for the reset policy, we can automatically determine when the forward policy is about to enter a non-reversible state, providing for uncertainty-aware safety aborts. Our experiments illustrate that proper use of the reset policy can greatly reduce the number of manual resets required to learn a task, can reduce the number of unsafe actions that lead to non-reversible states, and can automatically induce a curriculum. |
Tasks | |
Published | 2017-11-18 |
URL | http://arxiv.org/abs/1711.06782v1 |
http://arxiv.org/pdf/1711.06782v1.pdf | |
PWC | https://paperswithcode.com/paper/leave-no-trace-learning-to-reset-for-safe-and |
Repo | https://github.com/brain-research/LeaveNoTrace |
Framework | none |
What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?
Title | What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? |
Authors | Alex Kendall, Yarin Gal |
Abstract | There are two major types of uncertainty one can model. Aleatoric uncertainty captures noise inherent in the observations. On the other hand, epistemic uncertainty accounts for uncertainty in the model – uncertainty which can be explained away given enough data. Traditionally it has been difficult to model epistemic uncertainty in computer vision, but with new Bayesian deep learning tools this is now possible. We study the benefits of modeling epistemic vs. aleatoric uncertainty in Bayesian deep learning models for vision tasks. For this we present a Bayesian deep learning framework combining input-dependent aleatoric uncertainty together with epistemic uncertainty. We study models under the framework with per-pixel semantic segmentation and depth regression tasks. Further, our explicit uncertainty formulation leads to new loss functions for these tasks, which can be interpreted as learned attenuation. This makes the loss more robust to noisy data, also giving new state-of-the-art results on segmentation and depth regression benchmarks. |
Tasks | Semantic Segmentation |
Published | 2017-03-15 |
URL | http://arxiv.org/abs/1703.04977v2 |
http://arxiv.org/pdf/1703.04977v2.pdf | |
PWC | https://paperswithcode.com/paper/what-uncertainties-do-we-need-in-bayesian |
Repo | https://github.com/pmorerio/dl-uncertainty |
Framework | tf |
Face Synthesis from Visual Attributes via Sketch using Conditional VAEs and GANs
Title | Face Synthesis from Visual Attributes via Sketch using Conditional VAEs and GANs |
Authors | Xing Di, Vishal M. Patel |
Abstract | Automatic synthesis of faces from visual attributes is an important problem in computer vision and has wide applications in law enforcement and entertainment. With the advent of deep generative convolutional neural networks (CNNs), attempts have been made to synthesize face images from attributes and text descriptions. In this paper, we take a different approach, where we formulate the original problem as a stage-wise learning problem. We first synthesize the facial sketch corresponding to the visual attributes and then we reconstruct the face image based on the synthesized sketch. The proposed Attribute2Sketch2Face framework, which is based on a combination of deep Conditional Variational Autoencoder (CVAE) and Generative Adversarial Networks (GANs), consists of three stages: (1) Synthesis of facial sketch from attributes using a CVAE architecture, (2) Enhancement of coarse sketches to produce sharper sketches using a GAN-based framework, and (3) Synthesis of face from sketch using another GAN-based network. Extensive experiments and comparison with recent methods are performed to verify the effectiveness of the proposed attribute-based three stage face synthesis method. |
Tasks | Face Generation |
Published | 2017-12-30 |
URL | http://arxiv.org/abs/1801.00077v1 |
http://arxiv.org/pdf/1801.00077v1.pdf | |
PWC | https://paperswithcode.com/paper/face-synthesis-from-visual-attributes-via |
Repo | https://github.com/DetionDX/Attribute2Sketch2Face |
Framework | pytorch |
RIDI: Robust IMU Double Integration
Title | RIDI: Robust IMU Double Integration |
Authors | Hang Yan, Qi Shan, Yasutaka Furukawa |
Abstract | This paper proposes a novel data-driven approach for inertial navigation, which learns to estimate trajectories of natural human motions just from an inertial measurement unit (IMU) in every smartphone. The key observation is that human motions are repetitive and consist of a few major modes (e.g., standing, walking, or turning). Our algorithm regresses a velocity vector from the history of linear accelerations and angular velocities, then corrects low-frequency bias in the linear accelerations, which are integrated twice to estimate positions. We have acquired training data with ground-truth motions across multiple human subjects and multiple phone placements (e.g., in a bag or a hand). The qualitatively and quantitatively evaluations have demonstrated that our algorithm has surprisingly shown comparable results to full Visual Inertial navigation. To our knowledge, this paper is the first to integrate sophisticated machine learning techniques with inertial navigation, potentially opening up a new line of research in the domain of data-driven inertial navigation. We will publicly share our code and data to facilitate further research. |
Tasks | |
Published | 2017-12-25 |
URL | http://arxiv.org/abs/1712.09004v2 |
http://arxiv.org/pdf/1712.09004v2.pdf | |
PWC | https://paperswithcode.com/paper/ridi-robust-imu-double-integration |
Repo | https://github.com/higerra/ridi_imu |
Framework | none |
Neural Natural Language Inference Models Enhanced with External Knowledge
Title | Neural Natural Language Inference Models Enhanced with External Knowledge |
Authors | Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Diana Inkpen, Si Wei |
Abstract | Modeling natural language inference is a very challenging task. With the availability of large annotated data, it has recently become feasible to train complex models such as neural-network-based inference models, which have shown to achieve the state-of-the-art performance. Although there exist relatively large annotated data, can machines learn all knowledge needed to perform natural language inference (NLI) from these data? If not, how can neural-network-based NLI models benefit from external knowledge and how to build NLI models to leverage it? In this paper, we enrich the state-of-the-art neural natural language inference models with external knowledge. We demonstrate that the proposed models improve neural NLI models to achieve the state-of-the-art performance on the SNLI and MultiNLI datasets. |
Tasks | Natural Language Inference |
Published | 2017-11-12 |
URL | http://arxiv.org/abs/1711.04289v3 |
http://arxiv.org/pdf/1711.04289v3.pdf | |
PWC | https://paperswithcode.com/paper/neural-natural-language-inference-models |
Repo | https://github.com/lukecq1231/kim |
Framework | none |
Hindsight Experience Replay
Title | Hindsight Experience Replay |
Authors | Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba |
Abstract | Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). We present a novel technique called Hindsight Experience Replay which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering. It can be combined with an arbitrary off-policy RL algorithm and may be seen as a form of implicit curriculum. We demonstrate our approach on the task of manipulating objects with a robotic arm. In particular, we run experiments on three different tasks: pushing, sliding, and pick-and-place, in each case using only binary rewards indicating whether or not the task is completed. Our ablation studies show that Hindsight Experience Replay is a crucial ingredient which makes training possible in these challenging environments. We show that our policies trained on a physics simulation can be deployed on a physical robot and successfully complete the task. |
Tasks | |
Published | 2017-07-05 |
URL | http://arxiv.org/abs/1707.01495v3 |
http://arxiv.org/pdf/1707.01495v3.pdf | |
PWC | https://paperswithcode.com/paper/hindsight-experience-replay |
Repo | https://github.com/sjYoondeltar/IQN_example |
Framework | tf |
Grounded Language Learning in a Simulated 3D World
Title | Grounded Language Learning in a Simulated 3D World |
Authors | Karl Moritz Hermann, Felix Hill, Simon Green, Fumin Wang, Ryan Faulkner, Hubert Soyer, David Szepesvari, Wojciech Marian Czarnecki, Max Jaderberg, Denis Teplyashin, Marcus Wainwright, Chris Apps, Demis Hassabis, Phil Blunsom |
Abstract | We are increasingly surrounded by artificially intelligent technology that takes decisions and executes actions on our behalf. This creates a pressing need for general means to communicate with, instruct and guide artificial agents, with human language the most compelling means for such communication. To achieve this in a scalable fashion, agents must be able to relate language to the world and to actions; that is, their understanding of language must be grounded and embodied. However, learning grounded language is a notoriously challenging problem in artificial intelligence research. Here we present an agent that learns to interpret language in a simulated 3D environment where it is rewarded for the successful execution of written instructions. Trained via a combination of reinforcement and unsupervised learning, and beginning with minimal prior knowledge, the agent learns to relate linguistic symbols to emergent perceptual representations of its physical surroundings and to pertinent sequences of actions. The agent’s comprehension of language extends beyond its prior experience, enabling it to apply familiar language to unfamiliar situations and to interpret entirely novel instructions. Moreover, the speed with which this agent learns new words increases as its semantic knowledge grows. This facility for generalising and bootstrapping semantic knowledge indicates the potential of the present approach for reconciling ambiguous natural language with the complexity of the physical world. |
Tasks | |
Published | 2017-06-20 |
URL | http://arxiv.org/abs/1706.06551v2 |
http://arxiv.org/pdf/1706.06551v2.pdf | |
PWC | https://paperswithcode.com/paper/grounded-language-learning-in-a-simulated-3d |
Repo | https://github.com/SophiaAr/OpenAI-final-project |
Framework | tf |
HP-GAN: Probabilistic 3D human motion prediction via GAN
Title | HP-GAN: Probabilistic 3D human motion prediction via GAN |
Authors | Emad Barsoum, John Kender, Zicheng Liu |
Abstract | Predicting and understanding human motion dynamics has many applications, such as motion synthesis, augmented reality, security, and autonomous vehicles. Due to the recent success of generative adversarial networks (GAN), there has been much interest in probabilistic estimation and synthetic data generation using deep neural network architectures and learning algorithms. We propose a novel sequence-to-sequence model for probabilistic human motion prediction, trained with a modified version of improved Wasserstein generative adversarial networks (WGAN-GP), in which we use a custom loss function designed for human motion prediction. Our model, which we call HP-GAN, learns a probability density function of future human poses conditioned on previous poses. It predicts multiple sequences of possible future human poses, each from the same input sequence but a different vector z drawn from a random distribution. Furthermore, to quantify the quality of the non-deterministic predictions, we simultaneously train a motion-quality-assessment model that learns the probability that a given skeleton sequence is a real human motion. We test our algorithm on two of the largest skeleton datasets: NTURGB-D and Human3.6M. We train our model on both single and multiple action types. Its predictive power for long-term motion estimation is demonstrated by generating multiple plausible futures of more than 30 frames from just 10 frames of input. We show that most sequences generated from the same input have more than 50% probabilities of being judged as a real human sequence. We will release all the code used in this paper to Github. |
Tasks | Autonomous Vehicles, Motion Estimation, motion prediction, Synthetic Data Generation |
Published | 2017-11-27 |
URL | http://arxiv.org/abs/1711.09561v1 |
http://arxiv.org/pdf/1711.09561v1.pdf | |
PWC | https://paperswithcode.com/paper/hp-gan-probabilistic-3d-human-motion |
Repo | https://github.com/ebarsoum/hpgan |
Framework | tf |
Detail-revealing Deep Video Super-resolution
Title | Detail-revealing Deep Video Super-resolution |
Authors | Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, Jiaya Jia |
Abstract | Previous CNN-based video super-resolution approaches need to align multiple frames to the reference. In this paper, we show that proper frame alignment and motion compensation is crucial for achieving high quality results. We accordingly propose a `sub-pixel motion compensation’ (SPMC) layer in a CNN framework. Analysis and experiments show the suitability of this layer in video SR. The final end-to-end, scalable CNN framework effectively incorporates the SPMC layer and fuses multiple frames to reveal image details. Our implementation can generate visually and quantitatively high-quality results, superior to current state-of-the-arts, without the need of parameter tuning. | |
Tasks | Image Super-Resolution, Motion Compensation, Super-Resolution, Video Super-Resolution |
Published | 2017-04-10 |
URL | http://arxiv.org/abs/1704.02738v1 |
http://arxiv.org/pdf/1704.02738v1.pdf | |
PWC | https://paperswithcode.com/paper/detail-revealing-deep-video-super-resolution |
Repo | https://github.com/jiangsutx/SPMC_VideoSR |
Framework | tf |
Integral Curvature Representation and Matching Algorithms for Identification of Dolphins and Whales
Title | Integral Curvature Representation and Matching Algorithms for Identification of Dolphins and Whales |
Authors | Hendrik J. Weideman, Zachary M. Jablons, Jason Holmberg, Kiirsten Flynn, John Calambokidis, Reny B. Tyson, Jason B. Allen, Randall S. Wells, Krista Hupman, Kim Urian, Charles V. Stewart |
Abstract | We address the problem of identifying individual cetaceans from images showing the trailing edge of their fins. Given the trailing edge from an unknown individual, we produce a ranking of known individuals from a database. The nicks and notches along the trailing edge define an individual’s unique signature. We define a representation based on integral curvature that is robust to changes in viewpoint and pose, and captures the pattern of nicks and notches in a local neighborhood at multiple scales. We explore two ranking methods that use this representation. The first uses a dynamic programming time-warping algorithm to align two representations, and interprets the alignment cost as a measure of similarity. This algorithm also exploits learned spatial weights to downweight matches from regions of unstable curvature. The second interprets the representation as a feature descriptor. Feature keypoints are defined at the local extrema of the representation. Descriptors for the set of known individuals are stored in a tree structure, which allows us to perform queries given the descriptors from an unknown trailing edge. We evaluate the top-k accuracy on two real-world datasets to demonstrate the effectiveness of the curvature representation, achieving top-1 accuracy scores of approximately 95% and 80% for bottlenose dolphins and humpback whales, respectively. |
Tasks | |
Published | 2017-08-25 |
URL | http://arxiv.org/abs/1708.07785v1 |
http://arxiv.org/pdf/1708.07785v1.pdf | |
PWC | https://paperswithcode.com/paper/integral-curvature-representation-and |
Repo | https://github.com/omallo/kaggle-whale |
Framework | pytorch |
Deep Learning Techniques for Music Generation – A Survey
Title | Deep Learning Techniques for Music Generation – A Survey |
Authors | Jean-Pierre Briot, Gaëtan Hadjeres, François-David Pachet |
Abstract | This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects. |
Tasks | Music Generation |
Published | 2017-09-05 |
URL | https://arxiv.org/abs/1709.01620v4 |
https://arxiv.org/pdf/1709.01620v4.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-techniques-for-music-generation |
Repo | https://github.com/thomount/AI_compose |
Framework | none |
GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks
Title | GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks |
Authors | Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, Andrew Rabinovich |
Abstract | Deep multitask networks, in which one neural network produces multiple predictive outputs, can offer better speed and performance than their single-task counterparts but are challenging to train properly. We present a gradient normalization (GradNorm) algorithm that automatically balances training in deep multitask models by dynamically tuning gradient magnitudes. We show that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting across multiple tasks when compared to single-task networks, static baselines, and other adaptive multitask loss balancing techniques. GradNorm also matches or surpasses the performance of exhaustive grid search methods, despite only involving a single asymmetry hyperparameter $\alpha$. Thus, what was once a tedious search process that incurred exponentially more compute for each task added can now be accomplished within a few training runs, irrespective of the number of tasks. Ultimately, we will demonstrate that gradient manipulation affords us great control over the training dynamics of multitask networks and may be one of the keys to unlocking the potential of multitask learning. |
Tasks | |
Published | 2017-11-07 |
URL | http://arxiv.org/abs/1711.02257v4 |
http://arxiv.org/pdf/1711.02257v4.pdf | |
PWC | https://paperswithcode.com/paper/gradnorm-gradient-normalization-for-adaptive |
Repo | https://github.com/hav4ik/Hydra |
Framework | pytorch |
Mean Actor Critic
Title | Mean Actor Critic |
Authors | Cameron Allen, Kavosh Asadi, Melrose Roderick, Abdel-rahman Mohamed, George Konidaris, Michael Littman |
Abstract | We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent’s explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. We prove that this approach reduces variance in the policy gradient estimate relative to traditional actor-critic methods. We show empirical results on two control domains and on six Atari games, where MAC is competitive with state-of-the-art policy search algorithms. |
Tasks | Atari Games |
Published | 2017-09-01 |
URL | http://arxiv.org/abs/1709.00503v2 |
http://arxiv.org/pdf/1709.00503v2.pdf | |
PWC | https://paperswithcode.com/paper/mean-actor-critic |
Repo | https://github.com/kavosh8/MAC |
Framework | tf |
FreezeOut: Accelerate Training by Progressively Freezing Layers
Title | FreezeOut: Accelerate Training by Progressively Freezing Layers |
Authors | Andrew Brock, Theodore Lim, J. M. Ritchie, Nick Weston |
Abstract | The early layers of a deep neural net have the fewest parameters, but take up the most computation. In this extended abstract, we propose to only train the hidden layers for a set portion of the training run, freezing them out one-by-one and excluding them from the backward pass. Through experiments on CIFAR, we empirically demonstrate that FreezeOut yields savings of up to 20% wall-clock time during training with 3% loss in accuracy for DenseNets, a 20% speedup without loss of accuracy for ResNets, and no improvement for VGG networks. Our code is publicly available at https://github.com/ajbrock/FreezeOut |
Tasks | |
Published | 2017-06-15 |
URL | http://arxiv.org/abs/1706.04983v2 |
http://arxiv.org/pdf/1706.04983v2.pdf | |
PWC | https://paperswithcode.com/paper/freezeout-accelerate-training-by |
Repo | https://github.com/ajbrock/FreezeOut |
Framework | pytorch |
Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation
Title | Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation |
Authors | Guanghan Ning, Zhi Zhang, Zhihai He |
Abstract | Human pose estimation using deep neural networks aims to map input images with large variations into multiple body keypoints which must satisfy a set of geometric constraints and inter-dependency imposed by the human body model. This is a very challenging nonlinear manifold learning process in a very high dimensional feature space. We believe that the deep neural network, which is inherently an algebraic computation system, is not the most effecient way to capture highly sophisticated human knowledge, for example those highly coupled geometric characteristics and interdependence between keypoints in human poses. In this work, we propose to explore how external knowledge can be effectively represented and injected into the deep neural networks to guide its training process using learned projections that impose proper prior. Specifically, we use the stacked hourglass design and inception-resnet module to construct a fractal network to regress human pose images into heatmaps with no explicit graphical modeling. We encode external knowledge with visual features which are able to characterize the constraints of human body models and evaluate the fitness of intermediate network output. We then inject these external features into the neural network using a projection matrix learned using an auxiliary cost function. The effectiveness of the proposed inception-resnet module and the benefit in guided learning with knowledge projection is evaluated on two widely used benchmarks. Our approach achieves state-of-the-art performance on both datasets. |
Tasks | Pose Estimation |
Published | 2017-05-05 |
URL | http://arxiv.org/abs/1705.02407v2 |
http://arxiv.org/pdf/1705.02407v2.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-guided-deep-fractal-neural-networks |
Repo | https://github.com/Guanghan/GNet-pose |
Framework | none |