Paper Group ANR 398
An Energy-Aware Online Learning Framework for Resource Management in Heterogeneous Platforms. Acceleration of Actor-Critic Deep Reinforcement Learning for Visual Grasping in Clutter by State Representation Learning Based on Disentanglement of a Raw Input Image. Sparse Graphical Memory for Robust Planning. Control of the Final-Phase of Closed-Loop V …
An Energy-Aware Online Learning Framework for Resource Management in Heterogeneous Platforms
Title | An Energy-Aware Online Learning Framework for Resource Management in Heterogeneous Platforms |
Authors | Sumit K. Mandal, Ganapati Bhat, Janardhan Rao Doppa, Partha Pratim Pande, Umit Y. Ogras |
Abstract | Mobile platforms must satisfy the contradictory requirements of fast response time and minimum energy consumption as a function of dynamically changing applications. To address this need, system-on-chips (SoC) that are at the heart of these devices provide a variety of control knobs, such as the number of active cores and their voltage/frequency levels. Controlling these knobs optimally at runtime is challenging for two reasons. First, the large configuration space prohibits exhaustive solutions. Second, control policies designed offline are at best sub-optimal since many potential new applications are unknown at design-time. We address these challenges by proposing an online imitation learning approach. Our key idea is to construct an offline policy and adapt it online to new applications to optimize a given metric (e.g., energy). The proposed methodology leverages the supervision enabled by power-performance models learned at runtime. We demonstrate its effectiveness on a commercial mobile platform with 16 diverse benchmarks. Our approach successfully adapts the control policy to an unknown application after executing less than 25% of its instructions. |
Tasks | Imitation Learning |
Published | 2020-03-20 |
URL | https://arxiv.org/abs/2003.09526v1 |
https://arxiv.org/pdf/2003.09526v1.pdf | |
PWC | https://paperswithcode.com/paper/an-energy-aware-online-learning-framework-for |
Repo | |
Framework | |
Acceleration of Actor-Critic Deep Reinforcement Learning for Visual Grasping in Clutter by State Representation Learning Based on Disentanglement of a Raw Input Image
Title | Acceleration of Actor-Critic Deep Reinforcement Learning for Visual Grasping in Clutter by State Representation Learning Based on Disentanglement of a Raw Input Image |
Authors | Taewon Kim, Yeseong Park, Youngbin Park, Il Hong Suh |
Abstract | For a robotic grasping task in which diverse unseen target objects exist in a cluttered environment, some deep learning-based methods have achieved state-of-the-art results using visual input directly. In contrast, actor-critic deep reinforcement learning (RL) methods typically perform very poorly when grasping diverse objects, especially when learning from raw images and sparse rewards. To make these RL techniques feasible for vision-based grasping tasks, we employ state representation learning (SRL), where we encode essential information first for subsequent use in RL. However, typical representation learning procedures are unsuitable for extracting pertinent information for learning the grasping skill, because the visual inputs for representation learning, where a robot attempts to grasp a target object in clutter, are extremely complex. We found that preprocessing based on the disentanglement of a raw input image is the key to effectively capturing a compact representation. This enables deep RL to learn robotic grasping skills from highly varied and diverse visual inputs. We demonstrate the effectiveness of this approach with varying levels of disentanglement in a realistic simulated environment. |
Tasks | Representation Learning, Robotic Grasping |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2002.11903v1 |
https://arxiv.org/pdf/2002.11903v1.pdf | |
PWC | https://paperswithcode.com/paper/acceleration-of-actor-critic-deep |
Repo | |
Framework | |
Sparse Graphical Memory for Robust Planning
Title | Sparse Graphical Memory for Robust Planning |
Authors | Michael Laskin, Scott Emmons, Ajay Jain, Thanard Kurutach, Pieter Abbeel, Deepak Pathak |
Abstract | To operate effectively in the real world, artificial agents must act from raw sensory input such as images and achieve diverse goals across long time-horizons. On the one hand, recent strides in deep reinforcement and imitation learning have demonstrated impressive ability to learn goal-conditioned policies from high-dimensional image input, though only for short-horizon tasks. On the other hand, classical graphical methods like A* search are able to solve long-horizon tasks, but assume that the graph structure is abstracted away from raw sensory input and can only be constructed with task-specific priors. We wish to combine the strengths of deep learning and classical planning to solve long-horizon tasks from raw sensory input. To this end, we introduce Sparse Graphical Memory (SGM), a new data structure that stores observations and feasible transitions in a sparse memory. SGM can be combined with goal-conditioned RL or imitative agents to solve long-horizon tasks across a diverse set of domains. We show that SGM significantly outperforms current state of the art methods on long-horizon, sparse-reward visual navigation tasks. Project video and code are available at https://mishalaskin.github.io/sgm/ |
Tasks | Imitation Learning, Visual Navigation |
Published | 2020-03-13 |
URL | https://arxiv.org/abs/2003.06417v1 |
https://arxiv.org/pdf/2003.06417v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-graphical-memory-for-robust-planning |
Repo | |
Framework | |
Control of the Final-Phase of Closed-Loop Visual Grasping using Image-Based Visual Servoing
Title | Control of the Final-Phase of Closed-Loop Visual Grasping using Image-Based Visual Servoing |
Authors | Jesse Haviland, Feras Dayoub, Peter Corke |
Abstract | This paper considers the final approach phase of visual-closed-loop grasping where the RGB-D camera is no longer able to provide valid depth information. Many current robotic grasping controllers are not closed-loop and therefore fail for moving objects. Closed-loop grasp controllers based on RGB-D imagery can track a moving object, but fail when the sensor’s minimum object distance is violated just before grasping. To overcome this we propose the use of image-based visual servoing (IBVS) to guide the robot to the object-relative grasp pose using camera RGB information. IBVS robustly moves the camera to a goal pose defined implicitly in terms of an image-plane feature configuration. In this work, the goal image feature coordinates are predicted from RGB-D data to enable RGB-only tracking once depth data becomes unavailable – this enables more reliable grasping of previously unseen moving objects. Experimental results are provided. |
Tasks | Robotic Grasping |
Published | 2020-01-16 |
URL | https://arxiv.org/abs/2001.05650v2 |
https://arxiv.org/pdf/2001.05650v2.pdf | |
PWC | https://paperswithcode.com/paper/predicting-target-feature-configuration-of |
Repo | |
Framework | |
Robustness to Programmable String Transformations via Augmented Abstract Training
Title | Robustness to Programmable String Transformations via Augmented Abstract Training |
Authors | Yuhao Zhang, Aws Albarghouthi, Loris D’Antoni |
Abstract | Deep neural networks for natural language processing tasks are vulnerable to adversarial input perturbations. In this paper, we present a versatile language for programmatically specifying string transformations – e.g., insertions, deletions, substitutions, swaps, etc. – that are relevant to the task at hand. We then present an approach to adversarially training models that are robust to such user-defined string transformations. Our approach combines the advantages of search-based techniques for adversarial training with abstraction-based techniques. Specifically, we show how to decompose a set of user-defined string transformations into two component specifications, one that benefits from search and another from abstraction. We use our technique to train models on the AG and SST2 datasets and show that the resulting models are robust to combinations of user-defined transformations mimicking spelling mistakes and other meaning-preserving transformations. |
Tasks | |
Published | 2020-02-22 |
URL | https://arxiv.org/abs/2002.09579v1 |
https://arxiv.org/pdf/2002.09579v1.pdf | |
PWC | https://paperswithcode.com/paper/robustness-to-programmable-string |
Repo | |
Framework | |
Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations
Title | Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations |
Authors | Ajay Mandlekar, Danfei Xu, Roberto Martín-Martín, Silvio Savarese, Li Fei-Fei |
Abstract | Imitation learning is an effective and safe technique to train robot policies in the real world because it does not depend on an expensive random exploration process. However, due to the lack of exploration, learning policies that generalize beyond the demonstrated behaviors is still an open challenge. We present a novel imitation learning framework to enable robots to 1) learn complex real world manipulation tasks efficiently from a small number of human demonstrations, and 2) synthesize new behaviors not contained in the collected demonstrations. Our key insight is that multi-task domains often present a latent structure, where demonstrated trajectories for different tasks intersect at common regions of the state space. We present Generalization Through Imitation (GTI), a two-stage offline imitation learning algorithm that exploits this intersecting structure to train goal-directed policies that generalize to unseen start and goal state combinations. In the first stage of GTI, we train a stochastic policy that leverages trajectory intersections to have the capacity to compose behaviors from different demonstration trajectories together. In the second stage of GTI, we collect a small set of rollouts from the unconditioned stochastic policy of the first stage, and train a goal-directed agent to generalize to novel start and goal configurations. We validate GTI in both simulated domains and a challenging long-horizon robotic manipulation domain in the real world. Additional results and videos are available at https://sites.google.com/view/gti2020/ . |
Tasks | Imitation Learning |
Published | 2020-03-13 |
URL | https://arxiv.org/abs/2003.06085v1 |
https://arxiv.org/pdf/2003.06085v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-generalize-across-long-horizon |
Repo | |
Framework | |
RADIOGAN: Deep Convolutional Conditional Generative adversarial Network To Generate PET Images
Title | RADIOGAN: Deep Convolutional Conditional Generative adversarial Network To Generate PET Images |
Authors | Amine Amyar, Su Ruan, Pierre Vera, Pierre Decazes, Romain Modzelewski |
Abstract | One of the most challenges in medical imaging is the lack of data. It is proven that classical data augmentation methods are useful but still limited due to the huge variation in images. Using generative adversarial networks (GAN) is a promising way to address this problem, however, it is challenging to train one model to generate different classes of lesions. In this paper, we propose a deep convolutional conditional generative adversarial network to generate MIP positron emission tomography image (PET) which is a 2D image that represents a 3D volume for fast interpretation, according to different lesions or non lesion (normal). The advantage of our proposed method consists of one model that is capable of generating different classes of lesions trained on a small sample size for each class of lesion, and showing a very promising results. In addition, we show that a walk through a latent space can be used as a tool to evaluate the images generated. |
Tasks | Data Augmentation |
Published | 2020-03-19 |
URL | https://arxiv.org/abs/2003.08663v1 |
https://arxiv.org/pdf/2003.08663v1.pdf | |
PWC | https://paperswithcode.com/paper/radiogan-deep-convolutional-conditional |
Repo | |
Framework | |
Provable Representation Learning for Imitation Learning via Bi-level Optimization
Title | Provable Representation Learning for Imitation Learning via Bi-level Optimization |
Authors | Sanjeev Arora, Simon S. Du, Sham Kakade, Yuping Luo, Nikunj Saunshi |
Abstract | A common strategy in modern learning systems is to learn a representation that is useful for many tasks, a.k.a. representation learning. We study this strategy in the imitation learning setting for Markov decision processes (MDPs) where multiple experts’ trajectories are available. We formulate representation learning as a bi-level optimization problem where the “outer” optimization tries to learn the joint representation and the “inner” optimization encodes the imitation learning setup and tries to learn task-specific parameters. We instantiate this framework for the imitation learning settings of behavior cloning and observation-alone. Theoretically, we show using our framework that representation learning can provide sample complexity benefits for imitation learning in both settings. We also provide proof-of-concept experiments to verify our theory. |
Tasks | Imitation Learning, Representation Learning |
Published | 2020-02-24 |
URL | https://arxiv.org/abs/2002.10544v1 |
https://arxiv.org/pdf/2002.10544v1.pdf | |
PWC | https://paperswithcode.com/paper/provable-representation-learning-for-1 |
Repo | |
Framework | |
Bridging Knowledge Graphs to Generate Scene Graphs
Title | Bridging Knowledge Graphs to Generate Scene Graphs |
Authors | Alireza Zareian, Svebor Karaman, Shih-Fu Chang |
Abstract | Scene graphs are powerful representations that encode images into their abstract semantic elements, i.e, objects and their interactions, which facilitates visual comprehension and explainable reasoning. On the other hand, commonsense knowledge graphs are rich repositories that encode how the world is structured, and how general concepts interact. In this paper, we present a unified formulation of these two constructs, where a scene graph is seen as an image-conditioned instantiation of a commonsense knowledge graph. Based on this new perspective, we re-formulate scene graph generation as the inference of a bridge between the scene and commonsense graphs, where each entity or predicate instance in the scene graph has to be linked to its corresponding entity or predicate class in the commonsense graph. To this end, we propose a heterogeneous graph inference framework allowing to exploit the rich structure within the scene and commonsense at the same time. Through extensive experiments, we show the proposed method achieves significant improvement over the state of the art. |
Tasks | Graph Generation, Knowledge Graphs, Scene Graph Generation |
Published | 2020-01-07 |
URL | https://arxiv.org/abs/2001.02314v1 |
https://arxiv.org/pdf/2001.02314v1.pdf | |
PWC | https://paperswithcode.com/paper/bridging-knowledge-graphs-to-generate-scene |
Repo | |
Framework | |
Human AI interaction loop training: New approach for interactive reinforcement learning
Title | Human AI interaction loop training: New approach for interactive reinforcement learning |
Authors | Neda Navidi |
Abstract | Reinforcement Learning (RL) in various decision-making tasks of machine learning provides effective results with an agent learning from a stand-alone reward function. However, it presents unique challenges with large amounts of environment states and action spaces, as well as in the determination of rewards. This complexity, coming from high dimensionality and continuousness of the environments considered herein, calls for a large number of learning trials to learn about the environment through Reinforcement Learning. Imitation Learning (IL) offers a promising solution for those challenges using a teacher. In IL, the learning process can take advantage of human-sourced assistance and/or control over the agent and environment. A human teacher and an agent learner are considered in this study. The teacher takes part in the agent training towards dealing with the environment, tackling a specific objective, and achieving a predefined goal. Within that paradigm, however, existing IL approaches have the drawback of expecting extensive demonstration information in long-horizon problems. This paper proposes a novel approach combining IL with different types of RL methods, namely state action reward state action (SARSA) and asynchronous advantage actor-critic (A3C) agents, to overcome the problems of both stand-alone systems. It is addressed how to effectively leverage the teacher feedback, be it direct binary or indirect detailed for the agent learner to learn sequential decision-making policies. The results of this study on various OpenAI Gym environments show that this algorithmic method can be incorporated with different combinations, significantly decreases both human endeavor and tedious exploration process. |
Tasks | Decision Making, Imitation Learning |
Published | 2020-03-09 |
URL | https://arxiv.org/abs/2003.04203v1 |
https://arxiv.org/pdf/2003.04203v1.pdf | |
PWC | https://paperswithcode.com/paper/human-ai-interaction-loop-training-new |
Repo | |
Framework | |
PLOP: Probabilistic poLynomial Objects trajectory Planning for autonomous driving
Title | PLOP: Probabilistic poLynomial Objects trajectory Planning for autonomous driving |
Authors | Thibault Buhet, Emilie Wirbel, Xavier Perrotton |
Abstract | To navigate safely in an urban environment, an autonomous vehicle (ego vehicle) needs to understand and anticipate its surroundings, in particular the behavior of other road users (neighbors). However, multiple choices are often acceptable (e.g. turn right or left, or different ways of avoiding an obstacle). We focus here on predicting multiple feasible future trajectories both for the ego vehicle and neighbors through a probabilistic framework. We use a conditional imitation learning algorithm, conditioned by a navigation command for the ego vehicle (e.g. “turn right”). It takes as input the ego car front camera image, a Lidar point cloud in a bird-eye view grid and present and past objects detections to output ego vehicle and neighbors possible trajectories but also semantic segmentation as an auxiliary loss. We evaluate our method on the publicly available dataset nuScenes, showing state-of-the-art performance and investigating the impact of our architecture choices. |
Tasks | Autonomous Driving, Imitation Learning, Semantic Segmentation |
Published | 2020-03-09 |
URL | https://arxiv.org/abs/2003.08744v1 |
https://arxiv.org/pdf/2003.08744v1.pdf | |
PWC | https://paperswithcode.com/paper/plop-probabilistic-polynomial-objects |
Repo | |
Framework | |
Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate
Title | Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate |
Authors | Yufeng Zhang, Qi Cai, Zhuoran Yang, Zhaoran Wang |
Abstract | Generative adversarial imitation learning (GAIL) demonstrates tremendous success in practice, especially when combined with neural networks. Different from reinforcement learning, GAIL learns both policy and reward function from expert (human) demonstration. Despite its empirical success, it remains unclear whether GAIL with neural networks converges to the globally optimal solution. The major difficulty comes from the nonconvex-nonconcave minimax optimization structure. To bridge the gap between practice and theory, we analyze a gradient-based algorithm with alternating updates and establish its sublinear convergence to the globally optimal solution. To the best of our knowledge, our analysis establishes the global optimality and convergence rate of GAIL with neural networks for the first time. |
Tasks | Imitation Learning |
Published | 2020-03-08 |
URL | https://arxiv.org/abs/2003.03709v1 |
https://arxiv.org/pdf/2003.03709v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-adversarial-imitation-learning-2 |
Repo | |
Framework | |
Improving Efficiency in Large-Scale Decentralized Distributed Training
Title | Improving Efficiency in Large-Scale Decentralized Distributed Training |
Authors | Wei Zhang, Xiaodong Cui, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, Youssef Mroueh, Alper Buyuktosunoglu, Payel Das, David Kung, Michael Picheny |
Abstract | Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchronous Parallel SGD (AD-PSGD) is a family of distributed learning algorithms that have been demonstrated to perform well for large-scale deep learning tasks. One drawback of (A)D-PSGD is that the spectral gap of the mixing matrix decreases when the number of learners in the system increases, which hampers convergence. In this paper, we investigate techniques to accelerate (A)D-PSGD based training by improving the spectral gap while minimizing the communication cost. We demonstrate the effectiveness of our proposed techniques by running experiments on the 2000-hour Switchboard speech recognition task and the ImageNet computer vision task. On an IBM P9 supercomputer, our system is able to train an LSTM acoustic model in 2.28 hours with 7.5% WER on the Hub5-2000 Switchboard (SWB) test set and 13.3% WER on the CallHome (CH) test set using 64 V100 GPUs and in 1.98 hours with 7.7% WER on SWB and 13.3% WER on CH using 128 V100 GPUs, the fastest training time reported to date. |
Tasks | Speech Recognition |
Published | 2020-02-04 |
URL | https://arxiv.org/abs/2002.01119v1 |
https://arxiv.org/pdf/2002.01119v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-efficiency-in-large-scale |
Repo | |
Framework | |
RDFFrames: Knowledge Graph Access for Machine Learning Tools
Title | RDFFrames: Knowledge Graph Access for Machine Learning Tools |
Authors | Aisha Mohamed, Ghadeer Abuoda, Abdurrahman Ghanem, Zoi Kaoudi, Ashraf Aboulnaga |
Abstract | Knowledge graphs represented as RDF datasets are becoming increasingly popular, and they are an integral part of many machine learning applications. A rich ecosystem of data management systems and tools that support RDF has evolved over the years to facilitate high performance storage and retrieval of RDF data, most notably RDF database management systems that support the SPARQL query language. Surprisingly, machine learning tools for knowledge graphs typically do not use SPARQL, despite the obvious advantages of using a database system. This is due to the mismatch between SPARQL and machine learning tools in terms of the expected data model and the programming style. Machine learning tools work on data in tabular format and process it using an imperative programming style, while SPARQL is declarative and has as the basic query operation matching graph patterns to RDF triples. We posit that a good interface to knowledge graphs from a machine learning software stack should use an imperative, navigational programming paradigm based on graph traversal rather than the SPARQL query paradigm based on graph patterns. In this paper, we introduce RDFFrames, a framework that provides such an interface. RDFFrames enables the user to make a sequence of calls in a programming language such as Python to define the data to be extracted from a knowledge graph stored in an RDF database system. It then translates these calls into compact SPQARL queries, executes these queries on the database system, and returns the results in a standard tabular format. Thus, RDFframes combines the usability of a machine learning software stack with the performance of an RDF database system. |
Tasks | Knowledge Graphs |
Published | 2020-02-10 |
URL | https://arxiv.org/abs/2002.03614v1 |
https://arxiv.org/pdf/2002.03614v1.pdf | |
PWC | https://paperswithcode.com/paper/rdfframes-knowledge-graph-access-for-machine |
Repo | |
Framework | |
Inter- and Intra-domain Knowledge Transfer for Related Tasks in Deep Character Recognition
Title | Inter- and Intra-domain Knowledge Transfer for Related Tasks in Deep Character Recognition |
Authors | Nishai Kooverjee, Steven James, Terence van Zyl |
Abstract | Pre-training a deep neural network on the ImageNet dataset is a common practice for training deep learning models, and generally yields improved performance and faster training times. The technique of pre-training on one task and then retraining on a new one is called transfer learning. In this paper we analyse the effectiveness of using deep transfer learning for character recognition tasks. We perform three sets of experiments with varying levels of similarity between source and target tasks to investigate the behaviour of different types of knowledge transfer. We transfer both parameters and features and analyse their behaviour. Our results demonstrate that no significant advantage is gained by using a transfer learning approach over a traditional machine learning approach for our character recognition tasks. This suggests that using transfer learning does not necessarily presuppose a better performing model in all cases. |
Tasks | Transfer Learning |
Published | 2020-01-02 |
URL | https://arxiv.org/abs/2001.00448v1 |
https://arxiv.org/pdf/2001.00448v1.pdf | |
PWC | https://paperswithcode.com/paper/inter-and-intra-domain-knowledge-transfer-for |
Repo | |
Framework | |