February 1, 2020

3634 words 18 mins read

Paper Group AWR 76

Paper Group AWR 76

“Good Robot!": Efficient Reinforcement Learning for Multi-Step Visual Tasks with Sim to Real Transfer. White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks. PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows. LOST: A flexible framework for semi-automatic image annotation. CDPA: Common and Distinctive Pattern A …

“Good Robot!": Efficient Reinforcement Learning for Multi-Step Visual Tasks with Sim to Real Transfer

Title “Good Robot!": Efficient Reinforcement Learning for Multi-Step Visual Tasks with Sim to Real Transfer
Authors Andrew Hundt, Benjamin Killeen, Nicholas Greene, Hongtao Wu, Heeyeon Kwon, Chris Paxton, Gregory D. Hager
Abstract In order to effectively learn multi-step tasks, robots must be able to understand the context by which task progress is defined. In reinforcement learning, much of this information is provided to the learner by the reward function. However, comparatively little work has examined how the reward function captures - or fails to capture - task context in robotics, particularly in long-horizon tasks where failure is highly consequential. To address this issue, we describe the Schedule for Positive Task (SPOT) Reward and the SPOT-Q reinforcement learning algorithm, which efficiently learn multi-step block manipulation tasks in both simulation and real-world environments. SPOT-Q is remarkably effective compared to past benchmarks. It successfully completes simulated trials of a variety of tasks including stacking cubes (98%), clearing toys by pushing and grasping arranged in random (100%) and adversarial (95%) patterns, and creating rows of cubes (93%). Furthermore, we demonstrate direct sim to real transfer. By directly loading the simulation-trained model on the real robot, we are able to create real stacks in 90% of trials and rows in 80% of trials with no additional real-world fine-tuning. Our system is also quite efficient - models train within 1-10k actions, depending on the task. As a result, our algorithm makes learning complex, multi-step tasks both efficient and practical for real world manipulation tasks. Code is available at https://github.com/jhu-lcsr/good_robot .
Tasks
Published 2019-09-25
URL https://arxiv.org/abs/1909.11730v2
PDF https://arxiv.org/pdf/1909.11730v2.pdf
PWC https://paperswithcode.com/paper/good-robot-efficient-reinforcement-learning
Repo https://github.com/jhu-lcsr/good_robot
Framework pytorch

White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks

Title White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks
Authors Yotam Gil, Yoav Chai, Or Gorodissky, Jonathan Berant
Abstract Adversarial examples are important for understanding the behavior of neural models, and can improve their robustness through adversarial training. Recent work in natural language processing generated adversarial examples by assuming white-box access to the attacked model, and optimizing the input directly against it (Ebrahimi et al., 2018). In this work, we show that the knowledge implicit in the optimization procedure can be distilled into another more efficient neural network. We train a model to emulate the behavior of a white-box attack and show that it generalizes well across examples. Moreover, it reduces adversarial example generation time by 19x-39x. We also show that our approach transfers to a black-box setting, by attacking The Google Perspective API and exposing its vulnerability. Our attack flips the API-predicted label in 42% of the generated examples, while humans maintain high-accuracy in predicting the gold label.
Tasks
Published 2019-04-04
URL http://arxiv.org/abs/1904.02405v1
PDF http://arxiv.org/pdf/1904.02405v1.pdf
PWC https://paperswithcode.com/paper/white-to-black-efficient-distillation-of
Repo https://github.com/orgoro/white-2-black
Framework tf

PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows

Title PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows
Authors Guandao Yang, Xun Huang, Zekun Hao, Ming-Yu Liu, Serge Belongie, Bharath Hariharan
Abstract As 3D point clouds become the representation of choice for multiple vision and graphics applications, the ability to synthesize or reconstruct high-resolution, high-fidelity point clouds becomes crucial. Despite the recent success of deep learning models in discriminative tasks of point clouds, generating point clouds remains challenging. This paper proposes a principled probabilistic framework to generate 3D point clouds by modeling them as a distribution of distributions. Specifically, we learn a two-level hierarchy of distributions where the first level is the distribution of shapes and the second level is the distribution of points given a shape. This formulation allows us to both sample shapes and sample an arbitrary number of points from a shape. Our generative model, named PointFlow, learns each level of the distribution with a continuous normalizing flow. The invertibility of normalizing flows enables the computation of the likelihood during training and allows us to train our model in the variational inference framework. Empirically, we demonstrate that PointFlow achieves state-of-the-art performance in point cloud generation. We additionally show that our model can faithfully reconstruct point clouds and learn useful representations in an unsupervised manner. The code will be available at https://github.com/stevenygd/PointFlow.
Tasks Point Cloud Generation
Published 2019-06-28
URL https://arxiv.org/abs/1906.12320v3
PDF https://arxiv.org/pdf/1906.12320v3.pdf
PWC https://paperswithcode.com/paper/pointflow-3d-point-cloud-generation-with
Repo https://github.com/AnTao97/UnsupervisedPointCloudReconstruction
Framework pytorch

LOST: A flexible framework for semi-automatic image annotation

Title LOST: A flexible framework for semi-automatic image annotation
Authors Jonas Jäger, Gereon Reus, Joachim Denzler, Viviane Wolff, Klaus Fricke-Neuderth
Abstract State-of-the-art computer vision approaches rely on huge amounts of annotated data. The collection of such data is a time consuming process since it is mainly performed by humans. The literature shows that semi-automatic annotation approaches can significantly speed up the annotation process by the automatic generation of annotation proposals to support the annotator. In this paper we present a framework that allows for a quick and flexible design of semi-automatic annotation pipelines. We show that a good design of the process will speed up the collection of annotations. Our contribution is a new approach to image annotation that allows for the combination of different annotation tools and machine learning algorithms in one process. We further present potential applications of our approach. The source code of our framework called LOST (Label Objects and Save Time) is available at: https://github.com/l3p-cv/lost.
Tasks
Published 2019-10-16
URL https://arxiv.org/abs/1910.07486v2
PDF https://arxiv.org/pdf/1910.07486v2.pdf
PWC https://paperswithcode.com/paper/lost-a-flexible-framework-for-semi-automatic
Repo https://github.com/l3p-cv/lost
Framework none

CDPA: Common and Distinctive Pattern Analysis between High-dimensional Datasets

Title CDPA: Common and Distinctive Pattern Analysis between High-dimensional Datasets
Authors Hai Shu, Zhe Qu
Abstract A representative model in integrative analysis of two high-dimensional correlated datasets is to decompose each data matrix into a low-rank common matrix generated by latent factors shared across datasets, a low-rank distinctive matrix corresponding to each dataset, and an additive noise matrix. Existing decomposition methods claim that their common matrices capture the common pattern of the two datasets. However, their so-called common pattern only denotes the common latent factors but ignores the common information between the two coefficient matrices of these latent factors. We propose a novel method, called the common and distinctive pattern analysis (CDPA), which appropriately defines the two patterns by further incorporating the common and distinctive information of the coefficient matrices. A consistent estimation approach is developed for high-dimensional settings, and shows reasonably good finite-sample performance in simulations. The superiority of CDPA over state-of-the-art methods is corroborated in both simulated data and two real-data examples from the Human Connectome Project and The Cancer Genome Atlas. A Python package implementing the CDPA method is available at https://github.com/shu-hai/CDPA.
Tasks
Published 2019-12-20
URL https://arxiv.org/abs/1912.09989v2
PDF https://arxiv.org/pdf/1912.09989v2.pdf
PWC https://paperswithcode.com/paper/cdpa-common-and-distinctive-pattern-analysis
Repo https://github.com/shu-hai/CDPA
Framework none

Unsupervised Domain Adaptation on Reading Comprehension

Title Unsupervised Domain Adaptation on Reading Comprehension
Authors Yu Cao, Meng Fang, Baosheng Yu, Joey Tianyi Zhou
Abstract Reading comprehension (RC) has been studied in a variety of datasets with the boosted performance brought by deep neural networks. However, the generalization capability of these models across different domains remains unclear. To alleviate this issue, we are going to investigate unsupervised domain adaptation on RC, wherein a model is trained on labeled source domain and to be applied to the target domain with only unlabeled samples. We first show that even with the powerful BERT contextual representation, the performance is still unsatisfactory when the model trained on one dataset is directly applied to another target dataset. To solve this, we provide a novel conditional adversarial self-training method (CASe). Specifically, our approach leverages a BERT model fine-tuned on the source dataset along with the confidence filtering to generate reliable pseudo-labeled samples in the target domain for self-training. On the other hand, it further reduces domain distribution discrepancy through conditional adversarial learning across domains. Extensive experiments show our approach achieves comparable accuracy to supervised models on multiple large-scale benchmark datasets.
Tasks Domain Adaptation, Reading Comprehension, Unsupervised Domain Adaptation
Published 2019-11-13
URL https://arxiv.org/abs/1911.06137v3
PDF https://arxiv.org/pdf/1911.06137v3.pdf
PWC https://paperswithcode.com/paper/unsupervised-domain-adaptation-on-reading
Repo https://github.com/caoyu1991/CASe
Framework pytorch

COCO-GAN: Generation by Parts via Conditional Coordinating

Title COCO-GAN: Generation by Parts via Conditional Coordinating
Authors Chieh Hubert Lin, Chia-Che Chang, Yu-Sheng Chen, Da-Cheng Juan, Wei Wei, Hwann-Tzong Chen
Abstract Humans can only interact with part of the surrounding environment due to biological restrictions. Therefore, we learn to reason the spatial relationships across a series of observations to piece together the surrounding environment. Inspired by such behavior and the fact that machines also have computational constraints, we propose \underline{CO}nditional \underline{CO}ordinate GAN (COCO-GAN) of which the generator generates images by parts based on their spatial coordinates as the condition. On the other hand, the discriminator learns to justify realism across multiple assembled patches by global coherence, local appearance, and edge-crossing continuity. Despite the full images are never generated during training, we show that COCO-GAN can produce \textbf{state-of-the-art-quality} full images during inference. We further demonstrate a variety of novel applications enabled by teaching the network to be aware of coordinates. First, we perform extrapolation to the learned coordinate manifold and generate off-the-boundary patches. Combining with the originally generated full image, COCO-GAN can produce images that are larger than training samples, which we called “beyond-boundary generation”. We then showcase panorama generation within a cylindrical coordinate system that inherently preserves horizontally cyclic topology. On the computation side, COCO-GAN has a built-in divide-and-conquer paradigm that reduces memory requisition during training and inference, provides high-parallelism, and can generate parts of images on-demand.
Tasks Face Generation, Image Generation
Published 2019-03-30
URL https://arxiv.org/abs/1904.00284v4
PDF https://arxiv.org/pdf/1904.00284v4.pdf
PWC https://paperswithcode.com/paper/coco-gan-generation-by-parts-via-conditional
Repo https://github.com/hubert0527/COCO-GAN
Framework tf

Deep motion estimation for parallel inter-frame prediction in video compression

Title Deep motion estimation for parallel inter-frame prediction in video compression
Authors André Nortje, Herman A. Engelbrecht, Herman Kamper
Abstract Standard video codecs rely on optical flow to guide inter-frame prediction: pixels from reference frames are moved via motion vectors to predict target video frames. We propose to learn binary motion codes that are encoded based on an input video sequence. These codes are not limited to 2D translations, but can capture complex motion (warping, rotation and occlusion). Our motion codes are learned as part of a single neural network which also learns to compress and decode them. This approach supports parallel video frame decoding instead of the sequential motion estimation and compensation of flow-based methods. We also introduce 3D dynamic bit assignment to adapt to object displacements caused by motion, yielding additional bit savings. By replacing the optical flow-based block-motion algorithms found in an existing video codec with our learned inter-frame prediction model, our approach outperforms the standard H.264 and H.265 video codecs across at low bitrates.
Tasks Motion Estimation, Optical Flow Estimation, Video Compression
Published 2019-12-11
URL https://arxiv.org/abs/1912.05193v1
PDF https://arxiv.org/pdf/1912.05193v1.pdf
PWC https://paperswithcode.com/paper/deep-motion-estimation-for-parallel-inter
Repo https://github.com/adnortje/deepvideo
Framework pytorch

Graph-Partitioning-Based Diffusion Convolution Recurrent Neural Network for Large-Scale Traffic Forecasting

Title Graph-Partitioning-Based Diffusion Convolution Recurrent Neural Network for Large-Scale Traffic Forecasting
Authors Tanwi Mallick, Prasanna Balaprakash, Eric Rask, Jane Macfarlane
Abstract Traffic forecasting approaches are critical to developing adaptive strategies for mobility. Traffic patterns have complex spatial and temporal dependencies that make accurate forecasting on large highway networks a challenging task. Recently, diffusion convolutional recurrent neural networks (DCRNNs) have achieved state-of-the-art results in traffic forecasting by capturing the spatiotemporal dynamics of the traffic. Despite the promising results, adopting DCRNN for large highway networks still remains elusive because of computational and memory bottlenecks. We present an approach to apply DCRNN for a large highway network. We use a graph-partitioning approach to decompose a large highway network into smaller networks and train them simultaneously on a cluster with graphics processing units (GPU). For the first time, we forecast the traffic of the entire California highway network with 11,160 traffic sensor locations simultaneously. We show that our approach can be trained within 3 hours of wall-clock time using 64 GPUs to forecast speed with high accuracy. Further improvements in the accuracy are attained by including overlapping sensor locations from nearby partitions and finding high-performing hyperparameter configurations for the DCRNN using DeepHyper, a hyperparameter tuning package. We demonstrate that a single DCRNN model can be used to train and forecast the speed and flow simultaneously and the results preserve fundamental traffic flow dynamics. We expect our approach for modeling a large highway network in short wall-clock time as a potential core capability in advanced highway traffic monitoring systems, where forecasts can be used to adjust traffic management strategies proactively given anticipated future conditions.
Tasks graph partitioning
Published 2019-09-24
URL https://arxiv.org/abs/1909.11197v2
PDF https://arxiv.org/pdf/1909.11197v2.pdf
PWC https://paperswithcode.com/paper/graph-partitioning-based-diffusion
Repo https://github.com/liyaguang/DCRNN
Framework tf

Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel

Title Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel
Authors Xin Qiu, Elliot Meyerson, Risto Miikkulainen
Abstract Neural Networks (NNs) have been extensively used for a wide spectrum of real-world regression tasks, where the goal is to predict a numerical outcome such as revenue, effectiveness, or a quantitative result. In many such tasks, the point prediction is not enough: the uncertainty (i.e. risk or confidence) of that prediction must also be estimated. Standard NNs, which are most often used in such tasks, do not provide uncertainty information. Existing approaches address this issue by combining Bayesian models with NNs, but these models are hard to implement, more expensive to train, and usually do not predict as accurately as standard NNs. In this paper, a new framework (RIO) is developed that makes it possible to estimate uncertainty in any pretrained standard NN. The behavior of the NN is captured by modeling its prediction residuals with a Gaussian Process, whose kernel includes both the NN’s input and its output. The framework is evaluated in twelve real-world datasets, where it is found to (1) provide reliable estimates of uncertainty, (2) reduce the error of the point predictions, and (3) scale well to large datasets. Given that RIO can be applied to any standard NN without modifications to model architecture or training pipeline, it provides an important ingredient for building real-world NN applications.
Tasks
Published 2019-06-03
URL https://arxiv.org/abs/1906.00588v3
PDF https://arxiv.org/pdf/1906.00588v3.pdf
PWC https://paperswithcode.com/paper/190600588
Repo https://github.com/leaf-ai/rio-paper
Framework tf

Hierarchical Reinforcement Learning for Concurrent Discovery of Compound and Composable Policies

Title Hierarchical Reinforcement Learning for Concurrent Discovery of Compound and Composable Policies
Authors Domingo Esteban, Leonel Rozo, Darwin G. Caldwell
Abstract A common strategy to deal with the expensive reinforcement learning (RL) of complex tasks is to decompose them into a collection of subtasks that are usually simpler to learn as well as reusable for new problems. However, when a robot learns the policies for these subtasks, common approaches treat every policy learning process separately. Therefore, all these individual (composable) policies need to be learned before tackling the learning process of the complex task through policies composition. Moreover, such composition of individual policies is usually performed sequentially, which is not suitable for tasks that require to perform the subtasks concurrently. In this paper, we propose to combine a set of composable Gaussian policies corresponding to these subtasks using a set of activation vectors, resulting in a complex Gaussian policy that is a function of the means and covariances matrices of the composable policies. Moreover, we propose an algorithm for learning both compound and composable policies within the same learning process by exploiting the off-policy data generated from the compound policy. The algorithm is built on a maximum entropy RL approach to favor exploration during the learning process. The results of the experiments show that the experience collected with the compound policy permits not only to solve the complex task but also to obtain useful composable policies that successfully perform in their corresponding subtasks.
Tasks Hierarchical Reinforcement Learning
Published 2019-05-23
URL https://arxiv.org/abs/1905.09668v2
PDF https://arxiv.org/pdf/1905.09668v2.pdf
PWC https://paperswithcode.com/paper/hierarchical-reinforcement-learning-for-1
Repo https://github.com/domingoesteban/hiu_sac
Framework pytorch

Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

Title Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding
Authors Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Dechao Meng, Qingming Huang
Abstract Weakly supervised referring expression grounding aims at localizing the referential object in an image according to the linguistic query, where the mapping between the referential object and query is unknown in the training stage. To address this problem, we propose a novel end-to-end adaptive reconstruction network (ARN). It builds the correspondence between image region proposal and query in an adaptive manner: adaptive grounding and collaborative reconstruction. Specifically, we first extract the subject, location and context features to represent the proposals and the query respectively. Then, we design the adaptive grounding module to compute the matching score between each proposal and query by a hierarchical attention model. Finally, based on attention score and proposal features, we reconstruct the input query with a collaborative loss of language reconstruction loss, adaptive reconstruction loss, and attribute classification loss. This adaptive mechanism helps our model to alleviate the variance of different referring expressions. Experiments on four large-scale datasets show ARN outperforms existing state-of-the-art methods by a large margin. Qualitative results demonstrate that the proposed ARN can better handle the situation where multiple objects of a particular category situated together.
Tasks
Published 2019-08-28
URL https://arxiv.org/abs/1908.10568v1
PDF https://arxiv.org/pdf/1908.10568v1.pdf
PWC https://paperswithcode.com/paper/adaptive-reconstruction-network-for-weakly
Repo https://github.com/GingL/ARN
Framework pytorch

Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding

Title Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding
Authors Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Li Su, Qingming Huang
Abstract Weakly supervised referring expression grounding (REG) aims at localizing the referential entity in an image according to linguistic query, where the mapping between the image region (proposal) and the query is unknown in the training stage. In referring expressions, people usually describe a target entity in terms of its relationship with other contextual entities as well as visual attributes. However, previous weakly supervised REG methods rarely pay attention to the relationship between the entities. In this paper, we propose a knowledge-guided pairwise reconstruction network (KPRN), which models the relationship between the target entity (subject) and contextual entity (object) as well as grounds these two entities. Specifically, we first design a knowledge extraction module to guide the proposal selection of subject and object. The prior knowledge is obtained in a specific form of semantic similarities between each proposal and the subject/object. Second, guided by such knowledge, we design the subject and object attention module to construct the subject-object proposal pairs. The subject attention excludes the unrelated proposals from the candidate proposals. The object attention selects the most suitable proposal as the contextual proposal. Third, we introduce a pairwise attention and an adaptive weighting scheme to learn the correspondence between these proposal pairs and the query. Finally, a pairwise reconstruction module is used to measure the grounding for weakly supervised learning. Extensive experiments on four large-scale datasets show our method outperforms existing state-of-the-art methods by a large margin.
Tasks
Published 2019-09-05
URL https://arxiv.org/abs/1909.02860v1
PDF https://arxiv.org/pdf/1909.02860v1.pdf
PWC https://paperswithcode.com/paper/knowledge-guided-pairwise-reconstruction
Repo https://github.com/GingL/KPRN
Framework pytorch

Few-Shot Adaptive Gaze Estimation

Title Few-Shot Adaptive Gaze Estimation
Authors Seonwook Park, Shalini De Mello, Pavlo Molchanov, Umar Iqbal, Otmar Hilliges, Jan Kautz
Abstract Inter-personal anatomical differences limit the accuracy of person-independent gaze estimation networks. Yet there is a need to lower gaze errors further to enable applications requiring higher quality. Further gains can be achieved by personalizing gaze networks, ideally with few calibration samples. However, over-parameterized neural networks are not amenable to learning from few examples as they can quickly over-fit. We embrace these challenges and propose a novel framework for Few-shot Adaptive GaZE Estimation (FAZE) for learning person-specific gaze networks with very few (less than or equal to 9) calibration samples. FAZE learns a rotation-aware latent representation of gaze via a disentangling encoder-decoder architecture along with a highly adaptable gaze estimator trained using meta-learning. It is capable of adapting to any new person to yield significant performance gains with as few as 3 samples, yielding state-of-the-art performance of 3.18 degrees on GazeCapture, a 19% improvement over prior art. We open-source our code at https://github.com/NVlabs/few_shot_gaze
Tasks Calibration, Gaze Estimation, Meta-Learning
Published 2019-05-06
URL https://arxiv.org/abs/1905.01941v2
PDF https://arxiv.org/pdf/1905.01941v2.pdf
PWC https://paperswithcode.com/paper/few-shot-adaptive-gaze-estimation
Repo https://github.com/NVlabs/few_shot_gaze
Framework pytorch

BottleNet++: An End-to-End Approach for Feature Compression in Device-Edge Co-Inference Systems

Title BottleNet++: An End-to-End Approach for Feature Compression in Device-Edge Co-Inference Systems
Authors Jiawei Shao, Jun Zhang
Abstract The emergence of various intelligent mobile applications demands the deployment of powerful deep learning models at resource-constrained mobile devices. The device-edge co-inference framework provides a promising solution by splitting a neural network at a mobile device and an edge computing server. In order to balance the on-device computation and the communication overhead, the splitting point needs to be carefully picked, while the intermediate feature needs to be compressed before transmission. Existing studies decoupled the design of model splitting, feature compression, and communication, which may lead to excessive resource consumption of the mobile device. In this paper, we introduce an end-to-end architecture, named BottleNet++, that consists of an encoder, a non-trainable channel layer, and a decoder for more efficient feature compression and transmission. The encoder and decoder essentially implement joint source-channel coding via convolutional neural networks (CNNs), while explicitly considering the effect of channel noise. By exploiting the strong sparsity and the fault-tolerant property of the intermediate feature in a deep neural network (DNN), BottleNet++ achieves a much higher compression ratio than existing methods. Furthermore, by providing the channel condition to the encoder as an input, our method enjoys a strong generalization ability in different channel conditions. Compared with merely transmitting intermediate data without feature compression, BottleNet++ achieves up to 64x bandwidth reduction over the additive white Gaussian noise channel and up to 256x bit compression ratio in the binary erasure channel, with less than 2% reduction in accuracy. With a higher compression ratio, BottleNet++ enables splitting a DNN at earlier layers, which leads to up to 3x reduction in on-device computation compared with other compression methods.
Tasks
Published 2019-10-31
URL https://arxiv.org/abs/1910.14315v4
PDF https://arxiv.org/pdf/1910.14315v4.pdf
PWC https://paperswithcode.com/paper/bottlenet-an-end-to-end-approach-for-feature
Repo https://github.com/shaojiawei07/BottleNetPlusPlus
Framework pytorch
comments powered by Disqus