January 30, 2020

3469 words 17 mins read

Paper Group ANR 246

Hand Pose Estimation: A Survey. Hierarchical Variational Imitation Learning of Control Programs. Federated Imitation Learning: A Novel Framework for Cloud Robotic Systems with Heterogeneous Sensor Data. Partial Separability and Functional Graphical Models for Multivariate Gaussian Processes. Do Attention Heads in BERT Track Syntactic Dependencies?. …

Hand Pose Estimation: A Survey


Title	Hand Pose Estimation: A Survey
Authors	Bardia Doosti
Abstract	The success of Deep Convolutional Neural Networks (CNNs) in recent years in almost all the Computer Vision tasks on one hand, and the popularity of low-cost consumer depth cameras on the other, has made Hand Pose Estimation a hot topic in computer vision field. In this report, we will first explain the hand pose estimation problem and will review major approaches solving this problem, especially the two different problems of using depth maps or RGB images. We will survey the most important papers in each field and will discuss the strengths and weaknesses of each. Finally, we will explain the biggest datasets in this field in detail and list 22 datasets with all their properties. To the best of our knowledge this is the most complete list of all the datasets in the hand pose estimation field.
Tasks	Hand Pose Estimation, Pose Estimation
Published	2019-03-03
URL	https://arxiv.org/abs/1903.01013v2
PDF	https://arxiv.org/pdf/1903.01013v2.pdf
PWC	https://paperswithcode.com/paper/hand-pose-estimation-a-survey
Repo
Framework

Hierarchical Variational Imitation Learning of Control Programs


Title	Hierarchical Variational Imitation Learning of Control Programs
Authors	Roy Fox, Richard Shin, William Paul, Yitian Zou, Dawn Song, Ken Goldberg, Pieter Abbeel, Ion Stoica
Abstract	Autonomous agents can learn by imitating teacher demonstrations of the intended behavior. Hierarchical control policies are ubiquitously useful for such learning, having the potential to break down structured tasks into simpler sub-tasks, thereby improving data efficiency and generalization. In this paper, we propose a variational inference method for imitation learning of a control policy represented by parametrized hierarchical procedures (PHP), a program-like structure in which procedures can invoke sub-procedures to perform sub-tasks. Our method discovers the hierarchical structure in a dataset of observation-action traces of teacher demonstrations, by learning an approximate posterior distribution over the latent sequence of procedure calls and terminations. Samples from this learned distribution then guide the training of the hierarchical control policy. We identify and demonstrate a novel benefit of variational inference in the context of hierarchical imitation learning: in decomposing the policy into simpler procedures, inference can leverage acausal information that is unused by other methods. Training PHP with variational inference outperforms LSTM baselines in terms of data efficiency and generalization, requiring less than half as much data to achieve a 24% error rate in executing the bubble sort algorithm, and to achieve no error in executing Karel programs.
Tasks	Imitation Learning
Published	2019-12-29
URL	https://arxiv.org/abs/1912.12612v1
PDF	https://arxiv.org/pdf/1912.12612v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-variational-imitation-learning
Repo
Framework

Federated Imitation Learning: A Novel Framework for Cloud Robotic Systems with Heterogeneous Sensor Data


Title	Federated Imitation Learning: A Novel Framework for Cloud Robotic Systems with Heterogeneous Sensor Data
Authors	Boyi Liu, Lujia Wang, Ming Liu, Cheng-Zhong Xu
Abstract	Humans are capable of learning a new behavior by observing others to perform the skill. Similarly, robots can also implement this by imitation learning. Furthermore, if with external guidance, humans can master the new behavior more efficiently. So, how can robots achieve this? To address the issue, we present a novel framework named FIL. It provides a heterogeneous knowledge fusion mechanism for cloud robotic systems. Then, a knowledge fusion algorithm in FIL is proposed. It enables the cloud to fuse heterogeneous knowledge from local robots and generate guide models for robots with service requests. After that, we introduce a knowledge transfer scheme to facilitate local robots acquiring knowledge from the cloud. With FIL, a robot is capable of utilizing knowledge from other robots to increase its imitation learning in accuracy and efficiency. Compared with transfer learning and meta-learning, FIL is more suitable to be deployed in cloud robotic systems. Finally, we conduct experiments of a self-driving task for robots (cars). The experimental results demonstrate that the shared model generated by FIL increases imitation learning efficiency of local robots in cloud robotic systems.
Tasks	Imitation Learning, Meta-Learning, Transfer Learning
Published	2019-12-24
URL	https://arxiv.org/abs/1912.12204v1
PDF	https://arxiv.org/pdf/1912.12204v1.pdf
PWC	https://paperswithcode.com/paper/federated-imitation-learning-a-novel
Repo
Framework

Partial Separability and Functional Graphical Models for Multivariate Gaussian Processes


Title	Partial Separability and Functional Graphical Models for Multivariate Gaussian Processes
Authors	Javier Zapata, Sang-Yun Oh, Alexander Petersen
Abstract	The covariance structure of multivariate functional data can be highly complex, especially if the multivariate dimension is large, making extension of statistical methods for standard multivariate data to the functional data setting quite challenging. For example, Gaussian graphical models have recently been extended to the setting of multivariate functional data by applying multivariate methods to the coefficients of truncated basis expansions. However, a key difficulty compared to multivariate data is that the covariance operator is compact, and thus not invertible. The methodology in this paper addresses the general problem of covariance modeling for multivariate functional data, and functional Gaussian graphical models in particular. As a first step, a new notion of separability for multivariate functional data is proposed, termed partial separability, leading to a novel Karhunen-Lo`eve-type expansion for such data. Next, the partial separability structure is shown to be particularly useful in order to provide a well-defined Gaussian graphical model that can be identified with a sequence of finite-dimensional graphical models, each of fixed dimension. This motivates a simple and efficient estimation procedure through application of the joint graphical lasso. Empirical performance of the method for graphical model estimation is assessed through simulation and analysis of functional brain connectivity during a motor task.
Tasks	Gaussian Processes
Published	2019-10-07
URL	https://arxiv.org/abs/1910.03134v2
PDF	https://arxiv.org/pdf/1910.03134v2.pdf
PWC	https://paperswithcode.com/paper/partial-separability-and-functional-graphical
Repo
Framework

Do Attention Heads in BERT Track Syntactic Dependencies?


Title	Do Attention Heads in BERT Track Syntactic Dependencies?
Authors	Phu Mon Htut, Jason Phang, Shikha Bordia, Samuel R. Bowman
Abstract	We investigate the extent to which individual attention heads in pretrained transformer language models, such as BERT and RoBERTa, implicitly capture syntactic dependency relations. We employ two methods—taking the maximum attention weight and computing the maximum spanning tree—to extract implicit dependency relations from the attention weights of each layer/head, and compare them to the ground-truth Universal Dependency (UD) trees. We show that, for some UD relation types, there exist heads that can recover the dependency type significantly better than baselines on parsed English text, suggesting that some self-attention heads act as a proxy for syntactic structure. We also analyze BERT fine-tuned on two datasets—the syntax-oriented CoLA and the semantics-oriented MNLI—to investigate whether fine-tuning affects the patterns of their self-attention, but we do not observe substantial differences in the overall dependency relations extracted using our methods. Our results suggest that these models have some specialist attention heads that track individual dependency types, but no generalist head that performs holistic parsing significantly better than a trivial baseline, and that analyzing attention weights directly may not reveal much of the syntactic knowledge that BERT-style models are known to learn.
Tasks
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12246v1
PDF	https://arxiv.org/pdf/1911.12246v1.pdf
PWC	https://paperswithcode.com/paper/do-attention-heads-in-bert-track-syntactic
Repo
Framework

One-Shot Imitation Filming of Human Motion Videos


Title	One-Shot Imitation Filming of Human Motion Videos
Authors	Chong Huang, Yuanjie Dang, Peng Chen, Xin Yang, Kwang-Ting, Cheng
Abstract	Imitation learning has been applied to mimic the operation of a human cameraman in several autonomous cinematography systems. To imitate different filming styles, existing methods train multiple models, where each model handles a particular style and requires a significant number of training samples. As a result, existing methods can hardly generalize to unseen styles. In this paper, we propose a framework, which can imitate a filming style by “seeing” only a single demonstration video of the same style, i.e., one-shot imitation filming. This is done by two key enabling techniques: 1) feature extraction of the filming style from the demo video, and 2) filming style transfer from the demo video to the new situation. We implement the approach with deep neural network and deploy it to a 6 degrees of freedom (DOF) real drone cinematography system by first predicting the future camera motions, and then converting them to the drone’s control commands via an odometer. Our experimental results on extensive datasets and showcases exhibit significant improvements in our approach over conventional baselines and our approach can successfully mimic the footage with an unseen style.
Tasks	Imitation Learning, Style Transfer
Published	2019-12-23
URL	https://arxiv.org/abs/1912.10609v1
PDF	https://arxiv.org/pdf/1912.10609v1.pdf
PWC	https://paperswithcode.com/paper/one-shot-imitation-filming-of-human-motion
Repo
Framework

Trainable Spectrally Initializable Matrix Transformations in Convolutional Neural Networks


Title	Trainable Spectrally Initializable Matrix Transformations in Convolutional Neural Networks
Authors	Michele Alberti, Angela Botros, Narayan Schuez, Rolf Ingold, Marcus Liwicki, Mathias Seuret
Abstract	In this work, we investigate the application of trainable and spectrally initializable matrix transformations on the feature maps produced by convolution operations. While previous literature has already demonstrated the possibility of adding static spectral transformations as feature processors, our focus is on more general trainable transforms. We study the transforms in various architectural configurations on four datasets of different nature: from medical (ColorectalHist, HAM10000) and natural (Flowers, ImageNet) images to historical documents (CB55) and handwriting recognition (GPDS). With rigorous experiments that control for the number of parameters and randomness, we show that networks utilizing the introduced matrix transformations outperform vanilla neural networks. The observed accuracy increases by an average of 2.2 across all datasets. In addition, we show that the benefit of spectral initialization leads to significantly faster convergence, as opposed to randomly initialized matrix transformations. The transformations are implemented as auto-differentiable PyTorch modules that can be incorporated into any neural network architecture. The entire code base is open-source.
Tasks
Published	2019-11-12
URL	https://arxiv.org/abs/1911.05045v2
PDF	https://arxiv.org/pdf/1911.05045v2.pdf
PWC	https://paperswithcode.com/paper/trainable-spectrally-initializable-matrix
Repo
Framework

Tightening Bounds for Variational Inference by Revisiting Perturbation Theory


Title	Tightening Bounds for Variational Inference by Revisiting Perturbation Theory
Authors	Robert Bamler, Cheng Zhang, Manfred Opper, Stephan Mandt
Abstract	Variational inference has become one of the most widely used methods in latent variable modeling. In its basic form, variational inference employs a fully factorized variational distribution and minimizes its KL divergence to the posterior. As the minimization can only be carried out approximately, this approximation induces a bias. In this paper, we revisit perturbation theory as a powerful way of improving the variational approximation. Perturbation theory relies on a form of Taylor expansion of the log marginal likelihood, vaguely in terms of the log ratio of the true posterior and its variational approximation. While first order terms give the classical variational bound, higher-order terms yield corrections that tighten it. However, traditional perturbation theory does not provide a lower bound, making it inapt for stochastic optimization. In this paper, we present a similar yet alternative way of deriving corrections to the ELBO that resemble perturbation theory, but that result in a valid bound. We show in experiments on Gaussian Processes and Variational Autoencoders that the new bounds are more mass covering, and that the resulting posterior covariances are closer to the true posterior and lead to higher likelihoods on held-out data.
Tasks	Gaussian Processes, Stochastic Optimization
Published	2019-09-30
URL	https://arxiv.org/abs/1910.00069v1
PDF	https://arxiv.org/pdf/1910.00069v1.pdf
PWC	https://paperswithcode.com/paper/tightening-bounds-for-variational-inference
Repo
Framework

Learning To Reach Goals Without Reinforcement Learning


Title	Learning To Reach Goals Without Reinforcement Learning
Authors	Dibya Ghosh, Abhishek Gupta, Justin Fu, Ashwin Reddy, Coline Devin, Benjamin Eysenbach, Sergey Levine
Abstract	Imitation learning algorithms provide a simple and straightforward approach for training control policies via supervised learning. By maximizing the likelihood of good actions provided by an expert demonstrator, supervised imitation learning can produce effective policies without the algorithmic complexities and optimization challenges of reinforcement learning, at the cost of requiring an expert demonstrator to provide the demonstrations. In this paper, we ask: can we take insights from imitation learning to design algorithms that can effectively acquire optimal policies from scratch without any expert demonstrations? The key observation that makes this possible is that, in the multi-task setting, trajectories that are generated by a suboptimal policy can still serve as optimal examples for other tasks. In particular, when tasks correspond to different goals, every trajectory is a successful demonstration for the goal state that it actually reaches. We propose a simple algorithm for learning goal-reaching behaviors without any demonstrations, complicated user-provided reward functions, or complex reinforcement learning methods. Our method simply maximizes the likelihood of actions the agent actually took in its own previous rollouts, conditioned on the goal being the state that it actually reached. Although related variants of this approach have been proposed previously in imitation learning with demonstrations, we show how this approach can effectively learn goal-reaching policies from scratch. We present a theoretical result linking self-supervised imitation learning and reinforcement learning, and empirical results showing that it performs competitively with more complex reinforcement learning methods on a range of challenging goal reaching problems, while yielding advantages in terms of stability and use of offline data.
Tasks	Imitation Learning
Published	2019-12-12
URL	https://arxiv.org/abs/1912.06088v2
PDF	https://arxiv.org/pdf/1912.06088v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-reach-goals-without-reinforcement-1
Repo
Framework

Incentive-aware Contextual Pricing with Non-parametric Market Noise


Title	Incentive-aware Contextual Pricing with Non-parametric Market Noise
Authors	Negin Golrezaei, Patrick Jaillet, Jason Cheuk Nam Liang
Abstract	We consider a dynamic pricing problem for repeated contextual second-price auctions with strategic buyers whose goals are to maximize their long-term time discounted utility. The seller has very limited information about buyers’ overall demand curves, which depends on $d$-dimensional context vectors characterizing auctioned items, and a non-parametric market noise distribution that captures buyers’ idiosyncratic tastes. The noise distribution and the relationship between the context vectors and buyers’ demand curves are both unknown to the seller. We focus on designing the seller’s learning policy to set contextual reserve prices where the seller’s goal is to minimize his regret for revenue. We first propose a pricing policy when buyers are truthful and show that it achieves a $T$-period regret bound of $\tilde{\mathcal{O}}(\sqrt{dT})$ against a clairvoyant policy that has full information of the buyers’ demand. Next, under the setting where buyers bid strategically to maximize their long-term discounted utility, we develop a variant of our first policy that is robust to strategic (corrupted) bids. This policy incorporates randomized “isolation” periods, during which a buyer is randomly chosen to solely participate in the auction. We show that this design allows the seller to control the number of periods in which buyers significantly corrupt their bids. Because of this nice property, our robust policy enjoys a $T$-period regret of $\tilde{\mathcal{O}}(\sqrt{dT})$, matching that under the truthful setting up to a constant factor that depends on the utility discount factor.
Tasks
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03508v1
PDF	https://arxiv.org/pdf/1911.03508v1.pdf
PWC	https://paperswithcode.com/paper/incentive-aware-contextual-pricing-with-non
Repo
Framework

Imitation Learning via Off-Policy Distribution Matching


Title	Imitation Learning via Off-Policy Distribution Matching
Authors	Ilya Kostrikov, Ofir Nachum, Jonathan Tompson
Abstract	When performing imitation learning from expert demonstrations, distribution matching is a popular approach, in which one alternates between estimating distribution ratios and then using these ratios as rewards in a standard reinforcement learning (RL) algorithm. Traditionally, estimation of the distribution ratio requires on-policy data, which has caused previous work to either be exorbitantly data-inefficient or alter the original objective in a manner that can drastically change its optimum. In this work, we show how the original distribution ratio estimation objective may be transformed in a principled manner to yield a completely off-policy objective. In addition to the data-efficiency that this provides, we are able to show that this objective also renders the use of a separate RL optimization unnecessary.Rather, an imitation policy may be learned directly from this objective without the use of explicit rewards. We call the resulting algorithm ValueDICE and evaluate it on a suite of popular imitation learning benchmarks, finding that it can achieve state-of-the-art sample efficiency and performance.
Tasks	Imitation Learning
Published	2019-12-10
URL	https://arxiv.org/abs/1912.05032v1
PDF	https://arxiv.org/pdf/1912.05032v1.pdf
PWC	https://paperswithcode.com/paper/imitation-learning-via-off-policy-1
Repo
Framework

Furnishing Your Room by What You See: An End-to-End Furniture Set Retrieval Framework with Rich Annotated Benchmark Dataset


Title	Furnishing Your Room by What You See: An End-to-End Furniture Set Retrieval Framework with Rich Annotated Benchmark Dataset
Authors	Bingyuan Liu, Jiantao Zhang, Xiaoting Zhang, Wei Zhang, Chuanhui Yu, Yuan Zhou
Abstract	Understanding interior scenes has attracted enormous interest in computer vision community. However, few works focus on the understanding of furniture within the scenes and a large-scale dataset is also lacked to advance the field. In this paper, we first fill the gap by presenting DeepFurniture, a richly annotated large indoor scene dataset, including 24k indoor images, 170k furniture instances and 20k unique furniture identities. On the dataset, we introduce a new benchmark, named furniture set retrieval. Given an indoor photo as input, the task requires to detect all the furniture instances and search a matched set of furniture identities. To address this challenging task, we propose a feature and context embedding based framework. It contains 3 major contributions: (1) An improved Mask-RCNN model with an additional mask-based classifier is introduced for better utilizing the mask information to relieve the occlusion problems in furniture detection context. (2) A multi-task style Siamese network is proposed to train the feature embedding model for retrieval, which is composed of a classification subnet supervised by self-clustered pseudo attributes and a verification subnet to estimate whether the input pair is matched. (3) In order to model the relationship of the furniture entities in an interior design, a context embedding model is employed to re-rank the retrieval results. Extensive experiments demonstrate the effectiveness of each module and the overall system.
Tasks
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09299v2
PDF	https://arxiv.org/pdf/1911.09299v2.pdf
PWC	https://paperswithcode.com/paper/furnishing-your-room-by-what-you-see-an-end
Repo
Framework

High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks


Title	High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks
Authors	Ruben Villegas, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V. Le, Honglak Lee
Abstract	Predicting future video frames is extremely challenging, as there are many factors of variation that make up the dynamics of how frames change through time. Previously proposed solutions require complex inductive biases inside network architectures with highly specialized computation, including segmentation masks, optical flow, and foreground and background separation. In this work, we question if such handcrafted architectures are necessary and instead propose a different approach: finding minimal inductive bias for video prediction while maximizing network capacity. We investigate this question by performing the first large-scale empirical study and demonstrate state-of-the-art performance by learning large models on three different datasets: one for modeling object interactions, one for modeling human motion, and one for modeling car driving.
Tasks	Optical Flow Estimation, Video Prediction
Published	2019-11-05
URL	https://arxiv.org/abs/1911.01655v1
PDF	https://arxiv.org/pdf/1911.01655v1.pdf
PWC	https://paperswithcode.com/paper/high-fidelity-video-prediction-with-large
Repo
Framework

Question Relatedness on Stack Overflow: The Task, Dataset, and Corpus-inspired Models


Title	Question Relatedness on Stack Overflow: The Task, Dataset, and Corpus-inspired Models
Authors	Amirreza Shirani, Bowen Xu, David Lo, Thamar Solorio, Amin Alipour
Abstract	Domain-specific community question answering is becoming an integral part of professions. Finding related questions and answers in these communities can significantly improve the effectiveness and efficiency of information seeking. Stack Overflow is one of the most popular communities that is being used by millions of programmers. In this paper, we analyze the problem of predicting knowledge unit (question thread) relatedness in Stack Overflow. In particular, we formulate the question relatedness task as a multi-class classification problem with four degrees of relatedness. We present a large-scale dataset with more than 300K pairs. To the best of our knowledge, this dataset is the largest domain-specific dataset for Question-Question relatedness. We present the steps that we took to collect, clean, process, and assure the quality of the dataset. The proposed dataset Stack Overflow is a useful resource to develop novel solutions, specifically data-hungry neural network models, for the prediction of relatedness in technical community question-answering forums. We adopt a neural network architecture and a traditional model for this task that effectively utilize information from different parts of knowledge units to compute the relatedness between them. These models can be used to benchmark novel models, as they perform well in our task and in a closely similar task.
Tasks	Community Question Answering, Question Answering
Published	2019-05-03
URL	https://arxiv.org/abs/1905.01966v2
PDF	https://arxiv.org/pdf/1905.01966v2.pdf
PWC	https://paperswithcode.com/paper/question-relatedness-on-stack-overflow-the
Repo
Framework

Time-Space tradeoff in deep learning models for crop classification on satellite multi-spectral image time series


Title	Time-Space tradeoff in deep learning models for crop classification on satellite multi-spectral image time series
Authors	Vivien Sainte Fare Garnot, Loic Landrieu, Sebastien Giordano, Nesrine Chehata
Abstract	In this article, we investigate several structured deep learning models for crop type classification on multi-spectral time series. In particular, our aim is to assess the respective importance of spatial and temporal structures in such data. With this objective, we consider several designs of convolutional, recurrent, and hybrid neural networks, and assess their performance on a large dataset of freely available Sentinel-2 imagery. We find that the best-performing approaches are hybrid configurations for which most of the parameters (up to 90%) are allocated to modeling the temporal structure of the data. Our results thus constitute a set of guidelines for the design of bespoke deep learning models for crop type classification.
Tasks	Crop Classification, Time Series
Published	2019-01-29
URL	http://arxiv.org/abs/1901.10503v1
PDF	http://arxiv.org/pdf/1901.10503v1.pdf
PWC	https://paperswithcode.com/paper/time-space-tradeoff-in-deep-learning-models
Repo
Framework