October 17, 2019

2813 words 14 mins read

Paper Group ANR 868

Navigation by Imitation in a Pedestrian-Rich Environment. Latent Convolutional Models. A Multi-Task Learning & Generation Framework: Valence-Arousal, Action Units & Primary Expressions. Optical Flow Based Real-time Moving Object Detection in Unconstrained Scenes. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes. Dual Objective Approach …


Title	Navigation by Imitation in a Pedestrian-Rich Environment
Authors	Jing Bi, Tianyou Xiao, Qiuyue Sun, Chenliang Xu
Abstract	Deep neural networks trained on demonstrations of human actions give robot the ability to perform self-driving on the road. However, navigation in a pedestrian-rich environment, such as a campus setup, is still challenging—one needs to take frequent interventions to the robot and take control over the robot from early steps leading to a mistake. An arduous burden is, hence, placed on the learning framework design and data acquisition. In this paper, we propose a new learning-from-intervention Dataset Aggregation (DAgger) algorithm to overcome the limitations brought by applying imitation learning to navigation in the pedestrian-rich environment. Our new learning algorithm implements an error backtrack function that is able to effectively learn from expert interventions. Combining our new learning algorithm with deep convolutional neural networks and a hierarchically-nested policy-selection mechanism, we show that our robot is able to map pixels direct to control commands and navigate successfully in real world without explicitly modeling the pedestrian behaviors or the world model.
Tasks	Imitation Learning
Published	2018-11-01
URL	http://arxiv.org/abs/1811.00506v1
PDF	http://arxiv.org/pdf/1811.00506v1.pdf
PWC	https://paperswithcode.com/paper/navigation-by-imitation-in-a-pedestrian-rich
Repo
Framework

Latent Convolutional Models


Title	Latent Convolutional Models
Authors	ShahRukh Athar, Evgeny Burnaev, Victor Lempitsky
Abstract	We present a new latent model of natural images that can be learned on large-scale datasets. The learning process provides a latent embedding for every image in the training dataset, as well as a deep convolutional network that maps the latent space to the image space. After training, the new model provides a strong and universal image prior for a variety of image restoration tasks such as large-hole inpainting, superresolution, and colorization. To model high-resolution natural images, our approach uses latent spaces of very high dimensionality (one to two orders of magnitude higher than previous latent image models). To tackle this high dimensionality, we use latent spaces with a special manifold structure (convolutional manifolds) parameterized by a ConvNet of a certain architecture. In the experiments, we compare the learned latent models with latent models learned by autoencoders, advanced variants of generative adversarial networks, and a strong baseline system using simpler parameterization of the latent space. Our model outperforms the competing approaches over a range of restoration tasks.
Tasks	Colorization, Image Restoration
Published	2018-06-16
URL	http://arxiv.org/abs/1806.06284v2
PDF	http://arxiv.org/pdf/1806.06284v2.pdf
PWC	https://paperswithcode.com/paper/latent-convolutional-models
Repo
Framework

A Multi-Task Learning & Generation Framework: Valence-Arousal, Action Units & Primary Expressions


Title	A Multi-Task Learning & Generation Framework: Valence-Arousal, Action Units & Primary Expressions
Authors	Dimitrios Kollias, Stefanos Zafeiriou
Abstract	Over the past few years many research efforts have been devoted to the field of affect analysis. Various approaches have been proposed for: i) discrete emotion recognition in terms of the primary facial expressions; ii) emotion analysis in terms of facial Action Units (AUs), assuming a fixed expression intensity; iii) dimensional emotion analysis, in terms of valence and arousal (VA). These approaches can only be effective, if they are developed using large, appropriately annotated databases, showing behaviors of people in-the-wild, i.e., in uncontrolled environments. Aff-Wild has been the first, large-scale, in-the-wild database (including around 1,200,000 frames of 300 videos), annotated in terms of VA. In the vast majority of existing emotion databases, their annotation is limited to either primary expressions, or valence-arousal, or action units. In this paper, we first annotate a part (around $234,000$ frames) of the Aff-Wild database in terms of $8$ AUs and another part (around $288,000$ frames) in terms of the $7$ basic emotion categories, so that parts of this database are annotated in terms of VA, as well as AUs, or primary expressions. Then, we set up and tackle multi-task learning for emotion recognition, as well as for facial image generation. Multi-task learning is performed using: i) a deep neural network with shared hidden layers, which learns emotional attributes by exploiting their inter-dependencies; ii) a discriminator of a generative adversarial network (GAN). On the other hand, image generation is implemented through the generator of the GAN. For these two tasks, we carefully design loss functions that fit the examined set-up. Experiments are presented which illustrate the good performance of the proposed approach when applied to the new annotated parts of the Aff-Wild database.
Tasks	Emotion Recognition, Image Generation, Multi-Task Learning
Published	2018-11-11
URL	https://arxiv.org/abs/1811.07771v2
PDF	https://arxiv.org/pdf/1811.07771v2.pdf
PWC	https://paperswithcode.com/paper/a-multi-task-learning-generation-framework
Repo
Framework

Optical Flow Based Real-time Moving Object Detection in Unconstrained Scenes


Title	Optical Flow Based Real-time Moving Object Detection in Unconstrained Scenes
Authors	Junjie Huang, Wei Zou, Jiagang Zhu, Zheng Zhu
Abstract	Real-time moving object detection in unconstrained scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource. In this paper, an optical flow based moving object detection framework is proposed to address this problem. We utilize homography matrixes to online construct a background model in the form of optical flow. When judging out moving foregrounds from scenes, a dual-mode judge mechanism is designed to heighten the system’s adaptation to challenging situations. In experiment part, two evaluation metrics are redefined for more properly reflecting the performance of methods. We quantitatively and qualitatively validate the effectiveness and feasibility of our method with videos in various scene conditions. The experimental results show that our method adapts itself to different situations and outperforms the state-of-the-art methods, indicating the advantages of optical flow based methods.
Tasks	Object Detection, Optical Flow Estimation
Published	2018-07-13
URL	http://arxiv.org/abs/1807.04890v1
PDF	http://arxiv.org/pdf/1807.04890v1.pdf
PWC	https://paperswithcode.com/paper/optical-flow-based-real-time-moving-object
Repo
Framework

Recasting Gradient-Based Meta-Learning as Hierarchical Bayes


Title	Recasting Gradient-Based Meta-Learning as Hierarchical Bayes
Authors	Erin Grant, Chelsea Finn, Sergey Levine, Trevor Darrell, Thomas Griffiths
Abstract	Meta-learning allows an intelligent agent to leverage prior learning episodes as a basis for quickly improving performance on a novel task. Bayesian hierarchical modeling provides a theoretical framework for formalizing meta-learning as inference for a set of parameters that are shared across tasks. Here, we reformulate the model-agnostic meta-learning algorithm (MAML) of Finn et al. (2017) as a method for probabilistic inference in a hierarchical Bayesian model. In contrast to prior methods for meta-learning via hierarchical Bayes, MAML is naturally applicable to complex function approximators through its use of a scalable gradient descent procedure for posterior inference. Furthermore, the identification of MAML as hierarchical Bayes provides a way to understand the algorithm’s operation as a meta-learning procedure, as well as an opportunity to make use of computational strategies for efficient inference. We use this opportunity to propose an improvement to the MAML algorithm that makes use of techniques from approximate inference and curvature estimation.
Tasks	Meta-Learning
Published	2018-01-26
URL	http://arxiv.org/abs/1801.08930v1
PDF	http://arxiv.org/pdf/1801.08930v1.pdf
PWC	https://paperswithcode.com/paper/recasting-gradient-based-meta-learning-as
Repo
Framework

Dual Objective Approach Using A Convolutional Neural Network for Magnetic Resonance Elastography


Title	Dual Objective Approach Using A Convolutional Neural Network for Magnetic Resonance Elastography
Authors	Ligin Solamen, Yipeng Shi, Justice Amoh
Abstract	Traditionally, nonlinear inversion, direct inversion, or wave estimation methods have been used for reconstructing images from MRE displacement data. In this work, we propose a convolutional neural network architecture that can map MRE displacement data directly into elastograms, circumventing the costly and computationally intensive classical approaches. In addition to the mean squared error reconstruction objective, we also introduce a secondary loss inspired by the MRE mechanical models for training the neural network. Our network is demonstrated to be effective for generating MRE images that compare well with equivalents from the nonlinear inversion method.
Tasks
Published	2018-12-02
URL	http://arxiv.org/abs/1812.00441v1
PDF	http://arxiv.org/pdf/1812.00441v1.pdf
PWC	https://paperswithcode.com/paper/dual-objective-approach-using-a-convolutional
Repo
Framework

Deep Learning as Feature Encoding for Emotion Recognition


Title	Deep Learning as Feature Encoding for Emotion Recognition
Authors	Bhalaji Nagarajan, V Ramana Murthy Oruganti
Abstract	Deep learning is popular as an end-to-end framework extracting the prominent features and performing the classification also. In this paper, we extensively investigate deep networks as an alternate to feature encoding technique of low level descriptors for emotion recognition on the benchmark EmoDB dataset. Fusion performance with such obtained encoded features with other available features is also investigated. Highest performance to date in the literature is observed.
Tasks	Emotion Recognition
Published	2018-10-30
URL	http://arxiv.org/abs/1810.12613v2
PDF	http://arxiv.org/pdf/1810.12613v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-as-feature-encoding-for-emotion
Repo
Framework

Convergence of Value Aggregation for Imitation Learning


Title	Convergence of Value Aggregation for Imitation Learning
Authors	Ching-An Cheng, Byron Boots
Abstract	Value aggregation is a general framework for solving imitation learning problems. Based on the idea of data aggregation, it generates a policy sequence by iteratively interleaving policy optimization and evaluation in an online learning setting. While the existence of a good policy in the policy sequence can be guaranteed non-asymptotically, little is known about the convergence of the sequence or the performance of the last policy. In this paper, we debunk the common belief that value aggregation always produces a convergent policy sequence with improving performance. Moreover, we identify a critical stability condition for convergence and provide a tight non-asymptotic bound on the performance of the last policy. These new theoretical insights let us stabilize problems with regularization, which removes the inconvenient process of identifying the best policy in the policy sequence in stochastic problems.
Tasks	Imitation Learning
Published	2018-01-22
URL	http://arxiv.org/abs/1801.07292v1
PDF	http://arxiv.org/pdf/1801.07292v1.pdf
PWC	https://paperswithcode.com/paper/convergence-of-value-aggregation-for
Repo
Framework

SafeRoute: Learning to Navigate Streets Safely in an Urban Environment


Title	SafeRoute: Learning to Navigate Streets Safely in an Urban Environment
Authors	Sharon Levy, Wenhan Xiong, Elizabeth Belding, William Yang Wang
Abstract	Recent studies show that 85% of women have changed their traveled route to avoid harassment and assault. Despite this, current mapping tools do not empower users with information to take charge of their personal safety. We propose SafeRoute, a novel solution to the problem of navigating cities and avoiding street harassment and crime. Unlike other street navigation applications, SafeRoute introduces a new type of path generation via deep reinforcement learning. This enables us to successfully optimize for multi-criteria path-finding and incorporate representation learning within our framework. Our agent learns to pick favorable streets to create a safe and short path with a reward function that incorporates safety and efficiency. Given access to recent crime reports in many urban cities, we train our model for experiments in Boston, New York, and San Francisco. We test our model on areas of these cities, specifically the populated downtown regions where tourists and those unfamiliar with the streets walk. We evaluate SafeRoute and successfully improve over state-of-the-art methods by up to 17% in local average distance from crimes while decreasing path length by up to 7%.
Tasks	Representation Learning
Published	2018-11-03
URL	http://arxiv.org/abs/1811.01147v1
PDF	http://arxiv.org/pdf/1811.01147v1.pdf
PWC	https://paperswithcode.com/paper/saferoute-learning-to-navigate-streets-safely
Repo
Framework

A Structured Model For Action Detection


Title	A Structured Model For Action Detection
Authors	Yubo Zhang, Pavel Tokmakov, Martial Hebert, Cordelia Schmid
Abstract	A dominant paradigm for learning-based approaches in computer vision is training generic models, such as ResNet for image recognition, or I3D for video understanding, on large datasets and allowing them to discover the optimal representation for the problem at hand. While this is an obviously attractive approach, it is not applicable in all scenarios. We claim that action detection is one such challenging problem - the models that need to be trained are large, and labeled data is expensive to obtain. To address this limitation, we propose to incorporate domain knowledge into the structure of the model, simplifying optimization. In particular, we augment a standard I3D network with a tracking module to aggregate long term motion patterns, and use a graph convolutional network to reason about interactions between actors and objects. Evaluated on the challenging AVA dataset, the proposed approach improves over the I3D baseline by 5.5% mAP and over the state-of-the-art by 4.8% mAP.
Tasks	Action Detection, Video Understanding
Published	2018-12-09
URL	https://arxiv.org/abs/1812.03544v5
PDF	https://arxiv.org/pdf/1812.03544v5.pdf
PWC	https://paperswithcode.com/paper/a-structured-model-for-action-detection
Repo
Framework

The Minimax Learning Rate of Normal and Ising Undirected Graphical Models


Title	The Minimax Learning Rate of Normal and Ising Undirected Graphical Models
Authors	Luc Devroye, Abbas Mehrabian, Tommy Reddad
Abstract	Let $G$ be an undirected graph with $m$ edges and $d$ vertices. We show that $d$-dimensional Ising models on $G$ can be learned from $n$ i.i.d. samples within expected total variation distance some constant factor of $\min{1, \sqrt{(m + d)/n}}$, and that this rate is optimal. We show that the same rate holds for the class of $d$-dimensional multivariate normal undirected graphical models with respect to $G$. We also identify the optimal rate of $\min{1, \sqrt{m/n}}$ for Ising models with no external magnetic field.
Tasks
Published	2018-06-18
URL	http://arxiv.org/abs/1806.06887v1
PDF	http://arxiv.org/pdf/1806.06887v1.pdf
PWC	https://paperswithcode.com/paper/the-minimax-learning-rate-of-normal-and-ising
Repo
Framework

Competitive Analysis System for Theatrical Movie Releases Based on Movie Trailer Deep Video Representation


Title	Competitive Analysis System for Theatrical Movie Releases Based on Movie Trailer Deep Video Representation
Authors	Miguel Campo, Cheng-Kang Hsieh, Matt Nickens, JJ Espinoza, Abhinav Taliyan, Julie Rieger, Jean Ho, Bettina Sherick
Abstract	Audience discovery is an important activity at major movie studios. Deep models that use convolutional networks to extract frame-by-frame features of a movie trailer and represent it in a form that is suitable for prediction are now possible thanks to the availability of pre-built feature extractors trained on large image datasets. Using these pre-built feature extractors, we are able to process hundreds of publicly available movie trailers, extract frame-by-frame low level features (e.g., a face, an object, etc) and create video-level representations. We use the video-level representations to train a hybrid Collaborative Filtering model that combines video features with historical movie attendance records. The trained model not only makes accurate attendance and audience prediction for existing movies, but also successfully profiles new movies six to eight months prior to their release.
Tasks
Published	2018-07-12
URL	http://arxiv.org/abs/1807.04465v1
PDF	http://arxiv.org/pdf/1807.04465v1.pdf
PWC	https://paperswithcode.com/paper/competitive-analysis-system-for-theatrical
Repo
Framework

LSTM-based Whisper Detection


Title	LSTM-based Whisper Detection
Authors	Zeynab Raeesy, Kellen Gillespie, Chengyuan Ma, Thomas Drugman, Jiacheng Gu, Roland Maas, Ariya Rastrow, Björn Hoffmeister
Abstract	This article presents a whisper speech detector in the far-field domain. The proposed system consists of a long-short term memory (LSTM) neural network trained on log-filterbank energy (LFBE) acoustic features. This model is trained and evaluated on recordings of human interactions with voice-controlled, far-field devices in whisper and normal phonation modes. We compare multiple inference approaches for utterance-level classification by examining trajectories of the LSTM posteriors. In addition, we engineer a set of features based on the signal characteristics inherent to whisper speech, and evaluate their effectiveness in further separating whisper from normal speech. A benchmarking of these features using multilayer perceptrons (MLP) and LSTMs suggests that the proposed features, in combination with LFBE features, can help us further improve our classifiers. We prove that, with enough data, the LSTM model is indeed as capable of learning whisper characteristics from LFBE features alone com- pared to a simpler MLP model that uses both LFBE and features engineered for separating whisper and normal speech. In addition, we prove that the LSTM classifiers accuracy can be further improved with the incorporation of the proposed engineered features.
Tasks
Published	2018-09-20
URL	http://arxiv.org/abs/1809.07832v1
PDF	http://arxiv.org/pdf/1809.07832v1.pdf
PWC	https://paperswithcode.com/paper/lstm-based-whisper-detection
Repo
Framework

Information-Maximizing Sampling to Promote Tracking-by-Detection


Title	Information-Maximizing Sampling to Promote Tracking-by-Detection
Authors	Kourosh Meshgi, Maryam Sadat Mirzaei, Shigeyuki Oba
Abstract	The performance of an adaptive tracking-by-detection algorithm not only depends on the classification and updating processes but also on the sampling. Typically, such trackers select their samples from the vicinity of the last predicted object location, or from its expected location using a pre-defined motion model, which does not exploit the contents of the samples nor the information provided by the classifier. We introduced the idea of most informative sampling, in which the sampler attempts to select samples that trouble the classifier of a discriminative tracker. We then proposed an active discriminative co-tracker that embed an adversarial sampler to increase its robustness against various tracking challenges. Experiments show that our proposed tracker outperforms state-of-the-art trackers on various benchmark videos.
Tasks
Published	2018-06-07
URL	http://arxiv.org/abs/1806.02523v1
PDF	http://arxiv.org/pdf/1806.02523v1.pdf
PWC	https://paperswithcode.com/paper/information-maximizing-sampling-to-promote
Repo
Framework

Beyond the One Step Greedy Approach in Reinforcement Learning


Title	Beyond the One Step Greedy Approach in Reinforcement Learning
Authors	Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
Abstract	The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, $n$-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms are, in fact, instances of our framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.
Tasks
Published	2018-02-10
URL	http://arxiv.org/abs/1802.03654v3
PDF	http://arxiv.org/pdf/1802.03654v3.pdf
PWC	https://paperswithcode.com/paper/beyond-the-one-step-greedy-approach-in-1
Repo
Framework