Paper Group ANR 868
Navigation by Imitation in a Pedestrian-Rich Environment. Latent Convolutional Models. A Multi-Task Learning & Generation Framework: Valence-Arousal, Action Units & Primary Expressions. Optical Flow Based Real-time Moving Object Detection in Unconstrained Scenes. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes. Dual Objective Approach …
Navigation by Imitation in a Pedestrian-Rich Environment
Title | Navigation by Imitation in a Pedestrian-Rich Environment |
Authors | Jing Bi, Tianyou Xiao, Qiuyue Sun, Chenliang Xu |
Abstract | Deep neural networks trained on demonstrations of human actions give robot the ability to perform self-driving on the road. However, navigation in a pedestrian-rich environment, such as a campus setup, is still challenging—one needs to take frequent interventions to the robot and take control over the robot from early steps leading to a mistake. An arduous burden is, hence, placed on the learning framework design and data acquisition. In this paper, we propose a new learning-from-intervention Dataset Aggregation (DAgger) algorithm to overcome the limitations brought by applying imitation learning to navigation in the pedestrian-rich environment. Our new learning algorithm implements an error backtrack function that is able to effectively learn from expert interventions. Combining our new learning algorithm with deep convolutional neural networks and a hierarchically-nested policy-selection mechanism, we show that our robot is able to map pixels direct to control commands and navigate successfully in real world without explicitly modeling the pedestrian behaviors or the world model. |
Tasks | Imitation Learning |
Published | 2018-11-01 |
URL | http://arxiv.org/abs/1811.00506v1 |
http://arxiv.org/pdf/1811.00506v1.pdf | |
PWC | https://paperswithcode.com/paper/navigation-by-imitation-in-a-pedestrian-rich |
Repo | |
Framework | |
Latent Convolutional Models
Title | Latent Convolutional Models |
Authors | ShahRukh Athar, Evgeny Burnaev, Victor Lempitsky |
Abstract | We present a new latent model of natural images that can be learned on large-scale datasets. The learning process provides a latent embedding for every image in the training dataset, as well as a deep convolutional network that maps the latent space to the image space. After training, the new model provides a strong and universal image prior for a variety of image restoration tasks such as large-hole inpainting, superresolution, and colorization. To model high-resolution natural images, our approach uses latent spaces of very high dimensionality (one to two orders of magnitude higher than previous latent image models). To tackle this high dimensionality, we use latent spaces with a special manifold structure (convolutional manifolds) parameterized by a ConvNet of a certain architecture. In the experiments, we compare the learned latent models with latent models learned by autoencoders, advanced variants of generative adversarial networks, and a strong baseline system using simpler parameterization of the latent space. Our model outperforms the competing approaches over a range of restoration tasks. |
Tasks | Colorization, Image Restoration |
Published | 2018-06-16 |
URL | http://arxiv.org/abs/1806.06284v2 |
http://arxiv.org/pdf/1806.06284v2.pdf | |
PWC | https://paperswithcode.com/paper/latent-convolutional-models |
Repo | |
Framework | |
A Multi-Task Learning & Generation Framework: Valence-Arousal, Action Units & Primary Expressions
Title | A Multi-Task Learning & Generation Framework: Valence-Arousal, Action Units & Primary Expressions |
Authors | Dimitrios Kollias, Stefanos Zafeiriou |
Abstract | Over the past few years many research efforts have been devoted to the field of affect analysis. Various approaches have been proposed for: i) discrete emotion recognition in terms of the primary facial expressions; ii) emotion analysis in terms of facial Action Units (AUs), assuming a fixed expression intensity; iii) dimensional emotion analysis, in terms of valence and arousal (VA). These approaches can only be effective, if they are developed using large, appropriately annotated databases, showing behaviors of people in-the-wild, i.e., in uncontrolled environments. Aff-Wild has been the first, large-scale, in-the-wild database (including around 1,200,000 frames of 300 videos), annotated in terms of VA. In the vast majority of existing emotion databases, their annotation is limited to either primary expressions, or valence-arousal, or action units. In this paper, we first annotate a part (around $234,000$ frames) of the Aff-Wild database in terms of $8$ AUs and another part (around $288,000$ frames) in terms of the $7$ basic emotion categories, so that parts of this database are annotated in terms of VA, as well as AUs, or primary expressions. Then, we set up and tackle multi-task learning for emotion recognition, as well as for facial image generation. Multi-task learning is performed using: i) a deep neural network with shared hidden layers, which learns emotional attributes by exploiting their inter-dependencies; ii) a discriminator of a generative adversarial network (GAN). On the other hand, image generation is implemented through the generator of the GAN. For these two tasks, we carefully design loss functions that fit the examined set-up. Experiments are presented which illustrate the good performance of the proposed approach when applied to the new annotated parts of the Aff-Wild database. |
Tasks | Emotion Recognition, Image Generation, Multi-Task Learning |
Published | 2018-11-11 |
URL | https://arxiv.org/abs/1811.07771v2 |
https://arxiv.org/pdf/1811.07771v2.pdf | |
PWC | https://paperswithcode.com/paper/a-multi-task-learning-generation-framework |
Repo | |
Framework | |
Optical Flow Based Real-time Moving Object Detection in Unconstrained Scenes
Title | Optical Flow Based Real-time Moving Object Detection in Unconstrained Scenes |
Authors | Junjie Huang, Wei Zou, Jiagang Zhu, Zheng Zhu |
Abstract | Real-time moving object detection in unconstrained scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource. In this paper, an optical flow based moving object detection framework is proposed to address this problem. We utilize homography matrixes to online construct a background model in the form of optical flow. When judging out moving foregrounds from scenes, a dual-mode judge mechanism is designed to heighten the system’s adaptation to challenging situations. In experiment part, two evaluation metrics are redefined for more properly reflecting the performance of methods. We quantitatively and qualitatively validate the effectiveness and feasibility of our method with videos in various scene conditions. The experimental results show that our method adapts itself to different situations and outperforms the state-of-the-art methods, indicating the advantages of optical flow based methods. |
Tasks | Object Detection, Optical Flow Estimation |
Published | 2018-07-13 |
URL | http://arxiv.org/abs/1807.04890v1 |
http://arxiv.org/pdf/1807.04890v1.pdf | |
PWC | https://paperswithcode.com/paper/optical-flow-based-real-time-moving-object |
Repo | |
Framework | |
Recasting Gradient-Based Meta-Learning as Hierarchical Bayes
Title | Recasting Gradient-Based Meta-Learning as Hierarchical Bayes |
Authors | Erin Grant, Chelsea Finn, Sergey Levine, Trevor Darrell, Thomas Griffiths |
Abstract | Meta-learning allows an intelligent agent to leverage prior learning episodes as a basis for quickly improving performance on a novel task. Bayesian hierarchical modeling provides a theoretical framework for formalizing meta-learning as inference for a set of parameters that are shared across tasks. Here, we reformulate the model-agnostic meta-learning algorithm (MAML) of Finn et al. (2017) as a method for probabilistic inference in a hierarchical Bayesian model. In contrast to prior methods for meta-learning via hierarchical Bayes, MAML is naturally applicable to complex function approximators through its use of a scalable gradient descent procedure for posterior inference. Furthermore, the identification of MAML as hierarchical Bayes provides a way to understand the algorithm’s operation as a meta-learning procedure, as well as an opportunity to make use of computational strategies for efficient inference. We use this opportunity to propose an improvement to the MAML algorithm that makes use of techniques from approximate inference and curvature estimation. |
Tasks | Meta-Learning |
Published | 2018-01-26 |
URL | http://arxiv.org/abs/1801.08930v1 |
http://arxiv.org/pdf/1801.08930v1.pdf | |
PWC | https://paperswithcode.com/paper/recasting-gradient-based-meta-learning-as |
Repo | |
Framework | |
Dual Objective Approach Using A Convolutional Neural Network for Magnetic Resonance Elastography
Title | Dual Objective Approach Using A Convolutional Neural Network for Magnetic Resonance Elastography |
Authors | Ligin Solamen, Yipeng Shi, Justice Amoh |
Abstract | Traditionally, nonlinear inversion, direct inversion, or wave estimation methods have been used for reconstructing images from MRE displacement data. In this work, we propose a convolutional neural network architecture that can map MRE displacement data directly into elastograms, circumventing the costly and computationally intensive classical approaches. In addition to the mean squared error reconstruction objective, we also introduce a secondary loss inspired by the MRE mechanical models for training the neural network. Our network is demonstrated to be effective for generating MRE images that compare well with equivalents from the nonlinear inversion method. |
Tasks | |
Published | 2018-12-02 |
URL | http://arxiv.org/abs/1812.00441v1 |
http://arxiv.org/pdf/1812.00441v1.pdf | |
PWC | https://paperswithcode.com/paper/dual-objective-approach-using-a-convolutional |
Repo | |
Framework | |
Deep Learning as Feature Encoding for Emotion Recognition
Title | Deep Learning as Feature Encoding for Emotion Recognition |
Authors | Bhalaji Nagarajan, V Ramana Murthy Oruganti |
Abstract | Deep learning is popular as an end-to-end framework extracting the prominent features and performing the classification also. In this paper, we extensively investigate deep networks as an alternate to feature encoding technique of low level descriptors for emotion recognition on the benchmark EmoDB dataset. Fusion performance with such obtained encoded features with other available features is also investigated. Highest performance to date in the literature is observed. |
Tasks | Emotion Recognition |
Published | 2018-10-30 |
URL | http://arxiv.org/abs/1810.12613v2 |
http://arxiv.org/pdf/1810.12613v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-as-feature-encoding-for-emotion |
Repo | |
Framework | |
Convergence of Value Aggregation for Imitation Learning
Title | Convergence of Value Aggregation for Imitation Learning |
Authors | Ching-An Cheng, Byron Boots |
Abstract | Value aggregation is a general framework for solving imitation learning problems. Based on the idea of data aggregation, it generates a policy sequence by iteratively interleaving policy optimization and evaluation in an online learning setting. While the existence of a good policy in the policy sequence can be guaranteed non-asymptotically, little is known about the convergence of the sequence or the performance of the last policy. In this paper, we debunk the common belief that value aggregation always produces a convergent policy sequence with improving performance. Moreover, we identify a critical stability condition for convergence and provide a tight non-asymptotic bound on the performance of the last policy. These new theoretical insights let us stabilize problems with regularization, which removes the inconvenient process of identifying the best policy in the policy sequence in stochastic problems. |
Tasks | Imitation Learning |
Published | 2018-01-22 |
URL | http://arxiv.org/abs/1801.07292v1 |
http://arxiv.org/pdf/1801.07292v1.pdf | |
PWC | https://paperswithcode.com/paper/convergence-of-value-aggregation-for |
Repo | |
Framework | |
SafeRoute: Learning to Navigate Streets Safely in an Urban Environment
Title | SafeRoute: Learning to Navigate Streets Safely in an Urban Environment |
Authors | Sharon Levy, Wenhan Xiong, Elizabeth Belding, William Yang Wang |
Abstract | Recent studies show that 85% of women have changed their traveled route to avoid harassment and assault. Despite this, current mapping tools do not empower users with information to take charge of their personal safety. We propose SafeRoute, a novel solution to the problem of navigating cities and avoiding street harassment and crime. Unlike other street navigation applications, SafeRoute introduces a new type of path generation via deep reinforcement learning. This enables us to successfully optimize for multi-criteria path-finding and incorporate representation learning within our framework. Our agent learns to pick favorable streets to create a safe and short path with a reward function that incorporates safety and efficiency. Given access to recent crime reports in many urban cities, we train our model for experiments in Boston, New York, and San Francisco. We test our model on areas of these cities, specifically the populated downtown regions where tourists and those unfamiliar with the streets walk. We evaluate SafeRoute and successfully improve over state-of-the-art methods by up to 17% in local average distance from crimes while decreasing path length by up to 7%. |
Tasks | Representation Learning |
Published | 2018-11-03 |
URL | http://arxiv.org/abs/1811.01147v1 |
http://arxiv.org/pdf/1811.01147v1.pdf | |
PWC | https://paperswithcode.com/paper/saferoute-learning-to-navigate-streets-safely |
Repo | |
Framework | |
A Structured Model For Action Detection
Title | A Structured Model For Action Detection |
Authors | Yubo Zhang, Pavel Tokmakov, Martial Hebert, Cordelia Schmid |
Abstract | A dominant paradigm for learning-based approaches in computer vision is training generic models, such as ResNet for image recognition, or I3D for video understanding, on large datasets and allowing them to discover the optimal representation for the problem at hand. While this is an obviously attractive approach, it is not applicable in all scenarios. We claim that action detection is one such challenging problem - the models that need to be trained are large, and labeled data is expensive to obtain. To address this limitation, we propose to incorporate domain knowledge into the structure of the model, simplifying optimization. In particular, we augment a standard I3D network with a tracking module to aggregate long term motion patterns, and use a graph convolutional network to reason about interactions between actors and objects. Evaluated on the challenging AVA dataset, the proposed approach improves over the I3D baseline by 5.5% mAP and over the state-of-the-art by 4.8% mAP. |
Tasks | Action Detection, Video Understanding |
Published | 2018-12-09 |
URL | https://arxiv.org/abs/1812.03544v5 |
https://arxiv.org/pdf/1812.03544v5.pdf | |
PWC | https://paperswithcode.com/paper/a-structured-model-for-action-detection |
Repo | |
Framework | |
The Minimax Learning Rate of Normal and Ising Undirected Graphical Models
Title | The Minimax Learning Rate of Normal and Ising Undirected Graphical Models |
Authors | Luc Devroye, Abbas Mehrabian, Tommy Reddad |
Abstract | Let $G$ be an undirected graph with $m$ edges and $d$ vertices. We show that $d$-dimensional Ising models on $G$ can be learned from $n$ i.i.d. samples within expected total variation distance some constant factor of $\min{1, \sqrt{(m + d)/n}}$, and that this rate is optimal. We show that the same rate holds for the class of $d$-dimensional multivariate normal undirected graphical models with respect to $G$. We also identify the optimal rate of $\min{1, \sqrt{m/n}}$ for Ising models with no external magnetic field. |
Tasks | |
Published | 2018-06-18 |
URL | http://arxiv.org/abs/1806.06887v1 |
http://arxiv.org/pdf/1806.06887v1.pdf | |
PWC | https://paperswithcode.com/paper/the-minimax-learning-rate-of-normal-and-ising |
Repo | |
Framework | |
Competitive Analysis System for Theatrical Movie Releases Based on Movie Trailer Deep Video Representation
Title | Competitive Analysis System for Theatrical Movie Releases Based on Movie Trailer Deep Video Representation |
Authors | Miguel Campo, Cheng-Kang Hsieh, Matt Nickens, JJ Espinoza, Abhinav Taliyan, Julie Rieger, Jean Ho, Bettina Sherick |
Abstract | Audience discovery is an important activity at major movie studios. Deep models that use convolutional networks to extract frame-by-frame features of a movie trailer and represent it in a form that is suitable for prediction are now possible thanks to the availability of pre-built feature extractors trained on large image datasets. Using these pre-built feature extractors, we are able to process hundreds of publicly available movie trailers, extract frame-by-frame low level features (e.g., a face, an object, etc) and create video-level representations. We use the video-level representations to train a hybrid Collaborative Filtering model that combines video features with historical movie attendance records. The trained model not only makes accurate attendance and audience prediction for existing movies, but also successfully profiles new movies six to eight months prior to their release. |
Tasks | |
Published | 2018-07-12 |
URL | http://arxiv.org/abs/1807.04465v1 |
http://arxiv.org/pdf/1807.04465v1.pdf | |
PWC | https://paperswithcode.com/paper/competitive-analysis-system-for-theatrical |
Repo | |
Framework | |
LSTM-based Whisper Detection
Title | LSTM-based Whisper Detection |
Authors | Zeynab Raeesy, Kellen Gillespie, Chengyuan Ma, Thomas Drugman, Jiacheng Gu, Roland Maas, Ariya Rastrow, Björn Hoffmeister |
Abstract | This article presents a whisper speech detector in the far-field domain. The proposed system consists of a long-short term memory (LSTM) neural network trained on log-filterbank energy (LFBE) acoustic features. This model is trained and evaluated on recordings of human interactions with voice-controlled, far-field devices in whisper and normal phonation modes. We compare multiple inference approaches for utterance-level classification by examining trajectories of the LSTM posteriors. In addition, we engineer a set of features based on the signal characteristics inherent to whisper speech, and evaluate their effectiveness in further separating whisper from normal speech. A benchmarking of these features using multilayer perceptrons (MLP) and LSTMs suggests that the proposed features, in combination with LFBE features, can help us further improve our classifiers. We prove that, with enough data, the LSTM model is indeed as capable of learning whisper characteristics from LFBE features alone com- pared to a simpler MLP model that uses both LFBE and features engineered for separating whisper and normal speech. In addition, we prove that the LSTM classifiers accuracy can be further improved with the incorporation of the proposed engineered features. |
Tasks | |
Published | 2018-09-20 |
URL | http://arxiv.org/abs/1809.07832v1 |
http://arxiv.org/pdf/1809.07832v1.pdf | |
PWC | https://paperswithcode.com/paper/lstm-based-whisper-detection |
Repo | |
Framework | |
Information-Maximizing Sampling to Promote Tracking-by-Detection
Title | Information-Maximizing Sampling to Promote Tracking-by-Detection |
Authors | Kourosh Meshgi, Maryam Sadat Mirzaei, Shigeyuki Oba |
Abstract | The performance of an adaptive tracking-by-detection algorithm not only depends on the classification and updating processes but also on the sampling. Typically, such trackers select their samples from the vicinity of the last predicted object location, or from its expected location using a pre-defined motion model, which does not exploit the contents of the samples nor the information provided by the classifier. We introduced the idea of most informative sampling, in which the sampler attempts to select samples that trouble the classifier of a discriminative tracker. We then proposed an active discriminative co-tracker that embed an adversarial sampler to increase its robustness against various tracking challenges. Experiments show that our proposed tracker outperforms state-of-the-art trackers on various benchmark videos. |
Tasks | |
Published | 2018-06-07 |
URL | http://arxiv.org/abs/1806.02523v1 |
http://arxiv.org/pdf/1806.02523v1.pdf | |
PWC | https://paperswithcode.com/paper/information-maximizing-sampling-to-promote |
Repo | |
Framework | |
Beyond the One Step Greedy Approach in Reinforcement Learning
Title | Beyond the One Step Greedy Approach in Reinforcement Learning |
Authors | Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor |
Abstract | The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, $n$-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms are, in fact, instances of our framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study. |
Tasks | |
Published | 2018-02-10 |
URL | http://arxiv.org/abs/1802.03654v3 |
http://arxiv.org/pdf/1802.03654v3.pdf | |
PWC | https://paperswithcode.com/paper/beyond-the-one-step-greedy-approach-in-1 |
Repo | |
Framework | |