January 27, 2020

3216 words 16 mins read

Paper Group ANR 1326

Paper Group ANR 1326

Weather event severity prediction using buoy data and machine learning. Learning Your Way Without Map or Compass: Panoramic Target Driven Visual Navigation. Measuring Mother-Infant Emotions By Audio Sensing. Aggregated Hold-Out. Improving Visual Feature Extraction in Glacial Environments. Online Learning with Diverse User Preferences. NeoNav: Impro …

Weather event severity prediction using buoy data and machine learning

Title Weather event severity prediction using buoy data and machine learning
Authors Vikas Ramachandra
Abstract In this paper, we predict severity of extreme weather events (tropical storms, hurricanes, etc.) using buoy data time series variables such as wind speed and air temperature. The prediction/forecasting method is based on various forecasting and machine learning models. The following steps are used. Data sources for the buoys and weather events are identified, aggregated and merged. For missing data imputation, we use Kalman filters as well as splines for multivariate time series. Then, statistical tests are run to ascertain increasing trends in weather event severity. Next, we use machine learning to predict/forecast event severity using buoy variables, and report good accuracies for the models built.
Tasks Imputation, Time Series
Published 2019-11-17
URL https://arxiv.org/abs/1911.09001v1
PDF https://arxiv.org/pdf/1911.09001v1.pdf
PWC https://paperswithcode.com/paper/weather-event-severity-prediction-using-buoy
Repo
Framework

Learning Your Way Without Map or Compass: Panoramic Target Driven Visual Navigation

Title Learning Your Way Without Map or Compass: Panoramic Target Driven Visual Navigation
Authors David Watkins-Valls, Jingxi Xu, Nicholas Waytowich, Peter Allen
Abstract We present a robot navigation system that uses an imitation learning framework to successfully navigate in complex environments. Our framework takes a pre-built 3D scan of a real environment and trains an agent from pre-generated expert trajectories to navigate to any position given a panoramic view of the goal and the current visual input without relying on map, compass, odometry, GPS or relative position of the target at runtime. Our end-to-end trained agent uses RGB and depth (RGBD) information and can handle large environments (up to $1031m^2$) across multiple rooms (up to $40$) and generalizes to unseen targets. We show that when compared to several baselines using deep reinforcement learning and RGBD SLAM, our method (1) requires fewer training examples and less training time, (2) reaches the goal location with higher accuracy, (3) produces better solutions with shorter paths for long-range navigation tasks, and (4) generalizes to unseen environments given an RGBD map of the environment.
Tasks Imitation Learning, Robot Navigation, Visual Navigation
Published 2019-09-20
URL https://arxiv.org/abs/1909.09295v1
PDF https://arxiv.org/pdf/1909.09295v1.pdf
PWC https://paperswithcode.com/paper/learning-your-way-without-map-or-compass
Repo
Framework

Measuring Mother-Infant Emotions By Audio Sensing

Title Measuring Mother-Infant Emotions By Audio Sensing
Authors Xuewen Yao, Dong He, Tiancheng Jing, Kaya de Barbaro
Abstract It has been suggested in developmental psychology literature that the communication of affect between mothers and their infants correlates with the socioemotional and cognitive development of infants. In this study, we obtained day-long audio recordings of 10 mother-infant pairs in order to study their affect communication in speech with a focus on mother’s speech. In order to build a model for speech emotion detection, we used the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and trained a Convolutional Neural Nets model which is able to classify 6 different emotions at 70% accuracy. We applied our model to mother’s speech and found the dominant emotions were angry and sad, which were not true. Based on our own observations, we concluded that emotional speech databases made with the help of actors cannot generalize well to real-life settings, suggesting an active learning or unsupervised approach in the future.
Tasks Active Learning
Published 2019-12-10
URL https://arxiv.org/abs/1912.05920v1
PDF https://arxiv.org/pdf/1912.05920v1.pdf
PWC https://paperswithcode.com/paper/measuring-mother-infant-emotions-by-audio
Repo
Framework

Aggregated Hold-Out

Title Aggregated Hold-Out
Authors Guillaume Maillard, Sylvain Arlot, Matthieu Lerasle
Abstract Aggregated hold-out (Agghoo) is a method which averages learning rules selected by hold-out (that is, cross-validation with a single split). We provide the first theoretical guarantees on Agghoo, ensuring that it can be used safely: Agghoo performs at worst like the hold-out when the risk is convex. The same holds true in classification with the 0-1 risk, with an additional constant factor. For the hold-out, oracle inequalities are known for bounded losses, as in binary classification. We show that similar results can be proved, under appropriate assumptions, for other risk-minimization problems. In particular, we obtain an oracle inequality for regularized kernel regression with a Lip-schitz loss, without requiring that the Y variable or the regressors be bounded. Numerical experiments show that aggregation brings a significant improvement over the hold-out and that Agghoo is competitive with cross-validation.
Tasks
Published 2019-09-11
URL https://arxiv.org/abs/1909.04890v1
PDF https://arxiv.org/pdf/1909.04890v1.pdf
PWC https://paperswithcode.com/paper/aggregated-hold-out
Repo
Framework

Improving Visual Feature Extraction in Glacial Environments

Title Improving Visual Feature Extraction in Glacial Environments
Authors Steven D. Morad, Jeremy Nash, Shoya Higa, Russell Smith, Aaron Parness, Kobus Barnard
Abstract Glacial science could benefit tremendously from autonomous robots, but previous glacial robots have had perception issues in these colorless and featureless environments, specifically with visual feature extraction. This translates to failures in visual odometry and visual navigation. Glaciologists use near-infrared imagery to reveal the underlying heterogeneous spatial structure of snow and ice, and we theorize that this hidden near-infrared structure could produce more and higher quality features than available in visible light. We took a custom camera rig to Igloo Cave at Mt. St. Helens to test our theory. The camera rig contains two identical machine vision cameras, one which was outfitted with multiple filters to see only near-infrared light. We extracted features from short video clips taken inside Igloo Cave at Mt. St. Helens, using three popular feature extractors (FAST, SIFT, and SURF). We quantified the number of features and their quality for visual navigation by comparing the resulting orientation estimates to ground truth. Our main contribution is the use of NIR longpass filters to improve the quantity and quality of visual features in icy terrain, irrespective of the feature extractor used.
Tasks Visual Navigation, Visual Odometry
Published 2019-08-27
URL https://arxiv.org/abs/1908.10425v2
PDF https://arxiv.org/pdf/1908.10425v2.pdf
PWC https://paperswithcode.com/paper/improving-visual-feature-extraction-in
Repo
Framework

Online Learning with Diverse User Preferences

Title Online Learning with Diverse User Preferences
Authors Chao Gan, Jing Yang, Ruida Zhou, Cong Shen
Abstract In this paper, we investigate the impact of diverse user preference on learning under the stochastic multi-armed bandit (MAB) framework. We aim to show that when the user preferences are sufficiently diverse and each arm can be optimal for certain users, the O(log T) regret incurred by exploring the sub-optimal arms under the standard stochastic MAB setting can be reduced to a constant. Our intuition is that to achieve sub-linear regret, the number of times an optimal arm being pulled should scale linearly in time; when all arms are optimal for certain users and pulled frequently, the estimated arm statistics can quickly converge to their true values, thus reducing the need of exploration dramatically. We cast the problem into a stochastic linear bandits model, where both the users preferences and the state of arms are modeled as {independent and identical distributed (i.i.d)} d-dimensional random vectors. After receiving the user preference vector at the beginning of each time slot, the learner pulls an arm and receives a reward as the linear product of the preference vector and the arm state vector. We also assume that the state of the pulled arm is revealed to the learner once its pulled. We propose a Weighted Upper Confidence Bound (W-UCB) algorithm and show that it can achieve a constant regret when the user preferences are sufficiently diverse. The performance of W-UCB under general setups is also completely characterized and validated with synthetic data.
Tasks
Published 2019-01-23
URL http://arxiv.org/abs/1901.07924v3
PDF http://arxiv.org/pdf/1901.07924v3.pdf
PWC https://paperswithcode.com/paper/online-learning-with-diverse-user-preferences
Repo
Framework

NeoNav: Improving the Generalization of Visual Navigation via Generating Next Expected Observations

Title NeoNav: Improving the Generalization of Visual Navigation via Generating Next Expected Observations
Authors Qiaoyun Wu, Dinesh Manocha, Jun Wang, Kai Xu
Abstract We propose improving the cross-target and cross-scene generalization of visual navigation through learning an agent that is guided by conceiving the next observations it expects to see. This is achieved by learning a variational Bayesian model, called NeoNav, which generates the next expected observations (NEO) conditioned on the current observations of the agent and the target view. Our generative model is learned through optimizing a variational objective encompassing two key designs. First, the latent distribution is conditioned on current observations and the target view, leading to a model-based, target-driven navigation. Second, the latent space is modeled with a Mixture of Gaussians conditioned on the current observation and the next best action. Our use of mixture-of-posteriors prior effectively alleviates the issue of over-regularized latent space, thus significantly boosting the model generalization for new targets and in novel scenes. Moreover, the NEO generation models the forward dynamics of agent-environment interaction, which improves the quality of approximate inference and hence benefits data efficiency. We have conducted extensive evaluations on both real-world and synthetic benchmarks, and show that our model consistently outperforms the state-of-the-art models in terms of success rate, data efficiency, and generalization.
Tasks Visual Navigation
Published 2019-06-17
URL https://arxiv.org/abs/1906.07207v3
PDF https://arxiv.org/pdf/1906.07207v3.pdf
PWC https://paperswithcode.com/paper/visual-navigation-by-generating-next-expected
Repo
Framework

Adaptive Navigation Scheme for Optimal Deep-Sea Localization Using Multimodal Perception Cues

Title Adaptive Navigation Scheme for Optimal Deep-Sea Localization Using Multimodal Perception Cues
Authors Arturo Gomez Chavez, Qingwen Xu, Christian A. Mueller, Sören Schwertfeger, Andreas Birk
Abstract Underwater robot interventions require a high level of safety and reliability. A major challenge to address is a robust and accurate acquisition of localization estimates, as it is a prerequisite to enable more complex tasks, e.g. floating manipulation and mapping. State-of-the-art navigation in commercial operations, such as oil & gas production (OGP), rely on costly instrumentation. These can be partially replaced or assisted by visual navigation methods, especially in deep-sea scenarios where equipment deployment has high costs and risks. Our work presents a multimodal approach that adapts state-of-the-art methods from on-land robotics, i.e., dense point cloud generation in combination with plane representation and registration, to boost underwater localization performance. A two-stage navigation scheme is proposed that initially generates a coarse probabilistic map of the workspace, which is used to filter noise from computed point clouds and planes in the second stage. Furthermore, an adaptive decision-making approach is introduced that determines which perception cues to incorporate into the localization filter to optimize accuracy and computation performance. Our approach is investigated first in simulation and then validated with data from field trials in OGP monitoring and maintenance scenarios.
Tasks Decision Making, Point Cloud Generation, Visual Navigation
Published 2019-06-12
URL https://arxiv.org/abs/1906.04888v1
PDF https://arxiv.org/pdf/1906.04888v1.pdf
PWC https://paperswithcode.com/paper/adaptive-navigation-scheme-for-optimal-deep
Repo
Framework

Object Detection in Video with Spatial-temporal Context Aggregation

Title Object Detection in Video with Spatial-temporal Context Aggregation
Authors Hao Luo, Lichao Huang, Han Shen, Yuan Li, Chang Huang, Xinggang Wang
Abstract Recent cutting-edge feature aggregation paradigms for video object detection rely on inferring feature correspondence. The feature correspondence estimation problem is fundamentally difficult due to poor image quality, motion blur, etc, and the results of feature correspondence estimation are unstable. To avoid the problem, we propose a simple but effective feature aggregation framework which operates on the object proposal-level. It learns to enhance each proposal’s feature via modeling semantic and spatio-temporal relationships among object proposals from both within a frame and across adjacent frames. Experiments are carried out on the ImageNet VID dataset. Without any bells and whistles, our method obtains 80.3% mAP on the ImageNet VID dataset, which is superior over the previous state-of-the-arts. The proposed feature aggregation mechanism improves the single frame Faster RCNN baseline by 5.8% mAP. Besides, under the setting of no temporal post-processing, our method outperforms the previous state-of-the-art by 1.4% mAP.
Tasks Object Detection, Video Object Detection
Published 2019-07-11
URL https://arxiv.org/abs/1907.04988v1
PDF https://arxiv.org/pdf/1907.04988v1.pdf
PWC https://paperswithcode.com/paper/object-detection-in-video-with-spatial
Repo
Framework

Sharing Attention Weights for Fast Transformer

Title Sharing Attention Weights for Fast Transformer
Authors Tong Xiao, Yinqiao Li, Jingbo Zhu, Zhengtao Yu, Tongran Liu
Abstract Recently, the Transformer machine translation system has shown strong results by stacking attention layers on both the source and target-language sides. But the inference of this model is slow due to the heavy use of dot-product attention in auto-regressive decoding. In this paper we speed up Transformer via a fast and lightweight attention model. More specifically, we share attention weights in adjacent layers and enable the efficient re-use of hidden states in a vertical manner. Moreover, the sharing policy can be jointly learned with the MT model. We test our approach on ten WMT and NIST OpenMT tasks. Experimental results show that it yields an average of 1.3X speed-up (with almost no decrease in BLEU) on top of a state-of-the-art implementation that has already adopted a cache for fast inference. Also, our approach obtains a 1.8X speed-up when it works with the \textsc{Aan} model. This is even 16 times faster than the baseline with no use of the attention cache.
Tasks Machine Translation
Published 2019-06-26
URL https://arxiv.org/abs/1906.11024v1
PDF https://arxiv.org/pdf/1906.11024v1.pdf
PWC https://paperswithcode.com/paper/sharing-attention-weights-for-fast
Repo
Framework

Measuring Effectiveness of Video Advertisements

Title Measuring Effectiveness of Video Advertisements
Authors James Hahn, Adriana Kovashka
Abstract Advertisements are unavoidable in modern society. Times Square is notorious for its incessant display of advertisements. Its popularity is worldwide and smaller cities possess miniature versions of the display, such as Pittsburgh and its digital works in Oakland on Forbes Avenue. Tokyo’s Ginza district recently rose to popularity due to its upscale shops and constant onslaught of advertisements to pedestrians. Advertisements arise in other mediums as well. For example, they help popular streaming services, such as Spotify, Hulu, and Youtube TV gather significant streams of revenue to reduce the cost of monthly subscriptions for consumers. Ads provide an additional source of money for companies and entire industries to allocate resources toward alternative business motives. They are attractive to companies and nearly unavoidable for consumers. One challenge for advertisers is examining a advertisement’s effectiveness or usefulness in conveying a message to their targeted demographics. Rather than constructing a single, static image of content, a video advertisement possesses hundreds of frames of data with varying scenes, actors, objects, and complexity. Therefore, measuring effectiveness of video advertisements is important to impacting a billion-dollar industry. This paper explores the combination of human-annotated features and common video processing techniques to predict effectiveness ratings of advertisements collected from Youtube. This task is seen as a binary (effective vs. non-effective), four-way, and five-way machine learning classification task. The first findings in terms of accuracy and inference on this dataset, as well as some of the first ad research, on a small dataset are presented. Accuracies of 84%, 65%, and 55% are reached on the binary, four-way, and five-way tasks respectively.
Tasks
Published 2019-01-15
URL http://arxiv.org/abs/1901.07366v2
PDF http://arxiv.org/pdf/1901.07366v2.pdf
PWC https://paperswithcode.com/paper/measuring-effectiveness-of-video
Repo
Framework

LBS Autoencoder: Self-supervised Fitting of Articulated Meshes to Point Clouds

Title LBS Autoencoder: Self-supervised Fitting of Articulated Meshes to Point Clouds
Authors Chun-Liang Li, Tomas Simon, Jason Saragih, Barnabás Póczos, Yaser Sheikh
Abstract We present LBS-AE; a self-supervised autoencoding algorithm for fitting articulated mesh models to point clouds. As input, we take a sequence of point clouds to be registered as well as an artist-rigged mesh, i.e. a template mesh equipped with a linear-blend skinning (LBS) deformation space parameterized by a skeleton hierarchy. As output, we learn an LBS-based autoencoder that produces registered meshes from the input point clouds. To bridge the gap between the artist-defined geometry and the captured point clouds, our autoencoder models pose-dependent deviations from the template geometry. During training, instead of using explicit correspondences, such as key points or pose supervision, our method leverages LBS deformations to bootstrap the learning process. To avoid poor local minima from erroneous point-to-point correspondences, we utilize a structured Chamfer distance based on part-segmentations, which are learned concurrently using self-supervision. We demonstrate qualitative results on real captured hands, and report quantitative evaluations on the FAUST benchmark for body registration. Our method achieves performance that is superior to other unsupervised approaches and comparable to methods using supervised examples.
Tasks
Published 2019-04-22
URL http://arxiv.org/abs/1904.10037v1
PDF http://arxiv.org/pdf/1904.10037v1.pdf
PWC https://paperswithcode.com/paper/lbs-autoencoder-self-supervised-fitting-of
Repo
Framework

Large-Scale Pedestrian Retrieval Competition

Title Large-Scale Pedestrian Retrieval Competition
Authors Da Li, Zhang Zhang
Abstract The Large-Scale Pedestrian Retrieval Competition (LSPRC) mainly focuses on person retrieval which is an important end application in intelligent vision system of surveillance. Person retrieval aims at searching the interested target with specific visual attributes or images. The low image quality, various camera viewpoints, large pose variations and occlusions in real scenes make it a challenge problem. By providing large-scale surveillance data in real scene and standard evaluation methods that are closer to real application, the competition aims to improve the robust of related algorithms and further meet the complicated situations in real application. LSPRC includes two kinds of tasks, i.e., Attribute based Pedestrian Retrieval (PR-A) and Re-IDentification (ReID) based Pedestrian Retrieval (PR-ID). The normal evaluation index, i.e., mean Average Precision (mAP), is used to measure the performances of the two tasks under various scale, pose and occlusion. While the method of system evaluation is introduced to evaluate the person retrieval system in which the related algorithms of the two tasks are integrated into a large-scale video parsing platform (named ISEE) combing with algorithm of pedestrian detection.
Tasks Pedestrian Detection, Person Retrieval
Published 2019-03-06
URL http://arxiv.org/abs/1903.02137v1
PDF http://arxiv.org/pdf/1903.02137v1.pdf
PWC https://paperswithcode.com/paper/large-scale-pedestrian-retrieval-competition
Repo
Framework

Ab Antiquo: Proto-language Reconstruction with RNNs

Title Ab Antiquo: Proto-language Reconstruction with RNNs
Authors Carlo Meloni, Shauli Ravfogel, Yoav Goldberg
Abstract Historical linguists have identified regularities in the process of historic sound change. The comparative method utilizes those regularities to reconstruct proto-words based on observed forms in daughter languages. Can this process be efficiently automated? We address the task of proto-word reconstruction, in which the model is exposed to cognates in contemporary daughter languages, and has to predict the proto word in the ancestor language. We provide a novel dataset for this task, encompassing over 8,000 comparative entries, and show that neural sequence models outperform conventional methods applied to this task so far. Error analysis reveals a variability in the ability of neural model to capture different phonological changes, correlating with the complexity of the changes. Analysis of learned embeddings reveals the models learn phonologically meaningful generalizations, corresponding to well-attested phonological shifts documented by historical linguistics.
Tasks
Published 2019-08-07
URL https://arxiv.org/abs/1908.02477v1
PDF https://arxiv.org/pdf/1908.02477v1.pdf
PWC https://paperswithcode.com/paper/ab-antiquo-proto-language-reconstruction-with
Repo
Framework

Laplacian-regularized graph bandits: Algorithms and theoretical analysis

Title Laplacian-regularized graph bandits: Algorithms and theoretical analysis
Authors Kaige Yang, Xiaowen Dong, Laura Toni
Abstract We consider a stochastic linear bandit problem with multiple users, where the relationship between users is captured by an underlying graph and user preferences are represented as smooth signals on the graph. We introduce a novel bandit algorithm where the smoothness prior is imposed via the random-walk graph Laplacian, which leads to a single-user cumulative regret scaling as $\tilde{\mathcal{O}}(\Psi d \sqrt{T})$ with time horizon $T$, feature dimensionality $d$, and the scalar parameter $\Psi \in (0,1)$ that depends on the graph connectivity. This is an improvement over $\tilde{\mathcal{O}}(d \sqrt{T})$ in \algo{LinUCB}~\Ccite{li2010contextual}, where user relationship is not taken into account. In terms of network regret (sum of cumulative regret over $n$ users), the proposed algorithm leads to a scaling as $\tilde{\mathcal{O}}(\Psi d\sqrt{nT})$, which is a significant improvement over $\tilde{\mathcal{O}}(nd\sqrt{T})$ in the state-of-the-art algorithm \algo{Gob.Lin} \Ccite{cesa2013gang}. To improve scalability, we further propose a simplified algorithm with a linear computational complexity with respect to the number of users, while maintaining the same regret. Finally, we present a finite-time analysis on the proposed algorithms, and demonstrate their advantage in comparison with state-of-the-art graph-based bandit algorithms on both synthetic and real-world data.
Tasks
Published 2019-07-12
URL https://arxiv.org/abs/1907.05632v3
PDF https://arxiv.org/pdf/1907.05632v3.pdf
PWC https://paperswithcode.com/paper/laplacian-regularized-graph-bandits
Repo
Framework
comments powered by Disqus