January 27, 2020

3216 words 16 mins read

Paper Group ANR 1326

Weather event severity prediction using buoy data and machine learning. Learning Your Way Without Map or Compass: Panoramic Target Driven Visual Navigation. Measuring Mother-Infant Emotions By Audio Sensing. Aggregated Hold-Out. Improving Visual Feature Extraction in Glacial Environments. Online Learning with Diverse User Preferences. NeoNav: Impro …

Weather event severity prediction using buoy data and machine learning


Title	Weather event severity prediction using buoy data and machine learning
Authors	Vikas Ramachandra
Abstract	In this paper, we predict severity of extreme weather events (tropical storms, hurricanes, etc.) using buoy data time series variables such as wind speed and air temperature. The prediction/forecasting method is based on various forecasting and machine learning models. The following steps are used. Data sources for the buoys and weather events are identified, aggregated and merged. For missing data imputation, we use Kalman filters as well as splines for multivariate time series. Then, statistical tests are run to ascertain increasing trends in weather event severity. Next, we use machine learning to predict/forecast event severity using buoy variables, and report good accuracies for the models built.
Tasks	Imputation, Time Series
Published	2019-11-17
URL	https://arxiv.org/abs/1911.09001v1
PDF	https://arxiv.org/pdf/1911.09001v1.pdf
PWC	https://paperswithcode.com/paper/weather-event-severity-prediction-using-buoy
Repo
Framework


Title	Learning Your Way Without Map or Compass: Panoramic Target Driven Visual Navigation
Authors	David Watkins-Valls, Jingxi Xu, Nicholas Waytowich, Peter Allen
Abstract	We present a robot navigation system that uses an imitation learning framework to successfully navigate in complex environments. Our framework takes a pre-built 3D scan of a real environment and trains an agent from pre-generated expert trajectories to navigate to any position given a panoramic view of the goal and the current visual input without relying on map, compass, odometry, GPS or relative position of the target at runtime. Our end-to-end trained agent uses RGB and depth (RGBD) information and can handle large environments (up to $1031m^2$) across multiple rooms (up to $40$) and generalizes to unseen targets. We show that when compared to several baselines using deep reinforcement learning and RGBD SLAM, our method (1) requires fewer training examples and less training time, (2) reaches the goal location with higher accuracy, (3) produces better solutions with shorter paths for long-range navigation tasks, and (4) generalizes to unseen environments given an RGBD map of the environment.
Tasks	Imitation Learning, Robot Navigation, Visual Navigation
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09295v1
PDF	https://arxiv.org/pdf/1909.09295v1.pdf
PWC	https://paperswithcode.com/paper/learning-your-way-without-map-or-compass
Repo
Framework

Measuring Mother-Infant Emotions By Audio Sensing


Title	Measuring Mother-Infant Emotions By Audio Sensing
Authors	Xuewen Yao, Dong He, Tiancheng Jing, Kaya de Barbaro
Abstract	It has been suggested in developmental psychology literature that the communication of affect between mothers and their infants correlates with the socioemotional and cognitive development of infants. In this study, we obtained day-long audio recordings of 10 mother-infant pairs in order to study their affect communication in speech with a focus on mother’s speech. In order to build a model for speech emotion detection, we used the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and trained a Convolutional Neural Nets model which is able to classify 6 different emotions at 70% accuracy. We applied our model to mother’s speech and found the dominant emotions were angry and sad, which were not true. Based on our own observations, we concluded that emotional speech databases made with the help of actors cannot generalize well to real-life settings, suggesting an active learning or unsupervised approach in the future.
Tasks	Active Learning
Published	2019-12-10
URL	https://arxiv.org/abs/1912.05920v1
PDF	https://arxiv.org/pdf/1912.05920v1.pdf
PWC	https://paperswithcode.com/paper/measuring-mother-infant-emotions-by-audio
Repo
Framework

Aggregated Hold-Out


Title	Aggregated Hold-Out
Authors	Guillaume Maillard, Sylvain Arlot, Matthieu Lerasle
Abstract	Aggregated hold-out (Agghoo) is a method which averages learning rules selected by hold-out (that is, cross-validation with a single split). We provide the first theoretical guarantees on Agghoo, ensuring that it can be used safely: Agghoo performs at worst like the hold-out when the risk is convex. The same holds true in classification with the 0-1 risk, with an additional constant factor. For the hold-out, oracle inequalities are known for bounded losses, as in binary classification. We show that similar results can be proved, under appropriate assumptions, for other risk-minimization problems. In particular, we obtain an oracle inequality for regularized kernel regression with a Lip-schitz loss, without requiring that the Y variable or the regressors be bounded. Numerical experiments show that aggregation brings a significant improvement over the hold-out and that Agghoo is competitive with cross-validation.
Tasks
Published	2019-09-11
URL	https://arxiv.org/abs/1909.04890v1
PDF	https://arxiv.org/pdf/1909.04890v1.pdf
PWC	https://paperswithcode.com/paper/aggregated-hold-out
Repo
Framework

Improving Visual Feature Extraction in Glacial Environments


Title	Improving Visual Feature Extraction in Glacial Environments
Authors	Steven D. Morad, Jeremy Nash, Shoya Higa, Russell Smith, Aaron Parness, Kobus Barnard
Abstract	Glacial science could benefit tremendously from autonomous robots, but previous glacial robots have had perception issues in these colorless and featureless environments, specifically with visual feature extraction. This translates to failures in visual odometry and visual navigation. Glaciologists use near-infrared imagery to reveal the underlying heterogeneous spatial structure of snow and ice, and we theorize that this hidden near-infrared structure could produce more and higher quality features than available in visible light. We took a custom camera rig to Igloo Cave at Mt. St. Helens to test our theory. The camera rig contains two identical machine vision cameras, one which was outfitted with multiple filters to see only near-infrared light. We extracted features from short video clips taken inside Igloo Cave at Mt. St. Helens, using three popular feature extractors (FAST, SIFT, and SURF). We quantified the number of features and their quality for visual navigation by comparing the resulting orientation estimates to ground truth. Our main contribution is the use of NIR longpass filters to improve the quantity and quality of visual features in icy terrain, irrespective of the feature extractor used.
Tasks	Visual Navigation, Visual Odometry
Published	2019-08-27
URL	https://arxiv.org/abs/1908.10425v2
PDF	https://arxiv.org/pdf/1908.10425v2.pdf
PWC	https://paperswithcode.com/paper/improving-visual-feature-extraction-in
Repo
Framework

Online Learning with Diverse User Preferences


Title	Online Learning with Diverse User Preferences
Authors	Chao Gan, Jing Yang, Ruida Zhou, Cong Shen
Abstract	In this paper, we investigate the impact of diverse user preference on learning under the stochastic multi-armed bandit (MAB) framework. We aim to show that when the user preferences are sufficiently diverse and each arm can be optimal for certain users, the O(log T) regret incurred by exploring the sub-optimal arms under the standard stochastic MAB setting can be reduced to a constant. Our intuition is that to achieve sub-linear regret, the number of times an optimal arm being pulled should scale linearly in time; when all arms are optimal for certain users and pulled frequently, the estimated arm statistics can quickly converge to their true values, thus reducing the need of exploration dramatically. We cast the problem into a stochastic linear bandits model, where both the users preferences and the state of arms are modeled as {independent and identical distributed (i.i.d)} d-dimensional random vectors. After receiving the user preference vector at the beginning of each time slot, the learner pulls an arm and receives a reward as the linear product of the preference vector and the arm state vector. We also assume that the state of the pulled arm is revealed to the learner once its pulled. We propose a Weighted Upper Confidence Bound (W-UCB) algorithm and show that it can achieve a constant regret when the user preferences are sufficiently diverse. The performance of W-UCB under general setups is also completely characterized and validated with synthetic data.
Tasks
Published	2019-01-23
URL	http://arxiv.org/abs/1901.07924v3
PDF	http://arxiv.org/pdf/1901.07924v3.pdf
PWC	https://paperswithcode.com/paper/online-learning-with-diverse-user-preferences
Repo
Framework


Title	NeoNav: Improving the Generalization of Visual Navigation via Generating Next Expected Observations
Authors	Qiaoyun Wu, Dinesh Manocha, Jun Wang, Kai Xu
Abstract	We propose improving the cross-target and cross-scene generalization of visual navigation through learning an agent that is guided by conceiving the next observations it expects to see. This is achieved by learning a variational Bayesian model, called NeoNav, which generates the next expected observations (NEO) conditioned on the current observations of the agent and the target view. Our generative model is learned through optimizing a variational objective encompassing two key designs. First, the latent distribution is conditioned on current observations and the target view, leading to a model-based, target-driven navigation. Second, the latent space is modeled with a Mixture of Gaussians conditioned on the current observation and the next best action. Our use of mixture-of-posteriors prior effectively alleviates the issue of over-regularized latent space, thus significantly boosting the model generalization for new targets and in novel scenes. Moreover, the NEO generation models the forward dynamics of agent-environment interaction, which improves the quality of approximate inference and hence benefits data efficiency. We have conducted extensive evaluations on both real-world and synthetic benchmarks, and show that our model consistently outperforms the state-of-the-art models in terms of success rate, data efficiency, and generalization.
Tasks	Visual Navigation
Published	2019-06-17
URL	https://arxiv.org/abs/1906.07207v3
PDF	https://arxiv.org/pdf/1906.07207v3.pdf
PWC	https://paperswithcode.com/paper/visual-navigation-by-generating-next-expected
Repo
Framework


Title	Adaptive Navigation Scheme for Optimal Deep-Sea Localization Using Multimodal Perception Cues
Authors	Arturo Gomez Chavez, Qingwen Xu, Christian A. Mueller, Sören Schwertfeger, Andreas Birk
Abstract	Underwater robot interventions require a high level of safety and reliability. A major challenge to address is a robust and accurate acquisition of localization estimates, as it is a prerequisite to enable more complex tasks, e.g. floating manipulation and mapping. State-of-the-art navigation in commercial operations, such as oil & gas production (OGP), rely on costly instrumentation. These can be partially replaced or assisted by visual navigation methods, especially in deep-sea scenarios where equipment deployment has high costs and risks. Our work presents a multimodal approach that adapts state-of-the-art methods from on-land robotics, i.e., dense point cloud generation in combination with plane representation and registration, to boost underwater localization performance. A two-stage navigation scheme is proposed that initially generates a coarse probabilistic map of the workspace, which is used to filter noise from computed point clouds and planes in the second stage. Furthermore, an adaptive decision-making approach is introduced that determines which perception cues to incorporate into the localization filter to optimize accuracy and computation performance. Our approach is investigated first in simulation and then validated with data from field trials in OGP monitoring and maintenance scenarios.
Tasks	Decision Making, Point Cloud Generation, Visual Navigation
Published	2019-06-12
URL	https://arxiv.org/abs/1906.04888v1
PDF	https://arxiv.org/pdf/1906.04888v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-navigation-scheme-for-optimal-deep
Repo
Framework

Object Detection in Video with Spatial-temporal Context Aggregation


Title	Object Detection in Video with Spatial-temporal Context Aggregation
Authors	Hao Luo, Lichao Huang, Han Shen, Yuan Li, Chang Huang, Xinggang Wang
Abstract	Recent cutting-edge feature aggregation paradigms for video object detection rely on inferring feature correspondence. The feature correspondence estimation problem is fundamentally difficult due to poor image quality, motion blur, etc, and the results of feature correspondence estimation are unstable. To avoid the problem, we propose a simple but effective feature aggregation framework which operates on the object proposal-level. It learns to enhance each proposal’s feature via modeling semantic and spatio-temporal relationships among object proposals from both within a frame and across adjacent frames. Experiments are carried out on the ImageNet VID dataset. Without any bells and whistles, our method obtains 80.3% mAP on the ImageNet VID dataset, which is superior over the previous state-of-the-arts. The proposed feature aggregation mechanism improves the single frame Faster RCNN baseline by 5.8% mAP. Besides, under the setting of no temporal post-processing, our method outperforms the previous state-of-the-art by 1.4% mAP.
Tasks	Object Detection, Video Object Detection
Published	2019-07-11
URL	https://arxiv.org/abs/1907.04988v1
PDF	https://arxiv.org/pdf/1907.04988v1.pdf
PWC	https://paperswithcode.com/paper/object-detection-in-video-with-spatial
Repo
Framework


Title	Sharing Attention Weights for Fast Transformer
Authors	Tong Xiao, Yinqiao Li, Jingbo Zhu, Zhengtao Yu, Tongran Liu
Abstract	Recently, the Transformer machine translation system has shown strong results by stacking attention layers on both the source and target-language sides. But the inference of this model is slow due to the heavy use of dot-product attention in auto-regressive decoding. In this paper we speed up Transformer via a fast and lightweight attention model. More specifically, we share attention weights in adjacent layers and enable the efficient re-use of hidden states in a vertical manner. Moreover, the sharing policy can be jointly learned with the MT model. We test our approach on ten WMT and NIST OpenMT tasks. Experimental results show that it yields an average of 1.3X speed-up (with almost no decrease in BLEU) on top of a state-of-the-art implementation that has already adopted a cache for fast inference. Also, our approach obtains a 1.8X speed-up when it works with the \textsc{Aan} model. This is even 16 times faster than the baseline with no use of the attention cache.
Tasks	Machine Translation
Published	2019-06-26
URL	https://arxiv.org/abs/1906.11024v1
PDF	https://arxiv.org/pdf/1906.11024v1.pdf
PWC	https://paperswithcode.com/paper/sharing-attention-weights-for-fast
Repo
Framework

Measuring Effectiveness of Video Advertisements


Title	Measuring Effectiveness of Video Advertisements
Authors	James Hahn, Adriana Kovashka
Abstract	Advertisements are unavoidable in modern society. Times Square is notorious for its incessant display of advertisements. Its popularity is worldwide and smaller cities possess miniature versions of the display, such as Pittsburgh and its digital works in Oakland on Forbes Avenue. Tokyo’s Ginza district recently rose to popularity due to its upscale shops and constant onslaught of advertisements to pedestrians. Advertisements arise in other mediums as well. For example, they help popular streaming services, such as Spotify, Hulu, and Youtube TV gather significant streams of revenue to reduce the cost of monthly subscriptions for consumers. Ads provide an additional source of money for companies and entire industries to allocate resources toward alternative business motives. They are attractive to companies and nearly unavoidable for consumers. One challenge for advertisers is examining a advertisement’s effectiveness or usefulness in conveying a message to their targeted demographics. Rather than constructing a single, static image of content, a video advertisement possesses hundreds of frames of data with varying scenes, actors, objects, and complexity. Therefore, measuring effectiveness of video advertisements is important to impacting a billion-dollar industry. This paper explores the combination of human-annotated features and common video processing techniques to predict effectiveness ratings of advertisements collected from Youtube. This task is seen as a binary (effective vs. non-effective), four-way, and five-way machine learning classification task. The first findings in terms of accuracy and inference on this dataset, as well as some of the first ad research, on a small dataset are presented. Accuracies of 84%, 65%, and 55% are reached on the binary, four-way, and five-way tasks respectively.
Tasks
Published	2019-01-15
URL	http://arxiv.org/abs/1901.07366v2
PDF	http://arxiv.org/pdf/1901.07366v2.pdf
PWC	https://paperswithcode.com/paper/measuring-effectiveness-of-video
Repo
Framework

LBS Autoencoder: Self-supervised Fitting of Articulated Meshes to Point Clouds


Title	LBS Autoencoder: Self-supervised Fitting of Articulated Meshes to Point Clouds
Authors	Chun-Liang Li, Tomas Simon, Jason Saragih, Barnabás Póczos, Yaser Sheikh
Abstract	We present LBS-AE; a self-supervised autoencoding algorithm for fitting articulated mesh models to point clouds. As input, we take a sequence of point clouds to be registered as well as an artist-rigged mesh, i.e. a template mesh equipped with a linear-blend skinning (LBS) deformation space parameterized by a skeleton hierarchy. As output, we learn an LBS-based autoencoder that produces registered meshes from the input point clouds. To bridge the gap between the artist-defined geometry and the captured point clouds, our autoencoder models pose-dependent deviations from the template geometry. During training, instead of using explicit correspondences, such as key points or pose supervision, our method leverages LBS deformations to bootstrap the learning process. To avoid poor local minima from erroneous point-to-point correspondences, we utilize a structured Chamfer distance based on part-segmentations, which are learned concurrently using self-supervision. We demonstrate qualitative results on real captured hands, and report quantitative evaluations on the FAUST benchmark for body registration. Our method achieves performance that is superior to other unsupervised approaches and comparable to methods using supervised examples.
Tasks
Published	2019-04-22
URL	http://arxiv.org/abs/1904.10037v1
PDF	http://arxiv.org/pdf/1904.10037v1.pdf
PWC	https://paperswithcode.com/paper/lbs-autoencoder-self-supervised-fitting-of
Repo
Framework

Large-Scale Pedestrian Retrieval Competition


Title	Large-Scale Pedestrian Retrieval Competition
Authors	Da Li, Zhang Zhang
Abstract	The Large-Scale Pedestrian Retrieval Competition (LSPRC) mainly focuses on person retrieval which is an important end application in intelligent vision system of surveillance. Person retrieval aims at searching the interested target with specific visual attributes or images. The low image quality, various camera viewpoints, large pose variations and occlusions in real scenes make it a challenge problem. By providing large-scale surveillance data in real scene and standard evaluation methods that are closer to real application, the competition aims to improve the robust of related algorithms and further meet the complicated situations in real application. LSPRC includes two kinds of tasks, i.e., Attribute based Pedestrian Retrieval (PR-A) and Re-IDentification (ReID) based Pedestrian Retrieval (PR-ID). The normal evaluation index, i.e., mean Average Precision (mAP), is used to measure the performances of the two tasks under various scale, pose and occlusion. While the method of system evaluation is introduced to evaluate the person retrieval system in which the related algorithms of the two tasks are integrated into a large-scale video parsing platform (named ISEE) combing with algorithm of pedestrian detection.
Tasks	Pedestrian Detection, Person Retrieval
Published	2019-03-06
URL	http://arxiv.org/abs/1903.02137v1
PDF	http://arxiv.org/pdf/1903.02137v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-pedestrian-retrieval-competition
Repo
Framework

Ab Antiquo: Proto-language Reconstruction with RNNs


Title	Ab Antiquo: Proto-language Reconstruction with RNNs
Authors	Carlo Meloni, Shauli Ravfogel, Yoav Goldberg
Abstract	Historical linguists have identified regularities in the process of historic sound change. The comparative method utilizes those regularities to reconstruct proto-words based on observed forms in daughter languages. Can this process be efficiently automated? We address the task of proto-word reconstruction, in which the model is exposed to cognates in contemporary daughter languages, and has to predict the proto word in the ancestor language. We provide a novel dataset for this task, encompassing over 8,000 comparative entries, and show that neural sequence models outperform conventional methods applied to this task so far. Error analysis reveals a variability in the ability of neural model to capture different phonological changes, correlating with the complexity of the changes. Analysis of learned embeddings reveals the models learn phonologically meaningful generalizations, corresponding to well-attested phonological shifts documented by historical linguistics.
Tasks
Published	2019-08-07
URL	https://arxiv.org/abs/1908.02477v1
PDF	https://arxiv.org/pdf/1908.02477v1.pdf
PWC	https://paperswithcode.com/paper/ab-antiquo-proto-language-reconstruction-with
Repo
Framework

Laplacian-regularized graph bandits: Algorithms and theoretical analysis


Title	Laplacian-regularized graph bandits: Algorithms and theoretical analysis
Authors	Kaige Yang, Xiaowen Dong, Laura Toni
Abstract	We consider a stochastic linear bandit problem with multiple users, where the relationship between users is captured by an underlying graph and user preferences are represented as smooth signals on the graph. We introduce a novel bandit algorithm where the smoothness prior is imposed via the random-walk graph Laplacian, which leads to a single-user cumulative regret scaling as $\tilde{\mathcal{O}}(\Psi d \sqrt{T})$ with time horizon $T$, feature dimensionality $d$, and the scalar parameter $\Psi \in (0,1)$ that depends on the graph connectivity. This is an improvement over $\tilde{\mathcal{O}}(d \sqrt{T})$ in \algo{LinUCB}~\Ccite{li2010contextual}, where user relationship is not taken into account. In terms of network regret (sum of cumulative regret over $n$ users), the proposed algorithm leads to a scaling as $\tilde{\mathcal{O}}(\Psi d\sqrt{nT})$, which is a significant improvement over $\tilde{\mathcal{O}}(nd\sqrt{T})$ in the state-of-the-art algorithm \algo{Gob.Lin} \Ccite{cesa2013gang}. To improve scalability, we further propose a simplified algorithm with a linear computational complexity with respect to the number of users, while maintaining the same regret. Finally, we present a finite-time analysis on the proposed algorithms, and demonstrate their advantage in comparison with state-of-the-art graph-based bandit algorithms on both synthetic and real-world data.
Tasks
Published	2019-07-12
URL	https://arxiv.org/abs/1907.05632v3
PDF	https://arxiv.org/pdf/1907.05632v3.pdf
PWC	https://paperswithcode.com/paper/laplacian-regularized-graph-bandits
Repo
Framework

Paper Group ANR 1326

Weather event severity prediction using buoy data and machine learning

Learning Your Way Without Map or Compass: Panoramic Target Driven Visual Navigation

Measuring Mother-Infant Emotions By Audio Sensing

Aggregated Hold-Out

Improving Visual Feature Extraction in Glacial Environments

Online Learning with Diverse User Preferences

NeoNav: Improving the Generalization of Visual Navigation via Generating Next Expected Observations

Adaptive Navigation Scheme for Optimal Deep-Sea Localization Using Multimodal Perception Cues

Object Detection in Video with Spatial-temporal Context Aggregation

Sharing Attention Weights for Fast Transformer

Measuring Effectiveness of Video Advertisements

LBS Autoencoder: Self-supervised Fitting of Articulated Meshes to Point Clouds

Large-Scale Pedestrian Retrieval Competition

Ab Antiquo: Proto-language Reconstruction with RNNs

Laplacian-regularized graph bandits: Algorithms and theoretical analysis

Paper Group ANR 1085

Paper Group ANR 1174

Paper Group ANR 1197