Paper Group ANR 1326
Weather event severity prediction using buoy data and machine learning. Learning Your Way Without Map or Compass: Panoramic Target Driven Visual Navigation. Measuring Mother-Infant Emotions By Audio Sensing. Aggregated Hold-Out. Improving Visual Feature Extraction in Glacial Environments. Online Learning with Diverse User Preferences. NeoNav: Impro …
Weather event severity prediction using buoy data and machine learning
Title | Weather event severity prediction using buoy data and machine learning |
Authors | Vikas Ramachandra |
Abstract | In this paper, we predict severity of extreme weather events (tropical storms, hurricanes, etc.) using buoy data time series variables such as wind speed and air temperature. The prediction/forecasting method is based on various forecasting and machine learning models. The following steps are used. Data sources for the buoys and weather events are identified, aggregated and merged. For missing data imputation, we use Kalman filters as well as splines for multivariate time series. Then, statistical tests are run to ascertain increasing trends in weather event severity. Next, we use machine learning to predict/forecast event severity using buoy variables, and report good accuracies for the models built. |
Tasks | Imputation, Time Series |
Published | 2019-11-17 |
URL | https://arxiv.org/abs/1911.09001v1 |
https://arxiv.org/pdf/1911.09001v1.pdf | |
PWC | https://paperswithcode.com/paper/weather-event-severity-prediction-using-buoy |
Repo | |
Framework | |
Learning Your Way Without Map or Compass: Panoramic Target Driven Visual Navigation
Title | Learning Your Way Without Map or Compass: Panoramic Target Driven Visual Navigation |
Authors | David Watkins-Valls, Jingxi Xu, Nicholas Waytowich, Peter Allen |
Abstract | We present a robot navigation system that uses an imitation learning framework to successfully navigate in complex environments. Our framework takes a pre-built 3D scan of a real environment and trains an agent from pre-generated expert trajectories to navigate to any position given a panoramic view of the goal and the current visual input without relying on map, compass, odometry, GPS or relative position of the target at runtime. Our end-to-end trained agent uses RGB and depth (RGBD) information and can handle large environments (up to $1031m^2$) across multiple rooms (up to $40$) and generalizes to unseen targets. We show that when compared to several baselines using deep reinforcement learning and RGBD SLAM, our method (1) requires fewer training examples and less training time, (2) reaches the goal location with higher accuracy, (3) produces better solutions with shorter paths for long-range navigation tasks, and (4) generalizes to unseen environments given an RGBD map of the environment. |
Tasks | Imitation Learning, Robot Navigation, Visual Navigation |
Published | 2019-09-20 |
URL | https://arxiv.org/abs/1909.09295v1 |
https://arxiv.org/pdf/1909.09295v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-your-way-without-map-or-compass |
Repo | |
Framework | |
Measuring Mother-Infant Emotions By Audio Sensing
Title | Measuring Mother-Infant Emotions By Audio Sensing |
Authors | Xuewen Yao, Dong He, Tiancheng Jing, Kaya de Barbaro |
Abstract | It has been suggested in developmental psychology literature that the communication of affect between mothers and their infants correlates with the socioemotional and cognitive development of infants. In this study, we obtained day-long audio recordings of 10 mother-infant pairs in order to study their affect communication in speech with a focus on mother’s speech. In order to build a model for speech emotion detection, we used the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and trained a Convolutional Neural Nets model which is able to classify 6 different emotions at 70% accuracy. We applied our model to mother’s speech and found the dominant emotions were angry and sad, which were not true. Based on our own observations, we concluded that emotional speech databases made with the help of actors cannot generalize well to real-life settings, suggesting an active learning or unsupervised approach in the future. |
Tasks | Active Learning |
Published | 2019-12-10 |
URL | https://arxiv.org/abs/1912.05920v1 |
https://arxiv.org/pdf/1912.05920v1.pdf | |
PWC | https://paperswithcode.com/paper/measuring-mother-infant-emotions-by-audio |
Repo | |
Framework | |
Aggregated Hold-Out
Title | Aggregated Hold-Out |
Authors | Guillaume Maillard, Sylvain Arlot, Matthieu Lerasle |
Abstract | Aggregated hold-out (Agghoo) is a method which averages learning rules selected by hold-out (that is, cross-validation with a single split). We provide the first theoretical guarantees on Agghoo, ensuring that it can be used safely: Agghoo performs at worst like the hold-out when the risk is convex. The same holds true in classification with the 0-1 risk, with an additional constant factor. For the hold-out, oracle inequalities are known for bounded losses, as in binary classification. We show that similar results can be proved, under appropriate assumptions, for other risk-minimization problems. In particular, we obtain an oracle inequality for regularized kernel regression with a Lip-schitz loss, without requiring that the Y variable or the regressors be bounded. Numerical experiments show that aggregation brings a significant improvement over the hold-out and that Agghoo is competitive with cross-validation. |
Tasks | |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.04890v1 |
https://arxiv.org/pdf/1909.04890v1.pdf | |
PWC | https://paperswithcode.com/paper/aggregated-hold-out |
Repo | |
Framework | |
Improving Visual Feature Extraction in Glacial Environments
Title | Improving Visual Feature Extraction in Glacial Environments |
Authors | Steven D. Morad, Jeremy Nash, Shoya Higa, Russell Smith, Aaron Parness, Kobus Barnard |
Abstract | Glacial science could benefit tremendously from autonomous robots, but previous glacial robots have had perception issues in these colorless and featureless environments, specifically with visual feature extraction. This translates to failures in visual odometry and visual navigation. Glaciologists use near-infrared imagery to reveal the underlying heterogeneous spatial structure of snow and ice, and we theorize that this hidden near-infrared structure could produce more and higher quality features than available in visible light. We took a custom camera rig to Igloo Cave at Mt. St. Helens to test our theory. The camera rig contains two identical machine vision cameras, one which was outfitted with multiple filters to see only near-infrared light. We extracted features from short video clips taken inside Igloo Cave at Mt. St. Helens, using three popular feature extractors (FAST, SIFT, and SURF). We quantified the number of features and their quality for visual navigation by comparing the resulting orientation estimates to ground truth. Our main contribution is the use of NIR longpass filters to improve the quantity and quality of visual features in icy terrain, irrespective of the feature extractor used. |
Tasks | Visual Navigation, Visual Odometry |
Published | 2019-08-27 |
URL | https://arxiv.org/abs/1908.10425v2 |
https://arxiv.org/pdf/1908.10425v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-visual-feature-extraction-in |
Repo | |
Framework | |
Online Learning with Diverse User Preferences
Title | Online Learning with Diverse User Preferences |
Authors | Chao Gan, Jing Yang, Ruida Zhou, Cong Shen |
Abstract | In this paper, we investigate the impact of diverse user preference on learning under the stochastic multi-armed bandit (MAB) framework. We aim to show that when the user preferences are sufficiently diverse and each arm can be optimal for certain users, the O(log T) regret incurred by exploring the sub-optimal arms under the standard stochastic MAB setting can be reduced to a constant. Our intuition is that to achieve sub-linear regret, the number of times an optimal arm being pulled should scale linearly in time; when all arms are optimal for certain users and pulled frequently, the estimated arm statistics can quickly converge to their true values, thus reducing the need of exploration dramatically. We cast the problem into a stochastic linear bandits model, where both the users preferences and the state of arms are modeled as {independent and identical distributed (i.i.d)} d-dimensional random vectors. After receiving the user preference vector at the beginning of each time slot, the learner pulls an arm and receives a reward as the linear product of the preference vector and the arm state vector. We also assume that the state of the pulled arm is revealed to the learner once its pulled. We propose a Weighted Upper Confidence Bound (W-UCB) algorithm and show that it can achieve a constant regret when the user preferences are sufficiently diverse. The performance of W-UCB under general setups is also completely characterized and validated with synthetic data. |
Tasks | |
Published | 2019-01-23 |
URL | http://arxiv.org/abs/1901.07924v3 |
http://arxiv.org/pdf/1901.07924v3.pdf | |
PWC | https://paperswithcode.com/paper/online-learning-with-diverse-user-preferences |
Repo | |
Framework | |
NeoNav: Improving the Generalization of Visual Navigation via Generating Next Expected Observations
Title | NeoNav: Improving the Generalization of Visual Navigation via Generating Next Expected Observations |
Authors | Qiaoyun Wu, Dinesh Manocha, Jun Wang, Kai Xu |
Abstract | We propose improving the cross-target and cross-scene generalization of visual navigation through learning an agent that is guided by conceiving the next observations it expects to see. This is achieved by learning a variational Bayesian model, called NeoNav, which generates the next expected observations (NEO) conditioned on the current observations of the agent and the target view. Our generative model is learned through optimizing a variational objective encompassing two key designs. First, the latent distribution is conditioned on current observations and the target view, leading to a model-based, target-driven navigation. Second, the latent space is modeled with a Mixture of Gaussians conditioned on the current observation and the next best action. Our use of mixture-of-posteriors prior effectively alleviates the issue of over-regularized latent space, thus significantly boosting the model generalization for new targets and in novel scenes. Moreover, the NEO generation models the forward dynamics of agent-environment interaction, which improves the quality of approximate inference and hence benefits data efficiency. We have conducted extensive evaluations on both real-world and synthetic benchmarks, and show that our model consistently outperforms the state-of-the-art models in terms of success rate, data efficiency, and generalization. |
Tasks | Visual Navigation |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.07207v3 |
https://arxiv.org/pdf/1906.07207v3.pdf | |
PWC | https://paperswithcode.com/paper/visual-navigation-by-generating-next-expected |
Repo | |
Framework | |
Adaptive Navigation Scheme for Optimal Deep-Sea Localization Using Multimodal Perception Cues
Title | Adaptive Navigation Scheme for Optimal Deep-Sea Localization Using Multimodal Perception Cues |
Authors | Arturo Gomez Chavez, Qingwen Xu, Christian A. Mueller, Sören Schwertfeger, Andreas Birk |
Abstract | Underwater robot interventions require a high level of safety and reliability. A major challenge to address is a robust and accurate acquisition of localization estimates, as it is a prerequisite to enable more complex tasks, e.g. floating manipulation and mapping. State-of-the-art navigation in commercial operations, such as oil & gas production (OGP), rely on costly instrumentation. These can be partially replaced or assisted by visual navigation methods, especially in deep-sea scenarios where equipment deployment has high costs and risks. Our work presents a multimodal approach that adapts state-of-the-art methods from on-land robotics, i.e., dense point cloud generation in combination with plane representation and registration, to boost underwater localization performance. A two-stage navigation scheme is proposed that initially generates a coarse probabilistic map of the workspace, which is used to filter noise from computed point clouds and planes in the second stage. Furthermore, an adaptive decision-making approach is introduced that determines which perception cues to incorporate into the localization filter to optimize accuracy and computation performance. Our approach is investigated first in simulation and then validated with data from field trials in OGP monitoring and maintenance scenarios. |
Tasks | Decision Making, Point Cloud Generation, Visual Navigation |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.04888v1 |
https://arxiv.org/pdf/1906.04888v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-navigation-scheme-for-optimal-deep |
Repo | |
Framework | |
Object Detection in Video with Spatial-temporal Context Aggregation
Title | Object Detection in Video with Spatial-temporal Context Aggregation |
Authors | Hao Luo, Lichao Huang, Han Shen, Yuan Li, Chang Huang, Xinggang Wang |
Abstract | Recent cutting-edge feature aggregation paradigms for video object detection rely on inferring feature correspondence. The feature correspondence estimation problem is fundamentally difficult due to poor image quality, motion blur, etc, and the results of feature correspondence estimation are unstable. To avoid the problem, we propose a simple but effective feature aggregation framework which operates on the object proposal-level. It learns to enhance each proposal’s feature via modeling semantic and spatio-temporal relationships among object proposals from both within a frame and across adjacent frames. Experiments are carried out on the ImageNet VID dataset. Without any bells and whistles, our method obtains 80.3% mAP on the ImageNet VID dataset, which is superior over the previous state-of-the-arts. The proposed feature aggregation mechanism improves the single frame Faster RCNN baseline by 5.8% mAP. Besides, under the setting of no temporal post-processing, our method outperforms the previous state-of-the-art by 1.4% mAP. |
Tasks | Object Detection, Video Object Detection |
Published | 2019-07-11 |
URL | https://arxiv.org/abs/1907.04988v1 |
https://arxiv.org/pdf/1907.04988v1.pdf | |
PWC | https://paperswithcode.com/paper/object-detection-in-video-with-spatial |
Repo | |
Framework | |
Sharing Attention Weights for Fast Transformer
Title | Sharing Attention Weights for Fast Transformer |
Authors | Tong Xiao, Yinqiao Li, Jingbo Zhu, Zhengtao Yu, Tongran Liu |
Abstract | Recently, the Transformer machine translation system has shown strong results by stacking attention layers on both the source and target-language sides. But the inference of this model is slow due to the heavy use of dot-product attention in auto-regressive decoding. In this paper we speed up Transformer via a fast and lightweight attention model. More specifically, we share attention weights in adjacent layers and enable the efficient re-use of hidden states in a vertical manner. Moreover, the sharing policy can be jointly learned with the MT model. We test our approach on ten WMT and NIST OpenMT tasks. Experimental results show that it yields an average of 1.3X speed-up (with almost no decrease in BLEU) on top of a state-of-the-art implementation that has already adopted a cache for fast inference. Also, our approach obtains a 1.8X speed-up when it works with the \textsc{Aan} model. This is even 16 times faster than the baseline with no use of the attention cache. |
Tasks | Machine Translation |
Published | 2019-06-26 |
URL | https://arxiv.org/abs/1906.11024v1 |
https://arxiv.org/pdf/1906.11024v1.pdf | |
PWC | https://paperswithcode.com/paper/sharing-attention-weights-for-fast |
Repo | |
Framework | |
Measuring Effectiveness of Video Advertisements
Title | Measuring Effectiveness of Video Advertisements |
Authors | James Hahn, Adriana Kovashka |
Abstract | Advertisements are unavoidable in modern society. Times Square is notorious for its incessant display of advertisements. Its popularity is worldwide and smaller cities possess miniature versions of the display, such as Pittsburgh and its digital works in Oakland on Forbes Avenue. Tokyo’s Ginza district recently rose to popularity due to its upscale shops and constant onslaught of advertisements to pedestrians. Advertisements arise in other mediums as well. For example, they help popular streaming services, such as Spotify, Hulu, and Youtube TV gather significant streams of revenue to reduce the cost of monthly subscriptions for consumers. Ads provide an additional source of money for companies and entire industries to allocate resources toward alternative business motives. They are attractive to companies and nearly unavoidable for consumers. One challenge for advertisers is examining a advertisement’s effectiveness or usefulness in conveying a message to their targeted demographics. Rather than constructing a single, static image of content, a video advertisement possesses hundreds of frames of data with varying scenes, actors, objects, and complexity. Therefore, measuring effectiveness of video advertisements is important to impacting a billion-dollar industry. This paper explores the combination of human-annotated features and common video processing techniques to predict effectiveness ratings of advertisements collected from Youtube. This task is seen as a binary (effective vs. non-effective), four-way, and five-way machine learning classification task. The first findings in terms of accuracy and inference on this dataset, as well as some of the first ad research, on a small dataset are presented. Accuracies of 84%, 65%, and 55% are reached on the binary, four-way, and five-way tasks respectively. |
Tasks | |
Published | 2019-01-15 |
URL | http://arxiv.org/abs/1901.07366v2 |
http://arxiv.org/pdf/1901.07366v2.pdf | |
PWC | https://paperswithcode.com/paper/measuring-effectiveness-of-video |
Repo | |
Framework | |
LBS Autoencoder: Self-supervised Fitting of Articulated Meshes to Point Clouds
Title | LBS Autoencoder: Self-supervised Fitting of Articulated Meshes to Point Clouds |
Authors | Chun-Liang Li, Tomas Simon, Jason Saragih, Barnabás Póczos, Yaser Sheikh |
Abstract | We present LBS-AE; a self-supervised autoencoding algorithm for fitting articulated mesh models to point clouds. As input, we take a sequence of point clouds to be registered as well as an artist-rigged mesh, i.e. a template mesh equipped with a linear-blend skinning (LBS) deformation space parameterized by a skeleton hierarchy. As output, we learn an LBS-based autoencoder that produces registered meshes from the input point clouds. To bridge the gap between the artist-defined geometry and the captured point clouds, our autoencoder models pose-dependent deviations from the template geometry. During training, instead of using explicit correspondences, such as key points or pose supervision, our method leverages LBS deformations to bootstrap the learning process. To avoid poor local minima from erroneous point-to-point correspondences, we utilize a structured Chamfer distance based on part-segmentations, which are learned concurrently using self-supervision. We demonstrate qualitative results on real captured hands, and report quantitative evaluations on the FAUST benchmark for body registration. Our method achieves performance that is superior to other unsupervised approaches and comparable to methods using supervised examples. |
Tasks | |
Published | 2019-04-22 |
URL | http://arxiv.org/abs/1904.10037v1 |
http://arxiv.org/pdf/1904.10037v1.pdf | |
PWC | https://paperswithcode.com/paper/lbs-autoencoder-self-supervised-fitting-of |
Repo | |
Framework | |
Large-Scale Pedestrian Retrieval Competition
Title | Large-Scale Pedestrian Retrieval Competition |
Authors | Da Li, Zhang Zhang |
Abstract | The Large-Scale Pedestrian Retrieval Competition (LSPRC) mainly focuses on person retrieval which is an important end application in intelligent vision system of surveillance. Person retrieval aims at searching the interested target with specific visual attributes or images. The low image quality, various camera viewpoints, large pose variations and occlusions in real scenes make it a challenge problem. By providing large-scale surveillance data in real scene and standard evaluation methods that are closer to real application, the competition aims to improve the robust of related algorithms and further meet the complicated situations in real application. LSPRC includes two kinds of tasks, i.e., Attribute based Pedestrian Retrieval (PR-A) and Re-IDentification (ReID) based Pedestrian Retrieval (PR-ID). The normal evaluation index, i.e., mean Average Precision (mAP), is used to measure the performances of the two tasks under various scale, pose and occlusion. While the method of system evaluation is introduced to evaluate the person retrieval system in which the related algorithms of the two tasks are integrated into a large-scale video parsing platform (named ISEE) combing with algorithm of pedestrian detection. |
Tasks | Pedestrian Detection, Person Retrieval |
Published | 2019-03-06 |
URL | http://arxiv.org/abs/1903.02137v1 |
http://arxiv.org/pdf/1903.02137v1.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-pedestrian-retrieval-competition |
Repo | |
Framework | |
Ab Antiquo: Proto-language Reconstruction with RNNs
Title | Ab Antiquo: Proto-language Reconstruction with RNNs |
Authors | Carlo Meloni, Shauli Ravfogel, Yoav Goldberg |
Abstract | Historical linguists have identified regularities in the process of historic sound change. The comparative method utilizes those regularities to reconstruct proto-words based on observed forms in daughter languages. Can this process be efficiently automated? We address the task of proto-word reconstruction, in which the model is exposed to cognates in contemporary daughter languages, and has to predict the proto word in the ancestor language. We provide a novel dataset for this task, encompassing over 8,000 comparative entries, and show that neural sequence models outperform conventional methods applied to this task so far. Error analysis reveals a variability in the ability of neural model to capture different phonological changes, correlating with the complexity of the changes. Analysis of learned embeddings reveals the models learn phonologically meaningful generalizations, corresponding to well-attested phonological shifts documented by historical linguistics. |
Tasks | |
Published | 2019-08-07 |
URL | https://arxiv.org/abs/1908.02477v1 |
https://arxiv.org/pdf/1908.02477v1.pdf | |
PWC | https://paperswithcode.com/paper/ab-antiquo-proto-language-reconstruction-with |
Repo | |
Framework | |
Laplacian-regularized graph bandits: Algorithms and theoretical analysis
Title | Laplacian-regularized graph bandits: Algorithms and theoretical analysis |
Authors | Kaige Yang, Xiaowen Dong, Laura Toni |
Abstract | We consider a stochastic linear bandit problem with multiple users, where the relationship between users is captured by an underlying graph and user preferences are represented as smooth signals on the graph. We introduce a novel bandit algorithm where the smoothness prior is imposed via the random-walk graph Laplacian, which leads to a single-user cumulative regret scaling as $\tilde{\mathcal{O}}(\Psi d \sqrt{T})$ with time horizon $T$, feature dimensionality $d$, and the scalar parameter $\Psi \in (0,1)$ that depends on the graph connectivity. This is an improvement over $\tilde{\mathcal{O}}(d \sqrt{T})$ in \algo{LinUCB}~\Ccite{li2010contextual}, where user relationship is not taken into account. In terms of network regret (sum of cumulative regret over $n$ users), the proposed algorithm leads to a scaling as $\tilde{\mathcal{O}}(\Psi d\sqrt{nT})$, which is a significant improvement over $\tilde{\mathcal{O}}(nd\sqrt{T})$ in the state-of-the-art algorithm \algo{Gob.Lin} \Ccite{cesa2013gang}. To improve scalability, we further propose a simplified algorithm with a linear computational complexity with respect to the number of users, while maintaining the same regret. Finally, we present a finite-time analysis on the proposed algorithms, and demonstrate their advantage in comparison with state-of-the-art graph-based bandit algorithms on both synthetic and real-world data. |
Tasks | |
Published | 2019-07-12 |
URL | https://arxiv.org/abs/1907.05632v3 |
https://arxiv.org/pdf/1907.05632v3.pdf | |
PWC | https://paperswithcode.com/paper/laplacian-regularized-graph-bandits |
Repo | |
Framework | |