July 28, 2019

2944 words 14 mins read

Paper Group ANR 164

DDD17: End-To-End DAVIS Driving Dataset. Solving Almost all Systems of Random Quadratic Equations. Embedded-Graph Theory. Registration and Fusion of Multi-Spectral Images Using a Novel Edge Descriptor. Fashion Forward: Forecasting Visual Style in Fashion. Enhanced Local Binary Patterns for Automatic Face Recognition. Using $k$-way Co-occurrences fo …

DDD17: End-To-End DAVIS Driving Dataset


Title	DDD17: End-To-End DAVIS Driving Dataset
Authors	Jonathan Binas, Daniel Neil, Shih-Chii Liu, Tobi Delbruck
Abstract	Event cameras, such as dynamic vision sensors (DVS), and dynamic and active-pixel vision sensors (DAVIS) can supplement other autonomous driving sensors by providing a concurrent stream of standard active pixel sensor (APS) images and DVS temporal contrast events. The APS stream is a sequence of standard grayscale global-shutter image sensor frames. The DVS events represent brightness changes occurring at a particular moment, with a jitter of about a millisecond under most lighting conditions. They have a dynamic range of >120 dB and effective frame rates >1 kHz at data rates comparable to 30 fps (frames/second) image sensors. To overcome some of the limitations of current image acquisition technology, we investigate in this work the use of the combined DVS and APS streams in end-to-end driving applications. The dataset DDD17 accompanying this paper is the first open dataset of annotated DAVIS driving recordings. DDD17 has over 12 h of a 346x260 pixel DAVIS sensor recording highway and city driving in daytime, evening, night, dry and wet weather conditions, along with vehicle speed, GPS position, driver steering, throttle, and brake captured from the car’s on-board diagnostics interface. As an example application, we performed a preliminary end-to-end learning study of using a convolutional neural network that is trained to predict the instantaneous steering angle from DVS and APS visual data.
Tasks	Autonomous Driving
Published	2017-11-04
URL	http://arxiv.org/abs/1711.01458v1
PDF	http://arxiv.org/pdf/1711.01458v1.pdf
PWC	https://paperswithcode.com/paper/ddd17-end-to-end-davis-driving-dataset
Repo
Framework

Solving Almost all Systems of Random Quadratic Equations


Title	Solving Almost all Systems of Random Quadratic Equations
Authors	Gang Wang, Georgios B. Giannakis, Yousef Saad, Jie Chen
Abstract	This paper deals with finding an $n$-dimensional solution $x$ to a system of quadratic equations of the form $y_i=\langle{a}i,x\rangle^2$ for $1\le i \le m$, which is also known as phase retrieval and is NP-hard in general. We put forth a novel procedure for minimizing the amplitude-based least-squares empirical loss, that starts with a weighted maximal correlation initialization obtainable with a few power or Lanczos iterations, followed by successive refinements based upon a sequence of iteratively reweighted (generalized) gradient iterations. The two (both the initialization and gradient flow) stages distinguish themselves from prior contributions by the inclusion of a fresh (re)weighting regularization technique. The overall algorithm is conceptually simple, numerically scalable, and easy-to-implement. For certain random measurement models, the novel procedure is shown capable of finding the true solution $x$ in time proportional to reading the data ${(a_i;y_i)}{1\le i \le m}$. This holds with high probability and without extra assumption on the signal $x$ to be recovered, provided that the number $m$ of equations is some constant $c>0$ times the number $n$ of unknowns in the signal vector, namely, $m>cn$. Empirically, the upshots of this contribution are: i) (almost) $100%$ perfect signal recovery in the high-dimensional (say e.g., $n\ge 2,000$) regime given only an information-theoretic limit number of noiseless equations, namely, $m=2n-1$ in the real-valued Gaussian case; and, ii) (nearly) optimal statistical accuracy in the presence of additive noise of bounded support. Finally, substantial numerical tests using both synthetic data and real images corroborate markedly improved signal recovery performance and computational efficiency of our novel procedure relative to state-of-the-art approaches.
Tasks
Published	2017-05-29
URL	http://arxiv.org/abs/1705.10407v1
PDF	http://arxiv.org/pdf/1705.10407v1.pdf
PWC	https://paperswithcode.com/paper/solving-almost-all-systems-of-random
Repo
Framework

Embedded-Graph Theory


Title	Embedded-Graph Theory
Authors	Atsushi Yokoyama
Abstract	In this paper, we propose a new type of graph, denoted as “embedded-graph”, and its theory, which employs a distributed representation to describe the relations on the graph edges. Embedded-graphs can express linguistic and complicated relations, which cannot be expressed by the existing edge-graphs or weighted-graphs. We introduce the mathematical definition of embedded-graph, translation, edge distance, and graph similarity. We can transform an embedded-graph into a weighted-graph and a weighted-graph into an edge-graph by the translation method and by threshold calculation, respectively. The edge distance of an embedded-graph is a distance based on the components of a target vector, and it is calculated through cosine similarity with the target vector. The graph similarity is obtained considering the relations with linguistic complexity. In addition, we provide some examples and data structures for embedded-graphs in this paper.
Tasks	Graph Similarity
Published	2017-09-14
URL	http://arxiv.org/abs/1709.04710v1
PDF	http://arxiv.org/pdf/1709.04710v1.pdf
PWC	https://paperswithcode.com/paper/embedded-graph-theory
Repo
Framework

Registration and Fusion of Multi-Spectral Images Using a Novel Edge Descriptor


Title	Registration and Fusion of Multi-Spectral Images Using a Novel Edge Descriptor
Authors	Nati Ofir, Shai Silberstein, Dani Rozenbaum, Yosi Keller, Sharon Duvdevani Bar
Abstract	In this paper we introduce a fully end-to-end approach for multi-spectral image registration and fusion. Our method for fusion combines images from different spectral channels into a single fused image by different approaches for low and high frequency signals. A prerequisite of fusion is a stage of geometric alignment between the spectral bands, commonly referred to as registration. Unfortunately, common methods for image registration of a single spectral channel do not yield reasonable results on images from different modalities. For that end, we introduce a new algorithm for multi-spectral image registration, based on a novel edge descriptor of feature points. Our method achieves an accurate alignment of a level that allows us to further fuse the images. As our experiments show, we produce a high quality of multi-spectral image registration and fusion under many challenging scenarios.
Tasks	Image Registration
Published	2017-11-05
URL	http://arxiv.org/abs/1711.01543v5
PDF	http://arxiv.org/pdf/1711.01543v5.pdf
PWC	https://paperswithcode.com/paper/registration-and-fusion-of-multi-spectral
Repo
Framework

Fashion Forward: Forecasting Visual Style in Fashion


Title	Fashion Forward: Forecasting Visual Style in Fashion
Authors	Ziad Al-Halah, Rainer Stiefelhagen, Kristen Grauman
Abstract	What is the future of fashion? Tackling this question from a data-driven vision perspective, we propose to forecast visual style trends before they occur. We introduce the first approach to predict the future popularity of styles discovered from fashion images in an unsupervised manner. Using these styles as a basis, we train a forecasting model to represent their trends over time. The resulting model can hypothesize new mixtures of styles that will become popular in the future, discover style dynamics (trendy vs. classic), and name the key visual attributes that will dominate tomorrow’s fashion. We demonstrate our idea applied to three datasets encapsulating 80,000 fashion products sold across six years on Amazon. Results indicate that fashion forecasting benefits greatly from visual analysis, much more than textual or meta-data cues surrounding products.
Tasks
Published	2017-05-18
URL	http://arxiv.org/abs/1705.06394v2
PDF	http://arxiv.org/pdf/1705.06394v2.pdf
PWC	https://paperswithcode.com/paper/fashion-forward-forecasting-visual-style-in
Repo
Framework

Enhanced Local Binary Patterns for Automatic Face Recognition


Title	Enhanced Local Binary Patterns for Automatic Face Recognition
Authors	Pavel Král, Ladislav Lenc, Antonín Vrba
Abstract	This paper presents a novel automatic face recognition approach based on local binary patterns. This descriptor considers a local neighbourhood of a pixel to compute the feature vector values. This method is not very robust to handle image noise, variances and different illumination conditions. We address these issues by proposing a novel descriptor which considers more pixels and different neighbourhoods to compute the feature vector values. The proposed method is evaluated on two benchmark corpora, namely UFI and FERET face datasets. We experimentally show that our approach outperforms state-of-the-art methods and is efficient particularly in the real conditions where the above mentioned issues are obvious. We further show that the proposed method handles well one training sample issue and is also robust to the image resolution.
Tasks	Face Recognition
Published	2017-02-10
URL	http://arxiv.org/abs/1702.03349v2
PDF	http://arxiv.org/pdf/1702.03349v2.pdf
PWC	https://paperswithcode.com/paper/enhanced-local-binary-patterns-for-automatic
Repo
Framework

Using $k$-way Co-occurrences for Learning Word Embeddings


Title	Using $k$-way Co-occurrences for Learning Word Embeddings
Authors	Danushka Bollegala, Yuichi Yoshida, Ken-ichi Kawarabayashi
Abstract	Co-occurrences between two words provide useful insights into the semantics of those words. Consequently, numerous prior work on word embedding learning have used co-occurrences between two words as the training signal for learning word embeddings. However, in natural language texts it is common for multiple words to be related and co-occurring in the same context. We extend the notion of co-occurrences to cover $k(\geq!!2)$-way co-occurrences among a set of $k$-words. Specifically, we prove a theoretical relationship between the joint probability of $k(\geq!!2)$ words, and the sum of $\ell_2$ norms of their embeddings. Next, we propose a learning objective motivated by our theoretical result that utilises $k$-way co-occurrences for learning word embeddings. Our experimental results show that the derived theoretical relationship does indeed hold empirically, and despite data sparsity, for some smaller $k$ values, $k$-way embeddings perform comparably or better than $2$-way embeddings in a range of tasks.
Tasks	Learning Word Embeddings, Word Embeddings
Published	2017-09-05
URL	http://arxiv.org/abs/1709.01199v1
PDF	http://arxiv.org/pdf/1709.01199v1.pdf
PWC	https://paperswithcode.com/paper/using-k-way-co-occurrences-for-learning-word
Repo
Framework

Estimation of classrooms occupancy using a multi-layer perceptron


Title	Estimation of classrooms occupancy using a multi-layer perceptron
Authors	Eugénio Rodrigues, Luísa Dias Pereira, Adélio Rodrigues Gaspar, Álvaro Gomes, Manuel Carlos Gameiro da Silva
Abstract	This paper presents a multi-layer perceptron model for the estimation of classrooms number of occupants from sensed indoor environmental data-relative humidity, air temperature, and carbon dioxide concentration. The modelling datasets were collected from two classrooms in the Secondary School of Pombal, Portugal. The number of occupants and occupation periods were obtained from class attendance reports. However, post-class occupancy was unknown and the developed model is used to reconstruct the classrooms occupancy by filling the unreported periods. Different model structure and environment variables combination were tested. The model with best accuracy had as input vector 10 variables of five averaged time intervals of relative humidity and carbon dioxide concentration. The model presented a mean square error of 1.99, coefficient of determination of 0.96 with a significance of p-value < 0.001, and a mean absolute error of 1 occupant. These results show promising estimation capabilities in uncertain indoor environment conditions.
Tasks
Published	2017-02-07
URL	http://arxiv.org/abs/1702.02125v1
PDF	http://arxiv.org/pdf/1702.02125v1.pdf
PWC	https://paperswithcode.com/paper/estimation-of-classrooms-occupancy-using-a
Repo
Framework

A Hybrid Architecture for Multi-Party Conversational Systems


Title	A Hybrid Architecture for Multi-Party Conversational Systems
Authors	Maira Gatti de Bayser, Paulo Cavalin, Renan Souza, Alan Braz, Heloisa Candello, Claudio Pinhanez, Jean-Pierre Briot
Abstract	Multi-party Conversational Systems are systems with natural language interaction between one or more people or systems. From the moment that an utterance is sent to a group, to the moment that it is replied in the group by a member, several activities must be done by the system: utterance understanding, information search, reasoning, among others. In this paper we present the challenges of designing and building multi-party conversational systems, the state of the art, our proposed hybrid architecture using both rules and machine learning and some insights after implementing and evaluating one on the finance domain.
Tasks
Published	2017-05-03
URL	http://arxiv.org/abs/1705.01214v2
PDF	http://arxiv.org/pdf/1705.01214v2.pdf
PWC	https://paperswithcode.com/paper/a-hybrid-architecture-for-multi-party
Repo
Framework

From Plants to Landmarks: Time-invariant Plant Localization that uses Deep Pose Regression in Agricultural Fields


Title	From Plants to Landmarks: Time-invariant Plant Localization that uses Deep Pose Regression in Agricultural Fields
Authors	Florian Kraemer, Alexander Schaefer, Andreas Eitel, Johan Vertens, Wolfram Burgard
Abstract	Agricultural robots are expected to increase yields in a sustainable way and automate precision tasks, such as weeding and plant monitoring. At the same time, they move in a continuously changing, semi-structured field environment, in which features can hardly be found and reproduced at a later time. Challenges for Lidar and visual detection systems stem from the fact that plants can be very small, overlapping and have a steadily changing appearance. Therefore, a popular way to localize vehicles with high accuracy is based on ex- pensive global navigation satellite systems and not on natural landmarks. The contribution of this work is a novel image- based plant localization technique that uses the time-invariant stem emerging point as a reference. Our approach is based on a fully convolutional neural network that learns landmark localization from RGB and NIR image input in an end-to-end manner. The network performs pose regression to generate a plant location likelihood map. Our approach allows us to cope with visual variances of plants both for different species and different growth stages. We achieve high localization accuracies as shown in detailed evaluations of a sugar beet cultivation phase. In experiments with our BoniRob we demonstrate that detections can be robustly reproduced with centimeter accuracy.
Tasks
Published	2017-09-14
URL	http://arxiv.org/abs/1709.04751v1
PDF	http://arxiv.org/pdf/1709.04751v1.pdf
PWC	https://paperswithcode.com/paper/from-plants-to-landmarks-time-invariant-plant
Repo
Framework

Ranking and Selection as Stochastic Control


Title	Ranking and Selection as Stochastic Control
Authors	Yijie Peng, Edwin K. P. Chong, Chun-Hung Chen, Michael C. Fu
Abstract	Under a Bayesian framework, we formulate the fully sequential sampling and selection decision in statistical ranking and selection as a stochastic control problem, and derive the associated Bellman equation. Using value function approximation, we derive an approximately optimal allocation policy. We show that this policy is not only computationally efficient but also possesses both one-step-ahead and asymptotic optimality for independent normal sampling distributions. Moreover, the proposed allocation policy is easily generalizable in the approximate dynamic programming paradigm.
Tasks
Published	2017-10-07
URL	http://arxiv.org/abs/1710.02619v1
PDF	http://arxiv.org/pdf/1710.02619v1.pdf
PWC	https://paperswithcode.com/paper/ranking-and-selection-as-stochastic-control
Repo
Framework

Multitask Learning with CTC and Segmental CRF for Speech Recognition


Title	Multitask Learning with CTC and Segmental CRF for Speech Recognition
Authors	Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith
Abstract	Segmental conditional random fields (SCRFs) and connectionist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models. Both models define a transcription probability by marginalizing decisions about latent segmentation alternatives to derive a sequence probability: the former uses a globally normalized joint model of segment labels and durations, and the latter classifies each frame as either an output symbol or a “continuation” of the previous label. In this paper, we train a recognition model by optimizing an interpolation between the SCRF and CTC losses, where the same recurrent neural network (RNN) encoder is used for feature extraction for both outputs. We find that this multitask objective improves recognition accuracy when decoding with either the SCRF or CTC models. Additionally, we show that CTC can also be used to pretrain the RNN encoder, which improves the convergence rate when learning the joint model.
Tasks	Speech Recognition
Published	2017-02-21
URL	http://arxiv.org/abs/1702.06378v4
PDF	http://arxiv.org/pdf/1702.06378v4.pdf
PWC	https://paperswithcode.com/paper/multitask-learning-with-ctc-and-segmental-crf
Repo
Framework

Competing Bandits: Learning under Competition


Title	Competing Bandits: Learning under Competition
Authors	Yishay Mansour, Aleksandrs Slivkins, Zhiwei Steven Wu
Abstract	Most modern systems strive to learn from interactions with users, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information. We initiate a study of the interplay between exploration and competition–how such systems balance the exploration for learning and the competition for users. Here the users play three distinct roles: they are customers that generate revenue, they are sources of data for learning, and they are self-interested agents which choose among the competing systems. In our model, we consider competition between two multi-armed bandit algorithms faced with the same bandit instance. Users arrive one by one and choose among the two algorithms, so that each algorithm makes progress if and only if it is chosen. We ask whether and to what extent competition incentivizes the adoption of better bandit algorithms. We investigate this issue for several models of user response, as we vary the degree of rationality and competitiveness in the model. Our findings are closely related to the “competition vs. innovation” relationship, a well-studied theme in economics.
Tasks
Published	2017-02-27
URL	http://arxiv.org/abs/1702.08533v2
PDF	http://arxiv.org/pdf/1702.08533v2.pdf
PWC	https://paperswithcode.com/paper/competing-bandits-learning-under-competition
Repo
Framework

Learning RBM with a DC programming Approach


Title	Learning RBM with a DC programming Approach
Authors	Vidyadhar Upadhya, P. S. Sastry
Abstract	By exploiting the property that the RBM log-likelihood function is the difference of convex functions, we formulate a stochastic variant of the difference of convex functions (DC) programming to minimize the negative log-likelihood. Interestingly, the traditional contrastive divergence algorithm is a special case of the above formulation and the hyperparameters of the two algorithms can be chosen such that the amount of computation per mini-batch is identical. We show that for a given computational budget the proposed algorithm almost always reaches a higher log-likelihood more rapidly, compared to the standard contrastive divergence algorithm. Further, we modify this algorithm to use the centered gradients and show that it is more efficient and effective compared to the standard centered gradient algorithm on benchmark datasets.
Tasks
Published	2017-09-21
URL	http://arxiv.org/abs/1709.07149v2
PDF	http://arxiv.org/pdf/1709.07149v2.pdf
PWC	https://paperswithcode.com/paper/learning-rbm-with-a-dc-programming-approach
Repo
Framework

Training Deep Networks to be Spatially Sensitive


Title	Training Deep Networks to be Spatially Sensitive
Authors	Nicholas Kolkin, Gregory Shakhnarovich, Eli Shechtman
Abstract	In many computer vision tasks, for example saliency prediction or semantic segmentation, the desired output is a foreground map that predicts pixels where some criteria is satisfied. Despite the inherently spatial nature of this task commonly used learning objectives do not incorporate the spatial relationships between misclassified pixels and the underlying ground truth. The Weighted F-measure, a recently proposed evaluation metric, does reweight errors spatially, and has been shown to closely correlate with human evaluation of quality, and stably rank predictions with respect to noisy ground truths (such as a sloppy human annotator might generate). However it suffers from computational complexity which makes it intractable as an optimization objective for gradient descent, which must be evaluated thousands or millions of times while learning a model’s parameters. We propose a differentiable and efficient approximation of this metric. By incorporating spatial information into the objective we can use a simpler model than competing methods without sacrificing accuracy, resulting in faster inference speeds and alleviating the need for pre/post-processing. We match (or improve) performance on several tasks compared to prior state of the art by traditional metrics, and in many cases significantly improve performance by the weighted F-measure.
Tasks	Saliency Prediction, Semantic Segmentation
Published	2017-08-07
URL	http://arxiv.org/abs/1708.02212v1
PDF	http://arxiv.org/pdf/1708.02212v1.pdf
PWC	https://paperswithcode.com/paper/training-deep-networks-to-be-spatially
Repo
Framework