May 7, 2019

3017 words 15 mins read

Paper Group ANR 119

Stochastic Gradient Descent in Continuous Time. A Sub-Quadratic Exact Medoid Algorithm. Matching models across abstraction levels with Gaussian Processes. Towards Label Imbalance in Multi-label Classification with Many Labels. Areas of Attention for Image Captioning. Multi-Class Multi-Object Tracking using Changing Point Detection. A Communication- …

Stochastic Gradient Descent in Continuous Time


Title	Stochastic Gradient Descent in Continuous Time
Authors	Justin Sirignano, Konstantinos Spiliopoulos
Abstract	Stochastic gradient descent in continuous time (SGDCT) provides a computationally efficient method for the statistical learning of continuous-time models, which are widely used in science, engineering, and finance. The SGDCT algorithm follows a (noisy) descent direction along a continuous stream of data. SGDCT performs an online parameter update in continuous time, with the parameter updates $\theta_t$ satisfying a stochastic differential equation. We prove that $\lim_{t \rightarrow \infty} \nabla \bar g(\theta_t) = 0$ where $\bar g$ is a natural objective function for the estimation of the continuous-time dynamics. The convergence proof leverages ergodicity by using an appropriate Poisson equation to help describe the evolution of the parameters for large times. SGDCT can also be used to solve continuous-time optimization problems, such as American options. For certain continuous-time problems, SGDCT has some promising advantages compared to a traditional stochastic gradient descent algorithm. As an example application, SGDCT is combined with a deep neural network to price high-dimensional American options (up to 100 dimensions).
Tasks
Published	2016-11-17
URL	http://arxiv.org/abs/1611.05545v4
PDF	http://arxiv.org/pdf/1611.05545v4.pdf
PWC	https://paperswithcode.com/paper/stochastic-gradient-descent-in-continuous
Repo
Framework

A Sub-Quadratic Exact Medoid Algorithm


Title	A Sub-Quadratic Exact Medoid Algorithm
Authors	James Newling, François Fleuret
Abstract	We present a new algorithm, trimed, for obtaining the medoid of a set, that is the element of the set which minimises the mean distance to all other elements. The algorithm is shown to have, under certain assumptions, expected run time O(N^(3/2)) in R^d where N is the set size, making it the first sub-quadratic exact medoid algorithm for d>1. Experiments show that it performs very well on spatial network data, frequently requiring two orders of magnitude fewer distance calculations than state-of-the-art approximate algorithms. As an application, we show how trimed can be used as a component in an accelerated K-medoids algorithm, and then how it can be relaxed to obtain further computational gains with only a minor loss in cluster quality.
Tasks
Published	2016-05-23
URL	http://arxiv.org/abs/1605.06950v4
PDF	http://arxiv.org/pdf/1605.06950v4.pdf
PWC	https://paperswithcode.com/paper/a-sub-quadratic-exact-medoid-algorithm
Repo
Framework

Matching models across abstraction levels with Gaussian Processes


Title	Matching models across abstraction levels with Gaussian Processes
Authors	Giulio Caravagna, Luca Bortolussi, Guido Sanguinetti
Abstract	Biological systems are often modelled at different levels of abstraction depending on the particular aims/resources of a study. Such different models often provide qualitatively concordant predictions over specific parametrisations, but it is generally unclear whether model predictions are quantitatively in agreement, and whether such agreement holds for different parametrisations. Here we present a generally applicable statistical machine learning methodology to automatically reconcile the predictions of different models across abstraction levels. Our approach is based on defining a correction map, a random function which modifies the output of a model in order to match the statistics of the output of a different model of the same system. We use two biological examples to give a proof-of-principle demonstration of the methodology, and discuss its advantages and potential further applications.
Tasks	Gaussian Processes
Published	2016-05-07
URL	http://arxiv.org/abs/1605.02190v1
PDF	http://arxiv.org/pdf/1605.02190v1.pdf
PWC	https://paperswithcode.com/paper/matching-models-across-abstraction-levels
Repo
Framework

Towards Label Imbalance in Multi-label Classification with Many Labels


Title	Towards Label Imbalance in Multi-label Classification with Many Labels
Authors	Li Li, Houfeng Wang
Abstract	In multi-label classification, an instance may be associated with a set of labels simultaneously. Recently, the research on multi-label classification has largely shifted its focus to the other end of the spectrum where the number of labels is assumed to be extremely large. The existing works focus on how to design scalable algorithms that offer fast training procedures and have a small memory footprint. However they ignore and even compound another challenge - the label imbalance problem. To address this drawback, we propose a novel Representation-based Multi-label Learning with Sampling (RMLS) approach. To the best of our knowledge, we are the first to tackle the imbalance problem in multi-label classification with many labels. Our experimentations with real-world datasets demonstrate the effectiveness of the proposed approach.
Tasks	Multi-Label Classification, Multi-Label Learning
Published	2016-04-05
URL	http://arxiv.org/abs/1604.01304v1
PDF	http://arxiv.org/pdf/1604.01304v1.pdf
PWC	https://paperswithcode.com/paper/towards-label-imbalance-in-multi-label
Repo
Framework

Areas of Attention for Image Captioning


Title	Areas of Attention for Image Captioning
Authors	Marco Pedersoli, Thomas Lucas, Cordelia Schmid, Jakob Verbeek
Abstract	We propose “Areas of Attention”, a novel attention-based model for automatic image captioning. Our approach models the dependencies between image regions, caption words, and the state of an RNN language model, using three pairwise interactions. In contrast to previous attention-based approaches that associate image regions only to the RNN state, our method allows a direct association between caption words and image regions. During training these associations are inferred from image-level captions, akin to weakly-supervised object detector training. These associations help to improve captioning by localizing the corresponding regions during testing. We also propose and compare different ways of generating attention areas: CNN activation grids, object proposals, and spatial transformers nets applied in a convolutional fashion. Spatial transformers give the best results. They allow for image specific attention areas, and can be trained jointly with the rest of the network. Our attention mechanism and spatial transformer attention areas together yield state-of-the-art results on the MSCOCO dataset.o meaningful latent semantic structure in the generated captions.
Tasks	Image Captioning, Language Modelling
Published	2016-12-03
URL	http://arxiv.org/abs/1612.01033v2
PDF	http://arxiv.org/pdf/1612.01033v2.pdf
PWC	https://paperswithcode.com/paper/areas-of-attention-for-image-captioning
Repo
Framework

Multi-Class Multi-Object Tracking using Changing Point Detection


Title	Multi-Class Multi-Object Tracking using Changing Point Detection
Authors	Byungjae Lee, Enkhbayar Erdenee, Songguo Jin, Phill Kyu Rhee
Abstract	This paper presents a robust multi-class multi-object tracking (MCMOT) formulated by a Bayesian filtering framework. Multi-object tracking for unlimited object classes is conducted by combining detection responses and changing point detection (CPD) algorithm. The CPD model is used to observe abrupt or abnormal changes due to a drift and an occlusion based spatiotemporal characteristics of track states. The ensemble of convolutional neural network (CNN) based object detector and Lucas-Kanede Tracker (KLT) based motion detector is employed to compute the likelihoods of foreground regions as the detection responses of different object classes. Extensive experiments are performed using lately introduced challenging benchmark videos; ImageNet VID and MOT benchmark dataset. The comparison to state-of-the-art video tracking techniques shows very encouraging results.
Tasks	Multi-Object Tracking, Object Tracking
Published	2016-08-30
URL	http://arxiv.org/abs/1608.08434v1
PDF	http://arxiv.org/pdf/1608.08434v1.pdf
PWC	https://paperswithcode.com/paper/multi-class-multi-object-tracking-using
Repo
Framework

A Communication-Efficient Parallel Method for Group-Lasso


Title	A Communication-Efficient Parallel Method for Group-Lasso
Authors	Binghong Chen, Jun Zhu
Abstract	Group-Lasso (gLasso) identifies important explanatory factors in predicting the response variable by considering the grouping structure over input variables. However, most existing algorithms for gLasso are not scalable to deal with large-scale datasets, which are becoming a norm in many applications. In this paper, we present a divide-and-conquer based parallel algorithm (DC-gLasso) to scale up gLasso in the tasks of regression with grouping structures. DC-gLasso only needs two iterations to collect and aggregate the local estimates on subsets of the data, and is provably correct to recover the true model under certain conditions. We further extend it to deal with overlappings between groups. Empirical results on a wide range of synthetic and real-world datasets show that DC-gLasso can significantly improve the time efficiency without sacrificing regression accuracy.
Tasks
Published	2016-12-07
URL	http://arxiv.org/abs/1612.02222v1
PDF	http://arxiv.org/pdf/1612.02222v1.pdf
PWC	https://paperswithcode.com/paper/a-communication-efficient-parallel-method-for
Repo
Framework

A Non-Parametric Learning Approach to Identify Online Human Trafficking


Title	A Non-Parametric Learning Approach to Identify Online Human Trafficking
Authors	Hamidreza Alvari, Paulo Shakarian, J. E. Kelly Snyder
Abstract	Human trafficking is among the most challenging law enforcement problems which demands persistent fight against from all over the globe. In this study, we leverage readily available data from the website “Backpage”– used for classified advertisement– to discern potential patterns of human trafficking activities which manifest online and identify most likely trafficking related advertisements. Due to the lack of ground truth, we rely on two human analysts –one human trafficking victim survivor and one from law enforcement, for hand-labeling the small portion of the crawled data. We then present a semi-supervised learning approach that is trained on the available labeled and unlabeled data and evaluated on unseen data with further verification of experts.
Tasks
Published	2016-07-29
URL	http://arxiv.org/abs/1607.08691v2
PDF	http://arxiv.org/pdf/1607.08691v2.pdf
PWC	https://paperswithcode.com/paper/a-non-parametric-learning-approach-to
Repo
Framework

Semantic Decomposition and Recognition of Long and Complex Manipulation Action Sequences


Title	Semantic Decomposition and Recognition of Long and Complex Manipulation Action Sequences
Authors	Eren Erdal Aksoy, Adil Orhan, Florentin Woergoetter
Abstract	Understanding continuous human actions is a non-trivial but important problem in computer vision. Although there exists a large corpus of work in the recognition of action sequences, most approaches suffer from problems relating to vast variations in motions, action combinations, and scene contexts. In this paper, we introduce a novel method for semantic segmentation and recognition of long and complex manipulation action tasks, such as “preparing a breakfast” or “making a sandwich”. We represent manipulations with our recently introduced “Semantic Event Chain” (SEC) concept, which captures the underlying spatiotemporal structure of an action invariant to motion, velocity, and scene context. Solely based on the spatiotemporal interactions between manipulated objects and hands in the extracted SEC, the framework automatically parses individual manipulation streams performed either sequentially or concurrently. Using event chains, our method further extracts basic primitive elements of each parsed manipulation. Without requiring any prior object knowledge, the proposed framework can also extract object-like scene entities that exhibit the same role in semantically similar manipulations. We conduct extensive experiments on various recent datasets to validate the robustness of the framework.
Tasks	Semantic Segmentation
Published	2016-10-18
URL	http://arxiv.org/abs/1610.05693v1
PDF	http://arxiv.org/pdf/1610.05693v1.pdf
PWC	https://paperswithcode.com/paper/semantic-decomposition-and-recognition-of
Repo
Framework

Conditional Sparse Linear Regression


Title	Conditional Sparse Linear Regression
Authors	Brendan Juba
Abstract	Machine learning and statistics typically focus on building models that capture the vast majority of the data, possibly ignoring a small subset of data as “noise” or “outliers.” By contrast, here we consider the problem of jointly identifying a significant (but perhaps small) segment of a population in which there is a highly sparse linear regression fit, together with the coefficients for the linear fit. We contend that such tasks are of interest both because the models themselves may be able to achieve better predictions in such special cases, but also because they may aid our understanding of the data. We give algorithms for such problems under the sup norm, when this unknown segment of the population is described by a k-DNF condition and the regression fit is s-sparse for constant k and s. For the variants of this problem when the regression fit is not so sparse or using expected error, we also give a preliminary algorithm and highlight the question as a challenge for future work.
Tasks
Published	2016-08-18
URL	http://arxiv.org/abs/1608.05152v1
PDF	http://arxiv.org/pdf/1608.05152v1.pdf
PWC	https://paperswithcode.com/paper/conditional-sparse-linear-regression
Repo
Framework

An Iterative Deep Learning Framework for Unsupervised Discovery of Speech Features and Linguistic Units with Applications on Spoken Term Detection


Title	An Iterative Deep Learning Framework for Unsupervised Discovery of Speech Features and Linguistic Units with Applications on Spoken Term Detection
Authors	Cheng-Tao Chung, Cheng-Yu Tsai, Hsiang-Hung Lu, Chia-Hsiang Liu, Hung-yi Lee, Lin-shan Lee
Abstract	In this work we aim to discover high quality speech features and linguistic units directly from unlabeled speech data in a zero resource scenario. The results are evaluated using the metrics and corpora proposed in the Zero Resource Speech Challenge organized at Interspeech 2015. A Multi-layered Acoustic Tokenizer (MAT) was proposed for automatic discovery of multiple sets of acoustic tokens from the given corpus. Each acoustic token set is specified by a set of hyperparameters that describe the model configuration. These sets of acoustic tokens carry different characteristics fof the given corpus and the language behind, thus can be mutually reinforced. The multiple sets of token labels are then used as the targets of a Multi-target Deep Neural Network (MDNN) trained on low-level acoustic features. Bottleneck features extracted from the MDNN are then used as the feedback input to the MAT and the MDNN itself in the next iteration. We call this iterative deep learning framework the Multi-layered Acoustic Tokenizing Deep Neural Network (MAT-DNN), which generates both high quality speech features for the Track 1 of the Challenge and acoustic tokens for the Track 2 of the Challenge. In addition, we performed extra experiments on the same corpora on the application of query-by-example spoken term detection. The experimental results showed the iterative deep learning framework of MAT-DNN improved the detection performance due to better underlying speech features and acoustic tokens.
Tasks
Published	2016-02-01
URL	http://arxiv.org/abs/1602.00426v1
PDF	http://arxiv.org/pdf/1602.00426v1.pdf
PWC	https://paperswithcode.com/paper/an-iterative-deep-learning-framework-for
Repo
Framework

DialPort: Connecting the Spoken Dialog Research Community to Real User Data


Title	DialPort: Connecting the Spoken Dialog Research Community to Real User Data
Authors	Tiancheng Zhao, Kyusong Lee, Maxine Eskenazi
Abstract	This paper describes a new spoken dialog portal that connects systems produced by the spoken dialog academic research community and gives them access to real users. We introduce a distributed, multi-modal, multi-agent prototype dialog framework that affords easy integration with various remote resources, ranging from end-to-end dialog systems to external knowledge APIs. To date, the DialPort portal has successfully connected to the multi-domain spoken dialog system at Cambridge University, the NOAA (National Oceanic and Atmospheric Administration) weather API and the Yelp API.
Tasks
Published	2016-06-08
URL	http://arxiv.org/abs/1606.02562v1
PDF	http://arxiv.org/pdf/1606.02562v1.pdf
PWC	https://paperswithcode.com/paper/dialport-connecting-the-spoken-dialog
Repo
Framework

Graph-Guided Banding of the Covariance Matrix


Title	Graph-Guided Banding of the Covariance Matrix
Authors	Jacob Bien
Abstract	Regularization has become a primary tool for developing reliable estimators of the covariance matrix in high-dimensional settings. To curb the curse of dimensionality, numerous methods assume that the population covariance (or inverse covariance) matrix is sparse, while making no particular structural assumptions on the desired pattern of sparsity. A highly-related, yet complementary, literature studies the specific setting in which the measured variables have a known ordering, in which case a banded population matrix is often assumed. While the banded approach is conceptually and computationally easier than asking for “patternless sparsity,” it is only applicable in very specific situations (such as when data are measured over time or one-dimensional space). This work proposes a generalization of the notion of bandedness that greatly expands the range of problems in which banded estimators apply. We develop convex regularizers occupying the broad middle ground between the former approach of “patternless sparsity” and the latter reliance on having a known ordering. Our framework defines bandedness with respect to a known graph on the measured variables. Such a graph is available in diverse situations, and we provide a theoretical, computational, and applied treatment of two new estimators. An R package, called ggb, implements these new methods.
Tasks
Published	2016-06-01
URL	http://arxiv.org/abs/1606.00451v2
PDF	http://arxiv.org/pdf/1606.00451v2.pdf
PWC	https://paperswithcode.com/paper/graph-guided-banding-of-the-covariance-matrix
Repo
Framework

Vertical stratification of forest canopy for segmentation of under-story trees within small-footprint airborne LiDAR point clouds


Title	Vertical stratification of forest canopy for segmentation of under-story trees within small-footprint airborne LiDAR point clouds
Authors	Hamid Hamraz, Marco A. Contreras, Jun Zhang
Abstract	Airborne LiDAR point cloud representing a forest contains 3D data, from which vertical stand structure even of understory layers can be derived. This paper presents a tree segmentation approach for multi-story stands that stratifies the point cloud to canopy layers and segments individual tree crowns within each layer using a digital surface model based tree segmentation method. The novelty of the approach is the stratification procedure that separates the point cloud to an overstory and multiple understory tree canopy layers by analyzing vertical distributions of LiDAR points within overlapping locales. The procedure does not make a priori assumptions about the shape and size of the tree crowns and can, independent of the tree segmentation method, be utilized to vertically stratify tree crowns of forest canopies. We applied the proposed approach to the University of Kentucky Robinson Forest - a natural deciduous forest with complex and highly variable terrain and vegetation structure. The segmentation results showed that using the stratification procedure strongly improved detecting understory trees (from 46% to 68%) at the cost of introducing a fair number of over-segmented understory trees (increased from 1% to 16%), while barely affecting the overall segmentation quality of overstory trees. Results of vertical stratification of the canopy showed that the point density of understory canopy layers were suboptimal for performing a reasonable tree segmentation, suggesting that acquiring denser LiDAR point clouds would allow more improvements in segmenting understory trees. As shown by inspecting correlations of the results with forest structure, the segmentation approach is applicable to a variety of forest types.
Tasks
Published	2016-12-31
URL	http://arxiv.org/abs/1701.00169v4
PDF	http://arxiv.org/pdf/1701.00169v4.pdf
PWC	https://paperswithcode.com/paper/vertical-stratification-of-forest-canopy-for
Repo
Framework

Generating Holistic 3D Scene Abstractions for Text-based Image Retrieval


Title	Generating Holistic 3D Scene Abstractions for Text-based Image Retrieval
Authors	Ang Li, Jin Sun, Joe Yue-Hei Ng, Ruichi Yu, Vlad I. Morariu, Larry S. Davis
Abstract	Spatial relationships between objects provide important information for text-based image retrieval. As users are more likely to describe a scene from a real world perspective, using 3D spatial relationships rather than 2D relationships that assume a particular viewing direction, one of the main challenges is to infer the 3D structure that bridges images with users’ text descriptions. However, direct inference of 3D structure from images requires learning from large scale annotated data. Since interactions between objects can be reduced to a limited set of atomic spatial relations in 3D, we study the possibility of inferring 3D structure from a text description rather than an image, applying physical relation models to synthesize holistic 3D abstract object layouts satisfying the spatial constraints present in a textual description. We present a generic framework for retrieving images from a textual description of a scene by matching images with these generated abstract object layouts. Images are ranked by matching object detection outputs (bounding boxes) to 2D layout candidates (also represented by bounding boxes) which are obtained by projecting the 3D scenes with sampled camera directions. We validate our approach using public indoor scene datasets and show that our method outperforms baselines built upon object occurrence histograms and learned 2D pairwise relations.
Tasks	Image Retrieval, Object Detection
Published	2016-11-28
URL	http://arxiv.org/abs/1611.09392v2
PDF	http://arxiv.org/pdf/1611.09392v2.pdf
PWC	https://paperswithcode.com/paper/generating-holistic-3d-scene-abstractions-for
Repo
Framework