Paper Group ANR 119
Stochastic Gradient Descent in Continuous Time. A Sub-Quadratic Exact Medoid Algorithm. Matching models across abstraction levels with Gaussian Processes. Towards Label Imbalance in Multi-label Classification with Many Labels. Areas of Attention for Image Captioning. Multi-Class Multi-Object Tracking using Changing Point Detection. A Communication- …
Stochastic Gradient Descent in Continuous Time
Title | Stochastic Gradient Descent in Continuous Time |
Authors | Justin Sirignano, Konstantinos Spiliopoulos |
Abstract | Stochastic gradient descent in continuous time (SGDCT) provides a computationally efficient method for the statistical learning of continuous-time models, which are widely used in science, engineering, and finance. The SGDCT algorithm follows a (noisy) descent direction along a continuous stream of data. SGDCT performs an online parameter update in continuous time, with the parameter updates $\theta_t$ satisfying a stochastic differential equation. We prove that $\lim_{t \rightarrow \infty} \nabla \bar g(\theta_t) = 0$ where $\bar g$ is a natural objective function for the estimation of the continuous-time dynamics. The convergence proof leverages ergodicity by using an appropriate Poisson equation to help describe the evolution of the parameters for large times. SGDCT can also be used to solve continuous-time optimization problems, such as American options. For certain continuous-time problems, SGDCT has some promising advantages compared to a traditional stochastic gradient descent algorithm. As an example application, SGDCT is combined with a deep neural network to price high-dimensional American options (up to 100 dimensions). |
Tasks | |
Published | 2016-11-17 |
URL | http://arxiv.org/abs/1611.05545v4 |
http://arxiv.org/pdf/1611.05545v4.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-gradient-descent-in-continuous |
Repo | |
Framework | |
A Sub-Quadratic Exact Medoid Algorithm
Title | A Sub-Quadratic Exact Medoid Algorithm |
Authors | James Newling, François Fleuret |
Abstract | We present a new algorithm, trimed, for obtaining the medoid of a set, that is the element of the set which minimises the mean distance to all other elements. The algorithm is shown to have, under certain assumptions, expected run time O(N^(3/2)) in R^d where N is the set size, making it the first sub-quadratic exact medoid algorithm for d>1. Experiments show that it performs very well on spatial network data, frequently requiring two orders of magnitude fewer distance calculations than state-of-the-art approximate algorithms. As an application, we show how trimed can be used as a component in an accelerated K-medoids algorithm, and then how it can be relaxed to obtain further computational gains with only a minor loss in cluster quality. |
Tasks | |
Published | 2016-05-23 |
URL | http://arxiv.org/abs/1605.06950v4 |
http://arxiv.org/pdf/1605.06950v4.pdf | |
PWC | https://paperswithcode.com/paper/a-sub-quadratic-exact-medoid-algorithm |
Repo | |
Framework | |
Matching models across abstraction levels with Gaussian Processes
Title | Matching models across abstraction levels with Gaussian Processes |
Authors | Giulio Caravagna, Luca Bortolussi, Guido Sanguinetti |
Abstract | Biological systems are often modelled at different levels of abstraction depending on the particular aims/resources of a study. Such different models often provide qualitatively concordant predictions over specific parametrisations, but it is generally unclear whether model predictions are quantitatively in agreement, and whether such agreement holds for different parametrisations. Here we present a generally applicable statistical machine learning methodology to automatically reconcile the predictions of different models across abstraction levels. Our approach is based on defining a correction map, a random function which modifies the output of a model in order to match the statistics of the output of a different model of the same system. We use two biological examples to give a proof-of-principle demonstration of the methodology, and discuss its advantages and potential further applications. |
Tasks | Gaussian Processes |
Published | 2016-05-07 |
URL | http://arxiv.org/abs/1605.02190v1 |
http://arxiv.org/pdf/1605.02190v1.pdf | |
PWC | https://paperswithcode.com/paper/matching-models-across-abstraction-levels |
Repo | |
Framework | |
Towards Label Imbalance in Multi-label Classification with Many Labels
Title | Towards Label Imbalance in Multi-label Classification with Many Labels |
Authors | Li Li, Houfeng Wang |
Abstract | In multi-label classification, an instance may be associated with a set of labels simultaneously. Recently, the research on multi-label classification has largely shifted its focus to the other end of the spectrum where the number of labels is assumed to be extremely large. The existing works focus on how to design scalable algorithms that offer fast training procedures and have a small memory footprint. However they ignore and even compound another challenge - the label imbalance problem. To address this drawback, we propose a novel Representation-based Multi-label Learning with Sampling (RMLS) approach. To the best of our knowledge, we are the first to tackle the imbalance problem in multi-label classification with many labels. Our experimentations with real-world datasets demonstrate the effectiveness of the proposed approach. |
Tasks | Multi-Label Classification, Multi-Label Learning |
Published | 2016-04-05 |
URL | http://arxiv.org/abs/1604.01304v1 |
http://arxiv.org/pdf/1604.01304v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-label-imbalance-in-multi-label |
Repo | |
Framework | |
Areas of Attention for Image Captioning
Title | Areas of Attention for Image Captioning |
Authors | Marco Pedersoli, Thomas Lucas, Cordelia Schmid, Jakob Verbeek |
Abstract | We propose “Areas of Attention”, a novel attention-based model for automatic image captioning. Our approach models the dependencies between image regions, caption words, and the state of an RNN language model, using three pairwise interactions. In contrast to previous attention-based approaches that associate image regions only to the RNN state, our method allows a direct association between caption words and image regions. During training these associations are inferred from image-level captions, akin to weakly-supervised object detector training. These associations help to improve captioning by localizing the corresponding regions during testing. We also propose and compare different ways of generating attention areas: CNN activation grids, object proposals, and spatial transformers nets applied in a convolutional fashion. Spatial transformers give the best results. They allow for image specific attention areas, and can be trained jointly with the rest of the network. Our attention mechanism and spatial transformer attention areas together yield state-of-the-art results on the MSCOCO dataset.o meaningful latent semantic structure in the generated captions. |
Tasks | Image Captioning, Language Modelling |
Published | 2016-12-03 |
URL | http://arxiv.org/abs/1612.01033v2 |
http://arxiv.org/pdf/1612.01033v2.pdf | |
PWC | https://paperswithcode.com/paper/areas-of-attention-for-image-captioning |
Repo | |
Framework | |
Multi-Class Multi-Object Tracking using Changing Point Detection
Title | Multi-Class Multi-Object Tracking using Changing Point Detection |
Authors | Byungjae Lee, Enkhbayar Erdenee, Songguo Jin, Phill Kyu Rhee |
Abstract | This paper presents a robust multi-class multi-object tracking (MCMOT) formulated by a Bayesian filtering framework. Multi-object tracking for unlimited object classes is conducted by combining detection responses and changing point detection (CPD) algorithm. The CPD model is used to observe abrupt or abnormal changes due to a drift and an occlusion based spatiotemporal characteristics of track states. The ensemble of convolutional neural network (CNN) based object detector and Lucas-Kanede Tracker (KLT) based motion detector is employed to compute the likelihoods of foreground regions as the detection responses of different object classes. Extensive experiments are performed using lately introduced challenging benchmark videos; ImageNet VID and MOT benchmark dataset. The comparison to state-of-the-art video tracking techniques shows very encouraging results. |
Tasks | Multi-Object Tracking, Object Tracking |
Published | 2016-08-30 |
URL | http://arxiv.org/abs/1608.08434v1 |
http://arxiv.org/pdf/1608.08434v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-class-multi-object-tracking-using |
Repo | |
Framework | |
A Communication-Efficient Parallel Method for Group-Lasso
Title | A Communication-Efficient Parallel Method for Group-Lasso |
Authors | Binghong Chen, Jun Zhu |
Abstract | Group-Lasso (gLasso) identifies important explanatory factors in predicting the response variable by considering the grouping structure over input variables. However, most existing algorithms for gLasso are not scalable to deal with large-scale datasets, which are becoming a norm in many applications. In this paper, we present a divide-and-conquer based parallel algorithm (DC-gLasso) to scale up gLasso in the tasks of regression with grouping structures. DC-gLasso only needs two iterations to collect and aggregate the local estimates on subsets of the data, and is provably correct to recover the true model under certain conditions. We further extend it to deal with overlappings between groups. Empirical results on a wide range of synthetic and real-world datasets show that DC-gLasso can significantly improve the time efficiency without sacrificing regression accuracy. |
Tasks | |
Published | 2016-12-07 |
URL | http://arxiv.org/abs/1612.02222v1 |
http://arxiv.org/pdf/1612.02222v1.pdf | |
PWC | https://paperswithcode.com/paper/a-communication-efficient-parallel-method-for |
Repo | |
Framework | |
A Non-Parametric Learning Approach to Identify Online Human Trafficking
Title | A Non-Parametric Learning Approach to Identify Online Human Trafficking |
Authors | Hamidreza Alvari, Paulo Shakarian, J. E. Kelly Snyder |
Abstract | Human trafficking is among the most challenging law enforcement problems which demands persistent fight against from all over the globe. In this study, we leverage readily available data from the website “Backpage”– used for classified advertisement– to discern potential patterns of human trafficking activities which manifest online and identify most likely trafficking related advertisements. Due to the lack of ground truth, we rely on two human analysts –one human trafficking victim survivor and one from law enforcement, for hand-labeling the small portion of the crawled data. We then present a semi-supervised learning approach that is trained on the available labeled and unlabeled data and evaluated on unseen data with further verification of experts. |
Tasks | |
Published | 2016-07-29 |
URL | http://arxiv.org/abs/1607.08691v2 |
http://arxiv.org/pdf/1607.08691v2.pdf | |
PWC | https://paperswithcode.com/paper/a-non-parametric-learning-approach-to |
Repo | |
Framework | |
Semantic Decomposition and Recognition of Long and Complex Manipulation Action Sequences
Title | Semantic Decomposition and Recognition of Long and Complex Manipulation Action Sequences |
Authors | Eren Erdal Aksoy, Adil Orhan, Florentin Woergoetter |
Abstract | Understanding continuous human actions is a non-trivial but important problem in computer vision. Although there exists a large corpus of work in the recognition of action sequences, most approaches suffer from problems relating to vast variations in motions, action combinations, and scene contexts. In this paper, we introduce a novel method for semantic segmentation and recognition of long and complex manipulation action tasks, such as “preparing a breakfast” or “making a sandwich”. We represent manipulations with our recently introduced “Semantic Event Chain” (SEC) concept, which captures the underlying spatiotemporal structure of an action invariant to motion, velocity, and scene context. Solely based on the spatiotemporal interactions between manipulated objects and hands in the extracted SEC, the framework automatically parses individual manipulation streams performed either sequentially or concurrently. Using event chains, our method further extracts basic primitive elements of each parsed manipulation. Without requiring any prior object knowledge, the proposed framework can also extract object-like scene entities that exhibit the same role in semantically similar manipulations. We conduct extensive experiments on various recent datasets to validate the robustness of the framework. |
Tasks | Semantic Segmentation |
Published | 2016-10-18 |
URL | http://arxiv.org/abs/1610.05693v1 |
http://arxiv.org/pdf/1610.05693v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-decomposition-and-recognition-of |
Repo | |
Framework | |
Conditional Sparse Linear Regression
Title | Conditional Sparse Linear Regression |
Authors | Brendan Juba |
Abstract | Machine learning and statistics typically focus on building models that capture the vast majority of the data, possibly ignoring a small subset of data as “noise” or “outliers.” By contrast, here we consider the problem of jointly identifying a significant (but perhaps small) segment of a population in which there is a highly sparse linear regression fit, together with the coefficients for the linear fit. We contend that such tasks are of interest both because the models themselves may be able to achieve better predictions in such special cases, but also because they may aid our understanding of the data. We give algorithms for such problems under the sup norm, when this unknown segment of the population is described by a k-DNF condition and the regression fit is s-sparse for constant k and s. For the variants of this problem when the regression fit is not so sparse or using expected error, we also give a preliminary algorithm and highlight the question as a challenge for future work. |
Tasks | |
Published | 2016-08-18 |
URL | http://arxiv.org/abs/1608.05152v1 |
http://arxiv.org/pdf/1608.05152v1.pdf | |
PWC | https://paperswithcode.com/paper/conditional-sparse-linear-regression |
Repo | |
Framework | |
An Iterative Deep Learning Framework for Unsupervised Discovery of Speech Features and Linguistic Units with Applications on Spoken Term Detection
Title | An Iterative Deep Learning Framework for Unsupervised Discovery of Speech Features and Linguistic Units with Applications on Spoken Term Detection |
Authors | Cheng-Tao Chung, Cheng-Yu Tsai, Hsiang-Hung Lu, Chia-Hsiang Liu, Hung-yi Lee, Lin-shan Lee |
Abstract | In this work we aim to discover high quality speech features and linguistic units directly from unlabeled speech data in a zero resource scenario. The results are evaluated using the metrics and corpora proposed in the Zero Resource Speech Challenge organized at Interspeech 2015. A Multi-layered Acoustic Tokenizer (MAT) was proposed for automatic discovery of multiple sets of acoustic tokens from the given corpus. Each acoustic token set is specified by a set of hyperparameters that describe the model configuration. These sets of acoustic tokens carry different characteristics fof the given corpus and the language behind, thus can be mutually reinforced. The multiple sets of token labels are then used as the targets of a Multi-target Deep Neural Network (MDNN) trained on low-level acoustic features. Bottleneck features extracted from the MDNN are then used as the feedback input to the MAT and the MDNN itself in the next iteration. We call this iterative deep learning framework the Multi-layered Acoustic Tokenizing Deep Neural Network (MAT-DNN), which generates both high quality speech features for the Track 1 of the Challenge and acoustic tokens for the Track 2 of the Challenge. In addition, we performed extra experiments on the same corpora on the application of query-by-example spoken term detection. The experimental results showed the iterative deep learning framework of MAT-DNN improved the detection performance due to better underlying speech features and acoustic tokens. |
Tasks | |
Published | 2016-02-01 |
URL | http://arxiv.org/abs/1602.00426v1 |
http://arxiv.org/pdf/1602.00426v1.pdf | |
PWC | https://paperswithcode.com/paper/an-iterative-deep-learning-framework-for |
Repo | |
Framework | |
DialPort: Connecting the Spoken Dialog Research Community to Real User Data
Title | DialPort: Connecting the Spoken Dialog Research Community to Real User Data |
Authors | Tiancheng Zhao, Kyusong Lee, Maxine Eskenazi |
Abstract | This paper describes a new spoken dialog portal that connects systems produced by the spoken dialog academic research community and gives them access to real users. We introduce a distributed, multi-modal, multi-agent prototype dialog framework that affords easy integration with various remote resources, ranging from end-to-end dialog systems to external knowledge APIs. To date, the DialPort portal has successfully connected to the multi-domain spoken dialog system at Cambridge University, the NOAA (National Oceanic and Atmospheric Administration) weather API and the Yelp API. |
Tasks | |
Published | 2016-06-08 |
URL | http://arxiv.org/abs/1606.02562v1 |
http://arxiv.org/pdf/1606.02562v1.pdf | |
PWC | https://paperswithcode.com/paper/dialport-connecting-the-spoken-dialog |
Repo | |
Framework | |
Graph-Guided Banding of the Covariance Matrix
Title | Graph-Guided Banding of the Covariance Matrix |
Authors | Jacob Bien |
Abstract | Regularization has become a primary tool for developing reliable estimators of the covariance matrix in high-dimensional settings. To curb the curse of dimensionality, numerous methods assume that the population covariance (or inverse covariance) matrix is sparse, while making no particular structural assumptions on the desired pattern of sparsity. A highly-related, yet complementary, literature studies the specific setting in which the measured variables have a known ordering, in which case a banded population matrix is often assumed. While the banded approach is conceptually and computationally easier than asking for “patternless sparsity,” it is only applicable in very specific situations (such as when data are measured over time or one-dimensional space). This work proposes a generalization of the notion of bandedness that greatly expands the range of problems in which banded estimators apply. We develop convex regularizers occupying the broad middle ground between the former approach of “patternless sparsity” and the latter reliance on having a known ordering. Our framework defines bandedness with respect to a known graph on the measured variables. Such a graph is available in diverse situations, and we provide a theoretical, computational, and applied treatment of two new estimators. An R package, called ggb, implements these new methods. |
Tasks | |
Published | 2016-06-01 |
URL | http://arxiv.org/abs/1606.00451v2 |
http://arxiv.org/pdf/1606.00451v2.pdf | |
PWC | https://paperswithcode.com/paper/graph-guided-banding-of-the-covariance-matrix |
Repo | |
Framework | |
Vertical stratification of forest canopy for segmentation of under-story trees within small-footprint airborne LiDAR point clouds
Title | Vertical stratification of forest canopy for segmentation of under-story trees within small-footprint airborne LiDAR point clouds |
Authors | Hamid Hamraz, Marco A. Contreras, Jun Zhang |
Abstract | Airborne LiDAR point cloud representing a forest contains 3D data, from which vertical stand structure even of understory layers can be derived. This paper presents a tree segmentation approach for multi-story stands that stratifies the point cloud to canopy layers and segments individual tree crowns within each layer using a digital surface model based tree segmentation method. The novelty of the approach is the stratification procedure that separates the point cloud to an overstory and multiple understory tree canopy layers by analyzing vertical distributions of LiDAR points within overlapping locales. The procedure does not make a priori assumptions about the shape and size of the tree crowns and can, independent of the tree segmentation method, be utilized to vertically stratify tree crowns of forest canopies. We applied the proposed approach to the University of Kentucky Robinson Forest - a natural deciduous forest with complex and highly variable terrain and vegetation structure. The segmentation results showed that using the stratification procedure strongly improved detecting understory trees (from 46% to 68%) at the cost of introducing a fair number of over-segmented understory trees (increased from 1% to 16%), while barely affecting the overall segmentation quality of overstory trees. Results of vertical stratification of the canopy showed that the point density of understory canopy layers were suboptimal for performing a reasonable tree segmentation, suggesting that acquiring denser LiDAR point clouds would allow more improvements in segmenting understory trees. As shown by inspecting correlations of the results with forest structure, the segmentation approach is applicable to a variety of forest types. |
Tasks | |
Published | 2016-12-31 |
URL | http://arxiv.org/abs/1701.00169v4 |
http://arxiv.org/pdf/1701.00169v4.pdf | |
PWC | https://paperswithcode.com/paper/vertical-stratification-of-forest-canopy-for |
Repo | |
Framework | |
Generating Holistic 3D Scene Abstractions for Text-based Image Retrieval
Title | Generating Holistic 3D Scene Abstractions for Text-based Image Retrieval |
Authors | Ang Li, Jin Sun, Joe Yue-Hei Ng, Ruichi Yu, Vlad I. Morariu, Larry S. Davis |
Abstract | Spatial relationships between objects provide important information for text-based image retrieval. As users are more likely to describe a scene from a real world perspective, using 3D spatial relationships rather than 2D relationships that assume a particular viewing direction, one of the main challenges is to infer the 3D structure that bridges images with users’ text descriptions. However, direct inference of 3D structure from images requires learning from large scale annotated data. Since interactions between objects can be reduced to a limited set of atomic spatial relations in 3D, we study the possibility of inferring 3D structure from a text description rather than an image, applying physical relation models to synthesize holistic 3D abstract object layouts satisfying the spatial constraints present in a textual description. We present a generic framework for retrieving images from a textual description of a scene by matching images with these generated abstract object layouts. Images are ranked by matching object detection outputs (bounding boxes) to 2D layout candidates (also represented by bounding boxes) which are obtained by projecting the 3D scenes with sampled camera directions. We validate our approach using public indoor scene datasets and show that our method outperforms baselines built upon object occurrence histograms and learned 2D pairwise relations. |
Tasks | Image Retrieval, Object Detection |
Published | 2016-11-28 |
URL | http://arxiv.org/abs/1611.09392v2 |
http://arxiv.org/pdf/1611.09392v2.pdf | |
PWC | https://paperswithcode.com/paper/generating-holistic-3d-scene-abstractions-for |
Repo | |
Framework | |