April 2, 2020

3143 words 15 mins read

Paper Group ANR 246

Paper Group ANR 246

RetinaTrack: Online Single Stage Joint Detection and Tracking. Loss-annealed GAIL for sample efficient and stable Imitation Learning. Large-scale benchmark study of survival prediction methods using multi-omics data. Learning and Testing Variable Partitions. Rényi Entropy Bounds on the Active Learning Cost-Performance Tradeoff. Sequential Transfer …

RetinaTrack: Online Single Stage Joint Detection and Tracking

Title RetinaTrack: Online Single Stage Joint Detection and Tracking
Authors Zhichao Lu, Vivek Rathod, Ronny Votel, Jonathan Huang
Abstract Traditionally multi-object tracking and object detection are performed using separate systems with most prior works focusing exclusively on one of these aspects over the other. Tracking systems clearly benefit from having access to accurate detections, however and there is ample evidence in literature that detectors can benefit from tracking which, for example, can help to smooth predictions over time. In this paper we focus on the tracking-by-detection paradigm for autonomous driving where both tasks are mission critical. We propose a conceptually simple and efficient joint model of detection and tracking, called RetinaTrack, which modifies the popular single stage RetinaNet approach such that it is amenable to instance-level embedding training. We show, via evaluations on the Waymo Open Dataset, that we outperform a recent state of the art tracking algorithm while requiring significantly less computation. We believe that our simple yet effective approach can serve as a strong baseline for future work in this area.
Tasks Autonomous Driving, Multi-Object Tracking, Object Detection, Object Tracking
Published 2020-03-30
URL https://arxiv.org/abs/2003.13870v1
PDF https://arxiv.org/pdf/2003.13870v1.pdf
PWC https://paperswithcode.com/paper/retinatrack-online-single-stage-joint
Repo
Framework

Loss-annealed GAIL for sample efficient and stable Imitation Learning

Title Loss-annealed GAIL for sample efficient and stable Imitation Learning
Authors Rohit Jena, Katia Sycara
Abstract Imitation learning is the problem of learning a policy from an expert policy without access to a reward signal. Often, the expert policy is only available in the form of expert demonstrations. Behavior cloning and GAIL are two popularly used methods for performing imitation learning in this setting. Behavior cloning converges in a few training iterations, but doesn’t reach peak performance and suffers from compounding errors due to its supervised training framework and iid assumption. GAIL attempts to tackle this problem by accounting for the temporal dependencies between states while matching occupancy measures of the expert and the policy. Although GAIL has shown successes in a number of environments, it takes a lot of environment interactions. Given their complementary benefits, existing methods have mentioned trying or tried to combine the two methods, without much success. We look at some of the limitations of existing ideas that try to combine BC and GAIL, and present an algorithm that combines the best of both worlds to enable faster and stable training while not compromising on performance. Our algorithm is embarrassingly simple to implement and seamlessly integrates with different policy gradient algorithms. We demonstrate the effectiveness of the algorithm both in low dimensional control tasks in a limited data setting, and in high dimensional grid world environments.
Tasks Imitation Learning
Published 2020-01-21
URL https://arxiv.org/abs/2001.07798v1
PDF https://arxiv.org/pdf/2001.07798v1.pdf
PWC https://paperswithcode.com/paper/loss-annealed-gail-for-sample-efficient-and
Repo
Framework

Large-scale benchmark study of survival prediction methods using multi-omics data

Title Large-scale benchmark study of survival prediction methods using multi-omics data
Authors Moritz Herrmann, Philipp Probst, Roman Hornung, Vindi Jurinovic, Anne-Laure Boulesteix
Abstract Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables (often in addition to classical clinical variables), are increasingly generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions by means of a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets from the database “The Cancer Genome Atlas”, containing from 35 to 1,000 observations and from 60,000 to 100,000 variables. The considered outcome was the (censored) survival time. Twelve methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan-Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno’s C-index and the integrated Brier-score served as performance metrics. The results show that, although multi-omics data can improve the prediction performance, this is not generally the case. Only the method block forest slightly outperformed the Cox model on average over all datasets. Taking into account the multi-omics structure improves the predictive performance and protects variables in low-dimensional groups - especially clinical variables - from not being included in the model. All analyses are reproducible using freely available R code.
Tasks
Published 2020-03-07
URL https://arxiv.org/abs/2003.03621v1
PDF https://arxiv.org/pdf/2003.03621v1.pdf
PWC https://paperswithcode.com/paper/large-scale-benchmark-study-of-survival
Repo
Framework

Learning and Testing Variable Partitions

Title Learning and Testing Variable Partitions
Authors Andrej Bogdanov, Baoxiang Wang
Abstract $ $Let $F$ be a multivariate function from a product set $\Sigma^n$ to an Abelian group $G$. A $k$-partition of $F$ with cost $\delta$ is a partition of the set of variables $\mathbf{V}$ into $k$ non-empty subsets $(\mathbf{X}_1, \dots, \mathbf{X}_k)$ such that $F(\mathbf{V})$ is $\delta$-close to $F_1(\mathbf{X}_1)+\dots+F_k(\mathbf{X}_k)$ for some $F_1, \dots, F_k$ with respect to a given error metric. We study algorithms for agnostically learning $k$ partitions and testing $k$-partitionability over various groups and error metrics given query access to $F$. In particular we show that $1.$ Given a function that has a $k$-partition of cost $\delta$, a partition of cost $\mathcal{O}(k n^2)(\delta + \epsilon)$ can be learned in time $\tilde{\mathcal{O}}(n^2 \mathrm{poly} (1/\epsilon))$ for any $\epsilon > 0$. In contrast, for $k = 2$ and $n = 3$ learning a partition of cost $\delta + \epsilon$ is NP-hard. $2.$ When $F$ is real-valued and the error metric is the 2-norm, a 2-partition of cost $\sqrt{\delta^2 + \epsilon}$ can be learned in time $\tilde{\mathcal{O}}(n^5/\epsilon^2)$. $3.$ When $F$ is $\mathbb{Z}_q$-valued and the error metric is Hamming weight, $k$-partitionability is testable with one-sided error and $\mathcal{O}(kn^3/\epsilon)$ non-adaptive queries. We also show that even two-sided testers require $\Omega(n)$ queries when $k = 2$. This work was motivated by reinforcement learning control tasks in which the set of control variables can be partitioned. The partitioning reduces the task into multiple lower-dimensional ones that are relatively easier to learn. Our second algorithm empirically increases the scores attained over previous heuristic partitioning methods applied in this context.
Tasks
Published 2020-03-29
URL https://arxiv.org/abs/2003.12990v1
PDF https://arxiv.org/pdf/2003.12990v1.pdf
PWC https://paperswithcode.com/paper/learning-and-testing-variable-partitions
Repo
Framework

Rényi Entropy Bounds on the Active Learning Cost-Performance Tradeoff

Title Rényi Entropy Bounds on the Active Learning Cost-Performance Tradeoff
Authors Vahid Jamali, Antonia Tulino, Jaime Llorca, Elza Erkip
Abstract Semi-supervised classification, one of the most prominent fields in machine learning, studies how to combine the statistical knowledge of the often abundant unlabeled data with the often limited labeled data in order to maximize overall classification accuracy. In this context, the process of actively choosing the data to be labeled is referred to as active learning. In this paper, we initiate the non-asymptotic analysis of the optimal policy for semi-supervised classification with actively obtained labeled data. Considering a general Bayesian classification model, we provide the first characterization of the jointly optimal active learning and semi-supervised classification policy, in terms of the cost-performance tradeoff driven by the label query budget (number of data items to be labeled) and overall classification accuracy. Leveraging recent results on the R'enyi Entropy, we derive tight information-theoretic bounds on such active learning cost-performance tradeoff.
Tasks Active Learning
Published 2020-02-05
URL https://arxiv.org/abs/2002.02025v1
PDF https://arxiv.org/pdf/2002.02025v1.pdf
PWC https://paperswithcode.com/paper/renyi-entropy-bounds-on-the-active-learning
Repo
Framework

Sequential Transfer Machine Learning in Networks: Measuring the Impact of Data and Neural Net Similarity on Transferability

Title Sequential Transfer Machine Learning in Networks: Measuring the Impact of Data and Neural Net Similarity on Transferability
Authors Robin Hirt, Akash Srivastava, Carlos Berg, Niklas Kühl
Abstract In networks of independent entities that face similar predictive tasks, transfer machine learning enables to re-use and improve neural nets using distributed data sets without the exposure of raw data. As the number of data sets in business networks grows and not every neural net transfer is successful, indicators are needed for its impact on the target performance-its transferability. We perform an empirical study on a unique real-world use case comprised of sales data from six different restaurants. We train and transfer neural nets across these restaurant sales data and measure their transferability. Moreover, we calculate potential indicators for transferability based on divergences of data, data projections and a novel metric for neural net similarity. We obtain significant negative correlations between the transferability and the tested indicators. Our findings allow to choose the transfer path based on these indicators, which improves model performance whilst simultaneously requiring fewer model transfers.
Tasks
Published 2020-03-29
URL https://arxiv.org/abs/2003.13070v1
PDF https://arxiv.org/pdf/2003.13070v1.pdf
PWC https://paperswithcode.com/paper/sequential-transfer-machine-learning-in
Repo
Framework

Neural Message Passing on High Order Paths

Title Neural Message Passing on High Order Paths
Authors Daniel Flam-Shepherd, Tony Wu, Pascal Friederich, Alan Aspuru-Guzik
Abstract Graph neural network have achieved impressive results in predicting molecular properties, but they do not directly account for local and hidden structures in the graph such as functional groups and molecular geometry. At each propagation step, GNNs aggregate only over first order neighbours, ignoring important information contained in subsequent neighbours as well as the relationships between those higher order connections. In this work, we generalize graph neural nets to pass messages and aggregate across higher order paths. This allows for information to propagate over various levels and substructures of the graph. We demonstrate our model on a few tasks in molecular property prediction.
Tasks Molecular Property Prediction
Published 2020-02-24
URL https://arxiv.org/abs/2002.10413v1
PDF https://arxiv.org/pdf/2002.10413v1.pdf
PWC https://paperswithcode.com/paper/neural-message-passing-on-high-order-paths
Repo
Framework

Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks

Title Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks
Authors Nicolae-Catalin Ristea, Liviu Cristian Dutu, Anamaria Radoi
Abstract Emotion recognition has become an important field of research in the human-computer interactions domain. The latest advancements in the field show that combining visual with audio information lead to better results if compared to the case of using a single source of information separately. From a visual point of view, a human emotion can be recognized by analyzing the facial expression of the person. More precisely, the human emotion can be described through a combination of several Facial Action Units. In this paper, we propose a system that is able to recognize emotions with a high accuracy rate and in real time, based on deep Convolutional Neural Networks. In order to increase the accuracy of the recognition system, we analyze also the speech data and fuse the information coming from both sources, i.e., visual and audio. Experimental results show the effectiveness of the proposed scheme for emotion recognition and the importance of combining visual with audio data.
Tasks Emotion Recognition
Published 2020-02-29
URL https://arxiv.org/abs/2003.00351v1
PDF https://arxiv.org/pdf/2003.00351v1.pdf
PWC https://paperswithcode.com/paper/emotion-recognition-system-from-speech-and
Repo
Framework

On Biased Compression for Distributed Learning

Title On Biased Compression for Distributed Learning
Authors Aleksandr Beznosikov, Samuel Horváth, Peter Richtárik, Mher Safaryan
Abstract In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact {\em biased} compressors often show superior performance in practice when compared to the much more studied and understood {\em unbiased} compressors, very little is known about them. In this work we study three classes of biased compression operators, two of which are new, and their performance when applied to (stochastic) gradient descent and distributed (stochastic) gradient descent. We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. Our {\em distributed} SGD method enjoys the ergodic rate $\mathcal{O}\left(\frac{\delta L \exp(-K) }{\mu} + \frac{(C + D)}{K\mu}\right)$, where $\delta$ is a compression parameter which grows when more compression is applied, $L$ and $\mu$ are the smoothness and strong convexity constants, $C$ captures stochastic gradient noise ($C=0$ if full gradients are computed on each node) and $D$ captures the variance of the gradients at the optimum ($D=0$ for over-parameterized models). Further, via a theoretical study of several synthetic and empirical distributions of communicated gradients, we shed light on why and by how much biased compressors outperform their unbiased variants. Finally, we propose a new highly performing biased compressor—combination of Top-$k$ and natural dithering—which in our experiments outperforms all other compression techniques.
Tasks
Published 2020-02-27
URL https://arxiv.org/abs/2002.12410v1
PDF https://arxiv.org/pdf/2002.12410v1.pdf
PWC https://paperswithcode.com/paper/on-biased-compression-for-distributed
Repo
Framework

Ambiguity in Sequential Data: Predicting Uncertain Futures with Recurrent Models

Title Ambiguity in Sequential Data: Predicting Uncertain Futures with Recurrent Models
Authors Alessandro Berlati, Oliver Scheel, Luigi Di Stefano, Federico Tombari
Abstract Ambiguity is inherently present in many machine learning tasks, but especially for sequential models seldom accounted for, as most only output a single prediction. In this work we propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data, which is of special importance, as often multiple futures are equally likely. Our approach can be applied to the most common recurrent architectures and can be used with any loss function. Additionally, we introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties and coincides with our intuitive understanding of correctness in the presence of multiple labels. We test our method on several experiments and across diverse tasks dealing with time series data, such as trajectory forecasting and maneuver prediction, achieving promising results.
Tasks Time Series
Published 2020-03-10
URL https://arxiv.org/abs/2003.10381v1
PDF https://arxiv.org/pdf/2003.10381v1.pdf
PWC https://paperswithcode.com/paper/ambiguity-in-sequential-data-predicting
Repo
Framework

Non-Parametric Learning of Gaifman Models

Title Non-Parametric Learning of Gaifman Models
Authors Devendra Singh Dhami, Siwen Yan, Gautam Kunapuli, Sriraam Natarajan
Abstract We consider the problem of structure learning for Gaifman models and learn relational features that can be used to derive feature representations from a knowledge base. These relational features are first-order rules that are then partially grounded and counted over local neighborhoods of a Gaifman model to obtain the feature representations. We propose a method for learning these relational features for a Gaifman model by using relational tree distances. Our empirical evaluation on real data sets demonstrates the superiority of our approach over classical rule-learning.
Tasks
Published 2020-01-02
URL https://arxiv.org/abs/2001.00528v2
PDF https://arxiv.org/pdf/2001.00528v2.pdf
PWC https://paperswithcode.com/paper/non-parametric-learning-of-gaifman-models
Repo
Framework

Progressive Growing of Neural ODEs

Title Progressive Growing of Neural ODEs
Authors Hammad A. Ayyubi, Yi Yao, Ajay Divakaran
Abstract Neural Ordinary Differential Equations (NODEs) have proven to be a powerful modeling tool for approximating (interpolation) and forecasting (extrapolation) irregularly sampled time series data. However, their performance degrades substantially when applied to real-world data, especially long-term data with complex behaviors (e.g., long-term trend across years, mid-term seasonality across months, and short-term local variation across days). To address the modeling of such complex data with different behaviors at different frequencies (time spans), we propose a novel progressive learning paradigm of NODEs for long-term time series forecasting. Specifically, following the principle of curriculum learning, we gradually increase the complexity of data and network capacity as training progresses. Our experiments with both synthetic data and real traffic data (PeMS Bay Area traffic data) show that our training methodology consistently improves the performance of vanilla NODEs by over 64%.
Tasks Time Series, Time Series Forecasting
Published 2020-03-08
URL https://arxiv.org/abs/2003.03695v1
PDF https://arxiv.org/pdf/2003.03695v1.pdf
PWC https://paperswithcode.com/paper/progressive-growing-of-neural-odes
Repo
Framework

Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks

Title Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks
Authors Yehui Tang, Yunhe Wang, Yixing Xu, Boxin Shi, Chao Xu, Chunjing Xu, Chang Xu
Abstract Deep neural networks often consist of a great number of trainable parameters for extracting powerful features from given datasets. On one hand, massive trainable parameters significantly enhance the performance of these deep networks. On the other hand, they bring the problem of over-fitting. To this end, dropout based methods disable some elements in the output feature maps during the training phase for reducing the co-adaptation of neurons. Although the generalization ability of the resulting models can be enhanced by these approaches, the conventional binary dropout is not the optimal solution. Therefore, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks and propose a feature distortion method (Disout) for addressing the aforementioned problem. In the training period, randomly selected elements in the feature maps will be replaced with specific values by exploiting the generalization error bound. The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated on several benchmark image datasets.
Tasks
Published 2020-02-23
URL https://arxiv.org/abs/2002.11022v1
PDF https://arxiv.org/pdf/2002.11022v1.pdf
PWC https://paperswithcode.com/paper/beyond-dropout-feature-map-distortion-to
Repo
Framework

A Multi-oriented Chinese Keyword Spotter Guided by Text Line Detection

Title A Multi-oriented Chinese Keyword Spotter Guided by Text Line Detection
Authors Pei Xu, Shan Huang, Hongzhen Wang, Hao Song, Shen Huang, Qi Ju
Abstract Chinese keyword spotting is a challenging task as there is no visual blank for Chinese words. Different from English words which are split naturally by visual blanks, Chinese words are generally split only by semantic information. In this paper, we propose a new Chinese keyword spotter for natural images, which is inspired by Mask R-CNN. We propose to predict the keyword masks guided by text line detection. Firstly, proposals of text lines are generated by Faster R-CNN;Then, text line masks and keyword masks are predicted by segmentation in the proposals. In this way, the text lines and keywords are predicted in parallel. We create two Chinese keyword datasets based on RCTW-17 and ICPR MTWI2018 to verify the effectiveness of our method.
Tasks Keyword Spotting
Published 2020-01-03
URL https://arxiv.org/abs/2001.00722v2
PDF https://arxiv.org/pdf/2001.00722v2.pdf
PWC https://paperswithcode.com/paper/a-multi-oriented-chinese-keyword-spotter
Repo
Framework

Sparse Covariance Estimation in Logit Mixture Models

Title Sparse Covariance Estimation in Logit Mixture Models
Authors Youssef M Aboutaleb, Mazen Danaf, Yifei Xie, Moshe Ben-Akiva
Abstract This paper introduces a new data-driven methodology for estimating sparse covariance matrices of the random coefficients in logit mixture models. Researchers typically specify covariance matrices in logit mixture models under one of two extreme assumptions: either an unrestricted full covariance matrix (allowing correlations between all random coefficients), or a restricted diagonal matrix (allowing no correlations at all). Our objective is to find optimal subsets of correlated coefficients for which we estimate covariances. We propose a new estimator, called MISC, that uses a mixed-integer optimization (MIO) program to find an optimal block diagonal structure specification for the covariance matrix, corresponding to subsets of correlated coefficients, for any desired sparsity level using Markov Chain Monte Carlo (MCMC) posterior draws from the unrestricted full covariance matrix. The optimal sparsity level of the covariance matrix is determined using out-of-sample validation. We demonstrate the ability of MISC to correctly recover the true covariance structure from synthetic data. In an empirical illustration using a stated preference survey on modes of transportation, we use MISC to obtain a sparse covariance matrix indicating how preferences for attributes are related to one another.
Tasks
Published 2020-01-14
URL https://arxiv.org/abs/2001.05034v1
PDF https://arxiv.org/pdf/2001.05034v1.pdf
PWC https://paperswithcode.com/paper/sparse-covariance-estimation-in-logit-mixture
Repo
Framework
comments powered by Disqus