Paper Group ANR 246
RetinaTrack: Online Single Stage Joint Detection and Tracking. Loss-annealed GAIL for sample efficient and stable Imitation Learning. Large-scale benchmark study of survival prediction methods using multi-omics data. Learning and Testing Variable Partitions. Rényi Entropy Bounds on the Active Learning Cost-Performance Tradeoff. Sequential Transfer …
RetinaTrack: Online Single Stage Joint Detection and Tracking
Title | RetinaTrack: Online Single Stage Joint Detection and Tracking |
Authors | Zhichao Lu, Vivek Rathod, Ronny Votel, Jonathan Huang |
Abstract | Traditionally multi-object tracking and object detection are performed using separate systems with most prior works focusing exclusively on one of these aspects over the other. Tracking systems clearly benefit from having access to accurate detections, however and there is ample evidence in literature that detectors can benefit from tracking which, for example, can help to smooth predictions over time. In this paper we focus on the tracking-by-detection paradigm for autonomous driving where both tasks are mission critical. We propose a conceptually simple and efficient joint model of detection and tracking, called RetinaTrack, which modifies the popular single stage RetinaNet approach such that it is amenable to instance-level embedding training. We show, via evaluations on the Waymo Open Dataset, that we outperform a recent state of the art tracking algorithm while requiring significantly less computation. We believe that our simple yet effective approach can serve as a strong baseline for future work in this area. |
Tasks | Autonomous Driving, Multi-Object Tracking, Object Detection, Object Tracking |
Published | 2020-03-30 |
URL | https://arxiv.org/abs/2003.13870v1 |
https://arxiv.org/pdf/2003.13870v1.pdf | |
PWC | https://paperswithcode.com/paper/retinatrack-online-single-stage-joint |
Repo | |
Framework | |
Loss-annealed GAIL for sample efficient and stable Imitation Learning
Title | Loss-annealed GAIL for sample efficient and stable Imitation Learning |
Authors | Rohit Jena, Katia Sycara |
Abstract | Imitation learning is the problem of learning a policy from an expert policy without access to a reward signal. Often, the expert policy is only available in the form of expert demonstrations. Behavior cloning and GAIL are two popularly used methods for performing imitation learning in this setting. Behavior cloning converges in a few training iterations, but doesn’t reach peak performance and suffers from compounding errors due to its supervised training framework and iid assumption. GAIL attempts to tackle this problem by accounting for the temporal dependencies between states while matching occupancy measures of the expert and the policy. Although GAIL has shown successes in a number of environments, it takes a lot of environment interactions. Given their complementary benefits, existing methods have mentioned trying or tried to combine the two methods, without much success. We look at some of the limitations of existing ideas that try to combine BC and GAIL, and present an algorithm that combines the best of both worlds to enable faster and stable training while not compromising on performance. Our algorithm is embarrassingly simple to implement and seamlessly integrates with different policy gradient algorithms. We demonstrate the effectiveness of the algorithm both in low dimensional control tasks in a limited data setting, and in high dimensional grid world environments. |
Tasks | Imitation Learning |
Published | 2020-01-21 |
URL | https://arxiv.org/abs/2001.07798v1 |
https://arxiv.org/pdf/2001.07798v1.pdf | |
PWC | https://paperswithcode.com/paper/loss-annealed-gail-for-sample-efficient-and |
Repo | |
Framework | |
Large-scale benchmark study of survival prediction methods using multi-omics data
Title | Large-scale benchmark study of survival prediction methods using multi-omics data |
Authors | Moritz Herrmann, Philipp Probst, Roman Hornung, Vindi Jurinovic, Anne-Laure Boulesteix |
Abstract | Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables (often in addition to classical clinical variables), are increasingly generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions by means of a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets from the database “The Cancer Genome Atlas”, containing from 35 to 1,000 observations and from 60,000 to 100,000 variables. The considered outcome was the (censored) survival time. Twelve methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan-Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno’s C-index and the integrated Brier-score served as performance metrics. The results show that, although multi-omics data can improve the prediction performance, this is not generally the case. Only the method block forest slightly outperformed the Cox model on average over all datasets. Taking into account the multi-omics structure improves the predictive performance and protects variables in low-dimensional groups - especially clinical variables - from not being included in the model. All analyses are reproducible using freely available R code. |
Tasks | |
Published | 2020-03-07 |
URL | https://arxiv.org/abs/2003.03621v1 |
https://arxiv.org/pdf/2003.03621v1.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-benchmark-study-of-survival |
Repo | |
Framework | |
Learning and Testing Variable Partitions
Title | Learning and Testing Variable Partitions |
Authors | Andrej Bogdanov, Baoxiang Wang |
Abstract | $ $Let $F$ be a multivariate function from a product set $\Sigma^n$ to an Abelian group $G$. A $k$-partition of $F$ with cost $\delta$ is a partition of the set of variables $\mathbf{V}$ into $k$ non-empty subsets $(\mathbf{X}_1, \dots, \mathbf{X}_k)$ such that $F(\mathbf{V})$ is $\delta$-close to $F_1(\mathbf{X}_1)+\dots+F_k(\mathbf{X}_k)$ for some $F_1, \dots, F_k$ with respect to a given error metric. We study algorithms for agnostically learning $k$ partitions and testing $k$-partitionability over various groups and error metrics given query access to $F$. In particular we show that $1.$ Given a function that has a $k$-partition of cost $\delta$, a partition of cost $\mathcal{O}(k n^2)(\delta + \epsilon)$ can be learned in time $\tilde{\mathcal{O}}(n^2 \mathrm{poly} (1/\epsilon))$ for any $\epsilon > 0$. In contrast, for $k = 2$ and $n = 3$ learning a partition of cost $\delta + \epsilon$ is NP-hard. $2.$ When $F$ is real-valued and the error metric is the 2-norm, a 2-partition of cost $\sqrt{\delta^2 + \epsilon}$ can be learned in time $\tilde{\mathcal{O}}(n^5/\epsilon^2)$. $3.$ When $F$ is $\mathbb{Z}_q$-valued and the error metric is Hamming weight, $k$-partitionability is testable with one-sided error and $\mathcal{O}(kn^3/\epsilon)$ non-adaptive queries. We also show that even two-sided testers require $\Omega(n)$ queries when $k = 2$. This work was motivated by reinforcement learning control tasks in which the set of control variables can be partitioned. The partitioning reduces the task into multiple lower-dimensional ones that are relatively easier to learn. Our second algorithm empirically increases the scores attained over previous heuristic partitioning methods applied in this context. |
Tasks | |
Published | 2020-03-29 |
URL | https://arxiv.org/abs/2003.12990v1 |
https://arxiv.org/pdf/2003.12990v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-and-testing-variable-partitions |
Repo | |
Framework | |
Rényi Entropy Bounds on the Active Learning Cost-Performance Tradeoff
Title | Rényi Entropy Bounds on the Active Learning Cost-Performance Tradeoff |
Authors | Vahid Jamali, Antonia Tulino, Jaime Llorca, Elza Erkip |
Abstract | Semi-supervised classification, one of the most prominent fields in machine learning, studies how to combine the statistical knowledge of the often abundant unlabeled data with the often limited labeled data in order to maximize overall classification accuracy. In this context, the process of actively choosing the data to be labeled is referred to as active learning. In this paper, we initiate the non-asymptotic analysis of the optimal policy for semi-supervised classification with actively obtained labeled data. Considering a general Bayesian classification model, we provide the first characterization of the jointly optimal active learning and semi-supervised classification policy, in terms of the cost-performance tradeoff driven by the label query budget (number of data items to be labeled) and overall classification accuracy. Leveraging recent results on the R'enyi Entropy, we derive tight information-theoretic bounds on such active learning cost-performance tradeoff. |
Tasks | Active Learning |
Published | 2020-02-05 |
URL | https://arxiv.org/abs/2002.02025v1 |
https://arxiv.org/pdf/2002.02025v1.pdf | |
PWC | https://paperswithcode.com/paper/renyi-entropy-bounds-on-the-active-learning |
Repo | |
Framework | |
Sequential Transfer Machine Learning in Networks: Measuring the Impact of Data and Neural Net Similarity on Transferability
Title | Sequential Transfer Machine Learning in Networks: Measuring the Impact of Data and Neural Net Similarity on Transferability |
Authors | Robin Hirt, Akash Srivastava, Carlos Berg, Niklas Kühl |
Abstract | In networks of independent entities that face similar predictive tasks, transfer machine learning enables to re-use and improve neural nets using distributed data sets without the exposure of raw data. As the number of data sets in business networks grows and not every neural net transfer is successful, indicators are needed for its impact on the target performance-its transferability. We perform an empirical study on a unique real-world use case comprised of sales data from six different restaurants. We train and transfer neural nets across these restaurant sales data and measure their transferability. Moreover, we calculate potential indicators for transferability based on divergences of data, data projections and a novel metric for neural net similarity. We obtain significant negative correlations between the transferability and the tested indicators. Our findings allow to choose the transfer path based on these indicators, which improves model performance whilst simultaneously requiring fewer model transfers. |
Tasks | |
Published | 2020-03-29 |
URL | https://arxiv.org/abs/2003.13070v1 |
https://arxiv.org/pdf/2003.13070v1.pdf | |
PWC | https://paperswithcode.com/paper/sequential-transfer-machine-learning-in |
Repo | |
Framework | |
Neural Message Passing on High Order Paths
Title | Neural Message Passing on High Order Paths |
Authors | Daniel Flam-Shepherd, Tony Wu, Pascal Friederich, Alan Aspuru-Guzik |
Abstract | Graph neural network have achieved impressive results in predicting molecular properties, but they do not directly account for local and hidden structures in the graph such as functional groups and molecular geometry. At each propagation step, GNNs aggregate only over first order neighbours, ignoring important information contained in subsequent neighbours as well as the relationships between those higher order connections. In this work, we generalize graph neural nets to pass messages and aggregate across higher order paths. This allows for information to propagate over various levels and substructures of the graph. We demonstrate our model on a few tasks in molecular property prediction. |
Tasks | Molecular Property Prediction |
Published | 2020-02-24 |
URL | https://arxiv.org/abs/2002.10413v1 |
https://arxiv.org/pdf/2002.10413v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-message-passing-on-high-order-paths |
Repo | |
Framework | |
Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks
Title | Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks |
Authors | Nicolae-Catalin Ristea, Liviu Cristian Dutu, Anamaria Radoi |
Abstract | Emotion recognition has become an important field of research in the human-computer interactions domain. The latest advancements in the field show that combining visual with audio information lead to better results if compared to the case of using a single source of information separately. From a visual point of view, a human emotion can be recognized by analyzing the facial expression of the person. More precisely, the human emotion can be described through a combination of several Facial Action Units. In this paper, we propose a system that is able to recognize emotions with a high accuracy rate and in real time, based on deep Convolutional Neural Networks. In order to increase the accuracy of the recognition system, we analyze also the speech data and fuse the information coming from both sources, i.e., visual and audio. Experimental results show the effectiveness of the proposed scheme for emotion recognition and the importance of combining visual with audio data. |
Tasks | Emotion Recognition |
Published | 2020-02-29 |
URL | https://arxiv.org/abs/2003.00351v1 |
https://arxiv.org/pdf/2003.00351v1.pdf | |
PWC | https://paperswithcode.com/paper/emotion-recognition-system-from-speech-and |
Repo | |
Framework | |
On Biased Compression for Distributed Learning
Title | On Biased Compression for Distributed Learning |
Authors | Aleksandr Beznosikov, Samuel Horváth, Peter Richtárik, Mher Safaryan |
Abstract | In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact {\em biased} compressors often show superior performance in practice when compared to the much more studied and understood {\em unbiased} compressors, very little is known about them. In this work we study three classes of biased compression operators, two of which are new, and their performance when applied to (stochastic) gradient descent and distributed (stochastic) gradient descent. We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. Our {\em distributed} SGD method enjoys the ergodic rate $\mathcal{O}\left(\frac{\delta L \exp(-K) }{\mu} + \frac{(C + D)}{K\mu}\right)$, where $\delta$ is a compression parameter which grows when more compression is applied, $L$ and $\mu$ are the smoothness and strong convexity constants, $C$ captures stochastic gradient noise ($C=0$ if full gradients are computed on each node) and $D$ captures the variance of the gradients at the optimum ($D=0$ for over-parameterized models). Further, via a theoretical study of several synthetic and empirical distributions of communicated gradients, we shed light on why and by how much biased compressors outperform their unbiased variants. Finally, we propose a new highly performing biased compressor—combination of Top-$k$ and natural dithering—which in our experiments outperforms all other compression techniques. |
Tasks | |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2002.12410v1 |
https://arxiv.org/pdf/2002.12410v1.pdf | |
PWC | https://paperswithcode.com/paper/on-biased-compression-for-distributed |
Repo | |
Framework | |
Ambiguity in Sequential Data: Predicting Uncertain Futures with Recurrent Models
Title | Ambiguity in Sequential Data: Predicting Uncertain Futures with Recurrent Models |
Authors | Alessandro Berlati, Oliver Scheel, Luigi Di Stefano, Federico Tombari |
Abstract | Ambiguity is inherently present in many machine learning tasks, but especially for sequential models seldom accounted for, as most only output a single prediction. In this work we propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data, which is of special importance, as often multiple futures are equally likely. Our approach can be applied to the most common recurrent architectures and can be used with any loss function. Additionally, we introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties and coincides with our intuitive understanding of correctness in the presence of multiple labels. We test our method on several experiments and across diverse tasks dealing with time series data, such as trajectory forecasting and maneuver prediction, achieving promising results. |
Tasks | Time Series |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.10381v1 |
https://arxiv.org/pdf/2003.10381v1.pdf | |
PWC | https://paperswithcode.com/paper/ambiguity-in-sequential-data-predicting |
Repo | |
Framework | |
Non-Parametric Learning of Gaifman Models
Title | Non-Parametric Learning of Gaifman Models |
Authors | Devendra Singh Dhami, Siwen Yan, Gautam Kunapuli, Sriraam Natarajan |
Abstract | We consider the problem of structure learning for Gaifman models and learn relational features that can be used to derive feature representations from a knowledge base. These relational features are first-order rules that are then partially grounded and counted over local neighborhoods of a Gaifman model to obtain the feature representations. We propose a method for learning these relational features for a Gaifman model by using relational tree distances. Our empirical evaluation on real data sets demonstrates the superiority of our approach over classical rule-learning. |
Tasks | |
Published | 2020-01-02 |
URL | https://arxiv.org/abs/2001.00528v2 |
https://arxiv.org/pdf/2001.00528v2.pdf | |
PWC | https://paperswithcode.com/paper/non-parametric-learning-of-gaifman-models |
Repo | |
Framework | |
Progressive Growing of Neural ODEs
Title | Progressive Growing of Neural ODEs |
Authors | Hammad A. Ayyubi, Yi Yao, Ajay Divakaran |
Abstract | Neural Ordinary Differential Equations (NODEs) have proven to be a powerful modeling tool for approximating (interpolation) and forecasting (extrapolation) irregularly sampled time series data. However, their performance degrades substantially when applied to real-world data, especially long-term data with complex behaviors (e.g., long-term trend across years, mid-term seasonality across months, and short-term local variation across days). To address the modeling of such complex data with different behaviors at different frequencies (time spans), we propose a novel progressive learning paradigm of NODEs for long-term time series forecasting. Specifically, following the principle of curriculum learning, we gradually increase the complexity of data and network capacity as training progresses. Our experiments with both synthetic data and real traffic data (PeMS Bay Area traffic data) show that our training methodology consistently improves the performance of vanilla NODEs by over 64%. |
Tasks | Time Series, Time Series Forecasting |
Published | 2020-03-08 |
URL | https://arxiv.org/abs/2003.03695v1 |
https://arxiv.org/pdf/2003.03695v1.pdf | |
PWC | https://paperswithcode.com/paper/progressive-growing-of-neural-odes |
Repo | |
Framework | |
Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks
Title | Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks |
Authors | Yehui Tang, Yunhe Wang, Yixing Xu, Boxin Shi, Chao Xu, Chunjing Xu, Chang Xu |
Abstract | Deep neural networks often consist of a great number of trainable parameters for extracting powerful features from given datasets. On one hand, massive trainable parameters significantly enhance the performance of these deep networks. On the other hand, they bring the problem of over-fitting. To this end, dropout based methods disable some elements in the output feature maps during the training phase for reducing the co-adaptation of neurons. Although the generalization ability of the resulting models can be enhanced by these approaches, the conventional binary dropout is not the optimal solution. Therefore, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks and propose a feature distortion method (Disout) for addressing the aforementioned problem. In the training period, randomly selected elements in the feature maps will be replaced with specific values by exploiting the generalization error bound. The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated on several benchmark image datasets. |
Tasks | |
Published | 2020-02-23 |
URL | https://arxiv.org/abs/2002.11022v1 |
https://arxiv.org/pdf/2002.11022v1.pdf | |
PWC | https://paperswithcode.com/paper/beyond-dropout-feature-map-distortion-to |
Repo | |
Framework | |
A Multi-oriented Chinese Keyword Spotter Guided by Text Line Detection
Title | A Multi-oriented Chinese Keyword Spotter Guided by Text Line Detection |
Authors | Pei Xu, Shan Huang, Hongzhen Wang, Hao Song, Shen Huang, Qi Ju |
Abstract | Chinese keyword spotting is a challenging task as there is no visual blank for Chinese words. Different from English words which are split naturally by visual blanks, Chinese words are generally split only by semantic information. In this paper, we propose a new Chinese keyword spotter for natural images, which is inspired by Mask R-CNN. We propose to predict the keyword masks guided by text line detection. Firstly, proposals of text lines are generated by Faster R-CNN;Then, text line masks and keyword masks are predicted by segmentation in the proposals. In this way, the text lines and keywords are predicted in parallel. We create two Chinese keyword datasets based on RCTW-17 and ICPR MTWI2018 to verify the effectiveness of our method. |
Tasks | Keyword Spotting |
Published | 2020-01-03 |
URL | https://arxiv.org/abs/2001.00722v2 |
https://arxiv.org/pdf/2001.00722v2.pdf | |
PWC | https://paperswithcode.com/paper/a-multi-oriented-chinese-keyword-spotter |
Repo | |
Framework | |
Sparse Covariance Estimation in Logit Mixture Models
Title | Sparse Covariance Estimation in Logit Mixture Models |
Authors | Youssef M Aboutaleb, Mazen Danaf, Yifei Xie, Moshe Ben-Akiva |
Abstract | This paper introduces a new data-driven methodology for estimating sparse covariance matrices of the random coefficients in logit mixture models. Researchers typically specify covariance matrices in logit mixture models under one of two extreme assumptions: either an unrestricted full covariance matrix (allowing correlations between all random coefficients), or a restricted diagonal matrix (allowing no correlations at all). Our objective is to find optimal subsets of correlated coefficients for which we estimate covariances. We propose a new estimator, called MISC, that uses a mixed-integer optimization (MIO) program to find an optimal block diagonal structure specification for the covariance matrix, corresponding to subsets of correlated coefficients, for any desired sparsity level using Markov Chain Monte Carlo (MCMC) posterior draws from the unrestricted full covariance matrix. The optimal sparsity level of the covariance matrix is determined using out-of-sample validation. We demonstrate the ability of MISC to correctly recover the true covariance structure from synthetic data. In an empirical illustration using a stated preference survey on modes of transportation, we use MISC to obtain a sparse covariance matrix indicating how preferences for attributes are related to one another. |
Tasks | |
Published | 2020-01-14 |
URL | https://arxiv.org/abs/2001.05034v1 |
https://arxiv.org/pdf/2001.05034v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-covariance-estimation-in-logit-mixture |
Repo | |
Framework | |