April 2, 2020

3143 words 15 mins read

Paper Group ANR 246

RetinaTrack: Online Single Stage Joint Detection and Tracking. Loss-annealed GAIL for sample efficient and stable Imitation Learning. Large-scale benchmark study of survival prediction methods using multi-omics data. Learning and Testing Variable Partitions. Rényi Entropy Bounds on the Active Learning Cost-Performance Tradeoff. Sequential Transfer …

RetinaTrack: Online Single Stage Joint Detection and Tracking


Title	RetinaTrack: Online Single Stage Joint Detection and Tracking
Authors	Zhichao Lu, Vivek Rathod, Ronny Votel, Jonathan Huang
Abstract	Traditionally multi-object tracking and object detection are performed using separate systems with most prior works focusing exclusively on one of these aspects over the other. Tracking systems clearly benefit from having access to accurate detections, however and there is ample evidence in literature that detectors can benefit from tracking which, for example, can help to smooth predictions over time. In this paper we focus on the tracking-by-detection paradigm for autonomous driving where both tasks are mission critical. We propose a conceptually simple and efficient joint model of detection and tracking, called RetinaTrack, which modifies the popular single stage RetinaNet approach such that it is amenable to instance-level embedding training. We show, via evaluations on the Waymo Open Dataset, that we outperform a recent state of the art tracking algorithm while requiring significantly less computation. We believe that our simple yet effective approach can serve as a strong baseline for future work in this area.
Tasks	Autonomous Driving, Multi-Object Tracking, Object Detection, Object Tracking
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13870v1
PDF	https://arxiv.org/pdf/2003.13870v1.pdf
PWC	https://paperswithcode.com/paper/retinatrack-online-single-stage-joint
Repo
Framework

Loss-annealed GAIL for sample efficient and stable Imitation Learning


Title	Loss-annealed GAIL for sample efficient and stable Imitation Learning
Authors	Rohit Jena, Katia Sycara
Abstract	Imitation learning is the problem of learning a policy from an expert policy without access to a reward signal. Often, the expert policy is only available in the form of expert demonstrations. Behavior cloning and GAIL are two popularly used methods for performing imitation learning in this setting. Behavior cloning converges in a few training iterations, but doesn’t reach peak performance and suffers from compounding errors due to its supervised training framework and iid assumption. GAIL attempts to tackle this problem by accounting for the temporal dependencies between states while matching occupancy measures of the expert and the policy. Although GAIL has shown successes in a number of environments, it takes a lot of environment interactions. Given their complementary benefits, existing methods have mentioned trying or tried to combine the two methods, without much success. We look at some of the limitations of existing ideas that try to combine BC and GAIL, and present an algorithm that combines the best of both worlds to enable faster and stable training while not compromising on performance. Our algorithm is embarrassingly simple to implement and seamlessly integrates with different policy gradient algorithms. We demonstrate the effectiveness of the algorithm both in low dimensional control tasks in a limited data setting, and in high dimensional grid world environments.
Tasks	Imitation Learning
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07798v1
PDF	https://arxiv.org/pdf/2001.07798v1.pdf
PWC	https://paperswithcode.com/paper/loss-annealed-gail-for-sample-efficient-and
Repo
Framework

Large-scale benchmark study of survival prediction methods using multi-omics data


Title	Large-scale benchmark study of survival prediction methods using multi-omics data
Authors	Moritz Herrmann, Philipp Probst, Roman Hornung, Vindi Jurinovic, Anne-Laure Boulesteix
Abstract	Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables (often in addition to classical clinical variables), are increasingly generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions by means of a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets from the database “The Cancer Genome Atlas”, containing from 35 to 1,000 observations and from 60,000 to 100,000 variables. The considered outcome was the (censored) survival time. Twelve methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan-Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno’s C-index and the integrated Brier-score served as performance metrics. The results show that, although multi-omics data can improve the prediction performance, this is not generally the case. Only the method block forest slightly outperformed the Cox model on average over all datasets. Taking into account the multi-omics structure improves the predictive performance and protects variables in low-dimensional groups - especially clinical variables - from not being included in the model. All analyses are reproducible using freely available R code.
Tasks
Published	2020-03-07
URL	https://arxiv.org/abs/2003.03621v1
PDF	https://arxiv.org/pdf/2003.03621v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-benchmark-study-of-survival
Repo
Framework

Learning and Testing Variable Partitions


Title	Learning and Testing Variable Partitions
Authors	Andrej Bogdanov, Baoxiang Wang
Abstract	$ $Let $F$ be a multivariate function from a product set $\Sigma^n$ to an Abelian group $G$. A $k$-partition of $F$ with cost $\delta$ is a partition of the set of variables $\mathbf{V}$ into $k$ non-empty subsets $(\mathbf{X}_1, \dots, \mathbf{X}_k)$ such that $F(\mathbf{V})$ is $\delta$-close to $F_1(\mathbf{X}_1)+\dots+F_k(\mathbf{X}_k)$ for some $F_1, \dots, F_k$ with respect to a given error metric. We study algorithms for agnostically learning $k$ partitions and testing $k$-partitionability over various groups and error metrics given query access to $F$. In particular we show that $1.$ Given a function that has a $k$-partition of cost $\delta$, a partition of cost $\mathcal{O}(k n^2)(\delta + \epsilon)$ can be learned in time $\tilde{\mathcal{O}}(n^2 \mathrm{poly} (1/\epsilon))$ for any $\epsilon > 0$. In contrast, for $k = 2$ and $n = 3$ learning a partition of cost $\delta + \epsilon$ is NP-hard. $2.$ When $F$ is real-valued and the error metric is the 2-norm, a 2-partition of cost $\sqrt{\delta^2 + \epsilon}$ can be learned in time $\tilde{\mathcal{O}}(n^5/\epsilon^2)$. $3.$ When $F$ is $\mathbb{Z}_q$-valued and the error metric is Hamming weight, $k$-partitionability is testable with one-sided error and $\mathcal{O}(kn^3/\epsilon)$ non-adaptive queries. We also show that even two-sided testers require $\Omega(n)$ queries when $k = 2$. This work was motivated by reinforcement learning control tasks in which the set of control variables can be partitioned. The partitioning reduces the task into multiple lower-dimensional ones that are relatively easier to learn. Our second algorithm empirically increases the scores attained over previous heuristic partitioning methods applied in this context.
Tasks
Published	2020-03-29
URL	https://arxiv.org/abs/2003.12990v1
PDF	https://arxiv.org/pdf/2003.12990v1.pdf
PWC	https://paperswithcode.com/paper/learning-and-testing-variable-partitions
Repo
Framework

Rényi Entropy Bounds on the Active Learning Cost-Performance Tradeoff


Title	Rényi Entropy Bounds on the Active Learning Cost-Performance Tradeoff
Authors	Vahid Jamali, Antonia Tulino, Jaime Llorca, Elza Erkip
Abstract	Semi-supervised classification, one of the most prominent fields in machine learning, studies how to combine the statistical knowledge of the often abundant unlabeled data with the often limited labeled data in order to maximize overall classification accuracy. In this context, the process of actively choosing the data to be labeled is referred to as active learning. In this paper, we initiate the non-asymptotic analysis of the optimal policy for semi-supervised classification with actively obtained labeled data. Considering a general Bayesian classification model, we provide the first characterization of the jointly optimal active learning and semi-supervised classification policy, in terms of the cost-performance tradeoff driven by the label query budget (number of data items to be labeled) and overall classification accuracy. Leveraging recent results on the R'enyi Entropy, we derive tight information-theoretic bounds on such active learning cost-performance tradeoff.
Tasks	Active Learning
Published	2020-02-05
URL	https://arxiv.org/abs/2002.02025v1
PDF	https://arxiv.org/pdf/2002.02025v1.pdf
PWC	https://paperswithcode.com/paper/renyi-entropy-bounds-on-the-active-learning
Repo
Framework

Sequential Transfer Machine Learning in Networks: Measuring the Impact of Data and Neural Net Similarity on Transferability


Title	Sequential Transfer Machine Learning in Networks: Measuring the Impact of Data and Neural Net Similarity on Transferability
Authors	Robin Hirt, Akash Srivastava, Carlos Berg, Niklas Kühl
Abstract	In networks of independent entities that face similar predictive tasks, transfer machine learning enables to re-use and improve neural nets using distributed data sets without the exposure of raw data. As the number of data sets in business networks grows and not every neural net transfer is successful, indicators are needed for its impact on the target performance-its transferability. We perform an empirical study on a unique real-world use case comprised of sales data from six different restaurants. We train and transfer neural nets across these restaurant sales data and measure their transferability. Moreover, we calculate potential indicators for transferability based on divergences of data, data projections and a novel metric for neural net similarity. We obtain significant negative correlations between the transferability and the tested indicators. Our findings allow to choose the transfer path based on these indicators, which improves model performance whilst simultaneously requiring fewer model transfers.
Tasks
Published	2020-03-29
URL	https://arxiv.org/abs/2003.13070v1
PDF	https://arxiv.org/pdf/2003.13070v1.pdf
PWC	https://paperswithcode.com/paper/sequential-transfer-machine-learning-in
Repo
Framework

Neural Message Passing on High Order Paths


Title	Neural Message Passing on High Order Paths
Authors	Daniel Flam-Shepherd, Tony Wu, Pascal Friederich, Alan Aspuru-Guzik
Abstract	Graph neural network have achieved impressive results in predicting molecular properties, but they do not directly account for local and hidden structures in the graph such as functional groups and molecular geometry. At each propagation step, GNNs aggregate only over first order neighbours, ignoring important information contained in subsequent neighbours as well as the relationships between those higher order connections. In this work, we generalize graph neural nets to pass messages and aggregate across higher order paths. This allows for information to propagate over various levels and substructures of the graph. We demonstrate our model on a few tasks in molecular property prediction.
Tasks	Molecular Property Prediction
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10413v1
PDF	https://arxiv.org/pdf/2002.10413v1.pdf
PWC	https://paperswithcode.com/paper/neural-message-passing-on-high-order-paths
Repo
Framework

Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks


Title	Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks
Authors	Nicolae-Catalin Ristea, Liviu Cristian Dutu, Anamaria Radoi
Abstract	Emotion recognition has become an important field of research in the human-computer interactions domain. The latest advancements in the field show that combining visual with audio information lead to better results if compared to the case of using a single source of information separately. From a visual point of view, a human emotion can be recognized by analyzing the facial expression of the person. More precisely, the human emotion can be described through a combination of several Facial Action Units. In this paper, we propose a system that is able to recognize emotions with a high accuracy rate and in real time, based on deep Convolutional Neural Networks. In order to increase the accuracy of the recognition system, we analyze also the speech data and fuse the information coming from both sources, i.e., visual and audio. Experimental results show the effectiveness of the proposed scheme for emotion recognition and the importance of combining visual with audio data.
Tasks	Emotion Recognition
Published	2020-02-29
URL	https://arxiv.org/abs/2003.00351v1
PDF	https://arxiv.org/pdf/2003.00351v1.pdf
PWC	https://paperswithcode.com/paper/emotion-recognition-system-from-speech-and
Repo
Framework

On Biased Compression for Distributed Learning


Title	On Biased Compression for Distributed Learning
Authors	Aleksandr Beznosikov, Samuel Horváth, Peter Richtárik, Mher Safaryan
Abstract	In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact {\em biased} compressors often show superior performance in practice when compared to the much more studied and understood {\em unbiased} compressors, very little is known about them. In this work we study three classes of biased compression operators, two of which are new, and their performance when applied to (stochastic) gradient descent and distributed (stochastic) gradient descent. We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. Our {\em distributed} SGD method enjoys the ergodic rate $\mathcal{O}\left(\frac{\delta L \exp(-K) }{\mu} + \frac{(C + D)}{K\mu}\right)$, where $\delta$ is a compression parameter which grows when more compression is applied, $L$ and $\mu$ are the smoothness and strong convexity constants, $C$ captures stochastic gradient noise ($C=0$ if full gradients are computed on each node) and $D$ captures the variance of the gradients at the optimum ($D=0$ for over-parameterized models). Further, via a theoretical study of several synthetic and empirical distributions of communicated gradients, we shed light on why and by how much biased compressors outperform their unbiased variants. Finally, we propose a new highly performing biased compressor—combination of Top-$k$ and natural dithering—which in our experiments outperforms all other compression techniques.
Tasks
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12410v1
PDF	https://arxiv.org/pdf/2002.12410v1.pdf
PWC	https://paperswithcode.com/paper/on-biased-compression-for-distributed
Repo
Framework

Ambiguity in Sequential Data: Predicting Uncertain Futures with Recurrent Models


Title	Ambiguity in Sequential Data: Predicting Uncertain Futures with Recurrent Models
Authors	Alessandro Berlati, Oliver Scheel, Luigi Di Stefano, Federico Tombari
Abstract	Ambiguity is inherently present in many machine learning tasks, but especially for sequential models seldom accounted for, as most only output a single prediction. In this work we propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data, which is of special importance, as often multiple futures are equally likely. Our approach can be applied to the most common recurrent architectures and can be used with any loss function. Additionally, we introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties and coincides with our intuitive understanding of correctness in the presence of multiple labels. We test our method on several experiments and across diverse tasks dealing with time series data, such as trajectory forecasting and maneuver prediction, achieving promising results.
Tasks	Time Series
Published	2020-03-10
URL	https://arxiv.org/abs/2003.10381v1
PDF	https://arxiv.org/pdf/2003.10381v1.pdf
PWC	https://paperswithcode.com/paper/ambiguity-in-sequential-data-predicting
Repo
Framework

Non-Parametric Learning of Gaifman Models


Title	Non-Parametric Learning of Gaifman Models
Authors	Devendra Singh Dhami, Siwen Yan, Gautam Kunapuli, Sriraam Natarajan
Abstract	We consider the problem of structure learning for Gaifman models and learn relational features that can be used to derive feature representations from a knowledge base. These relational features are first-order rules that are then partially grounded and counted over local neighborhoods of a Gaifman model to obtain the feature representations. We propose a method for learning these relational features for a Gaifman model by using relational tree distances. Our empirical evaluation on real data sets demonstrates the superiority of our approach over classical rule-learning.
Tasks
Published	2020-01-02
URL	https://arxiv.org/abs/2001.00528v2
PDF	https://arxiv.org/pdf/2001.00528v2.pdf
PWC	https://paperswithcode.com/paper/non-parametric-learning-of-gaifman-models
Repo
Framework

Progressive Growing of Neural ODEs


Title	Progressive Growing of Neural ODEs
Authors	Hammad A. Ayyubi, Yi Yao, Ajay Divakaran
Abstract	Neural Ordinary Differential Equations (NODEs) have proven to be a powerful modeling tool for approximating (interpolation) and forecasting (extrapolation) irregularly sampled time series data. However, their performance degrades substantially when applied to real-world data, especially long-term data with complex behaviors (e.g., long-term trend across years, mid-term seasonality across months, and short-term local variation across days). To address the modeling of such complex data with different behaviors at different frequencies (time spans), we propose a novel progressive learning paradigm of NODEs for long-term time series forecasting. Specifically, following the principle of curriculum learning, we gradually increase the complexity of data and network capacity as training progresses. Our experiments with both synthetic data and real traffic data (PeMS Bay Area traffic data) show that our training methodology consistently improves the performance of vanilla NODEs by over 64%.
Tasks	Time Series, Time Series Forecasting
Published	2020-03-08
URL	https://arxiv.org/abs/2003.03695v1
PDF	https://arxiv.org/pdf/2003.03695v1.pdf
PWC	https://paperswithcode.com/paper/progressive-growing-of-neural-odes
Repo
Framework

Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks


Title	Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks
Authors	Yehui Tang, Yunhe Wang, Yixing Xu, Boxin Shi, Chao Xu, Chunjing Xu, Chang Xu
Abstract	Deep neural networks often consist of a great number of trainable parameters for extracting powerful features from given datasets. On one hand, massive trainable parameters significantly enhance the performance of these deep networks. On the other hand, they bring the problem of over-fitting. To this end, dropout based methods disable some elements in the output feature maps during the training phase for reducing the co-adaptation of neurons. Although the generalization ability of the resulting models can be enhanced by these approaches, the conventional binary dropout is not the optimal solution. Therefore, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks and propose a feature distortion method (Disout) for addressing the aforementioned problem. In the training period, randomly selected elements in the feature maps will be replaced with specific values by exploiting the generalization error bound. The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated on several benchmark image datasets.
Tasks
Published	2020-02-23
URL	https://arxiv.org/abs/2002.11022v1
PDF	https://arxiv.org/pdf/2002.11022v1.pdf
PWC	https://paperswithcode.com/paper/beyond-dropout-feature-map-distortion-to
Repo
Framework

A Multi-oriented Chinese Keyword Spotter Guided by Text Line Detection


Title	A Multi-oriented Chinese Keyword Spotter Guided by Text Line Detection
Authors	Pei Xu, Shan Huang, Hongzhen Wang, Hao Song, Shen Huang, Qi Ju
Abstract	Chinese keyword spotting is a challenging task as there is no visual blank for Chinese words. Different from English words which are split naturally by visual blanks, Chinese words are generally split only by semantic information. In this paper, we propose a new Chinese keyword spotter for natural images, which is inspired by Mask R-CNN. We propose to predict the keyword masks guided by text line detection. Firstly, proposals of text lines are generated by Faster R-CNN;Then, text line masks and keyword masks are predicted by segmentation in the proposals. In this way, the text lines and keywords are predicted in parallel. We create two Chinese keyword datasets based on RCTW-17 and ICPR MTWI2018 to verify the effectiveness of our method.
Tasks	Keyword Spotting
Published	2020-01-03
URL	https://arxiv.org/abs/2001.00722v2
PDF	https://arxiv.org/pdf/2001.00722v2.pdf
PWC	https://paperswithcode.com/paper/a-multi-oriented-chinese-keyword-spotter
Repo
Framework

Sparse Covariance Estimation in Logit Mixture Models


Title	Sparse Covariance Estimation in Logit Mixture Models
Authors	Youssef M Aboutaleb, Mazen Danaf, Yifei Xie, Moshe Ben-Akiva
Abstract	This paper introduces a new data-driven methodology for estimating sparse covariance matrices of the random coefficients in logit mixture models. Researchers typically specify covariance matrices in logit mixture models under one of two extreme assumptions: either an unrestricted full covariance matrix (allowing correlations between all random coefficients), or a restricted diagonal matrix (allowing no correlations at all). Our objective is to find optimal subsets of correlated coefficients for which we estimate covariances. We propose a new estimator, called MISC, that uses a mixed-integer optimization (MIO) program to find an optimal block diagonal structure specification for the covariance matrix, corresponding to subsets of correlated coefficients, for any desired sparsity level using Markov Chain Monte Carlo (MCMC) posterior draws from the unrestricted full covariance matrix. The optimal sparsity level of the covariance matrix is determined using out-of-sample validation. We demonstrate the ability of MISC to correctly recover the true covariance structure from synthetic data. In an empirical illustration using a stated preference survey on modes of transportation, we use MISC to obtain a sparse covariance matrix indicating how preferences for attributes are related to one another.
Tasks
Published	2020-01-14
URL	https://arxiv.org/abs/2001.05034v1
PDF	https://arxiv.org/pdf/2001.05034v1.pdf
PWC	https://paperswithcode.com/paper/sparse-covariance-estimation-in-logit-mixture
Repo
Framework