Paper Group ANR 267
Geometric Fusion via Joint Delay Embeddings. Spatio-temporal Tubelet Feature Aggregation and Object Linking in Videos. Efficiency and Equity are Both Essential: A Generalized Traffic Signal Controller with Deep Reinforcement Learning. multi-patch aggregation models for resampling detection. Longitudinal Support Vector Machines for High Dimensional …
Geometric Fusion via Joint Delay Embeddings
Title | Geometric Fusion via Joint Delay Embeddings |
Authors | Elchanan Solomon, Paul Bendich |
Abstract | We introduce geometric and topological methods to develop a new framework for fusing multi-sensor time series. This framework consists of two steps: (1) a joint delay embedding, which reconstructs a high-dimensional state space in which our sensors correspond to observation functions, and (2) a simple orthogonalization scheme, which accounts for tangencies between such observation functions, and produces a more diversified geometry on the embedding space. We conclude with some synthetic and real-world experiments demonstrating that our framework outperforms traditional metric fusion methods. |
Tasks | Time Series |
Published | 2020-02-25 |
URL | https://arxiv.org/abs/2002.11201v1 |
https://arxiv.org/pdf/2002.11201v1.pdf | |
PWC | https://paperswithcode.com/paper/geometric-fusion-via-joint-delay-embeddings |
Repo | |
Framework | |
Spatio-temporal Tubelet Feature Aggregation and Object Linking in Videos
Title | Spatio-temporal Tubelet Feature Aggregation and Object Linking in Videos |
Authors | Daniel Cores, Víctor M. Brea, Manuel Mucientes |
Abstract | This paper addresses the problem of how to exploit spatio-temporal information available in videos to improve the object detection precision. We propose a two stage object detector called FANet based on short-term spatio-temporal feature aggregation to give a first detection set, and long-term object linking to refine these detections. Firstly, we generate a set of short tubelet proposals containing the object in $N$ consecutive frames. Then, we aggregate RoI pooled deep features through the tubelet using a temporal pooling operator that summarizes the information with a fixed size output independent of the number of input frames. On top of that, we define a double head implementation that we feed with spatio-temporal aggregated information for spatio-temporal object classification, and with spatial information extracted from the current frame for object localization and spatial classification. Furthermore, we also specialize each head branch architecture to better perform in each task taking into account the input data. Finally, a long-term linking method builds long tubes using the previously calculated short tubelets to overcome detection errors. We have evaluated our model in the widely used ImageNet VID dataset achieving a 80.9% mAP, which is the new state-of-the-art result for single models. Also, in the challenging small object detection dataset USC-GRAD-STDdb, our proposal outperforms the single frame baseline by 5.4% mAP. |
Tasks | Object Classification, Object Detection, Object Localization, Small Object Detection |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00451v1 |
https://arxiv.org/pdf/2004.00451v1.pdf | |
PWC | https://paperswithcode.com/paper/spatio-temporal-tubelet-feature-aggregation |
Repo | |
Framework | |
Efficiency and Equity are Both Essential: A Generalized Traffic Signal Controller with Deep Reinforcement Learning
Title | Efficiency and Equity are Both Essential: A Generalized Traffic Signal Controller with Deep Reinforcement Learning |
Authors | Shengchao Yan, Jingwei Zhang, Daniel Buescher, Wolfram Burgard |
Abstract | Traffic signal controllers play an essential role in the traffic system, while the current majority of them are not sufficiently flexible or adaptive to make optimal traffic schedules. In this paper we present an approach to learn policies for the signal controllers using deep reinforcement learning. Our method uses a novel formulation of the reward function that simultaneously considers efficiency and equity. We furthermore present a general approach to find the bound for the proposed equity factor. Moreover, we introduce the adaptive discounting approach that greatly stabilizes learning, which helps to keep high flexibility of green light duration. The experimental evaluations on both simulated and real-world data demonstrate that our proposed algorithm achieves state-of-the-art performance (previously held by traditional non-learning methods) on a wide range of traffic situations. A video of our experimental results can be found at: https://youtu.be/3rc5-ac3XX0 |
Tasks | |
Published | 2020-03-09 |
URL | https://arxiv.org/abs/2003.04046v1 |
https://arxiv.org/pdf/2003.04046v1.pdf | |
PWC | https://paperswithcode.com/paper/efficiency-and-equity-are-both-essential-a |
Repo | |
Framework | |
multi-patch aggregation models for resampling detection
Title | multi-patch aggregation models for resampling detection |
Authors | Mohit Lamba, Kaushik Mitra |
Abstract | Images captured nowadays are of varying dimensions with smartphones and DSLR’s allowing users to choose from a list of available image resolutions. It is therefore imperative for forensic algorithms such as resampling detection to scale well for images of varying dimensions. However, in our experiments, we observed that many state-of-the-art forensic algorithms are sensitive to image size and their performance quickly degenerates when operated on images of diverse dimensions despite re-training them using multiple image sizes. To handle this issue, we propose a novel pooling strategy called ITERATIVE POOLING. This pooling strategy can dynamically adjust input tensors in a discrete without much loss of information as in ROI Max-pooling. This pooling strategy can be used with any of the existing deep models and for demonstration purposes, we show its utility on Resnet-18 for the case of resampling detection a fundamental operation for any image sought of image manipulation. Compared to existing strategies and Max-pooling it gives up to 7-8% improvement on public datasets. |
Tasks | |
Published | 2020-03-03 |
URL | https://arxiv.org/abs/2003.01364v1 |
https://arxiv.org/pdf/2003.01364v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-patch-aggregation-models-for-resampling |
Repo | |
Framework | |
Longitudinal Support Vector Machines for High Dimensional Time Series
Title | Longitudinal Support Vector Machines for High Dimensional Time Series |
Authors | Kristiaan Pelckmans, Hong-Li Zeng |
Abstract | We consider the problem of learning a classifier from observed functional data. Here, each data-point takes the form of a single time-series and contains numerous features. Assuming that each such series comes with a binary label, the problem of learning to predict the label of a new coming time-series is considered. Hereto, the notion of {\em margin} underlying the classical support vector machine is extended to the continuous version for such data. The longitudinal support vector machine is also a convex optimization problem and its dual form is derived as well. Empirical results for specified cases with significance tests indicate the efficacy of this innovative algorithm for analyzing such long-term multivariate data. |
Tasks | Time Series |
Published | 2020-02-22 |
URL | https://arxiv.org/abs/2002.09763v1 |
https://arxiv.org/pdf/2002.09763v1.pdf | |
PWC | https://paperswithcode.com/paper/longitudinal-support-vector-machines-for-high |
Repo | |
Framework | |
Learning to Cluster Faces via Confidence and Connectivity Estimation
Title | Learning to Cluster Faces via Confidence and Connectivity Estimation |
Authors | Lei Yang, Dapeng Chen, Xiaohang Zhan, Rui Zhao, Chen Change Loy, Dahua Lin |
Abstract | Face clustering is an essential tool for exploiting the unlabeled face data, and has a wide range of applications including face annotation and retrieval. Recent works show that supervised clustering can result in noticeable performance gain. However, they usually involve heuristic steps and require numerous overlapped subgraphs, severely restricting their accuracy and efficiency. In this paper, we propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs. Instead, we transform the clustering problem into two sub-problems. Specifically, two graph convolutional networks, named GCN-V and GCN-E, are designed to estimate the confidence of vertices and the connectivity of edges, respectively. With the vertex confidence and edge connectivity, we can naturally organize more relevant vertices on the affinity graph and group them into clusters. Experiments on two large-scale benchmarks show that our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods. |
Tasks | Connectivity Estimation |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00445v1 |
https://arxiv.org/pdf/2004.00445v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-cluster-faces-via-confidence-and |
Repo | |
Framework | |
Handling Concept Drifts in Regression Problems – the Error Intersection Approach
Title | Handling Concept Drifts in Regression Problems – the Error Intersection Approach |
Authors | Lucas Baier, Marcel Hofmann, Niklas Kühl, Marisa Mohr, Gerhard Satzger |
Abstract | Machine learning models are omnipresent for predictions on big data. One challenge of deployed models is the change of the data over time, a phenomenon called concept drift. If not handled correctly, a concept drift can lead to significant mispredictions. We explore a novel approach for concept drift handling, which depicts a strategy to switch between the application of simple and complex machine learning models for regression tasks. We assume that the approach plays out the individual strengths of each model, switching to the simpler model if a drift occurs and switching back to the complex model for typical situations. We instantiate the approach on a real-world data set of taxi demand in New York City, which is prone to multiple drifts, e.g. the weather phenomena of blizzards, resulting in a sudden decrease of taxi demand. We are able to show that our suggested approach outperforms all regarded baselines significantly. |
Tasks | |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00438v1 |
https://arxiv.org/pdf/2004.00438v1.pdf | |
PWC | https://paperswithcode.com/paper/handling-concept-drifts-in-regression |
Repo | |
Framework | |
Total Variation Regularization for Compartmental Epidemic Models with Time-varying Dynamics
Title | Total Variation Regularization for Compartmental Epidemic Models with Time-varying Dynamics |
Authors | Wenjie Zheng |
Abstract | Traditional methods to infer compartmental epidemic models with time-varying dynamics can only capture continuous changes in the dynamic. However, many changes are discontinuous due to sudden interventions, such as city lockdown and opening of field hospitals. To model the discontinuities, this study introduces the tool of total variation regularization, which regulates the temporal changes of the dynamic parameters, such as the transmission rate. To recover the ground truth dynamic, this study designs a novel yet straightforward optimization algorithm, dubbed iterative Nelder-Mead, which repeatedly applies the Nelder-Mead algorithm. Experiments on the simulated data show that the proposed approach can qualitatively reproduce the discontinuities of the underlying dynamics. To extend this research to real data as well as to help researchers worldwide to fight against COVID-19, the author releases his research platform as an open-source package. |
Tasks | |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00412v1 |
https://arxiv.org/pdf/2004.00412v1.pdf | |
PWC | https://paperswithcode.com/paper/total-variation-regularization-for |
Repo | |
Framework | |
Drug-disease Graph: Predicting Adverse Drug Reaction Signals via Graph Neural Network with Clinical Data
Title | Drug-disease Graph: Predicting Adverse Drug Reaction Signals via Graph Neural Network with Clinical Data |
Authors | Heeyoung Kwak, Minwoo Lee, Seunghyun Yoon, Jooyoung Chang, Sangmin Park, Kyomin Jung |
Abstract | Adverse Drug Reaction (ADR) is a significant public health concern world-wide. Numerous graph-based methods have been applied to biomedical graphs for predicting ADRs in pre-marketing phases. ADR detection in post-market surveillance is no less important than pre-marketing assessment, and ADR detection with large-scale clinical data have attracted much attention in recent years. However, there are not many studies considering graph structures from clinical data for detecting an ADR signal, which is a pair of a prescription and a diagnosis that might be a potential ADR. In this study, we develop a novel graph-based framework for ADR signal detection using healthcare claims data. We construct a Drug-disease graph with nodes representing the medical codes. The edges are given as the relationships between two codes, computed using the data. We apply Graph Neural Network to predict ADR signals, using labels from the Side Effect Resource database. The model shows improved AUROC and AUPRC performance of 0.795 and 0.775, compared to other algorithms, showing that it successfully learns node representations expressive of those relationships. Furthermore, our model predicts ADR pairs that do not exist in the established ADR database, showing its capability to supplement the ADR database. |
Tasks | |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00407v1 |
https://arxiv.org/pdf/2004.00407v1.pdf | |
PWC | https://paperswithcode.com/paper/drug-disease-graph-predicting-adverse-drug |
Repo | |
Framework | |
Image Demoireing with Learnable Bandpass Filters
Title | Image Demoireing with Learnable Bandpass Filters |
Authors | Bolun Zheng, Shanxin Yuan, Gregory Slabaugh, Ales Leonardis |
Abstract | Image demoireing is a multi-faceted image restoration task involving both texture and color restoration. In this paper, we propose a novel multiscale bandpass convolutional neural network (MBCNN) to address this problem. As an end-to-end solution, MBCNN respectively solves the two sub-problems. For texture restoration, we propose a learnable bandpass filter (LBF) to learn the frequency prior for moire texture removal. For color restoration, we propose a two-step tone mapping strategy, which first applies a global tone mapping to correct for a global color shift, and then performs local fine tuning of the color per pixel. Through an ablation study, we demonstrate the effectiveness of the different components of MBCNN. Experimental results on two public datasets show that our method outperforms state-of-the-art methods by a large margin (more than 2dB in terms of PSNR). |
Tasks | Image Restoration |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00406v1 |
https://arxiv.org/pdf/2004.00406v1.pdf | |
PWC | https://paperswithcode.com/paper/image-demoireing-with-learnable-bandpass |
Repo | |
Framework | |
The Utility of Feature Reuse: Transfer Learning in Data-Starved Regimes
Title | The Utility of Feature Reuse: Transfer Learning in Data-Starved Regimes |
Authors | Edward Verenich, Alvaro Velasquez, M. G. Sarwar Murshed, Faraz Hussain |
Abstract | The use of transfer learning with deep neural networks has increasingly become widespread for deploying well-tested computer vision systems to newer domains, especially those with limited datasets. We describe a transfer learning use case for a domain with a data-starved regime, having fewer than 100 labeled target samples. We evaluate the effectiveness of convolutional feature extraction and fine-tuning of overparameterized models with respect to the size of target training data, as well as their generalization performance on data with covariate shift, or out-of-distribution (OOD) data. Our experiments show that both overparameterization and feature reuse contribute to successful application of transfer learning in training image classifiers in data-starved regimes. |
Tasks | Transfer Learning |
Published | 2020-02-29 |
URL | https://arxiv.org/abs/2003.04117v1 |
https://arxiv.org/pdf/2003.04117v1.pdf | |
PWC | https://paperswithcode.com/paper/the-utility-of-feature-reuse-transfer |
Repo | |
Framework | |
A Modular Neural Network Based Deep Learning Approach for MIMO Signal Detection
Title | A Modular Neural Network Based Deep Learning Approach for MIMO Signal Detection |
Authors | Songyan Xue, Yi Ma, Na Yi, Terence E. Dodgson |
Abstract | In this paper, we reveal that artificial neural network (ANN) assisted multiple-input multiple-output (MIMO) signal detection can be modeled as ANN-assisted lossy vector quantization (VQ), named MIMO-VQ, which is basically a joint statistical channel quantization and signal quantization procedure. It is found that the quantization loss increases linearly with the number of transmit antennas, and thus MIMO-VQ scales poorly with the size of MIMO. Motivated by this finding, we propose a novel modular neural network based approach, termed MNNet, where the whole network is formed by a set of pre-defined ANN modules. The key of ANN module design lies in the integration of parallel interference cancellation in the MNNet, which linearly reduces the interference (or equivalently the number of transmit-antennas) along the feed-forward propagation; and so as the quantization loss. Our simulation results show that the MNNet approach largely improves the deep-learning capacity with near-optimal performance in various cases. Provided that MNNet is well modularized, the learning procedure does not need to be applied on the entire network as a whole, but rather at the modular level. Due to this reason, MNNet has the advantage of much lower learning complexity than other deep-learning based MIMO detection approaches. |
Tasks | Quantization |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00404v1 |
https://arxiv.org/pdf/2004.00404v1.pdf | |
PWC | https://paperswithcode.com/paper/a-modular-neural-network-based-deep-learning |
Repo | |
Framework | |
Two-shot Spatially-varying BRDF and Shape Estimation
Title | Two-shot Spatially-varying BRDF and Shape Estimation |
Authors | Mark Boss, Varun Jampani, Kihwan Kim, Hendrik P. A. Lensch, Jan Kautz |
Abstract | Capturing the shape and spatially-varying appearance (SVBRDF) of an object from images is a challenging task that has applications in both computer vision and graphics. Traditional optimization-based approaches often need a large number of images taken from multiple views in a controlled environment. Newer deep learning-based approaches require only a few input images, but the reconstruction quality is not on par with optimization techniques. We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF. The previous predictions guide each estimation, and a joint refinement network later refines both SVBRDF and shape. We follow a practical mobile image capture setting and use unaligned two-shot flash and no-flash images as input. Both our two-shot image capture and network inference can run on mobile hardware. We also create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials. Extensive experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images. Comparisons with recent approaches demonstrate the superior performance of the proposed approach. |
Tasks | |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00403v1 |
https://arxiv.org/pdf/2004.00403v1.pdf | |
PWC | https://paperswithcode.com/paper/two-shot-spatially-varying-brdf-and-shape |
Repo | |
Framework | |
More Grounded Image Captioning by Distilling Image-Text Matching Model
Title | More Grounded Image Captioning by Distilling Image-Text Matching Model |
Authors | Yuanen Zhou, Meng Wang, Daqing Liu, Zhenzhen Hu, Hanwang Zhang |
Abstract | Visual attention not only improves the performance of image captioners, but also serves as a visual interpretation to qualitatively measure the caption rationality and model transparency. Specifically, we expect that a captioner can fix its attentive gaze on the correct objects while generating the corresponding words. This ability is also known as grounded image captioning. However, the grounding accuracy of existing captioners is far from satisfactory. To improve the grounding accuracy while retaining the captioning quality, it is expensive to collect the word-region alignment as strong supervision. To this end, we propose a Part-of-Speech (POS) enhanced image-text matching model (SCAN \cite{lee2018stacked}): POS-SCAN, as the effective knowledge distillation for more grounded image captioning. The benefits are two-fold: 1) given a sentence and an image, POS-SCAN can ground the objects more accurately than SCAN; 2) POS-SCAN serves as a word-region alignment regularization for the captioner’s visual attention module. By showing benchmark experimental results, we demonstrate that conventional image captioners equipped with POS-SCAN can significantly improve the grounding accuracy without strong supervision. Last but not the least, we explore the indispensable Self-Critical Sequence Training (SCST) \cite{Rennie_2017_CVPR} in the context of grounded image captioning and show that the image-text matching score can serve as a reward for more grounded captioning \footnote{https://github.com/YuanEZhou/Grounded-Image-Captioning}. |
Tasks | Image Captioning, Text Matching |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00390v1 |
https://arxiv.org/pdf/2004.00390v1.pdf | |
PWC | https://paperswithcode.com/paper/more-grounded-image-captioning-by-distilling |
Repo | |
Framework | |
A New Challenge: Approaching Tetris Link with AI
Title | A New Challenge: Approaching Tetris Link with AI |
Authors | Matthias Muller-Brockhausen, Mike Preuss, Aske Plaat |
Abstract | Decades of research have been invested in making computer programs for playing games such as Chess and Go. This paper focuses on a new game, Tetris Link, a board game that is still lacking any scientific analysis. Tetris Link has a large branching factor, hampering a traditional heuristic planning approach. We explore heuristic planning and two other approaches: Reinforcement Learning, Monte Carlo tree search. We document our approach and report on their relative performance in a tournament. Curiously, the heuristic approach is stronger than the planning/learning approaches. However, experienced human players easily win the majority of the matches against the heuristic planning AIs. We, therefore, surmise that Tetris Link is more difficult than expected. We offer our findings to the community as a challenge to improve upon. |
Tasks | |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00377v1 |
https://arxiv.org/pdf/2004.00377v1.pdf | |
PWC | https://paperswithcode.com/paper/a-new-challenge-approaching-tetris-link-with |
Repo | |
Framework | |