April 2, 2020

2975 words 14 mins read

Paper Group ANR 267

Geometric Fusion via Joint Delay Embeddings. Spatio-temporal Tubelet Feature Aggregation and Object Linking in Videos. Efficiency and Equity are Both Essential: A Generalized Traffic Signal Controller with Deep Reinforcement Learning. multi-patch aggregation models for resampling detection. Longitudinal Support Vector Machines for High Dimensional …

Geometric Fusion via Joint Delay Embeddings


Title	Geometric Fusion via Joint Delay Embeddings
Authors	Elchanan Solomon, Paul Bendich
Abstract	We introduce geometric and topological methods to develop a new framework for fusing multi-sensor time series. This framework consists of two steps: (1) a joint delay embedding, which reconstructs a high-dimensional state space in which our sensors correspond to observation functions, and (2) a simple orthogonalization scheme, which accounts for tangencies between such observation functions, and produces a more diversified geometry on the embedding space. We conclude with some synthetic and real-world experiments demonstrating that our framework outperforms traditional metric fusion methods.
Tasks	Time Series
Published	2020-02-25
URL	https://arxiv.org/abs/2002.11201v1
PDF	https://arxiv.org/pdf/2002.11201v1.pdf
PWC	https://paperswithcode.com/paper/geometric-fusion-via-joint-delay-embeddings
Repo
Framework

Spatio-temporal Tubelet Feature Aggregation and Object Linking in Videos


Title	Spatio-temporal Tubelet Feature Aggregation and Object Linking in Videos
Authors	Daniel Cores, Víctor M. Brea, Manuel Mucientes
Abstract	This paper addresses the problem of how to exploit spatio-temporal information available in videos to improve the object detection precision. We propose a two stage object detector called FANet based on short-term spatio-temporal feature aggregation to give a first detection set, and long-term object linking to refine these detections. Firstly, we generate a set of short tubelet proposals containing the object in $N$ consecutive frames. Then, we aggregate RoI pooled deep features through the tubelet using a temporal pooling operator that summarizes the information with a fixed size output independent of the number of input frames. On top of that, we define a double head implementation that we feed with spatio-temporal aggregated information for spatio-temporal object classification, and with spatial information extracted from the current frame for object localization and spatial classification. Furthermore, we also specialize each head branch architecture to better perform in each task taking into account the input data. Finally, a long-term linking method builds long tubes using the previously calculated short tubelets to overcome detection errors. We have evaluated our model in the widely used ImageNet VID dataset achieving a 80.9% mAP, which is the new state-of-the-art result for single models. Also, in the challenging small object detection dataset USC-GRAD-STDdb, our proposal outperforms the single frame baseline by 5.4% mAP.
Tasks	Object Classification, Object Detection, Object Localization, Small Object Detection
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00451v1
PDF	https://arxiv.org/pdf/2004.00451v1.pdf
PWC	https://paperswithcode.com/paper/spatio-temporal-tubelet-feature-aggregation
Repo
Framework

Efficiency and Equity are Both Essential: A Generalized Traffic Signal Controller with Deep Reinforcement Learning


Title	Efficiency and Equity are Both Essential: A Generalized Traffic Signal Controller with Deep Reinforcement Learning
Authors	Shengchao Yan, Jingwei Zhang, Daniel Buescher, Wolfram Burgard
Abstract	Traffic signal controllers play an essential role in the traffic system, while the current majority of them are not sufficiently flexible or adaptive to make optimal traffic schedules. In this paper we present an approach to learn policies for the signal controllers using deep reinforcement learning. Our method uses a novel formulation of the reward function that simultaneously considers efficiency and equity. We furthermore present a general approach to find the bound for the proposed equity factor. Moreover, we introduce the adaptive discounting approach that greatly stabilizes learning, which helps to keep high flexibility of green light duration. The experimental evaluations on both simulated and real-world data demonstrate that our proposed algorithm achieves state-of-the-art performance (previously held by traditional non-learning methods) on a wide range of traffic situations. A video of our experimental results can be found at: https://youtu.be/3rc5-ac3XX0
Tasks
Published	2020-03-09
URL	https://arxiv.org/abs/2003.04046v1
PDF	https://arxiv.org/pdf/2003.04046v1.pdf
PWC	https://paperswithcode.com/paper/efficiency-and-equity-are-both-essential-a
Repo
Framework

multi-patch aggregation models for resampling detection


Title	multi-patch aggregation models for resampling detection
Authors	Mohit Lamba, Kaushik Mitra
Abstract	Images captured nowadays are of varying dimensions with smartphones and DSLR’s allowing users to choose from a list of available image resolutions. It is therefore imperative for forensic algorithms such as resampling detection to scale well for images of varying dimensions. However, in our experiments, we observed that many state-of-the-art forensic algorithms are sensitive to image size and their performance quickly degenerates when operated on images of diverse dimensions despite re-training them using multiple image sizes. To handle this issue, we propose a novel pooling strategy called ITERATIVE POOLING. This pooling strategy can dynamically adjust input tensors in a discrete without much loss of information as in ROI Max-pooling. This pooling strategy can be used with any of the existing deep models and for demonstration purposes, we show its utility on Resnet-18 for the case of resampling detection a fundamental operation for any image sought of image manipulation. Compared to existing strategies and Max-pooling it gives up to 7-8% improvement on public datasets.
Tasks
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01364v1
PDF	https://arxiv.org/pdf/2003.01364v1.pdf
PWC	https://paperswithcode.com/paper/multi-patch-aggregation-models-for-resampling
Repo
Framework

Longitudinal Support Vector Machines for High Dimensional Time Series


Title	Longitudinal Support Vector Machines for High Dimensional Time Series
Authors	Kristiaan Pelckmans, Hong-Li Zeng
Abstract	We consider the problem of learning a classifier from observed functional data. Here, each data-point takes the form of a single time-series and contains numerous features. Assuming that each such series comes with a binary label, the problem of learning to predict the label of a new coming time-series is considered. Hereto, the notion of {\em margin} underlying the classical support vector machine is extended to the continuous version for such data. The longitudinal support vector machine is also a convex optimization problem and its dual form is derived as well. Empirical results for specified cases with significance tests indicate the efficacy of this innovative algorithm for analyzing such long-term multivariate data.
Tasks	Time Series
Published	2020-02-22
URL	https://arxiv.org/abs/2002.09763v1
PDF	https://arxiv.org/pdf/2002.09763v1.pdf
PWC	https://paperswithcode.com/paper/longitudinal-support-vector-machines-for-high
Repo
Framework

Learning to Cluster Faces via Confidence and Connectivity Estimation


Title	Learning to Cluster Faces via Confidence and Connectivity Estimation
Authors	Lei Yang, Dapeng Chen, Xiaohang Zhan, Rui Zhao, Chen Change Loy, Dahua Lin
Abstract	Face clustering is an essential tool for exploiting the unlabeled face data, and has a wide range of applications including face annotation and retrieval. Recent works show that supervised clustering can result in noticeable performance gain. However, they usually involve heuristic steps and require numerous overlapped subgraphs, severely restricting their accuracy and efficiency. In this paper, we propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs. Instead, we transform the clustering problem into two sub-problems. Specifically, two graph convolutional networks, named GCN-V and GCN-E, are designed to estimate the confidence of vertices and the connectivity of edges, respectively. With the vertex confidence and edge connectivity, we can naturally organize more relevant vertices on the affinity graph and group them into clusters. Experiments on two large-scale benchmarks show that our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods.
Tasks	Connectivity Estimation
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00445v1
PDF	https://arxiv.org/pdf/2004.00445v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-cluster-faces-via-confidence-and
Repo
Framework

Handling Concept Drifts in Regression Problems – the Error Intersection Approach


Title	Handling Concept Drifts in Regression Problems – the Error Intersection Approach
Authors	Lucas Baier, Marcel Hofmann, Niklas Kühl, Marisa Mohr, Gerhard Satzger
Abstract	Machine learning models are omnipresent for predictions on big data. One challenge of deployed models is the change of the data over time, a phenomenon called concept drift. If not handled correctly, a concept drift can lead to significant mispredictions. We explore a novel approach for concept drift handling, which depicts a strategy to switch between the application of simple and complex machine learning models for regression tasks. We assume that the approach plays out the individual strengths of each model, switching to the simpler model if a drift occurs and switching back to the complex model for typical situations. We instantiate the approach on a real-world data set of taxi demand in New York City, which is prone to multiple drifts, e.g. the weather phenomena of blizzards, resulting in a sudden decrease of taxi demand. We are able to show that our suggested approach outperforms all regarded baselines significantly.
Tasks
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00438v1
PDF	https://arxiv.org/pdf/2004.00438v1.pdf
PWC	https://paperswithcode.com/paper/handling-concept-drifts-in-regression
Repo
Framework

Total Variation Regularization for Compartmental Epidemic Models with Time-varying Dynamics


Title	Total Variation Regularization for Compartmental Epidemic Models with Time-varying Dynamics
Authors	Wenjie Zheng
Abstract	Traditional methods to infer compartmental epidemic models with time-varying dynamics can only capture continuous changes in the dynamic. However, many changes are discontinuous due to sudden interventions, such as city lockdown and opening of field hospitals. To model the discontinuities, this study introduces the tool of total variation regularization, which regulates the temporal changes of the dynamic parameters, such as the transmission rate. To recover the ground truth dynamic, this study designs a novel yet straightforward optimization algorithm, dubbed iterative Nelder-Mead, which repeatedly applies the Nelder-Mead algorithm. Experiments on the simulated data show that the proposed approach can qualitatively reproduce the discontinuities of the underlying dynamics. To extend this research to real data as well as to help researchers worldwide to fight against COVID-19, the author releases his research platform as an open-source package.
Tasks
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00412v1
PDF	https://arxiv.org/pdf/2004.00412v1.pdf
PWC	https://paperswithcode.com/paper/total-variation-regularization-for
Repo
Framework

Drug-disease Graph: Predicting Adverse Drug Reaction Signals via Graph Neural Network with Clinical Data


Title	Drug-disease Graph: Predicting Adverse Drug Reaction Signals via Graph Neural Network with Clinical Data
Authors	Heeyoung Kwak, Minwoo Lee, Seunghyun Yoon, Jooyoung Chang, Sangmin Park, Kyomin Jung
Abstract	Adverse Drug Reaction (ADR) is a significant public health concern world-wide. Numerous graph-based methods have been applied to biomedical graphs for predicting ADRs in pre-marketing phases. ADR detection in post-market surveillance is no less important than pre-marketing assessment, and ADR detection with large-scale clinical data have attracted much attention in recent years. However, there are not many studies considering graph structures from clinical data for detecting an ADR signal, which is a pair of a prescription and a diagnosis that might be a potential ADR. In this study, we develop a novel graph-based framework for ADR signal detection using healthcare claims data. We construct a Drug-disease graph with nodes representing the medical codes. The edges are given as the relationships between two codes, computed using the data. We apply Graph Neural Network to predict ADR signals, using labels from the Side Effect Resource database. The model shows improved AUROC and AUPRC performance of 0.795 and 0.775, compared to other algorithms, showing that it successfully learns node representations expressive of those relationships. Furthermore, our model predicts ADR pairs that do not exist in the established ADR database, showing its capability to supplement the ADR database.
Tasks
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00407v1
PDF	https://arxiv.org/pdf/2004.00407v1.pdf
PWC	https://paperswithcode.com/paper/drug-disease-graph-predicting-adverse-drug
Repo
Framework

Image Demoireing with Learnable Bandpass Filters


Title	Image Demoireing with Learnable Bandpass Filters
Authors	Bolun Zheng, Shanxin Yuan, Gregory Slabaugh, Ales Leonardis
Abstract	Image demoireing is a multi-faceted image restoration task involving both texture and color restoration. In this paper, we propose a novel multiscale bandpass convolutional neural network (MBCNN) to address this problem. As an end-to-end solution, MBCNN respectively solves the two sub-problems. For texture restoration, we propose a learnable bandpass filter (LBF) to learn the frequency prior for moire texture removal. For color restoration, we propose a two-step tone mapping strategy, which first applies a global tone mapping to correct for a global color shift, and then performs local fine tuning of the color per pixel. Through an ablation study, we demonstrate the effectiveness of the different components of MBCNN. Experimental results on two public datasets show that our method outperforms state-of-the-art methods by a large margin (more than 2dB in terms of PSNR).
Tasks	Image Restoration
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00406v1
PDF	https://arxiv.org/pdf/2004.00406v1.pdf
PWC	https://paperswithcode.com/paper/image-demoireing-with-learnable-bandpass
Repo
Framework

The Utility of Feature Reuse: Transfer Learning in Data-Starved Regimes


Title	The Utility of Feature Reuse: Transfer Learning in Data-Starved Regimes
Authors	Edward Verenich, Alvaro Velasquez, M. G. Sarwar Murshed, Faraz Hussain
Abstract	The use of transfer learning with deep neural networks has increasingly become widespread for deploying well-tested computer vision systems to newer domains, especially those with limited datasets. We describe a transfer learning use case for a domain with a data-starved regime, having fewer than 100 labeled target samples. We evaluate the effectiveness of convolutional feature extraction and fine-tuning of overparameterized models with respect to the size of target training data, as well as their generalization performance on data with covariate shift, or out-of-distribution (OOD) data. Our experiments show that both overparameterization and feature reuse contribute to successful application of transfer learning in training image classifiers in data-starved regimes.
Tasks	Transfer Learning
Published	2020-02-29
URL	https://arxiv.org/abs/2003.04117v1
PDF	https://arxiv.org/pdf/2003.04117v1.pdf
PWC	https://paperswithcode.com/paper/the-utility-of-feature-reuse-transfer
Repo
Framework

A Modular Neural Network Based Deep Learning Approach for MIMO Signal Detection


Title	A Modular Neural Network Based Deep Learning Approach for MIMO Signal Detection
Authors	Songyan Xue, Yi Ma, Na Yi, Terence E. Dodgson
Abstract	In this paper, we reveal that artificial neural network (ANN) assisted multiple-input multiple-output (MIMO) signal detection can be modeled as ANN-assisted lossy vector quantization (VQ), named MIMO-VQ, which is basically a joint statistical channel quantization and signal quantization procedure. It is found that the quantization loss increases linearly with the number of transmit antennas, and thus MIMO-VQ scales poorly with the size of MIMO. Motivated by this finding, we propose a novel modular neural network based approach, termed MNNet, where the whole network is formed by a set of pre-defined ANN modules. The key of ANN module design lies in the integration of parallel interference cancellation in the MNNet, which linearly reduces the interference (or equivalently the number of transmit-antennas) along the feed-forward propagation; and so as the quantization loss. Our simulation results show that the MNNet approach largely improves the deep-learning capacity with near-optimal performance in various cases. Provided that MNNet is well modularized, the learning procedure does not need to be applied on the entire network as a whole, but rather at the modular level. Due to this reason, MNNet has the advantage of much lower learning complexity than other deep-learning based MIMO detection approaches.
Tasks	Quantization
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00404v1
PDF	https://arxiv.org/pdf/2004.00404v1.pdf
PWC	https://paperswithcode.com/paper/a-modular-neural-network-based-deep-learning
Repo
Framework

Two-shot Spatially-varying BRDF and Shape Estimation


Title	Two-shot Spatially-varying BRDF and Shape Estimation
Authors	Mark Boss, Varun Jampani, Kihwan Kim, Hendrik P. A. Lensch, Jan Kautz
Abstract	Capturing the shape and spatially-varying appearance (SVBRDF) of an object from images is a challenging task that has applications in both computer vision and graphics. Traditional optimization-based approaches often need a large number of images taken from multiple views in a controlled environment. Newer deep learning-based approaches require only a few input images, but the reconstruction quality is not on par with optimization techniques. We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF. The previous predictions guide each estimation, and a joint refinement network later refines both SVBRDF and shape. We follow a practical mobile image capture setting and use unaligned two-shot flash and no-flash images as input. Both our two-shot image capture and network inference can run on mobile hardware. We also create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials. Extensive experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images. Comparisons with recent approaches demonstrate the superior performance of the proposed approach.
Tasks
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00403v1
PDF	https://arxiv.org/pdf/2004.00403v1.pdf
PWC	https://paperswithcode.com/paper/two-shot-spatially-varying-brdf-and-shape
Repo
Framework

More Grounded Image Captioning by Distilling Image-Text Matching Model


Title	More Grounded Image Captioning by Distilling Image-Text Matching Model
Authors	Yuanen Zhou, Meng Wang, Daqing Liu, Zhenzhen Hu, Hanwang Zhang
Abstract	Visual attention not only improves the performance of image captioners, but also serves as a visual interpretation to qualitatively measure the caption rationality and model transparency. Specifically, we expect that a captioner can fix its attentive gaze on the correct objects while generating the corresponding words. This ability is also known as grounded image captioning. However, the grounding accuracy of existing captioners is far from satisfactory. To improve the grounding accuracy while retaining the captioning quality, it is expensive to collect the word-region alignment as strong supervision. To this end, we propose a Part-of-Speech (POS) enhanced image-text matching model (SCAN \cite{lee2018stacked}): POS-SCAN, as the effective knowledge distillation for more grounded image captioning. The benefits are two-fold: 1) given a sentence and an image, POS-SCAN can ground the objects more accurately than SCAN; 2) POS-SCAN serves as a word-region alignment regularization for the captioner’s visual attention module. By showing benchmark experimental results, we demonstrate that conventional image captioners equipped with POS-SCAN can significantly improve the grounding accuracy without strong supervision. Last but not the least, we explore the indispensable Self-Critical Sequence Training (SCST) \cite{Rennie_2017_CVPR} in the context of grounded image captioning and show that the image-text matching score can serve as a reward for more grounded captioning \footnote{https://github.com/YuanEZhou/Grounded-Image-Captioning}.
Tasks	Image Captioning, Text Matching
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00390v1
PDF	https://arxiv.org/pdf/2004.00390v1.pdf
PWC	https://paperswithcode.com/paper/more-grounded-image-captioning-by-distilling
Repo
Framework

A New Challenge: Approaching Tetris Link with AI


Title	A New Challenge: Approaching Tetris Link with AI
Authors	Matthias Muller-Brockhausen, Mike Preuss, Aske Plaat
Abstract	Decades of research have been invested in making computer programs for playing games such as Chess and Go. This paper focuses on a new game, Tetris Link, a board game that is still lacking any scientific analysis. Tetris Link has a large branching factor, hampering a traditional heuristic planning approach. We explore heuristic planning and two other approaches: Reinforcement Learning, Monte Carlo tree search. We document our approach and report on their relative performance in a tournament. Curiously, the heuristic approach is stronger than the planning/learning approaches. However, experienced human players easily win the majority of the matches against the heuristic planning AIs. We, therefore, surmise that Tetris Link is more difficult than expected. We offer our findings to the community as a challenge to improve upon.
Tasks
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00377v1
PDF	https://arxiv.org/pdf/2004.00377v1.pdf
PWC	https://paperswithcode.com/paper/a-new-challenge-approaching-tetris-link-with
Repo
Framework