April 2, 2020

# Paper Group ANR 267

Geometric Fusion via Joint Delay Embeddings. Spatio-temporal Tubelet Feature Aggregation and Object Linking in Videos. Efficiency and Equity are Both Essential: A Generalized Traffic Signal Controller with Deep Reinforcement Learning. multi-patch aggregation models for resampling detection. Longitudinal Support Vector Machines for High Dimensional …

#### Geometric Fusion via Joint Delay Embeddings

Title Geometric Fusion via Joint Delay Embeddings
Authors Elchanan Solomon, Paul Bendich
Abstract We introduce geometric and topological methods to develop a new framework for fusing multi-sensor time series. This framework consists of two steps: (1) a joint delay embedding, which reconstructs a high-dimensional state space in which our sensors correspond to observation functions, and (2) a simple orthogonalization scheme, which accounts for tangencies between such observation functions, and produces a more diversified geometry on the embedding space. We conclude with some synthetic and real-world experiments demonstrating that our framework outperforms traditional metric fusion methods.
Published 2020-02-25
URL https://arxiv.org/abs/2002.11201v1
PDF https://arxiv.org/pdf/2002.11201v1.pdf
PWC https://paperswithcode.com/paper/geometric-fusion-via-joint-delay-embeddings
Repo
Framework

#### Spatio-temporal Tubelet Feature Aggregation and Object Linking in Videos

Title Spatio-temporal Tubelet Feature Aggregation and Object Linking in Videos
Authors Daniel Cores, Víctor M. Brea, Manuel Mucientes
Abstract This paper addresses the problem of how to exploit spatio-temporal information available in videos to improve the object detection precision. We propose a two stage object detector called FANet based on short-term spatio-temporal feature aggregation to give a first detection set, and long-term object linking to refine these detections. Firstly, we generate a set of short tubelet proposals containing the object in $N$ consecutive frames. Then, we aggregate RoI pooled deep features through the tubelet using a temporal pooling operator that summarizes the information with a fixed size output independent of the number of input frames. On top of that, we define a double head implementation that we feed with spatio-temporal aggregated information for spatio-temporal object classification, and with spatial information extracted from the current frame for object localization and spatial classification. Furthermore, we also specialize each head branch architecture to better perform in each task taking into account the input data. Finally, a long-term linking method builds long tubes using the previously calculated short tubelets to overcome detection errors. We have evaluated our model in the widely used ImageNet VID dataset achieving a 80.9% mAP, which is the new state-of-the-art result for single models. Also, in the challenging small object detection dataset USC-GRAD-STDdb, our proposal outperforms the single frame baseline by 5.4% mAP.
Tasks Object Classification, Object Detection, Object Localization, Small Object Detection
Published 2020-04-01
URL https://arxiv.org/abs/2004.00451v1
PDF https://arxiv.org/pdf/2004.00451v1.pdf
PWC https://paperswithcode.com/paper/spatio-temporal-tubelet-feature-aggregation
Repo
Framework

#### Efficiency and Equity are Both Essential: A Generalized Traffic Signal Controller with Deep Reinforcement Learning

Title Efficiency and Equity are Both Essential: A Generalized Traffic Signal Controller with Deep Reinforcement Learning
Authors Shengchao Yan, Jingwei Zhang, Daniel Buescher, Wolfram Burgard
Abstract Traffic signal controllers play an essential role in the traffic system, while the current majority of them are not sufficiently flexible or adaptive to make optimal traffic schedules. In this paper we present an approach to learn policies for the signal controllers using deep reinforcement learning. Our method uses a novel formulation of the reward function that simultaneously considers efficiency and equity. We furthermore present a general approach to find the bound for the proposed equity factor. Moreover, we introduce the adaptive discounting approach that greatly stabilizes learning, which helps to keep high flexibility of green light duration. The experimental evaluations on both simulated and real-world data demonstrate that our proposed algorithm achieves state-of-the-art performance (previously held by traditional non-learning methods) on a wide range of traffic situations. A video of our experimental results can be found at: https://youtu.be/3rc5-ac3XX0
Published 2020-03-09
URL https://arxiv.org/abs/2003.04046v1
PDF https://arxiv.org/pdf/2003.04046v1.pdf
PWC https://paperswithcode.com/paper/efficiency-and-equity-are-both-essential-a
Repo
Framework

#### multi-patch aggregation models for resampling detection

Title multi-patch aggregation models for resampling detection
Authors Mohit Lamba, Kaushik Mitra
Abstract Images captured nowadays are of varying dimensions with smartphones and DSLR’s allowing users to choose from a list of available image resolutions. It is therefore imperative for forensic algorithms such as resampling detection to scale well for images of varying dimensions. However, in our experiments, we observed that many state-of-the-art forensic algorithms are sensitive to image size and their performance quickly degenerates when operated on images of diverse dimensions despite re-training them using multiple image sizes. To handle this issue, we propose a novel pooling strategy called ITERATIVE POOLING. This pooling strategy can dynamically adjust input tensors in a discrete without much loss of information as in ROI Max-pooling. This pooling strategy can be used with any of the existing deep models and for demonstration purposes, we show its utility on Resnet-18 for the case of resampling detection a fundamental operation for any image sought of image manipulation. Compared to existing strategies and Max-pooling it gives up to 7-8% improvement on public datasets.
Published 2020-03-03
URL https://arxiv.org/abs/2003.01364v1
PDF https://arxiv.org/pdf/2003.01364v1.pdf
PWC https://paperswithcode.com/paper/multi-patch-aggregation-models-for-resampling
Repo
Framework

#### Longitudinal Support Vector Machines for High Dimensional Time Series

Title Longitudinal Support Vector Machines for High Dimensional Time Series
Authors Kristiaan Pelckmans, Hong-Li Zeng
Abstract We consider the problem of learning a classifier from observed functional data. Here, each data-point takes the form of a single time-series and contains numerous features. Assuming that each such series comes with a binary label, the problem of learning to predict the label of a new coming time-series is considered. Hereto, the notion of {\em margin} underlying the classical support vector machine is extended to the continuous version for such data. The longitudinal support vector machine is also a convex optimization problem and its dual form is derived as well. Empirical results for specified cases with significance tests indicate the efficacy of this innovative algorithm for analyzing such long-term multivariate data.
Published 2020-02-22
URL https://arxiv.org/abs/2002.09763v1
PDF https://arxiv.org/pdf/2002.09763v1.pdf
PWC https://paperswithcode.com/paper/longitudinal-support-vector-machines-for-high
Repo
Framework

#### Learning to Cluster Faces via Confidence and Connectivity Estimation

Title Learning to Cluster Faces via Confidence and Connectivity Estimation
Authors Lei Yang, Dapeng Chen, Xiaohang Zhan, Rui Zhao, Chen Change Loy, Dahua Lin
Abstract Face clustering is an essential tool for exploiting the unlabeled face data, and has a wide range of applications including face annotation and retrieval. Recent works show that supervised clustering can result in noticeable performance gain. However, they usually involve heuristic steps and require numerous overlapped subgraphs, severely restricting their accuracy and efficiency. In this paper, we propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs. Instead, we transform the clustering problem into two sub-problems. Specifically, two graph convolutional networks, named GCN-V and GCN-E, are designed to estimate the confidence of vertices and the connectivity of edges, respectively. With the vertex confidence and edge connectivity, we can naturally organize more relevant vertices on the affinity graph and group them into clusters. Experiments on two large-scale benchmarks show that our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods.
Published 2020-04-01
URL https://arxiv.org/abs/2004.00445v1
PDF https://arxiv.org/pdf/2004.00445v1.pdf
PWC https://paperswithcode.com/paper/learning-to-cluster-faces-via-confidence-and
Repo
Framework

#### Handling Concept Drifts in Regression Problems – the Error Intersection Approach

Title Handling Concept Drifts in Regression Problems – the Error Intersection Approach
Authors Lucas Baier, Marcel Hofmann, Niklas Kühl, Marisa Mohr, Gerhard Satzger
Abstract Machine learning models are omnipresent for predictions on big data. One challenge of deployed models is the change of the data over time, a phenomenon called concept drift. If not handled correctly, a concept drift can lead to significant mispredictions. We explore a novel approach for concept drift handling, which depicts a strategy to switch between the application of simple and complex machine learning models for regression tasks. We assume that the approach plays out the individual strengths of each model, switching to the simpler model if a drift occurs and switching back to the complex model for typical situations. We instantiate the approach on a real-world data set of taxi demand in New York City, which is prone to multiple drifts, e.g. the weather phenomena of blizzards, resulting in a sudden decrease of taxi demand. We are able to show that our suggested approach outperforms all regarded baselines significantly.
Published 2020-04-01
URL https://arxiv.org/abs/2004.00438v1
PDF https://arxiv.org/pdf/2004.00438v1.pdf
PWC https://paperswithcode.com/paper/handling-concept-drifts-in-regression
Repo
Framework

#### Total Variation Regularization for Compartmental Epidemic Models with Time-varying Dynamics

Title Total Variation Regularization for Compartmental Epidemic Models with Time-varying Dynamics
Authors Wenjie Zheng
Abstract Traditional methods to infer compartmental epidemic models with time-varying dynamics can only capture continuous changes in the dynamic. However, many changes are discontinuous due to sudden interventions, such as city lockdown and opening of field hospitals. To model the discontinuities, this study introduces the tool of total variation regularization, which regulates the temporal changes of the dynamic parameters, such as the transmission rate. To recover the ground truth dynamic, this study designs a novel yet straightforward optimization algorithm, dubbed iterative Nelder-Mead, which repeatedly applies the Nelder-Mead algorithm. Experiments on the simulated data show that the proposed approach can qualitatively reproduce the discontinuities of the underlying dynamics. To extend this research to real data as well as to help researchers worldwide to fight against COVID-19, the author releases his research platform as an open-source package.
Published 2020-04-01
URL https://arxiv.org/abs/2004.00412v1
PDF https://arxiv.org/pdf/2004.00412v1.pdf
PWC https://paperswithcode.com/paper/total-variation-regularization-for
Repo
Framework

#### Drug-disease Graph: Predicting Adverse Drug Reaction Signals via Graph Neural Network with Clinical Data

Title Drug-disease Graph: Predicting Adverse Drug Reaction Signals via Graph Neural Network with Clinical Data
Authors Heeyoung Kwak, Minwoo Lee, Seunghyun Yoon, Jooyoung Chang, Sangmin Park, Kyomin Jung
Published 2020-04-01
URL https://arxiv.org/abs/2004.00407v1
PDF https://arxiv.org/pdf/2004.00407v1.pdf
Repo
Framework

#### Image Demoireing with Learnable Bandpass Filters

Title Image Demoireing with Learnable Bandpass Filters
Authors Bolun Zheng, Shanxin Yuan, Gregory Slabaugh, Ales Leonardis
Abstract Image demoireing is a multi-faceted image restoration task involving both texture and color restoration. In this paper, we propose a novel multiscale bandpass convolutional neural network (MBCNN) to address this problem. As an end-to-end solution, MBCNN respectively solves the two sub-problems. For texture restoration, we propose a learnable bandpass filter (LBF) to learn the frequency prior for moire texture removal. For color restoration, we propose a two-step tone mapping strategy, which first applies a global tone mapping to correct for a global color shift, and then performs local fine tuning of the color per pixel. Through an ablation study, we demonstrate the effectiveness of the different components of MBCNN. Experimental results on two public datasets show that our method outperforms state-of-the-art methods by a large margin (more than 2dB in terms of PSNR).
Published 2020-04-01
URL https://arxiv.org/abs/2004.00406v1
PDF https://arxiv.org/pdf/2004.00406v1.pdf
PWC https://paperswithcode.com/paper/image-demoireing-with-learnable-bandpass
Repo
Framework

#### The Utility of Feature Reuse: Transfer Learning in Data-Starved Regimes

Title The Utility of Feature Reuse: Transfer Learning in Data-Starved Regimes
Authors Edward Verenich, Alvaro Velasquez, M. G. Sarwar Murshed, Faraz Hussain
Abstract The use of transfer learning with deep neural networks has increasingly become widespread for deploying well-tested computer vision systems to newer domains, especially those with limited datasets. We describe a transfer learning use case for a domain with a data-starved regime, having fewer than 100 labeled target samples. We evaluate the effectiveness of convolutional feature extraction and fine-tuning of overparameterized models with respect to the size of target training data, as well as their generalization performance on data with covariate shift, or out-of-distribution (OOD) data. Our experiments show that both overparameterization and feature reuse contribute to successful application of transfer learning in training image classifiers in data-starved regimes.
Published 2020-02-29
URL https://arxiv.org/abs/2003.04117v1
PDF https://arxiv.org/pdf/2003.04117v1.pdf
PWC https://paperswithcode.com/paper/the-utility-of-feature-reuse-transfer
Repo
Framework

#### A Modular Neural Network Based Deep Learning Approach for MIMO Signal Detection

Title A Modular Neural Network Based Deep Learning Approach for MIMO Signal Detection
Authors Songyan Xue, Yi Ma, Na Yi, Terence E. Dodgson
Abstract In this paper, we reveal that artificial neural network (ANN) assisted multiple-input multiple-output (MIMO) signal detection can be modeled as ANN-assisted lossy vector quantization (VQ), named MIMO-VQ, which is basically a joint statistical channel quantization and signal quantization procedure. It is found that the quantization loss increases linearly with the number of transmit antennas, and thus MIMO-VQ scales poorly with the size of MIMO. Motivated by this finding, we propose a novel modular neural network based approach, termed MNNet, where the whole network is formed by a set of pre-defined ANN modules. The key of ANN module design lies in the integration of parallel interference cancellation in the MNNet, which linearly reduces the interference (or equivalently the number of transmit-antennas) along the feed-forward propagation; and so as the quantization loss. Our simulation results show that the MNNet approach largely improves the deep-learning capacity with near-optimal performance in various cases. Provided that MNNet is well modularized, the learning procedure does not need to be applied on the entire network as a whole, but rather at the modular level. Due to this reason, MNNet has the advantage of much lower learning complexity than other deep-learning based MIMO detection approaches.
Published 2020-04-01
URL https://arxiv.org/abs/2004.00404v1
PDF https://arxiv.org/pdf/2004.00404v1.pdf
PWC https://paperswithcode.com/paper/a-modular-neural-network-based-deep-learning
Repo
Framework

#### Two-shot Spatially-varying BRDF and Shape Estimation

Title Two-shot Spatially-varying BRDF and Shape Estimation
Authors Mark Boss, Varun Jampani, Kihwan Kim, Hendrik P. A. Lensch, Jan Kautz
Abstract Capturing the shape and spatially-varying appearance (SVBRDF) of an object from images is a challenging task that has applications in both computer vision and graphics. Traditional optimization-based approaches often need a large number of images taken from multiple views in a controlled environment. Newer deep learning-based approaches require only a few input images, but the reconstruction quality is not on par with optimization techniques. We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF. The previous predictions guide each estimation, and a joint refinement network later refines both SVBRDF and shape. We follow a practical mobile image capture setting and use unaligned two-shot flash and no-flash images as input. Both our two-shot image capture and network inference can run on mobile hardware. We also create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials. Extensive experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images. Comparisons with recent approaches demonstrate the superior performance of the proposed approach.
Published 2020-04-01
URL https://arxiv.org/abs/2004.00403v1
PDF https://arxiv.org/pdf/2004.00403v1.pdf
PWC https://paperswithcode.com/paper/two-shot-spatially-varying-brdf-and-shape
Repo
Framework

#### More Grounded Image Captioning by Distilling Image-Text Matching Model

Title More Grounded Image Captioning by Distilling Image-Text Matching Model
Authors Yuanen Zhou, Meng Wang, Daqing Liu, Zhenzhen Hu, Hanwang Zhang
Abstract Visual attention not only improves the performance of image captioners, but also serves as a visual interpretation to qualitatively measure the caption rationality and model transparency. Specifically, we expect that a captioner can fix its attentive gaze on the correct objects while generating the corresponding words. This ability is also known as grounded image captioning. However, the grounding accuracy of existing captioners is far from satisfactory. To improve the grounding accuracy while retaining the captioning quality, it is expensive to collect the word-region alignment as strong supervision. To this end, we propose a Part-of-Speech (POS) enhanced image-text matching model (SCAN \cite{lee2018stacked}): POS-SCAN, as the effective knowledge distillation for more grounded image captioning. The benefits are two-fold: 1) given a sentence and an image, POS-SCAN can ground the objects more accurately than SCAN; 2) POS-SCAN serves as a word-region alignment regularization for the captioner’s visual attention module. By showing benchmark experimental results, we demonstrate that conventional image captioners equipped with POS-SCAN can significantly improve the grounding accuracy without strong supervision. Last but not the least, we explore the indispensable Self-Critical Sequence Training (SCST) \cite{Rennie_2017_CVPR} in the context of grounded image captioning and show that the image-text matching score can serve as a reward for more grounded captioning \footnote{https://github.com/YuanEZhou/Grounded-Image-Captioning}.
Published 2020-04-01
URL https://arxiv.org/abs/2004.00390v1
PDF https://arxiv.org/pdf/2004.00390v1.pdf
PWC https://paperswithcode.com/paper/more-grounded-image-captioning-by-distilling
Repo
Framework
Title A New Challenge: Approaching Tetris Link with AI
Authors Matthias Muller-Brockhausen, Mike Preuss, Aske Plaat
Abstract Decades of research have been invested in making computer programs for playing games such as Chess and Go. This paper focuses on a new game, Tetris Link, a board game that is still lacking any scientific analysis. Tetris Link has a large branching factor, hampering a traditional heuristic planning approach. We explore heuristic planning and two other approaches: Reinforcement Learning, Monte Carlo tree search. We document our approach and report on their relative performance in a tournament. Curiously, the heuristic approach is stronger than the planning/learning approaches. However, experienced human players easily win the majority of the matches against the heuristic planning AIs. We, therefore, surmise that Tetris Link is more difficult than expected. We offer our findings to the community as a challenge to improve upon.