April 3, 2020

3170 words 15 mins read

Paper Group AWR 17

Variational Wasserstein Barycenters for Geometric Clustering. MatrixNets: A New Scale and Aspect Ratio Aware Architecture for Object Detection. PCSGAN: Perceptual Cyclic-Synthesized Generative Adversarial Networks for Thermal and NIR to Visible Image Transformation. Building a COVID-19 Vulnerability Index. End-to-End Fast Training of Communication …

Variational Wasserstein Barycenters for Geometric Clustering


Title	Variational Wasserstein Barycenters for Geometric Clustering
Authors	Liang Mi, Tianshu Yu, Jose Bento, Wen Zhang, Baoxin Li, Yalin Wang
Abstract	We propose to compute Wasserstein barycenters (WBs) by solving for Monge maps with variational principle. We discuss the metric properties of WBs and explore their connections, especially the connections of Monge WBs, to K-means clustering and co-clustering. We also discuss the feasibility of Monge WBs on unbalanced measures and spherical domains. We propose two new problems – regularized K-means and Wasserstein barycenter compression. We demonstrate the use of VWBs in solving these clustering-related problems.
Tasks
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10543v1
PDF	https://arxiv.org/pdf/2002.10543v1.pdf
PWC	https://paperswithcode.com/paper/variational-wasserstein-barycenters-for
Repo	https://github.com/icemiliang/pyvot
Framework	pytorch

MatrixNets: A New Scale and Aspect Ratio Aware Architecture for Object Detection


Title	MatrixNets: A New Scale and Aspect Ratio Aware Architecture for Object Detection
Authors	Abdullah Rashwan, Rishav Agarwal, Agastya Kalra, Pascal Poupart
Abstract	We present MatrixNets (xNets), a new deep architecture for object detection. xNets map objects with similar sizes and aspect ratios into many specialized layers, allowing xNets to provide a scale and aspect ratio aware architecture. We leverage xNets to enhance single-stage object detection frameworks. First, we apply xNets on anchor-based object detection, for which we predict object centers and regress the top-left and bottom-right corners. Second, we use MatrixNets for corner-based object detection by predicting top-left and bottom-right corners. Each corner predicts the center location of the object. We also enhance corner-based detection by replacing the embedding layer with center regression. Our final architecture achieves mAP of 47.8 on MS COCO, which is higher than its CornerNet counterpart by +5.6 mAP while also closing the gap between single-stage and two-stage detectors. The code is available at https://github.com/arashwan/matrixnet.
Tasks	Object Detection
Published	2020-01-09
URL	https://arxiv.org/abs/2001.03194v1
PDF	https://arxiv.org/pdf/2001.03194v1.pdf
PWC	https://paperswithcode.com/paper/matrixnets-a-new-scale-and-aspect-ratio-aware
Repo	https://github.com/arashwan/matrixnet
Framework	pytorch

PCSGAN: Perceptual Cyclic-Synthesized Generative Adversarial Networks for Thermal and NIR to Visible Image Transformation


Title	PCSGAN: Perceptual Cyclic-Synthesized Generative Adversarial Networks for Thermal and NIR to Visible Image Transformation
Authors	Kancharagunta Kishan Babu, Shiv Ram Dubey
Abstract	In many real world scenarios, it is difficult to capture the images in the visible light spectrum (VIS) due to bad lighting conditions. However, the images can be captured in such scenarios using Near-Infrared (NIR) and Thermal (THM) cameras. The NIR and THM images contain the limited details. Thus, there is a need to transform the images from THM/NIR to VIS for better understanding. However, it is non-trivial task due to the large domain discrepancies and lack of abundant datasets. Nowadays, Generative Adversarial Network (GAN) is able to transform the images from one domain to another domain. Most of the available GAN based methods use the combination of the adversarial and the pixel-wise losses (like L1 or L2) as the objective function for training. The quality of transformed images in case of THM/NIR to VIS transformation is still not up to the mark using such objective function. Thus, better objective functions are needed to improve the quality, fine details and realism of the transformed images. A new model for THM/NIR to VIS image transformation called Perceptual Cyclic-Synthesized Generative Adversarial Network (PCSGAN) is introduced to address these issues. The PCSGAN uses the combination of the perceptual (i.e., feature based) losses along with the pixel-wise and the adversarial losses. Both the quantitative and qualitative measures are used to judge the performance of the PCSGAN model over the WHU-IIP face and the RGB-NIR scene datasets. The proposed PCSGAN outperforms the state-of-the-art image transformation models, including Pix2pix, DualGAN, CycleGAN, PS2GAN, and PAN in terms of the SSIM, MSE, PSNR and LPIPS evaluation measures. The code is available at: \url{https://github.com/KishanKancharagunta/PCSGAN}.
Tasks
Published	2020-02-13
URL	https://arxiv.org/abs/2002.07082v1
PDF	https://arxiv.org/pdf/2002.07082v1.pdf
PWC	https://paperswithcode.com/paper/pcsgan-perceptual-cyclic-synthesized
Repo	https://github.com/KishanKancharagunta/PCSGAN
Framework	pytorch

Building a COVID-19 Vulnerability Index


Title	Building a COVID-19 Vulnerability Index
Authors	Dave DeCaprio, Joseph Gartner, Thadeus Burgess, Sarthak Kothari, Shaayan Sayed, Carol J. McCall
Abstract	COVID-19 is an acute respiratory disease that has been classified as a pandemic by the World Health Organization. Information regarding this particular disease is limited, however, it is known to have high mortality rates, particularly among individuals with preexisting medical conditions. Creating models to identify individuals who are at the greatest risk for severe complications due to COVID-19 will be useful to help for outreach campaigns in mitigating the diseases worst effects. While information specific to COVID-19 is limited, a model using complications due to other upper respiratory infections can be used as a proxy to help identify those individuals who are at the greatest risk. We present the results for three models predicting such complications, with each model having varying levels of predictive effectiveness at the expense of ease of implementation.
Tasks
Published	2020-03-16
URL	https://arxiv.org/abs/2003.07347v2
PDF	https://arxiv.org/pdf/2003.07347v2.pdf
PWC	https://paperswithcode.com/paper/building-a-covid-19-vulnerability-index
Repo	https://github.com/closedloop-ai/cv19index
Framework	none

End-to-End Fast Training of Communication Links Without a Channel Model via Online Meta-Learning


Title	End-to-End Fast Training of Communication Links Without a Channel Model via Online Meta-Learning
Authors	Sangwoo Park, Osvaldo Simeone, Joonhyuk Kang
Abstract	When a channel model is not available, the end-to-end training of encoder and decoder on a fading noisy channel generally requires the repeated use of the channel and of a feedback link. An important limitation of the approach is that training should be generally carried out from scratch for each new channel. To cope with this problem, prior works considered joint training over multiple channels with the aim of finding a single pair of encoder and decoder that works well on a class of channels. In this paper, we propose to obviate the limitations of joint training via meta-learning. The proposed approach is based on a meta-training phase in which the online gradient-based meta-learning of the decoder is coupled with the joint training of the encoder via the transmission of pilots and the use of a feedback link. Accounting for channel variations during the meta-training phase, this work demonstrates the advantages of meta-learning in terms of number of pilots as compared to conventional methods when the feedback link is only available for meta-training and not at run time.
Tasks	Meta-Learning
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01479v1
PDF	https://arxiv.org/pdf/2003.01479v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-fast-training-of-communication
Repo	https://github.com/kclip/meta-autoencoder-without-channel-model
Framework	pytorch

Adversarial Monte Carlo Meta-Learning of Optimal Prediction Procedures


Title	Adversarial Monte Carlo Meta-Learning of Optimal Prediction Procedures
Authors	Alex Luedtke, Incheoul Chung, Oleg Sofrygin
Abstract	We frame the meta-learning of prediction procedures as a search for an optimal strategy in a two-player game. In this game, Nature selects a prior over distributions that generate labeled data consisting of features and an associated outcome, and the Predictor observes data sampled from a distribution drawn from this prior. The Predictor’s objective is to learn a function that maps from a new feature to an estimate of the associated outcome. We establish that, under reasonable conditions, the Predictor has an optimal strategy that is equivariant to shifts and rescalings of the outcome and is invariant to permutations of the observations and to shifts, rescalings, and permutations of the features. We introduce a neural network architecture that satisfies these properties. The proposed strategy performs favorably compared to standard practice in both parametric and nonparametric experiments.
Tasks	Meta-Learning
Published	2020-02-26
URL	https://arxiv.org/abs/2002.11275v1
PDF	https://arxiv.org/pdf/2002.11275v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-monte-carlo-meta-learning-of
Repo	https://github.com/alexluedtke12/amc-meta-learning-of-optimal-prediction-procedures
Framework	pytorch

Multi-Step Model-Agnostic Meta-Learning: Convergence and Improved Algorithms


Title	Multi-Step Model-Agnostic Meta-Learning: Convergence and Improved Algorithms
Authors	Kaiyi Ji, Junjie Yang, Yingbin Liang
Abstract	As a popular meta-learning approach, the model-agnostic meta-learning (MAML) algorithm has been widely used due to its simplicity and effectiveness. However, the convergence of the general multi-step MAML still remains unexplored. In this paper, we develop a new theoretical framework, under which we characterize the convergence rate and the computational complexity of multi-step MAML. Our results indicate that $N$-step MAML attains the convergence with linearly increasing complexity with $N$ under a properly chosen inner stepsize. We then take a further step to develop a more efficient Hessian-free MAML. We first show that the existing zeroth-order Hessian estimator contains a constant-level estimation error so that the MAML algorithm can perform unstably. To address this issue, we propose a novel Hessian estimator via a gradient-based Gaussian smoothing method, and show that it achieves a much smaller estimation bias and variance, and the resulting algorithm achieves the same performance guarantee as the original MAML under mild conditions. Our experiments validate our theory and demonstrate the effectiveness of the proposed Hessian estimator.
Tasks	Meta-Learning
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07836v2
PDF	https://arxiv.org/pdf/2002.07836v2.pdf
PWC	https://paperswithcode.com/paper/multi-step-model-agnostic-meta-learning
Repo	https://github.com/JunjieYang97/GGS-MAML-RL
Framework	pytorch

Meta-Transfer Learning for Zero-Shot Super-Resolution


Title	Meta-Transfer Learning for Zero-Shot Super-Resolution
Authors	Jae Woong Soh, Sunwoo Cho, Nam Ik Cho
Abstract	Convolutional neural networks (CNNs) have shown dramatic improvements in single image super-resolution (SISR) by using large-scale external samples. Despite their remarkable performance based on the external dataset, they cannot exploit internal information within a specific image. Another problem is that they are applicable only to the specific condition of data that they are supervised. For instance, the low-resolution (LR) image should be a “bicubic” downsampled noise-free image from a high-resolution (HR) one. To address both issues, zero-shot super-resolution (ZSSR) has been proposed for flexible internal learning. However, they require thousands of gradient updates, i.e., long inference time. In this paper, we present Meta-Transfer Learning for Zero-Shot Super-Resolution (MZSR), which leverages ZSSR. Precisely, it is based on finding a generic initial parameter that is suitable for internal learning. Thus, we can exploit both external and internal information, where one single gradient update can yield quite considerable results. (See Figure 1). With our method, the network can quickly adapt to a given image condition. In this respect, our method can be applied to a large spectrum of image conditions within a fast adaptation process.
Tasks	Image Super-Resolution, Meta-Learning, Super-Resolution, Transfer Learning
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12213v1
PDF	https://arxiv.org/pdf/2002.12213v1.pdf
PWC	https://paperswithcode.com/paper/meta-transfer-learning-for-zero-shot-super
Repo	https://github.com/JWSoh/MZSR
Framework	tf

SynFi: Automatic Synthetic Fingerprint Generation


Title	SynFi: Automatic Synthetic Fingerprint Generation
Authors	M. Sadegh Riazi, Seyed M. Chavoshian, Farinaz Koushanfar
Abstract	Authentication and identification methods based on human fingerprints are ubiquitous in several systems ranging from government organizations to consumer products. The performance and reliability of such systems directly rely on the volume of data on which they have been verified. Unfortunately, a large volume of fingerprint databases is not publicly available due to many privacy and security concerns. In this paper, we introduce a new approach to automatically generate high-fidelity synthetic fingerprints at scale. Our approach relies on (i) Generative Adversarial Networks to estimate the probability distribution of human fingerprints and (ii) Super-Resolution methods to synthesize fine-grained textures. We rigorously test our system and show that our methodology is the first to generate fingerprints that are computationally indistinguishable from real ones, a task that prior art could not accomplish.
Tasks	Super-Resolution
Published	2020-02-16
URL	https://arxiv.org/abs/2002.08900v1
PDF	https://arxiv.org/pdf/2002.08900v1.pdf
PWC	https://paperswithcode.com/paper/synfi-automatic-synthetic-fingerprint
Repo	https://github.com/MohammadChavosh/synthetic-fingerprint-generation
Framework	pytorch

EndoL2H: Deep Super-Resolution for Capsule Endoscopy


Title	EndoL2H: Deep Super-Resolution for Capsule Endoscopy
Authors	Yasin Almalioglu, Abdulkadir Gokce, Kagan Incetan, Muhammed Ali Simsek, Kivanc Ararat, Richard J. Chen, Nichalos J. Durr, Faisal Mahmood, Mehmet Turan
Abstract	Wireless capsule endoscopy is the preferred modality for diagnosis and assessment of small bowel disease. However, the poor resolution is a limitation for both subjective and automated diagnostics. Enhanced-resolution endoscopy has shown to improve adenoma detection rate for conventional endoscopy and is likely to do the same for capsule endoscopy. In this work, we propose and quantitatively validate a novel framework to learn a mapping from low-to-high resolution endoscopic images. We use conditional adversarial networks and spatial attention to improve the resolution by up to a factor of 8x. Our quantitative study demonstrates the superiority of our proposed approach over Super-Resolution Generative Adversarial Network (SRGAN) and bicubic interpolation. For qualitative analysis, visual Turing tests were performed by 16 gastroenterologists to confirm the clinical utility of the proposed approach. Our approach is generally applicable to any endoscopic capsule system and has the potential to improve diagnosis and better harness computational approaches for polyp detection and characterization. Our code and trained models are available at https://github.com/akgokce/EndoL2H.
Tasks	Super-Resolution
Published	2020-02-13
URL	https://arxiv.org/abs/2002.05459v1
PDF	https://arxiv.org/pdf/2002.05459v1.pdf
PWC	https://paperswithcode.com/paper/endol2h-deep-super-resolution-for-capsule
Repo	https://github.com/akgokce/EndoL2H
Framework	pytorch

Cross-domain Detection via Graph-induced Prototype Alignment


Title	Cross-domain Detection via Graph-induced Prototype Alignment
Authors	Minghao Xu, Hang Wang, Bingbing Ni, Qi Tian, Wenjun Zhang
Abstract	Applying the knowledge of an object detector trained on a specific domain directly onto a new domain is risky, as the gap between two domains can severely degrade model’s performance. Furthermore, since different instances commonly embody distinct modal information in object detection scenario, the feature alignment of source and target domain is hard to be realized. To mitigate these problems, we propose a Graph-induced Prototype Alignment (GPA) framework to seek for category-level domain alignment via elaborate prototype representations. In the nutshell, more precise instance-level features are obtained through graph-based information propagation among region proposals, and, on such basis, the prototype representation of each class is derived for category-level domain alignment. In addition, in order to alleviate the negative effect of class-imbalance on domain adaptation, we design a Class-reweighted Contrastive Loss to harmonize the adaptation training process. Combining with Faster R-CNN, the proposed framework conducts feature alignment in a two-stage manner. Comprehensive results on various cross-domain detection tasks demonstrate that our approach outperforms existing methods with a remarkable margin. Our code is available at https://github.com/ChrisAllenMing/GPA-detection.
Tasks	Domain Adaptation, Object Detection
Published	2020-03-28
URL	https://arxiv.org/abs/2003.12849v1
PDF	https://arxiv.org/pdf/2003.12849v1.pdf
PWC	https://paperswithcode.com/paper/cross-domain-detection-via-graph-induced
Repo	https://github.com/ChrisAllenMing/GPA-detection
Framework	pytorch

Training-Set Distillation for Real-Time UAV Object Tracking


Title	Training-Set Distillation for Real-Time UAV Object Tracking
Authors	Fan Li, Changhong Fu, Fuling Lin, Yiming Li, Peng Lu
Abstract	Correlation filter (CF) has recently exhibited promising performance in visual object tracking for unmanned aerial vehicle (UAV). Such online learning method heavily depends on the quality of the training-set, yet complicated aerial scenarios like occlusion or out of view can reduce its reliability. In this work, a novel time slot-based distillation approach is proposed to efficiently and effectively optimize the training-set’s quality on the fly. A cooperative energy minimization function is established to score the historical samples adaptively. To accelerate the scoring process, frames with high confident tracking results are employed as the keyframes to divide the tracking process into multiple time slots. After the establishment of a new slot, the weighted fusion of the previous samples generates one key-sample, in order to reduce the number of samples to be scored. Besides, when the current time slot exceeds the maximum frame number, which can be scored, the sample with the lowest score will be discarded. Consequently, the training-set can be efficiently and reliably distilled. Comprehensive tests on two well-known UAV benchmarks prove the effectiveness of our method with real-time speed on a single CPU.
Tasks	Object Tracking, Visual Object Tracking
Published	2020-03-11
URL	https://arxiv.org/abs/2003.05326v1
PDF	https://arxiv.org/pdf/2003.05326v1.pdf
PWC	https://paperswithcode.com/paper/training-set-distillation-for-real-time-uav
Repo	https://github.com/vision4robotics/TSD-Tracker
Framework	none

GEDDnet: A Network for Gaze Estimation with Dilation and Decomposition


Title	GEDDnet: A Network for Gaze Estimation with Dilation and Decomposition
Authors	Zhaokang Chen, Bertram E. Shi
Abstract	Appearance-based gaze estimation from RGB images provides relatively unconstrained gaze tracking from commonly available hardware. The accuracy of subject-independent models is limited partly by small intra-subject and large inter-subject variations in appearance, and partly by a latent subject-dependent bias. To improve estimation accuracy, we propose to use dilated-convolutions in a deep convolutional neural network to capture subtle changes in the eye images, and a novel gaze decomposition method that decomposes the gaze angle into the sum of a subject-independent gaze estimate from the image and a subject-dependent bias. To further reduce estimation error, we propose a calibration method that estimates the bias from a few images taken as the subject gazes at only a few or even just a single gaze target. This significantly redues calibration time and complexity. Experiments on four datasets, including a new dataset we collected containing large variations in head pose and face location, indicate that even without calibration the estimator already outperforms state-of-the-art methods by more than 6.3%. The proposed calibration method is robust to the location of calibration target and reduces estimation error significantly (up to 35.6%), achieving state-of-the-art performance with much less calibration data than required by previously proposed methods.
Tasks	Calibration, Gaze Estimation
Published	2020-01-25
URL	https://arxiv.org/abs/2001.09284v1
PDF	https://arxiv.org/pdf/2001.09284v1.pdf
PWC	https://paperswithcode.com/paper/geddnet-a-network-for-gaze-estimation-with
Repo	https://github.com/czk32611/GEDDnet
Framework	tf

Universal Differential Equations for Scientific Machine Learning


Title	Universal Differential Equations for Scientific Machine Learning
Authors	Christopher Rackauckas, Yingbo Ma, Julius Martensen, Collin Warner, Kirill Zubov, Rohit Supekar, Dominic Skinner, Ali Ramadhan
Abstract	In the context of science, the well-known adage “a picture is worth a thousand words” might well be “a model is worth a thousand datasets.” Scientific models, such as Newtonian physics or biological gene regulatory networks, are human-driven simplifications of complex phenomena that serve as surrogates for the countless experiments that validated the models. Recently, machine learning has been able to overcome the inaccuracies of approximate modeling by directly learning the entire set of nonlinear interactions from data. However, without any predetermined structure from the scientific basis behind the problem, machine learning approaches are flexible but data-expensive, requiring large databases of homogeneous labeled training data. A central challenge is reconciling data that is at odds with simplified models without requiring “big data”. In this work we develop a new methodology, universal differential equations (UDEs), which augments scientific models with machine-learnable structures for scientifically-based learning. We show how UDEs can be utilized to discover previously unknown governing equations, accurately extrapolate beyond the original data, and accelerate model simulation, all in a time and data-efficient manner. This advance is coupled with open-source software that allows for training UDEs which incorporate physical constraints, delayed interactions, implicitly-defined events, and intrinsic stochasticity in the model. Our examples show how a diverse set of computationally-difficult modeling issues across scientific disciplines, from automatically discovering biological mechanisms to accelerating climate simulations by 15,000x, can be handled by training UDEs.
Tasks
Published	2020-01-13
URL	https://arxiv.org/abs/2001.04385v1
PDF	https://arxiv.org/pdf/2001.04385v1.pdf
PWC	https://paperswithcode.com/paper/universal-differential-equations-for
Repo	https://github.com/ChrisRackauckas/universal_differential_equations
Framework	none

Unsupervised Sentiment Analysis for Code-mixed Data


Title	Unsupervised Sentiment Analysis for Code-mixed Data
Authors	Siddharth Yadav, Tanmoy Chakraborty
Abstract	Code-mixing is the practice of alternating between two or more languages. Mostly observed in multilingual societies, its occurrence is increasing and therefore its importance. A major part of sentiment analysis research has been monolingual, and most of them perform poorly on code-mixed text. In this work, we introduce methods that use different kinds of multilingual and cross-lingual embeddings to efficiently transfer knowledge from monolingual text to code-mixed text for sentiment analysis of code-mixed text. Our methods can handle code-mixed text through a zero-shot learning. Our methods beat state-of-the-art on English-Spanish code-mixed sentiment analysis by absolute 3% F1-score. We are able to achieve 0.58 F1-score (without parallel corpus) and 0.62 F1-score (with parallel corpus) on the same benchmark in a zero-shot way as compared to 0.68 F1-score in supervised settings. Our code is publicly available.
Tasks	Sentiment Analysis, Zero-Shot Learning
Published	2020-01-20
URL	https://arxiv.org/abs/2001.11384v1
PDF	https://arxiv.org/pdf/2001.11384v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-sentiment-analysis-for-code
Repo	https://github.com/sedflix/unsacmt
Framework	none