February 2, 2020

# Paper Group AWR 12

Deep Generative Learning via Variational Gradient Flow. Robust statistics and no-reference image quality assessment in Curvelet domain. Deep Learning for Visual Tracking: A Comprehensive Survey. SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines. Multi-Domain Adversarial Learning. Learning the Model Update for S …

#### Deep Generative Learning via Variational Gradient Flow

Title Deep Generative Learning via Variational Gradient Flow
Authors Yuan Gao, Yuling Jiao, Yang Wang, Yao Wang, Can Yang, Shunkang Zhang
Abstract We propose a general framework to learn deep generative models via \textbf{V}ariational \textbf{Gr}adient Fl\textbf{ow} (VGrow) on probability spaces. The evolving distribution that asymptotically converges to the target distribution is governed by a vector field, which is the negative gradient of the first variation of the $f$-divergence between them. We prove that the evolving distribution coincides with the pushforward distribution through the infinitesimal time composition of residual maps that are perturbations of the identity map along the vector field. The vector field depends on the density ratio of the pushforward distribution and the target distribution, which can be consistently learned from a binary classification problem. Connections of our proposed VGrow method with other popular methods, such as VAE, GAN and flow-based methods, have been established in this framework, gaining new insights of deep generative learning. We also evaluated several commonly used divergences, including Kullback-Leibler, Jensen-Shannon, Jeffrey divergences as well as our newly discovered `logD’ divergence which serves as the objective function of the logD-trick GAN. Experimental results on benchmark datasets demonstrate that VGrow can generate high-fidelity images in a stable and efficient manner, achieving competitive performance with state-of-the-art GANs. |
Published 2019-01-24
URL https://arxiv.org/abs/1901.08469v3
PDF https://arxiv.org/pdf/1901.08469v3.pdf
PWC https://paperswithcode.com/paper/deep-generative-learning-via-variational
Repo https://github.com/xjtuygao/VGrow
Framework pytorch

#### Robust statistics and no-reference image quality assessment in Curvelet domain

Title Robust statistics and no-reference image quality assessment in Curvelet domain
Authors Ramon Giostri Campos, Evandro Ottoni Teatini Salles
Abstract This paper uses robust statistics and curvelet transform to learn a general-purpose no-reference (NR) image quality assessment (IQA) model. The new approach, here called M1, competes with the Curvelet Quality Assessment proposed in 2014 (Curvelet2014). The central idea is to use descriptors based on robust statistics to extract features and predict the human opinion about degraded images. To show the consistency of the method the model is tested with 3 different datasets, LIVE IQA, TID2013 and CSIQ. To test evaluation, it is used the Wilcoxon test to verify the statistical significance of results and promote an accurate comparison between new model M1 and Curvelet2014. The results show a gain when robust statistics are used as descriptor.
Tasks Image Quality Assessment, No-Reference Image Quality Assessment
Published 2019-02-11
URL http://arxiv.org/abs/1902.03842v1
PDF http://arxiv.org/pdf/1902.03842v1.pdf
PWC https://paperswithcode.com/paper/robust-statistics-and-no-reference-image
Repo https://github.com/rgiostri/robustcurvelet
Framework none

#### Deep Learning for Visual Tracking: A Comprehensive Survey

Title Deep Learning for Visual Tracking: A Comprehensive Survey
Authors Seyed Mojtaba Marvasti-Zadeh, Li Cheng, Hossein Ghanei-Yakhdan, Shohreh Kasaei
Abstract Visual target tracking is one of the most sought-after yet challenging research topics in computer vision. Given the ill-posed nature of the problem and its popularity in a broad range of real-world scenarios, a number of large-scale benchmark datasets have been established, on which considerable methods have been developed and demonstrated with significant progress in recent years – predominantly by recent deep learning (DL)-based methods. This survey aims to systematically investigate the current DL-based visual tracking methods, benchmark datasets, and evaluation metrics. It also extensively evaluates and analyzes the leading visual tracking methods. First, the fundamental characteristics, primary motivations, and contributions of DL-based methods are summarized from six key aspects of: network architecture, network exploitation, network training for visual tracking, network objective, network output, and the exploitation of correlation filter advantages. Second, popular visual tracking benchmarks and their respective properties are compared, and their evaluation metrics are summarized. Third, the state-of-the-art DL-based methods are comprehensively examined on a set of well-established benchmarks of OTB2013, OTB2015, VOT2018, and LaSOT. Finally, by conducting critical analyses of these state-of-the-art methods both quantitatively and qualitatively, their pros and cons under various common scenarios are investigated. It may serve as a gentle use guide for practitioners to weigh on when and under what conditions to choose which method(s). It also facilitates a discussion on ongoing issues and sheds light on promising research directions.
Published 2019-12-02
URL https://arxiv.org/abs/1912.00535v1
PDF https://arxiv.org/pdf/1912.00535v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-for-visual-tracking-a
Repo https://github.com/MMarvasti/Deep-Learning-for-Visual-Tracking-Survey
Framework none

#### SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines

Title SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines
Authors Yinda Xu, Zeyu Wang, Zuoxin Li, Yuan Ye, Gang Yu
Abstract Visual tracking problem demands to efficiently perform robust classification and accurate target state estimation over a given target at the same time. Former methods have proposed various ways of target state estimation, yet few of them took the particularity of the visual tracking problem itself into consideration. After a careful analysis, we propose a set of practical guidelines of target state estimation for high-performance generic object tracker design. Following these guidelines, we design our Fully Convolutional Siamese tracker++ (SiamFC++) by introducing both classification and target state estimation branch(G1), classification score without ambiguity(G2), tracking without prior knowledge(G3), and estimation quality score(G4). Extensive analysis and ablation studies demonstrate the effectiveness of our proposed guidelines. Without bells and whistles, our SiamFC++ tracker achieves state-of-the-art performance on five challenging benchmarks(OTB2015, VOT2018, LaSOT, GOT-10k, TrackingNet), which proves both the tracking and generalization ability of the tracker. Particularly, on the large-scale TrackingNet dataset, SiamFC++ achieves a previously unseen AUC score of 75.4 while running at over 90 FPS, which is far above the real-time requirement. Code and models are available at: https://github.com/MegviiDetection/video_analyst .
Published 2019-11-14
URL https://arxiv.org/abs/1911.06188v3
PDF https://arxiv.org/pdf/1911.06188v3.pdf
PWC https://paperswithcode.com/paper/siamfc-towards-robust-and-accurate-visual
Repo https://github.com/MegviiDetection/video_analyst
Framework none

Authors Alice Schoenauer-Sebag, Louise Heinrich, Marc Schoenauer, Michele Sebag, Lani F. Wu, Steve J. Altschuler
Abstract Multi-domain learning (MDL) aims at obtaining a model with minimal average risk across multiple domains. Our empirical motivation is automated microscopy data, where cultured cells are imaged after being exposed to known and unknown chemical perturbations, and each dataset displays significant experimental bias. This paper presents a multi-domain adversarial learning approach, MuLANN, to leverage multiple datasets with overlapping but distinct class sets, in a semi-supervised setting. Our contributions include: i) a bound on the average- and worst-domain risk in MDL, obtained using the H-divergence; ii) a new loss to accommodate semi-supervised multi-domain learning and domain adaptation; iii) the experimental validation of the approach, improving on the state of the art on two standard image benchmarks, and a novel bioimage dataset, Cell.
Published 2019-03-21
URL http://arxiv.org/abs/1903.09239v1
PDF http://arxiv.org/pdf/1903.09239v1.pdf
Repo https://github.com/AltschulerWu-Lab/MuLANN
Framework pytorch

#### Learning the Model Update for Siamese Trackers

Title Learning the Model Update for Siamese Trackers
Authors Lichao Zhang, Abel Gonzalez-Garcia, Joost van de Weijer, Martin Danelljan, Fahad Shahbaz Khan
Abstract Siamese approaches address the visual tracking problem by extracting an appearance template from the current frame, which is used to localize the target in the next frame. In general, this template is linearly combined with the accumulated template from the previous frame, resulting in an exponential decay of information over time. While such an approach to updating has led to improved results, its simplicity limits the potential gain likely to be obtained by learning to update. Therefore, we propose to replace the handcrafted update function with a method which learns to update. We use a convolutional neural network, called UpdateNet, which given the initial template, the accumulated template and the template of the current frame aims to estimate the optimal template for the next frame. The UpdateNet is compact and can easily be integrated into existing Siamese trackers. We demonstrate the generality of the proposed approach by applying it to two Siamese trackers, SiamFC and DaSiamRPN. Extensive experiments on VOT2016, VOT2018, LaSOT, and TrackingNet datasets demonstrate that our UpdateNet effectively predicts the new target template, outperforming the standard linear update. On the large-scale TrackingNet dataset, our UpdateNet improves the results of DaSiamRPN with an absolute gain of 3.9% in terms of success score.
Published 2019-08-02
URL https://arxiv.org/abs/1908.00855v2
PDF https://arxiv.org/pdf/1908.00855v2.pdf
PWC https://paperswithcode.com/paper/learning-the-model-update-for-siamese
Repo https://github.com/zhanglichao/updatenet
Framework pytorch

#### Black-box Adversarial Attacks with Bayesian Optimization

Title Black-box Adversarial Attacks with Bayesian Optimization
Authors Satya Narayan Shukla, Anit Kumar Sahu, Devin Willmott, J. Zico Kolter
Abstract We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples using information limited to loss function evaluations of input-output pairs. We use Bayesian optimization~(BO) to specifically cater to scenarios involving low query budgets to develop query efficient adversarial attacks. We alleviate the issues surrounding BO in regards to optimizing high dimensional deep learning models by effective dimension upsampling techniques. Our proposed approach achieves performance comparable to the state of the art black-box adversarial attacks albeit with a much lower average query count. In particular, in low query budget regimes, our proposed method reduces the query count up to $80%$ with respect to the state of the art methods.
Published 2019-09-30
URL https://arxiv.org/abs/1909.13857v1
PDF https://arxiv.org/pdf/1909.13857v1.pdf
Repo https://github.com/snu-mllab/parsimonious-blackbox-attack
Framework tf

#### Decoding the Rejuvenating Effects of Mechanical Loading on Skeletal Maturation using in Vivo Imaging and Deep Learning

Title Decoding the Rejuvenating Effects of Mechanical Loading on Skeletal Maturation using in Vivo Imaging and Deep Learning
Authors Pouyan Asgharzadeh, Oliver Röhrle, Bettina M. Willie, Annette I. Birkhold
Abstract Throughout the process of aging, deterioration of bone macro- and micro-architecture, as well as material decomposition result in a loss of strength and therefore in an increased likelihood of fractures. To date, precise contributions of age-related changes in bone (re)modeling and (de)mineralization dynamics and its effect on the loss of functional integrity are not completely understood. Here, we present an image-based deep learning approach to quantitatively describe the dynamic effects of short-term aging and adaptive response to treatment in proximal mouse tibia and fibula. Our approach allowed us to perform an end-to-end age prediction based on $\mu$CT images to determine the dynamic biological process of tissue maturation during a two week period, therefore permitting a short-term bone aging prediction with $95%$ accuracy. In a second application, our radiomics analysis reveals that two weeks of in vivo mechanical loading are associated with an underlying rejuvenating effect of 5 days. Additionally, by quantitatively analyzing the learning process, we could, for the first time, identify the localization of the age-relevant encoded information and demonstrate $89%$ load-induced similarity of these locations in the loaded tibia with younger bones. These data suggest that our method enables identifying a general prognostic phenotype of a certain bone age as well as a temporal and localized loading-treatment effect on this apparent bone age. Future translational applications of this method may provide an improved decision-support method for osteoporosis treatment at low cost.
Published 2019-05-20
URL https://arxiv.org/abs/1905.08099v1
PDF https://arxiv.org/pdf/1905.08099v1.pdf
PWC https://paperswithcode.com/paper/decoding-the-rejuvenating-effects-of
Framework tf

#### Value Iteration Networks on Multiple Levels of Abstraction

Title Value Iteration Networks on Multiple Levels of Abstraction
Authors Daniel Schleich, Tobias Klamt, Sven Behnke
Abstract Learning-based methods are promising to plan robot motion without performing extensive search, which is needed by many non-learning approaches. Recently, Value Iteration Networks (VINs) received much interest since—in contrast to standard CNN-based architectures—they learn goal-directed behaviors which generalize well to unseen domains. However, VINs are restricted to small and low-dimensional domains, limiting their applicability to real-world planning problems. To address this issue, we propose to extend VINs to representations with multiple levels of abstraction. While the vicinity of the robot is represented in sufficient detail, the representation gets spatially coarser with increasing distance from the robot. The information loss caused by the decreasing resolution is compensated by increasing the number of features representing a cell. We show that our approach is capable of solving significantly larger 2D grid world planning tasks than the original VIN implementation. In contrast to a multiresolution coarse-to-fine VIN implementation which does not employ additional descriptive features, our approach is capable of solving challenging environments, which demonstrates that the proposed method learns to encode useful information in the additional features. As an application for solving real-world planning tasks, we successfully employ our method to plan omnidirectional driving for a search-and-rescue robot in cluttered terrain.
Published 2019-05-27
URL https://arxiv.org/abs/1905.11068v2
PDF https://arxiv.org/pdf/1905.11068v2.pdf
PWC https://paperswithcode.com/paper/value-iteration-networks-on-multiple-levels
Repo https://github.com/AIS-Bonn/abstract_vin
Framework pytorch

#### Unsupervised Domain Adaptation via Structured Prediction Based Selective Pseudo-Labeling

Title Unsupervised Domain Adaptation via Structured Prediction Based Selective Pseudo-Labeling
Authors Qian Wang, Toby P. Breckon
Abstract Unsupervised domain adaptation aims to address the problem of classifying unlabeled samples from the target domain whilst labeled samples are only available from the source domain and the data distributions are different in these two domains. As a result, classifiers trained from labeled samples in the source domain suffer from significant performance drop when directly applied to the samples from the target domain. To address this issue, different approaches have been proposed to learn domain-invariant features or domain-specific classifiers. In either case, the lack of labeled samples in the target domain can be an issue which is usually overcome by pseudo-labeling. Inaccurate pseudo-labeling, however, could result in catastrophic error accumulation during learning. In this paper, we propose a novel selective pseudo-labeling strategy based on structured prediction. The idea of structured prediction is inspired by the fact that samples in the target domain are well clustered within the deep feature space so that unsupervised clustering analysis can be used to facilitate accurate pseudo-labeling. Experimental results on four datasets (i.e. Office-Caltech, Office31, ImageCLEF-DA and Office-Home) validate our approach outperforms contemporary state-of-the-art methods.
Published 2019-11-18
URL https://arxiv.org/abs/1911.07982v1
PDF https://arxiv.org/pdf/1911.07982v1.pdf
Framework none

#### Human Keypoint Detection by Progressive Context Refinement

Title Human Keypoint Detection by Progressive Context Refinement
Authors Jing Zhang, Zhe Chen, Dacheng Tao
Abstract Human keypoint detection from a single image is very challenging due to occlusion, blur, illumination and scale variance of person instances. In this paper, we find that context information plays an important role in addressing these issues, and propose a novel method named progressive context refinement (PCR) for human keypoint detection. First, we devise a simple but effective context-aware module (CAM) that can efficiently integrate spatial and channel context information to aid feature learning for locating hard keypoints. Then, we construct the PCR model by stacking several CAMs sequentially with shortcuts and employ multi-task learning to progressively refine the context information and predictions. Besides, to maximize PCR’s potential for the aforementioned hard case inference, we propose a hard-negative person detection mining strategy together with a joint-training strategy by exploiting the unlabeled coco dataset and external dataset. Extensive experiments on the COCO keypoint detection benchmark demonstrate the superiority of PCR over representative state-of-the-art (SOTA) methods. Our single model achieves comparable performance with the winner of the 2018 COCO Keypoint Detection Challenge. The final ensemble model sets a new SOTA on this benchmark.
Published 2019-10-27
URL https://arxiv.org/abs/1910.12223v1
PDF https://arxiv.org/pdf/1910.12223v1.pdf
PWC https://paperswithcode.com/paper/human-keypoint-detection-by-progressive
Repo https://github.com/zccyman/pose_estimation/tree/master/lib/models/pose_pcr.py
Framework pytorch

#### Neural Networks for Full Phase-space Reweighting and Parameter Tuning

Title Neural Networks for Full Phase-space Reweighting and Parameter Tuning
Authors Anders Andreassen, Benjamin Nachman
Abstract Precise scientific analysis in collider-based particle physics is possible because of complex simulations that connect fundamental theories to observable quantities. The significant computational cost of these programs limits the scope, precision, and accuracy of Standard Model measurements and searches for new phenomena. We therefore introduce Deep neural networks using Classification for Tuning and Reweighting (DCTR), a neural network-based approach to reweight and fit simulations using all kinematic and flavor information – the full phase space. DCTR can perform tasks that are currently not possible with existing methods, such as estimating non-perturbative fragmentation uncertainties. The core idea behind the new approach is to exploit powerful high-dimensional classifiers to reweight phase space as well as to identify the best parameters for describing data. Numerical examples from $e^+e^-\rightarrow\text{jets}$ demonstrate the fidelity of these methods for simulation parameters that have a big and broad impact on phase space as well as those that have a minimal and/or localized impact. The high fidelity of the full phase-space reweighting enables a new paradigm for simulations, parameter tuning, and model systematic uncertainties across particle physics and possibly beyond.
Published 2019-07-18
URL https://arxiv.org/abs/1907.08209v3
PDF https://arxiv.org/pdf/1907.08209v3.pdf
PWC https://paperswithcode.com/paper/neural-networks-for-full-phase-space
Repo https://github.com/bnachman/DCTR
Framework none

#### A Review of Reinforcement Learning for Autonomous Building Energy Management

Title A Review of Reinforcement Learning for Autonomous Building Energy Management
Authors Karl Mason, Santiago Grijalva
Abstract The area of building energy management has received a significant amount of interest in recent years. This area is concerned with combining advancements in sensor technologies, communications and advanced control algorithms to optimize energy utilization. Reinforcement learning is one of the most prominent machine learning algorithms used for control problems and has had many successful applications in the area of building energy management. This research gives a comprehensive review of the literature relating to the application of reinforcement learning to developing autonomous building energy management systems. The main direction for future research and challenges in reinforcement learning are also outlined.
Published 2019-03-12
URL http://arxiv.org/abs/1903.05196v2
PDF http://arxiv.org/pdf/1903.05196v2.pdf
PWC https://paperswithcode.com/paper/a-review-of-reinforcement-learning-for
Repo https://github.com/tucane/MM-Project
Framework none

#### Reg R-CNN: Lesion Detection and Grading under Noisy Labels

Title Reg R-CNN: Lesion Detection and Grading under Noisy Labels
Authors Gregor N. Ramien, Paul F. Jaeger, Simon A. A. Kohl, Klaus H. Maier-Hein
Abstract For the task of concurrently detecting and categorizing objects, the medical imaging community commonly adopts methods developed on natural images. Current state-of-the-art object detectors are comprised of two stages: the first stage generates region proposals, the second stage subsequently categorizes them. Unlike in natural images, however, for anatomical structures of interest such as tumors, the appearance in the image (e.g., scale or intensity) links to a malignancy grade that lies on a continuous ordinal scale. While classification models discard this ordinal relation between grades by discretizing the continuous scale to an unordered bag of categories, regression models are trained with distance metrics, which preserve the relation. This advantage becomes all the more important in the setting of label confusions on ambiguous data sets, which is the usual case with medical images. To this end, we propose Reg R-CNN, which replaces the second-stage classification model of a current object detector with a regression model. We show the superiority of our approach on a public data set with 1026 patients and a series of toy experiments. Code will be available at github.com/MIC-DKFZ/RegRCNN.
Published 2019-07-22
URL https://arxiv.org/abs/1907.12915v3
PDF https://arxiv.org/pdf/1907.12915v3.pdf
Repo https://github.com/MIC-DKFZ/RegRCNN
Framework pytorch

#### Audiovisual Speaker Tracking using Nonlinear Dynamical Systems with Dynamic Stream Weights

Title Audiovisual Speaker Tracking using Nonlinear Dynamical Systems with Dynamic Stream Weights
Authors Christopher Schymura, Dorothea Kolossa
Abstract Data fusion plays an important role in many technical applications that require efficient processing of multimodal sensory observations. A prominent example is audiovisual signal processing, which has gained increasing attention in automatic speech recognition, speaker localization and related tasks. If appropriately combined with acoustic information, additional visual cues can help to improve the performance in these applications, especially under adverse acoustic conditions. A dynamic weighting of acoustic and visual streams based on instantaneous sensor reliability measures is an efficient approach to data fusion in this context. This paper presents a framework that extends the well-established theory of nonlinear dynamical systems with the notion of dynamic stream weights for an arbitrary number of sensory observations. It comprises a recursive state estimator based on the Gaussian filtering paradigm, which incorporates dynamic stream weights into a framework closely related to the extended Kalman filter. Additionally, a convex optimization approach to estimate oracle dynamic stream weights in fully observed dynamical systems utilizing a Dirichlet prior is presented. This serves as a basis for a generic parameter learning framework of dynamic stream weight estimators. The proposed system is application-independent and can be easily adapted to specific tasks and requirements. A study using audiovisual speaker tracking tasks is considered as an exemplary application in this work. An improved tracking performance of the dynamic stream weight-based estimation framework over state-of-the-art methods is demonstrated in the experiments.