April 2, 2020

3707 words 18 mins read

Paper Group ANR 164

Paper Group ANR 164

Cooperative Observation of Targets moving over a Planar Graph with Prediction of Positions. Optimal HDR and Depth from Dual Cameras. Video Face Super-Resolution with Motion-Adaptive Feedback Cell. Semi-supervised Learning via Conditional Rotation Angle Estimation. De-randomized PAC-Bayes Margin Bounds: Applications to Non-convex and Non-smooth Pred …

Cooperative Observation of Targets moving over a Planar Graph with Prediction of Positions

Title Cooperative Observation of Targets moving over a Planar Graph with Prediction of Positions
Authors José E. B. Maia, Levi P. Figueredo
Abstract Consider a team with two types of agents: targets and observers. Observers are aerial UAVs that observe targets moving on land with their movements restricted to the paths that form a planar graph on the surface. Observers have limited range of vision and targets do not avoid observers. The objective is to maximize the integral of the number of targets observed in the observation interval. Taking advantage of the fact that the future positions of targets in the short term are predictable, we show in this article a modified hill climbing algorithm that surpasses its previous versions in this new setting of the CTO problem.
Published 2020-02-13
URL https://arxiv.org/abs/2002.05294v1
PDF https://arxiv.org/pdf/2002.05294v1.pdf
PWC https://paperswithcode.com/paper/cooperative-observation-of-targets-moving

Optimal HDR and Depth from Dual Cameras

Title Optimal HDR and Depth from Dual Cameras
Authors Pradyumna Chari, Anil Kumar Vadathya, Kaushik Mitra
Abstract Dual camera systems have assisted in the proliferation of various applications, such as optical zoom, low-light imaging and High Dynamic Range (HDR) imaging. In this work, we explore an optimal method for capturing the scene HDR and disparity map using dual camera setups. Hasinoff et al. (2010) have developed a noise optimal framework for HDR capture from a single camera. We generalize this to the dual camera set-up for estimating both HDR and disparity map. It may seem that dual camera systems can capture HDR in a shorter time. However, disparity estimation is a necessary step, which requires overlap among the images captured by the two cameras. This may lead to an increase in the capture time. To address this conflicting requirement, we propose a novel framework to find the optimal exposure and ISO sequence by minimizing the capture time under the constraints of an upper bound on the disparity error and a lower bound on the per-exposure SNR. We show that the resulting optimization problem is non-convex in general and propose an appropriate initialization technique. To obtain the HDR and disparity map from the optimal capture sequence, we propose a pipeline which alternates between estimating the camera ICRFs and the scene disparity map. We demonstrate that our optimal capture sequence leads to better results than other possible capture sequences. Our results are also close to those obtained by capturing the full stereo stack spanning the entire dynamic range. Finally, we present for the first time a stereo HDR dataset consisting of dense ISO and exposure stack captured from a smartphone dual camera. The dataset consists of 6 scenes, with an average of 142 exposure-ISO image sequence per scene.
Tasks Disparity Estimation
Published 2020-03-12
URL https://arxiv.org/abs/2003.05907v1
PDF https://arxiv.org/pdf/2003.05907v1.pdf
PWC https://paperswithcode.com/paper/optimal-hdr-and-depth-from-dual-cameras

Video Face Super-Resolution with Motion-Adaptive Feedback Cell

Title Video Face Super-Resolution with Motion-Adaptive Feedback Cell
Authors Jingwei Xin, Nannan Wang, Jie Li, Xinbo Gao, Zhifeng Li
Abstract Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN). Current state-of-the-art CNN methods usually treat the VSR problem as a large number of separate multi-frame super-resolution tasks, at which a batch of low resolution (LR) frames is utilized to generate a single high resolution (HR) frame, and running a slide window to select LR frames over the entire video would obtain a series of HR frames. However, duo to the complex temporal dependency between frames, with the number of LR input frames increase, the performance of the reconstructed HR frames become worse. The reason is in that these methods lack the ability to model complex temporal dependencies and hard to give an accurate motion estimation and compensation for VSR process. Which makes the performance degrade drastically when the motion in frames is complex. In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way. Our approach efficiently utilizes the information of the inter-frame motion, the dependence of the network on motion estimation and compensation method can be avoid. In addition, benefiting from the excellent nature of MAFC, the network can achieve better performance in the case of extremely complex motion scenarios. Extensive evaluations and comparisons validate the strengths of our approach, and the experimental results demonstrated that the proposed framework is outperform the state-of-the-art methods.
Tasks Motion Compensation, Motion Estimation, Multi-Frame Super-Resolution, Super-Resolution, Video Super-Resolution
Published 2020-02-15
URL https://arxiv.org/abs/2002.06378v1
PDF https://arxiv.org/pdf/2002.06378v1.pdf
PWC https://paperswithcode.com/paper/video-face-super-resolution-with-motion

Semi-supervised Learning via Conditional Rotation Angle Estimation

Title Semi-supervised Learning via Conditional Rotation Angle Estimation
Authors Hai-Ming Xu, Lingqiao Liu, Dong Gong
Abstract Self-supervised learning (SlfSL), aiming at learning feature representations through ingeniously designed pretext tasks without human annotation, has achieved compelling progress in the past few years. Very recently, SlfSL has also been identified as a promising solution for semi-supervised learning (SemSL) since it offers a new paradigm to utilize unlabeled data. This work further explores this direction by proposing to couple SlfSL with SemSL. Our insight is that the prediction target in SemSL can be modeled as the latent factor in the predictor for the SlfSL target. Marginalizing over the latent factor naturally derives a new formulation which marries the prediction targets of these two learning processes. By implementing this idea through a simple-but-effective SlfSL approach – rotation angle prediction, we create a new SemSL approach called Conditional Rotation Angle Estimation (CRAE). Specifically, CRAE is featured by adopting a module which predicts the image rotation angle conditioned on the candidate image class. Through experimental evaluation, we show that CRAE achieves superior performance over the other existing ways of combining SlfSL and SemSL. To further boost CRAE, we propose two extensions to strengthen the coupling between SemSL target and SlfSL target in basic CRAE. We show that this leads to an improved CRAE method which can achieve the state-of-the-art SemSL performance.
Published 2020-01-09
URL https://arxiv.org/abs/2001.02865v1
PDF https://arxiv.org/pdf/2001.02865v1.pdf
PWC https://paperswithcode.com/paper/semi-supervised-learning-via-conditional

De-randomized PAC-Bayes Margin Bounds: Applications to Non-convex and Non-smooth Predictors

Title De-randomized PAC-Bayes Margin Bounds: Applications to Non-convex and Non-smooth Predictors
Authors Arindam Banerjee, Tiancong Chen, Yingxue Zhou
Abstract In spite of several notable efforts, explaining the generalization of deterministic deep nets, e.g., ReLU-nets, has remained challenging. Existing approaches usually need to bound the Lipschitz constant of such deep nets but such bounds have been shown to increase substantially with the number of training samples yielding vacuous generalization bounds [Nagarajan and Kolter, 2019a]. In this paper, we present new de-randomized PAC-Bayes margin bounds for deterministic non-convex and non-smooth predictors, e.g., ReLU-nets. The bounds depend on a trade-off between the $L_2$-norm of the weights and the effective curvature (`flatness’) of the predictor, avoids any dependency on the Lipschitz constant, and yield meaningful (decreasing) bounds with increase in training set size. Our analysis first develops a de-randomization argument for non-convex but smooth predictors, e.g., linear deep networks (LDNs). We then consider non-smooth predictors which for any given input realize as a smooth predictor, e.g., ReLU-nets become some LDN for a given input, but the realized smooth predictor can be different for different inputs. For such non-smooth predictors, we introduce a new PAC-Bayes analysis that maintains distributions over the structure as well as parameters of smooth predictors, e.g., LDNs corresponding to ReLU-nets, which after de-randomization yields a bound for the deterministic non-smooth predictor. We present empirical results to illustrate the efficacy of our bounds over changing training set size and randomness in labels. |
Published 2020-02-23
URL https://arxiv.org/abs/2002.09956v1
PDF https://arxiv.org/pdf/2002.09956v1.pdf
PWC https://paperswithcode.com/paper/de-randomized-pac-bayes-margin-bounds

Removing Disparate Impact of Differentially Private Stochastic Gradient Descent on Model Accuracy

Title Removing Disparate Impact of Differentially Private Stochastic Gradient Descent on Model Accuracy
Authors Depeng Xu, Wei Du, Xintao Wu
Abstract When we enforce differential privacy in machine learning, the utility-privacy trade-off is different w.r.t. each group. Gradient clipping and random noise addition disproportionately affect underrepresented and complex classes and subgroups, which results in inequality in utility loss. In this work, we analyze the inequality in utility loss by differential privacy and propose a modified differentially private stochastic gradient descent (DPSGD), called DPSGD-F, to remove the potential disparate impact of differential privacy on the protected group. DPSGD-F adjusts the contribution of samples in a group depending on the group clipping bias such that differential privacy has no disparate impact on group utility. Our experimental evaluation shows how group sample size and group clipping bias affect the impact of differential privacy in DPSGD, and how adaptive clipping for each group helps to mitigate the disparate impact caused by differential privacy in DPSGD-F.
Published 2020-03-08
URL https://arxiv.org/abs/2003.03699v1
PDF https://arxiv.org/pdf/2003.03699v1.pdf
PWC https://paperswithcode.com/paper/removing-disparate-impact-of-differentially

On Large-Scale Dynamic Topic Modeling with Nonnegative CP Tensor Decomposition

Title On Large-Scale Dynamic Topic Modeling with Nonnegative CP Tensor Decomposition
Authors Miju Ahn, Nicole Eikmeier, Jamie Haddock, Lara Kassab, Alona Kryshchenko, Kathryn Leonard, Deanna Needell, R. W. M. A. Madushani, Elena Sizikova, Chuntian Wang
Abstract There is currently an unprecedented demand for large-scale temporal data analysis due to the explosive growth of data. Dynamic topic modeling has been widely used in social and data sciences with the goal of learning latent topics that emerge, evolve, and fade over time. Previous work on dynamic topic modeling primarily employ the method of nonnegative matrix factorization (NMF), where slices of the data tensor are each factorized into the product of lower-dimensional nonnegative matrices. With this approach, however, information contained in the temporal dimension of the data is often neglected or underutilized. To overcome this issue, we propose instead adopting the method of nonnegative CANDECOMP/PARAPAC (CP) tensor decomposition (NNCPD), where the data tensor is directly decomposed into a minimal sum of outer products of nonnegative vectors, thereby preserving the temporal information. The viability of NNCPD is demonstrated through application to both synthetic and real data, where significantly improved results are obtained compared to those of typical NMF-based methods. The advantages of NNCPD over such approaches are studied and discussed. To the best of our knowledge, this is the first time that NNCPD has been utilized for the purpose of dynamic topic modeling, and our findings will be transformative for both applications and further developments.
Published 2020-01-02
URL https://arxiv.org/abs/2001.00631v1
PDF https://arxiv.org/pdf/2001.00631v1.pdf
PWC https://paperswithcode.com/paper/on-large-scale-dynamic-topic-modeling-with

SideInfNet: A Deep Neural Network for Semi-Automatic Semantic Segmentation with Side Information

Title SideInfNet: A Deep Neural Network for Semi-Automatic Semantic Segmentation with Side Information
Authors Jing Yu Koh, Duc Thanh Nguyen, Quang-Trung Truong, Sai-Kit Yeung, Alexander Binder
Abstract Fully-automatic execution is the ultimate goal for many Computer Vision applications. However, this objective is not always realistic in tasks associated with high failure costs, such as medical applications. For these tasks, a compromise between fully-automatic execution and user interactions is often preferred due to desirable accuracy and performance. Semi-automatic methods require minimal effort from experts by allowing them to provide cues that guide computer algorithms. Inspired by the practicality and applicability of the semi-automatic approach, this paper proposes a novel deep neural network architecture, namely SideInfNet that effectively integrates features learnt from images with side information extracted from user annotations to produce high quality semantic segmentation results. To evaluate our method, we applied the proposed network to three semantic segmentation tasks and conducted extensive experiments on benchmark datasets. Experimental results and comparison with prior work have verified the superiority of our model, suggesting the generality and effectiveness of the model in semi-automatic semantic segmentation.
Tasks Semantic Segmentation
Published 2020-02-07
URL https://arxiv.org/abs/2002.02634v3
PDF https://arxiv.org/pdf/2002.02634v3.pdf
PWC https://paperswithcode.com/paper/sideinfnet-a-deep-neural-network-for-semi

RGB-based Semantic Segmentation Using Self-Supervised Depth Pre-Training

Title RGB-based Semantic Segmentation Using Self-Supervised Depth Pre-Training
Authors Jean Lahoud, Bernard Ghanem
Abstract Although well-known large-scale datasets, such as ImageNet, have driven image understanding forward, most of these datasets require extensive manual annotation and are thus not easily scalable. This limits the advancement of image understanding techniques. The impact of these large-scale datasets can be observed in almost every vision task and technique in the form of pre-training for initialization. In this work, we propose an easily scalable and self-supervised technique that can be used to pre-train any semantic RGB segmentation method. In particular, our pre-training approach makes use of automatically generated labels that can be obtained using depth sensors. These labels, denoted by HN-labels, represent different height and normal patches, which allow mining of local semantic information that is useful in the task of semantic RGB segmentation. We show how our proposed self-supervised pre-training with HN-labels can be used to replace ImageNet pre-training, while using 25x less images and without requiring any manual labeling. We pre-train a semantic segmentation network with our HN-labels, which resembles our final task more than pre-training on a less related task, e.g. classification with ImageNet. We evaluate on two datasets (NYUv2 and CamVid), and we show how the similarity in tasks is advantageous not only in speeding up the pre-training process, but also in achieving better final semantic segmentation accuracy than ImageNet pre-training
Tasks Semantic Segmentation
Published 2020-02-06
URL https://arxiv.org/abs/2002.02200v1
PDF https://arxiv.org/pdf/2002.02200v1.pdf
PWC https://paperswithcode.com/paper/rgb-based-semantic-segmentation-using-self

Accurate Temporal Action Proposal Generation with Relation-Aware Pyramid Network

Title Accurate Temporal Action Proposal Generation with Relation-Aware Pyramid Network
Authors Jialin Gao, Zhixiang Shi, Jiani Li, Guanshuo Wang, Yufeng Yuan, Shiming Ge, Xi Zhou
Abstract Accurate temporal action proposals play an important role in detecting actions from untrimmed videos. The existing approaches have difficulties in capturing global contextual information and simultaneously localizing actions with different durations. To this end, we propose a Relation-aware pyramid Network (RapNet) to generate highly accurate temporal action proposals. In RapNet, a novel relation-aware module is introduced to exploit bi-directional long-range relations between local features for context distilling. This embedded module enhances the RapNet in terms of its multi-granularity temporal proposal generation ability, given predefined anchor boxes. We further introduce a two-stage adjustment scheme to refine the proposal boundaries and measure their confidence in containing an action with snippet-level actionness. Extensive experiments on the challenging ActivityNet and THUMOS14 benchmarks demonstrate our RapNet generates superior accurate proposals over the existing state-of-the-art methods.
Tasks Temporal Action Proposal Generation
Published 2020-03-09
URL https://arxiv.org/abs/2003.04145v1
PDF https://arxiv.org/pdf/2003.04145v1.pdf
PWC https://paperswithcode.com/paper/accurate-temporal-action-proposal-generation

Learning landmark guided embeddings for animal re-identification

Title Learning landmark guided embeddings for animal re-identification
Authors Olga Moskvyak, Frederic Maire, Feras Dayoub, Mahsa Baktashmotlagh
Abstract Re-identification of individual animals in images can be ambiguous due to subtle variations in body markings between different individuals and no constraints on the poses of animals in the wild. Person re-identification is a similar task and it has been approached with a deep convolutional neural network (CNN) that learns discriminative embeddings for images of people. However, learning discriminative features for an individual animal is more challenging than for a person’s appearance due to the relatively small size of ecological datasets compared to labelled datasets of person’s identities. We propose to improve embedding learning by exploiting body landmarks information explicitly. Body landmarks are provided to the input of a CNN as confidence heatmaps that can be obtained from a separate body landmark predictor. The model is encouraged to use heatmaps by learning an auxiliary task of reconstructing input heatmaps. Body landmarks guide a feature extraction network to learn the representation of a distinctive pattern and its position on the body. We evaluate the proposed method on a large synthetic dataset and a small real dataset. Our method outperforms the same model without body landmarks input by 26% and 18% on the synthetic and the real datasets respectively. The method is robust to noise in input coordinates and can tolerate an error in coordinates up to 10% of the image size.
Tasks Person Re-Identification
Published 2020-01-09
URL https://arxiv.org/abs/2001.02801v1
PDF https://arxiv.org/pdf/2001.02801v1.pdf
PWC https://paperswithcode.com/paper/learning-landmark-guided-embeddings-for

Neural Fuzzy Extractors: A Secure Way to Use Artificial Neural Networks for Biometric User Authentication

Title Neural Fuzzy Extractors: A Secure Way to Use Artificial Neural Networks for Biometric User Authentication
Authors Abhishek Jana, Md Kamruzzaman Sarker, Monireh Ebrahimi, Pascal Hitzler, George T Amariucai
Abstract Powered by new advances in sensor development and artificial intelligence, the decreasing cost of computation, and the pervasiveness of handheld computation devices, biometric user authentication (and identification) is rapidly becoming ubiquitous. Modern approaches to biometric authentication, based on sophisticated machine learning techniques, cannot avoid storing either trained-classifier details or explicit user biometric data, thus exposing users’ credentials to falsification. In this paper, we introduce a secure way to handle user-specific information involved with the use of vector-space classifiers or artificial neural networks for biometric authentication. Our proposed architecture, called a Neural Fuzzy Extractor (NFE), allows the coupling of pre-existing classifiers with fuzzy extractors, through a artificial-neural-network-based buffer called an expander, with minimal or no performance degradation. The NFE thus offers all the performance advantages of modern deep-learning-based classifiers, and all the security of standard fuzzy extractors. We demonstrate the NFE retrofit to a classic artificial neural network for a simple scenario of fingerprint-based user authentication.
Published 2020-03-18
URL https://arxiv.org/abs/2003.08433v1
PDF https://arxiv.org/pdf/2003.08433v1.pdf
PWC https://paperswithcode.com/paper/neural-fuzzy-extractors-a-secure-way-to-use

Newtonian Monte Carlo: single-site MCMC meets second-order gradient methods

Title Newtonian Monte Carlo: single-site MCMC meets second-order gradient methods
Authors Nimar S. Arora, Nazanin Khosravani Tehrani, Kinjal Divesh Shah, Michael Tingley, Yucen Lily Li, Narjes Torabi, David Noursi, Sepehr Akhavan Masouleh, Eric Lippert, Erik Meijer
Abstract Single-site Markov Chain Monte Carlo (MCMC) is a variant of MCMC in which a single coordinate in the state space is modified in each step. Structured relational models are a good candidate for this style of inference. In the single-site context, second order methods become feasible because the typical cubic costs associated with these methods is now restricted to the dimension of each coordinate. Our work, which we call Newtonian Monte Carlo (NMC), is a method to improve MCMC convergence by analyzing the first and second order gradients of the target density to determine a suitable proposal density at each point. Existing first order gradient-based methods suffer from the problem of determining an appropriate step size. Too small a step size and it will take a large number of steps to converge, while a very large step size will cause it to overshoot the high density region. NMC is similar to the Newton-Raphson update in optimization where the second order gradient is used to automatically scale the step size in each dimension. However, our objective is to find a parameterized proposal density rather than the maxima. As a further improvement on existing first and second order methods, we show that random variables with constrained supports don’t need to be transformed before taking a gradient step. We demonstrate the efficiency of NMC on a number of different domains. For statistical models where the prior is conjugate to the likelihood, our method recovers the posterior quite trivially in one step. However, we also show results on fairly large non-conjugate models, where NMC performs better than adaptive first order methods such as NUTS or other inexact scalable inference methods such as Stochastic Variational Inference or bootstrapping.
Published 2020-01-15
URL https://arxiv.org/abs/2001.05567v1
PDF https://arxiv.org/pdf/2001.05567v1.pdf
PWC https://paperswithcode.com/paper/newtonian-monte-carlo-single-site-mcmc-meets

Image Quality Transfer Enhances Contrast and Resolution of Low-Field Brain MRI in African Paediatric Epilepsy Patients

Title Image Quality Transfer Enhances Contrast and Resolution of Low-Field Brain MRI in African Paediatric Epilepsy Patients
Authors Matteo Figini, Hongxiang Lin, Godwin Ogbole, Felice D Arco, Stefano B. Blumberg, David W. Carmichael, Ryutaro Tanno, Enrico Kaden, Biobele J. Brown, Ikeoluwa Lagunju, Helen J. Cross, Delmiro Fernandez-Reyes, Daniel C. Alexander
Abstract 1.5T or 3T scanners are the current standard for clinical MRI, but low-field (<1T) scanners are still common in many lower- and middle-income countries for reasons of cost and robustness to power failures. Compared to modern high-field scanners, low-field scanners provide images with lower signal-to-noise ratio at equivalent resolution, leaving practitioners to compensate by using large slice thickness and incomplete spatial coverage. Furthermore, the contrast between different types of brain tissue may be substantially reduced even at equal signal-to-noise ratio, which limits diagnostic value. Recently the paradigm of Image Quality Transfer has been applied to enhance 0.36T structural images aiming to approximate the resolution, spatial coverage, and contrast of typical 1.5T or 3T images. A variant of the neural network U-Net was trained using low-field images simulated from the publicly available 3T Human Connectome Project dataset. Here we present qualitative results from real and simulated clinical low-field brain images showing the potential value of IQT to enhance the clinical utility of readily accessible low-field MRIs in the management of epilepsy.
Published 2020-03-16
URL https://arxiv.org/abs/2003.07216v2
PDF https://arxiv.org/pdf/2003.07216v2.pdf
PWC https://paperswithcode.com/paper/image-quality-transfer-enhances-contrast-and

GANs May Have No Nash Equilibria

Title GANs May Have No Nash Equilibria
Authors Farzan Farnia, Asuman Ozdaglar
Abstract Generative adversarial networks (GANs) represent a zero-sum game between two machine players, a generator and a discriminator, designed to learn the distribution of data. While GANs have achieved state-of-the-art performance in several benchmark learning tasks, GAN minimax optimization still poses great theoretical and empirical challenges. GANs trained using first-order optimization methods commonly fail to converge to a stable solution where the players cannot improve their objective, i.e., the Nash equilibrium of the underlying game. Such issues raise the question of the existence of Nash equilibrium solutions in the GAN zero-sum game. In this work, we show through several theoretical and numerical results that indeed GAN zero-sum games may not have any local Nash equilibria. To characterize an equilibrium notion applicable to GANs, we consider the equilibrium of a new zero-sum game with an objective function given by a proximal operator applied to the original objective, a solution we call the proximal equilibrium. Unlike the Nash equilibrium, the proximal equilibrium captures the sequential nature of GANs, in which the generator moves first followed by the discriminator. We prove that the optimal generative model in Wasserstein GAN problems provides a proximal equilibrium. Inspired by these results, we propose a new approach, which we call proximal training, for solving GAN problems. We discuss several numerical experiments demonstrating the existence of proximal equilibrium solutions in GAN minimax problems.
Published 2020-02-21
URL https://arxiv.org/abs/2002.09124v1
PDF https://arxiv.org/pdf/2002.09124v1.pdf
PWC https://paperswithcode.com/paper/gans-may-have-no-nash-equilibria
comments powered by Disqus