April 3, 2020

# Paper Group AWR 44

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Pairwise Discriminative Neural PLDA for Speaker Verification. BERT-of-Theseus: Compressing BERT by Progressive Module Replacing. Lipschitz Lifelong Reinforcement Learning. Learning Compositional Neural Information Fusion for Human Parsing. Depth Based Semantic Scene Completion …

#### NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

Title NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Authors Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng
Abstract We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location $(x,y,z)$ and viewing direction $(\theta, \phi)$) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.
Published 2020-03-19
URL https://arxiv.org/abs/2003.08934v1
PDF https://arxiv.org/pdf/2003.08934v1.pdf
Repo https://github.com/bmild/nerf
Framework tf

#### Pairwise Discriminative Neural PLDA for Speaker Verification

Title Pairwise Discriminative Neural PLDA for Speaker Verification
Authors Shreyas Ramoji, Prashant Krishnan V, Prachi Singh, Sriram Ganapathy
Abstract The state-of-art approach to speaker verification involves the extraction of discriminative embeddings like x-vectors followed by a generative model back-end using a probabilistic linear discriminant analysis (PLDA). In this paper, we propose a Pairwise neural discriminative model for the task of speaker verification which operates on a pair of speaker embeddings such as x-vectors/i-vectors and outputs a score that can be considered as a scaled log-likelihood ratio. We construct a differentiable cost function which approximates speaker verification loss, namely the minimum detection cost. The pre-processing steps of linear discriminant analysis (LDA), unit length normalization and within class covariance normalization are all modeled as layers of a neural model and the speaker verification cost functions can be back-propagated through these layers during training. We also explore regularization techniques to prevent overfitting, which is a major concern in using discriminative back-end models for verification tasks. The experiments are performed on the NIST SRE 2018 development and evaluation datasets. We observe average relative improvements of 8% in CMN2 condition and 30% in VAST condition over the PLDA baseline system.
Published 2020-01-20
URL https://arxiv.org/abs/2001.07034v2
PDF https://arxiv.org/pdf/2001.07034v2.pdf
PWC https://paperswithcode.com/paper/pairwise-discriminative-neural-plda-for
Repo https://github.com/iiscleap/NeuralPlda
Framework pytorch

#### BERT-of-Theseus: Compressing BERT by Progressive Module Replacing

Title BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Authors Canwen Xu, Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou
Abstract In this paper, we propose a novel model compression approach to effectively compress BERT by progressive module replacing. Our approach first divides the original BERT into several modules and builds their compact substitutes. Then, we randomly replace the original modules with their substitutes to train the compact modules to mimic the behavior of the original modules. We progressively increase the probability of replacement through the training. In this way, our approach brings a deeper level of interaction between the original and compact models, and smooths the training process. Compared to the previous knowledge distillation approaches for BERT compression, our approach leverages only one loss function and one hyper-parameter, liberating human effort from hyper-parameter tuning. Our approach outperforms existing knowledge distillation approaches on GLUE benchmark, showing a new perspective of model compression.
Published 2020-02-07
URL https://arxiv.org/abs/2002.02925v3
PDF https://arxiv.org/pdf/2002.02925v3.pdf
PWC https://paperswithcode.com/paper/bert-of-theseus-compressing-bert-by
Repo https://github.com/JetRunner/BERT-of-Theseus
Framework pytorch

#### Lipschitz Lifelong Reinforcement Learning

Title Lipschitz Lifelong Reinforcement Learning
Authors Erwan Lecarpentier, David Abel, Kavosh Asadi, Yuu Jinnai, Emmanuel Rachelson, Michael L. Littman
Abstract We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These theoretical results lead us to a value transfer method for Lifelong RL, which we use to build a PAC-MDP algorithm with improved convergence rate. We illustrate the benefits of the method in Lifelong RL experiments.
Published 2020-01-15
URL https://arxiv.org/abs/2001.05411v2
PDF https://arxiv.org/pdf/2001.05411v2.pdf
PWC https://paperswithcode.com/paper/lipschitz-lifelong-reinforcement-learning-1
Repo https://github.com/SuReLI/llrl
Framework none

#### Learning Compositional Neural Information Fusion for Human Parsing

Title Learning Compositional Neural Information Fusion for Human Parsing
Authors Wenguan Wang, Zhijie Zhang, Siyuan Qi, Jianbing Shen, Yanwei Pang, Ling Shao
Abstract This work proposes to combine neural networks with the compositional hierarchy of human bodies for efficient and complete human parsing. We formulate the approach as a neural information fusion framework. Our model assembles the information from three inference processes over the hierarchy: direct inference (directly predicting each part of a human body using image information), bottom-up inference (assembling knowledge from constituent parts), and top-down inference (leveraging context from parent nodes). The bottom-up and top-down inferences explicitly model the compositional and decompositional relations in human bodies, respectively. In addition, the fusion of multi-source information is conditioned on the inputs, i.e., by estimating and considering the confidence of the sources. The whole model is end-to-end differentiable, explicitly modeling information flows and structures. Our approach is extensively evaluated on four popular datasets, outperforming the state-of-the-arts in all cases, with a fast processing speed of 23fps. Our code and results have been released to help ease future research in this direction.
Published 2020-01-19
URL https://arxiv.org/abs/2001.06804v1
PDF https://arxiv.org/pdf/2001.06804v1.pdf
PWC https://paperswithcode.com/paper/learning-compositional-neural-information-1
Repo https://github.com/ZzzjzzZ/CompositionalHumanParsing
Framework none

#### Depth Based Semantic Scene Completion with Position Importance Aware Loss

Title Depth Based Semantic Scene Completion with Position Importance Aware Loss
Authors Yu Liu, Jie Li, Xia Yuan, Chunxia Zhao, Roland Siegwart, Ian Reid, Cesar Cadena
Abstract Semantic Scene Completion (SSC) refers to the task of inferring the 3D semantic segmentation of a scene while simultaneously completing the 3D shapes. We propose PALNet, a novel hybrid network for SSC based on single depth. PALNet utilizes a two-stream network to extract both 2D and 3D features from multi-stages using fine-grained depth information to efficiently captures the context, as well as the geometric cues of the scene. Current methods for SSC treat all parts of the scene equally causing unnecessary attention to the interior of objects. To address this problem, we propose Position Aware Loss(PA-Loss) which is position importance aware while training the network. Specifically, PA-Loss considers Local Geometric Anisotropy to determine the importance of different positions within the scene. It is beneficial for recovering key details like the boundaries of objects and the corners of the scene. Comprehensive experiments on two benchmark datasets demonstrate the effectiveness of the proposed method and its superior performance. Models and Video demo can be found at: https://github.com/UniLauX/PALNet.
Tasks 3D Semantic Segmentation, Semantic Segmentation
Published 2020-01-29
URL https://arxiv.org/abs/2001.10709v2
PDF https://arxiv.org/pdf/2001.10709v2.pdf
PWC https://paperswithcode.com/paper/depth-based-semantic-scene-completion-with
Repo https://github.com/UniLauX/PALNet
Framework pytorch

#### Thermal to Visible Face Recognition Using Deep Autoencoders

Title Thermal to Visible Face Recognition Using Deep Autoencoders
Authors Alperen Kantarcı, Hazım Kemal Ekenel
Abstract Visible face recognition systems achieve nearly perfect recognition accuracies using deep learning. However, in lack of light, these systems perform poorly. A way to deal with this problem is thermal to visible cross-domain face matching. This is a desired technology because of its usefulness in night time surveillance. Nevertheless, due to differences between two domains, it is a very challenging face recognition problem. In this paper, we present a deep autoencoder based system to learn the mapping between visible and thermal face images. Also, we assess the impact of alignment in thermal to visible face recognition. For this purpose, we manually annotate the facial landmarks on the Carl and EURECOM datasets. The proposed approach is extensively tested on three publicly available datasets: Carl, UND-X1, and EURECOM. Experimental results show that the proposed approach improves the state-of-the-art significantly. We observe that alignment increases the performance by around 2%. Annotated facial landmark positions in this study can be downloaded from the following link: github.com/Alpkant/Thermal-to-Visible-Face-Recognition-Using-Deep-Autoencoders .
Published 2020-02-10
URL https://arxiv.org/abs/2002.04219v1
PDF https://arxiv.org/pdf/2002.04219v1.pdf
PWC https://paperswithcode.com/paper/thermal-to-visible-face-recognition-using
Repo https://github.com/Alpkant/Thermal-to-Visible-Face-Recognition-Using-Deep-Autoencoders
Framework none

#### A deep-learning based Bayesian approach to seismic imaging and uncertainty quantification

Title A deep-learning based Bayesian approach to seismic imaging and uncertainty quantification
Authors Ali Siahkoohi, Gabrio Rizzuti, Felix J. Herrmann
Abstract Uncertainty quantification is essential when dealing with ill-conditioned inverse problems due to the inherent nonuniqueness of the solution. Bayesian approaches allow us to determine how likely an estimation of the unknown parameters is via formulating the posterior distribution. Unfortunately, it is often not possible to formulate a prior distribution that precisely encodes our prior knowledge about the unknown. Furthermore, adherence to handcrafted priors may greatly bias the outcome of the Bayesian analysis. To address this issue, we propose to use the functional form of a randomly initialized convolutional neural network as an implicit structured prior, which is shown to promote natural images and excludes images with unnatural noise. In order to incorporate the model uncertainty into the final estimate, we sample the posterior distribution using stochastic gradient Langevin dynamics and perform Bayesian model averaging on the obtained samples. Our synthetic numerical experiment verifies that deep priors combined with Bayesian model averaging are able to partially circumvent imaging artifacts and reduce the risk of overfitting in the presence of extreme noise. Finally, we present pointwise variance of the estimates as a measure of uncertainty, which coincides with regions that are more difficult to image.
Published 2020-01-13
URL https://arxiv.org/abs/2001.04567v2
PDF https://arxiv.org/pdf/2001.04567v2.pdf
PWC https://paperswithcode.com/paper/a-deep-learning-based-bayesian-approach-to
Repo https://github.com/alisiahkoohi/seismic-imaging-with-SGLD
Framework pytorch

#### Multiplicative Gaussian Particle Filter

Title Multiplicative Gaussian Particle Filter
Authors Xuan Su, Wee Sun Lee, Zhen Zhang
Abstract We propose a new sampling-based approach for approximate inference in filtering problems. Instead of approximating conditional distributions with a finite set of states, as done in particle filters, our approach approximates the distribution with a weighted sum of functions from a set of continuous functions. Central to the approach is the use of sampling to approximate multiplications in the Bayes filter. We provide theoretical analysis, giving conditions for sampling to give good approximation. We next specialize to the case of weighted sums of Gaussians, and show how properties of Gaussians enable closed-form transition and efficient multiplication. Lastly, we conduct preliminary experiments on a robot localization problem and compare performance with the particle filter, to demonstrate the potential of the proposed method.
Published 2020-02-29
URL https://arxiv.org/abs/2003.00218v1
PDF https://arxiv.org/pdf/2003.00218v1.pdf
PWC https://paperswithcode.com/paper/multiplicative-gaussian-particle-filter
Repo https://github.com/suxuann/mgpf
Framework tf

#### An Investigation into the Stochasticity of Batch Whitening

Title An Investigation into the Stochasticity of Batch Whitening
Authors Lei Huang, Lei Zhao, Yi Zhou, Fan Zhu, Li Liu, Ling Shao
Abstract Batch Normalization (BN) is extensively employed in various network architectures by performing standardization within mini-batches. A full understanding of the process has been a central target in the deep learning communities. Unlike existing works, which usually only analyze the standardization operation, this paper investigates the more general Batch Whitening (BW). Our work originates from the observation that while various whitening transformations equivalently improve the conditioning, they show significantly different behaviors in discriminative scenarios and training Generative Adversarial Networks (GANs). We attribute this phenomenon to the stochasticity that BW introduces. We quantitatively investigate the stochasticity of different whitening transformations and show that it correlates well with the optimization behaviors during training. We also investigate how stochasticity relates to the estimation of population statistics during inference. Based on our analysis, we provide a framework for designing and comparing BW algorithms in different scenarios. Our proposed BW algorithm improves the residual networks by a significant margin on ImageNet classification. Besides, we show that the stochasticity of BW can improve the GAN’s performance with, however, the sacrifice of the training stability.
Published 2020-03-27
URL https://arxiv.org/abs/2003.12327v1
PDF https://arxiv.org/pdf/2003.12327v1.pdf
PWC https://paperswithcode.com/paper/an-investigation-into-the-stochasticity-of
Repo https://github.com/huangleiBuaa/StochasticityBW
Framework pytorch

#### A New Meta-Baseline for Few-Shot Learning

Title A New Meta-Baseline for Few-Shot Learning
Authors Yinbo Chen, Xiaolong Wang, Zhuang Liu, Huijuan Xu, Trevor Darrell
Abstract Meta-learning has become a popular framework for few-shot learning in recent years, with the goal of learning a model from collections of few-shot classification tasks. While more and more novel meta-learning models are being proposed, our research has uncovered simple baselines that have been overlooked. We present a Meta-Baseline method, by pre-training a classifier on all base classes and meta-learning on a nearest-centroid based few-shot classification algorithm, it outperforms recent state-of-the-art methods by a large margin. Why does this simple method work so well? In the meta-learning stage, we observe that a model generalizing better on unseen tasks from base classes can have a decreasing performance on tasks from novel classes, indicating a potential objective discrepancy. We find both pre-training and inheriting a good few-shot classification metric from the pre-trained classifier are important for Meta-Baseline, which potentially helps the model better utilize the pre-trained representations with stronger transferability. Furthermore, we investigate when we need meta-learning in this Meta-Baseline. Our work sets up a new solid benchmark for this field and sheds light on further understanding the phenomenons in the meta-learning framework for few-shot learning.
Published 2020-03-09
URL https://arxiv.org/abs/2003.04390v2
PDF https://arxiv.org/pdf/2003.04390v2.pdf
PWC https://paperswithcode.com/paper/a-new-meta-baseline-for-few-shot-learning
Repo https://github.com/cyvius96/few-shot-meta-baseline
Framework pytorch

#### Filter Grafting for Deep Neural Networks

Title Filter Grafting for Deep Neural Networks
Authors Fanxu Meng, Hao Cheng, Ke Li, Zhixin Xu, Rongrong Ji, Xing Sun, Gaungming Lu
Abstract This paper proposes a new learning paradigm called filter grafting, which aims to improve the representation capability of Deep Neural Networks (DNNs). The motivation is that DNNs have unimportant (invalid) filters (e.g., l1 norm close to 0). These filters limit the potential of DNNs since they are identified as having little effect on the network. While filter pruning removes these invalid filters for efficiency consideration, filter grafting re-activates them from an accuracy boosting perspective. The activation is processed by grafting external information (weights) into invalid filters. To better perform the grafting process, we develop an entropy-based criterion to measure the information of filters and an adaptive weighting strategy for balancing the grafted information among networks. After the grafting operation, the network has very few invalid filters compared with its untouched state, enpowering the model with more representation capacity. We also perform extensive experiments on the classification and recognition tasks to show the superiority of our method. For example, the grafted MobileNetV2 outperforms the non-grafted MobileNetV2 by about 7 percent on CIFAR-100 dataset. Code is available at https://github.com/fxmeng/filter-grafting.git.
Published 2020-01-15
URL https://arxiv.org/abs/2001.05868v3
PDF https://arxiv.org/pdf/2001.05868v3.pdf
PWC https://paperswithcode.com/paper/filter-grafting-for-deep-neural-networks
Repo https://github.com/fxmeng/filter-grafting
Framework pytorch

#### Self-Supervised Linear Motion Deblurring

Title Self-Supervised Linear Motion Deblurring
Authors Peidong Liu, Joel Janai, Marc Pollefeys, Torsten Sattler, Andreas Geiger
Abstract Motion blurry images challenge many computer vision algorithms, e.g, feature detection, motion estimation, or object recognition. Deep convolutional neural networks are state-of-the-art for image deblurring. However, obtaining training data with corresponding sharp and blurry image pairs can be difficult. In this paper, we present a differentiable reblur model for self-supervised motion deblurring, which enables the network to learn from real-world blurry image sequences without relying on sharp images for supervision. Our key insight is that motion cues obtained from consecutive images yield sufficient information to inform the deblurring task. We therefore formulate deblurring as an inverse rendering problem, taking into account the physical image formation process: we first predict two deblurred images from which we estimate the corresponding optical flow. Using these predictions, we re-render the blurred images and minimize the difference with respect to the original blurry inputs. We use both synthetic and real dataset for experimental evaluations. Our experiments demonstrate that self-supervised single image deblurring is really feasible and leads to visually compelling results.
Tasks Deblurring, Motion Estimation, Object Recognition, Optical Flow Estimation
Published 2020-02-10
URL https://arxiv.org/abs/2002.04070v1
PDF https://arxiv.org/pdf/2002.04070v1.pdf
PWC https://paperswithcode.com/paper/self-supervised-linear-motion-deblurring
Repo https://github.com/ethliup/SelfDeblur
Framework pytorch

#### A System for Real-Time Interactive Analysis of Deep Learning Training

Title A System for Real-Time Interactive Analysis of Deep Learning Training
Authors Shital Shah, Roland Fernandez, Steven Drucker
Abstract Performing diagnosis or exploratory analysis during the training of deep learning models is challenging but often necessary for making a sequence of decisions guided by the incremental observations. Currently available systems for this purpose are limited to monitoring only the logged data that must be specified before the training process starts. Each time a new information is desired, a cycle of stop-change-restart is required in the training process. These limitations make interactive exploration and diagnosis tasks difficult, imposing long tedious iterations during the model development. We present a new system that enables users to perform interactive queries on live processes generating real-time information that can be rendered in multiple formats on multiple surfaces in the form of several desired visualizations simultaneously. To achieve this, we model various exploratory inspection and diagnostic tasks for deep learning training processes as specifications for streams using a map-reduce paradigm with which many data scientists are already familiar. Our design achieves generality and extensibility by defining composable primitives which is a fundamentally different approach than is used by currently available systems. The open source implementation of our system is available as TensorWatch project at https://github.com/microsoft/tensorwatch.
Published 2020-01-05
URL https://arxiv.org/abs/2001.01215v2
PDF https://arxiv.org/pdf/2001.01215v2.pdf
PWC https://paperswithcode.com/paper/a-system-for-real-time-interactive-analysis
Repo https://github.com/microsoft/tensorwatch
Framework tf

#### Comparing Rewinding and Fine-tuning in Neural Network Pruning

Title Comparing Rewinding and Fine-tuning in Neural Network Pruning
Authors Alex Renda, Jonathan Frankle, Michael Carbin
Abstract Many neural network pruning algorithms proceed in three steps: train the network to completion, remove unwanted structure to compress the network, and retrain the remaining structure to recover lost accuracy. The standard retraining technique, fine-tuning, trains the unpruned weights from their final trained values using a small fixed learning rate. In this paper, we compare fine-tuning to alternative retraining techniques. Weight rewinding (as proposed by Frankle et al., (2019)), rewinds unpruned weights to their values from earlier in training and retrains them from there using the original training schedule. Learning rate rewinding (which we propose) trains the unpruned weights from their final values using the same learning rate schedule as weight rewinding. Both rewinding techniques outperform fine-tuning, forming the basis of a network-agnostic pruning algorithm that matches the accuracy and compression ratios of several more network-specific state-of-the-art techniques.