April 3, 2020

# Paper Group ANR 73

Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition. Online LiDAR-SLAM for Legged Robots with Robust Registration and Deep-Learned Loop Closure. Decentralized SGD with Over-the-Air Computation. MaxUp: A Simple Way to Improve Generalization of Neural Network Training. Neural Kernels Without Tangents. On the Convergence o …

#### Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition

Title Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition
Authors Zhong Meng, Jinyu Li, Yashesh Gaur, Yifan Gong
Abstract Teacher-student (T/S) has shown to be effective for domain adaptation of deep neural network acoustic models in hybrid speech recognition systems. In this work, we extend the T/S learning to large-scale unsupervised domain adaptation of an attention-based end-to-end (E2E) model through two levels of knowledge transfer: teacher’s token posteriors as soft labels and one-best predictions as decoder guidance. To further improve T/S learning with the help of ground-truth labels, we propose adaptive T/S (AT/S) learning. Instead of conditionally choosing from either the teacher’s soft token posteriors or the one-hot ground-truth label, in AT/S, the student always learns from both the teacher and the ground truth with a pair of adaptive weights assigned to the soft and one-hot labels quantifying the confidence on each of the knowledge sources. The confidence scores are dynamically estimated at each decoder step as a function of the soft and one-hot labels. With 3400 hours parallel close-talk and far-field Microsoft Cortana data for domain adaptation, T/S and AT/S achieve 6.3% and 10.3% relative word error rate improvement over a strong E2E model trained with the same amount of far-field data.
Published 2020-01-06
URL https://arxiv.org/abs/2001.01798v1
PDF https://arxiv.org/pdf/2001.01798v1.pdf
Repo
Framework

#### Online LiDAR-SLAM for Legged Robots with Robust Registration and Deep-Learned Loop Closure

Title Online LiDAR-SLAM for Legged Robots with Robust Registration and Deep-Learned Loop Closure
Authors Milad Ramezani, Georgi Tinchev, Egor Iuganov, Maurice Fallon
Abstract In this paper, we present a factor-graph LiDAR-SLAM system which incorporates a state-of-the-art deeply learned feature-based loop closure detector to enable a legged robot to localize and map in industrial environments. These facilities can be badly lit and comprised of indistinct metallic structures, thus our system uses only LiDAR sensing and was developed to run on the quadruped robot’s navigation PC. Point clouds are accumulated using an inertial-kinematic state estimator before being aligned using ICP registration. To close loops we use a loop proposal mechanism which matches individual segments between clouds. We trained a descriptor offline to match these segments. The efficiency of our method comes from carefully designing the network architecture to minimize the number of parameters such that this deep learning method can be deployed in real-time using only the CPU of a legged robot, a major contribution of this work. The set of odometry and loop closure factors are updated using pose graph optimization. Finally we present an efficient risk alignment prediction method which verifies the reliability of the registrations. Experimental results at an industrial facility demonstrated the robustness and flexibility of our system, including autonomous following paths derived from the SLAM map.
Published 2020-01-28
URL https://arxiv.org/abs/2001.10249v1
PDF https://arxiv.org/pdf/2001.10249v1.pdf
PWC https://paperswithcode.com/paper/online-lidar-slam-for-legged-robots-with
Repo
Framework

#### Decentralized SGD with Over-the-Air Computation

Title Decentralized SGD with Over-the-Air Computation
Authors Emre Ozfatura, Stefano Rini, Deniz Gunduz
Abstract We study the performance of decentralized stochastic gradient descent (DSGD) in a wireless network, where the nodes collaboratively optimize an objective function using their local datasets. Unlike the conventional setting, where the nodes communicate over error-free orthogonal communication links, we assume that transmissions are prone to additive noise and interference.We first consider a point-to-point (P2P) transmission strategy, termed the OAC-P2P scheme, in which the node pairs are scheduled in an orthogonal fashion to minimize interference. Since in the DSGD framework, each node requires a linear combination of the neighboring models at the consensus step, we then propose the OAC-MAC scheme, which utilizes the signal superposition property of the wireless medium to achieve over-the-air computation (OAC). For both schemes, we cast the scheduling problem as a graph coloring problem. We numerically evaluate the performance of these two schemes for the MNIST image classification task under various network conditions. We show that the OAC-MAC scheme attains better convergence performance with a fewer communication rounds.
Published 2020-03-06
URL https://arxiv.org/abs/2003.04216v1
PDF https://arxiv.org/pdf/2003.04216v1.pdf
PWC https://paperswithcode.com/paper/decentralized-sgd-with-over-the-air
Repo
Framework

#### MaxUp: A Simple Way to Improve Generalization of Neural Network Training

Title MaxUp: A Simple Way to Improve Generalization of Neural Network Training
Authors Chengyue Gong, Tongzheng Ren, Mao Ye, Qiang Liu
Abstract We propose \emph{MaxUp}, an embarrassingly simple, highly effective technique for improving the generalization performance of machine learning models, especially deep neural networks. The idea is to generate a set of augmented data with some random perturbations or transforms and minimize the maximum, or worst case loss over the augmented data. By doing so, we implicitly introduce a smoothness or robustness regularization against the random perturbations, and hence improve the generation performance. For example, in the case of Gaussian perturbation, \emph{MaxUp} is asymptotically equivalent to using the gradient norm of the loss as a penalty to encourage smoothness. We test \emph{MaxUp} on a range of tasks, including image classification, language modeling, and adversarial certification, on which \emph{MaxUp} consistently outperforms the existing best baseline methods, without introducing substantial computational overhead. In particular, we improve ImageNet classification from the state-of-the-art top-1 accuracy $85.5%$ without extra data to $85.8%$. Code will be released soon.
Published 2020-02-20
URL https://arxiv.org/abs/2002.09024v1
PDF https://arxiv.org/pdf/2002.09024v1.pdf
PWC https://paperswithcode.com/paper/maxup-a-simple-way-to-improve-generalization
Repo
Framework

#### Neural Kernels Without Tangents

Title Neural Kernels Without Tangents
Authors Vaishaal Shankar, Alex Fang, Wenshuo Guo, Sara Fridovich-Keil, Ludwig Schmidt, Jonathan Ragan-Kelley, Benjamin Recht
Abstract We investigate the connections between neural networks and simple building blocks in kernel space. In particular, using well established feature space tools such as direct sum, averaging, and moment lifting, we present an algebra for creating “compositional” kernels from bags of features. We show that these operations correspond to many of the building blocks of “neural tangent kernels (NTK)". Experimentally, we show that there is a correlation in test error between neural network architectures and the associated kernels. We construct a simple neural network architecture using only 3x3 convolutions, 2x2 average pooling, ReLU, and optimized with SGD and MSE loss that achieves 96% accuracy on CIFAR10, and whose corresponding compositional kernel achieves 90% accuracy. We also use our constructions to investigate the relative performance of neural networks, NTKs, and compositional kernels in the small dataset regime. In particular, we find that compositional kernels outperform NTKs and neural networks outperform both kernel methods.
Published 2020-03-04
URL https://arxiv.org/abs/2003.02237v2
PDF https://arxiv.org/pdf/2003.02237v2.pdf
PWC https://paperswithcode.com/paper/neural-kernels-without-tangents
Repo
Framework

#### On the Convergence of the Dynamic Inner PCA Algorithm

Title On the Convergence of the Dynamic Inner PCA Algorithm
Authors Sungho Shin, Alex D. Smith, S. Joe Qin, Victor M. Zavala
Abstract Dynamic inner principal component analysis (DiPCA) is a powerful method for the analysis of time-dependent multivariate data. DiPCA extracts dynamic latent variables that capture the most dominant temporal trends by solving a large-scale, dense, and nonconvex nonlinear program (NLP). A scalable decomposition algorithm has been recently proposed in the literature to solve these challenging NLPs. The decomposition algorithm performs well in practice but its convergence properties are not well understood. In this work, we show that this algorithm is a specialized variant of a coordinate maximization algorithm. This observation allows us to explain why the decomposition algorithm might work (or not) in practice and can guide improvements. We compare the performance of the decomposition strategies with that of the off-the-shelf solver Ipopt. The results show that decomposition is more scalable and, surprisingly, delivers higher quality solutions.
Published 2020-03-12
URL https://arxiv.org/abs/2003.05928v1
PDF https://arxiv.org/pdf/2003.05928v1.pdf
PWC https://paperswithcode.com/paper/on-the-convergence-of-the-dynamic-inner-pca
Repo
Framework

#### Review: Noise and artifact reduction for MRI using deep learning

Title Review: Noise and artifact reduction for MRI using deep learning
Abstract For several years, numerous attempts have been made to reduce noise and artifacts in MRI. Although there have been many successful methods to address these problems, practical implementation for clinical images is still challenging because of its complicated mechanism. Recently, deep learning received considerable attention, emerging as a machine learning approach in delivering robust MR image processing. The purpose here is therefore to explore further and review noise and artifact reduction using deep learning for MRI.
Published 2020-02-28
URL https://arxiv.org/abs/2002.12889v1
PDF https://arxiv.org/pdf/2002.12889v1.pdf
PWC https://paperswithcode.com/paper/review-noise-and-artifact-reduction-for-mri
Repo
Framework

#### Single Image Depth Estimation Trained via Depth from Defocus Cues

Title Single Image Depth Estimation Trained via Depth from Defocus Cues
Authors Shir Gur, Lior Wolf
Abstract Estimating depth from a single RGB images is a fundamental task in computer vision, which is most directly solved using supervised deep learning. In the field of unsupervised learning of depth from a single RGB image, depth is not given explicitly. Existing work in the field receives either a stereo pair, a monocular video, or multiple views, and, using losses that are based on structure-from-motion, trains a depth estimation network. In this work, we rely, instead of different views, on depth from focus cues. Learning is based on a novel Point Spread Function convolutional layer, which applies location specific kernels that arise from the Circle-Of-Confusion in each image location. We evaluate our method on data derived from five common datasets for depth estimation and lightfield images, and present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches. Since the phenomenon of depth from defocus is not dataset specific, we hypothesize that learning based on it would overfit less to the specific content in each dataset. Our experiments show that this is indeed the case, and an estimator learned on one dataset using our method provides better results on other datasets, than the directly supervised methods.
Published 2020-01-14
URL https://arxiv.org/abs/2001.05036v1
PDF https://arxiv.org/pdf/2001.05036v1.pdf
PWC https://paperswithcode.com/paper/single-image-depth-estimation-trained-via-1
Repo
Framework

#### Convex Fairness Constrained Model Using Causal Effect Estimators

Title Convex Fairness Constrained Model Using Causal Effect Estimators
Authors Hikaru Ogura, Akiko Takeda
Abstract Recent years have seen much research on fairness in machine learning. Here, mean difference (MD) or demographic parity is one of the most popular measures of fairness. However, MD quantifies not only discrimination but also explanatory bias which is the difference of outcomes justified by explanatory features. In this paper, we devise novel models, called FairCEEs, which remove discrimination while keeping explanatory bias. The models are based on estimators of causal effect utilizing propensity score analysis. We prove that FairCEEs with the squared loss theoretically outperform a naive MD constraint model. We provide an efficient algorithm for solving FairCEEs in regression and binary classification tasks. In our experiment on synthetic and real-world data in these two tasks, FairCEEs outperformed an existing model that considers explanatory bias in specific cases.
Published 2020-02-16
URL https://arxiv.org/abs/2002.06501v1
PDF https://arxiv.org/pdf/2002.06501v1.pdf
PWC https://paperswithcode.com/paper/convex-fairness-constrained-model-using
Repo
Framework

#### Hold me tight! Influence of discriminative features on deep network boundaries

Title Hold me tight! Influence of discriminative features on deep network boundaries
Authors Guillermo Ortiz-Jimenez, Apostolos Modas, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard
Abstract Important insights towards the explainability of neural networks and their properties reside in the formation of their decision boundaries. In this work, we borrow tools from the field of adversarial robustness and propose a new framework that permits to relate the features of the dataset with the distance of data samples to the decision boundary along specific directions. We demonstrate that the inductive bias of deep learning has the tendency to generate classification functions that are invariant along non-discriminative directions of the dataset. More surprisingly, we further show that training on small perturbations of the data samples are sufficient to completely change the decision boundary. This is actually the characteristic exploited by the so-called adversarial training to produce robust classifiers. Our general framework can be used to reveal the effect of specific dataset features on the macroscopic properties of deep models and to develop a better understanding of the successes and limitations of deep learning.
Published 2020-02-15
URL https://arxiv.org/abs/2002.06349v1
PDF https://arxiv.org/pdf/2002.06349v1.pdf
PWC https://paperswithcode.com/paper/hold-me-tight-influence-of-discriminative
Repo
Framework

#### Circumventing Outliers of AutoAugment with Knowledge Distillation

Title Circumventing Outliers of AutoAugment with Knowledge Distillation
Authors Longhui Wei, An Xiao, Lingxi Xie, Xin Chen, Xiaopeng Zhang, Qi Tian
Abstract AutoAugment has been a powerful algorithm that improves the accuracy of many vision tasks, yet it is sensitive to the operator space as well as hyper-parameters, and an improper setting may degenerate network optimization. This paper delves deep into the working mechanism, and reveals that AutoAugment may remove part of discriminative information from the training image and so insisting on the ground-truth label is no longer the best option. To relieve the inaccuracy of supervision, we make use of knowledge distillation that refers to the output of a teacher model to guide network training. Experiments are performed in standard image classification benchmarks, and demonstrate the effectiveness of our approach in suppressing noise of data augmentation and stabilizing training. Upon the cooperation of knowledge distillation and AutoAugment, we claim the new state-of-the-art on ImageNet classification with a top-1 accuracy of 85.8%.
Published 2020-03-25
URL https://arxiv.org/abs/2003.11342v1
PDF https://arxiv.org/pdf/2003.11342v1.pdf
PWC https://paperswithcode.com/paper/circumventing-outliers-of-autoaugment-with
Repo
Framework

#### The Effect of Data Ordering in Image Classification

Title The Effect of Data Ordering in Image Classification
Authors Ethem F. Can, Aysu Ezen-Can
Abstract The success stories from deep learning models increase every day spanning different tasks from image classification to natural language understanding. With the increasing popularity of these models, scientists spend more and more time finding the optimal parameters and best model architectures for their tasks. In this paper, we focus on the ingredient that feeds these machines: the data. We hypothesize that the data ordering affects how well a model performs. To that end, we conduct experiments on an image classification task using ImageNet dataset and show that some data orderings are better than others in terms of obtaining higher classification accuracies. Experimental results show that independent of model architecture, learning rate and batch size, ordering of the data significantly affects the outcome. We show these findings using different metrics: NDCG, accuracy @ 1 and accuracy @ 5. Our goal here is to show that not only parameters and model architectures but also the data ordering has a say in obtaining better results.
Published 2020-01-08
URL https://arxiv.org/abs/2001.05857v1
PDF https://arxiv.org/pdf/2001.05857v1.pdf
PWC https://paperswithcode.com/paper/the-effect-of-data-ordering-in-image
Repo
Framework

#### Inverse Learning of Symmetry Transformations

Title Inverse Learning of Symmetry Transformations
Authors Mario Wieser, Sonali Parbhoo, Aleksander Wieczorek, Volker Roth
Abstract Symmetry transformations induce invariances and are a crucial building block of modern machine learning algorithms. Some transformations can be described analytically, e.g. geometric invariances. However, in many complex domains, such as the chemical space, invariances can be observed yet the corresponding symmetry transformation cannot be formulated analytically. Thus, the goal of our work is to learn the symmetry transformation that induced this invariance. To address this task, we propose learning two latent subspaces, where the first subspace captures the property and the second subspace the remaining invariant information. Our approach is based on the deep information bottleneck principle in combination with a mutual information regulariser. Unlike previous methods however, we focus on estimating mutual information in continuous rather than binary settings. This poses many challenges as mutual information cannot be meaningfully minimised in continuous domains. Therefore, we base the calculation of mutual information on correlation matrices in combination with a bijective variable transformation. Extensive experiments demonstrate that our model outperforms state-of-the-art methods on artificial and molecular datasets.
Published 2020-02-07
URL https://arxiv.org/abs/2002.02782v1
PDF https://arxiv.org/pdf/2002.02782v1.pdf
PWC https://paperswithcode.com/paper/inverse-learning-of-symmetry-transformations
Repo
Framework

#### Type I Attack for Generative Models

Title Type I Attack for Generative Models
Authors Chengjin Sun, Sizhe Chen, Jia Cai, Xiaolin Huang
Abstract Generative models are popular tools with a wide range of applications. Nevertheless, it is as vulnerable to adversarial samples as classifiers. The existing attack methods mainly focus on generating adversarial examples by adding imperceptible perturbations to input, which leads to wrong result. However, we focus on another aspect of attack, i.e., cheating models by significant changes. The former induces Type II error and the latter causes Type I error. In this paper, we propose Type I attack to generative models such as VAE and GAN. One example given in VAE is that we can change an original image significantly to a meaningless one but their reconstruction results are similar. To implement the Type I attack, we destroy the original one by increasing the distance in input space while keeping the output similar because different inputs may correspond to similar features for the property of deep neural network. Experimental results show that our attack method is effective to generate Type I adversarial examples for generative models on large-scale image datasets.
Published 2020-03-04
URL https://arxiv.org/abs/2003.01872v1
PDF https://arxiv.org/pdf/2003.01872v1.pdf
PWC https://paperswithcode.com/paper/type-i-attack-for-generative-models
Repo
Framework

#### Is There Tradeoff between Spatial and Temporal in Video Super-Resolution?

Title Is There Tradeoff between Spatial and Temporal in Video Super-Resolution?
Authors Haochen Zhang, Dong Liu, Zhiwei Xiong
Abstract Recent advances of deep learning lead to great success of image and video super-resolution (SR) methods that are based on convolutional neural networks (CNN). For video SR, advanced algorithms have been proposed to exploit the temporal correlation between low-resolution (LR) video frames, and/or to super-resolve a frame with multiple LR frames. These methods pursue higher quality of super-resolved frames, where the quality is usually measured frame by frame in e.g. PSNR. However, frame-wise quality may not reveal the consistency between frames. If an algorithm is applied to each frame independently (which is the case of most previous methods), the algorithm may cause temporal inconsistency, which can be observed as flickering. It is a natural requirement to improve both frame-wise fidelity and between-frame consistency, which are termed spatial quality and temporal quality, respectively. Then we may ask, is a method optimized for spatial quality also optimized for temporal quality? Can we optimize the two quality metrics jointly?