April 1, 2020

3440 words 17 mins read

Paper Group ANR 400

Paper Group ANR 400

Contextual Policy Reuse using Deep Mixture Models. Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images. How Does BN Increase Collapsed Neural Network Filters?. Quantum noise protects quantum classifiers against adversaries. Adaptive Gradient Sparsification for Efficient Federated Learning: An Online Learning Approac …

Contextual Policy Reuse using Deep Mixture Models

Title Contextual Policy Reuse using Deep Mixture Models
Authors Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee
Abstract Reinforcement learning methods that consider the context, or current state, when selecting source policies for transfer have been shown to outperform context-free approaches. However, existing work typically tailors the approach to a specific learning algorithm such as Q-learning, and it is often difficult to interpret and validate the knowledge transferred between tasks. In this paper, we assume knowledge of estimated source task dynamics and policies, and common goals between tasks. We introduce a novel deep mixture model formulation for learning a state-dependent prior over source task dynamics that matches the target dynamics using only state trajectories obtained while learning the target policy. The mixture model is easy to train and interpret, is compatible with most reinforcement learning algorithms, and complements existing work by leveraging knowledge of source dynamics rather than Q-values. We then show how the trained mixture model can be incorporated into standard policy reuse frameworks, and demonstrate its effectiveness on benchmarks from OpenAI-Gym.
Tasks Q-Learning
Published 2020-02-29
URL https://arxiv.org/abs/2003.00203v1
PDF https://arxiv.org/pdf/2003.00203v1.pdf
PWC https://paperswithcode.com/paper/contextual-policy-reuse-using-deep-mixture
Repo
Framework

Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

Title Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images
Authors Hang Zhou, Jihao Liu, Ziwei Liu, Yu Liu, Xiaogang Wang
Abstract Though face rotation has achieved rapid progress in recent years, the lack of high-quality paired training data remains a great hurdle for existing methods. The current generative models heavily rely on datasets with multi-view images of the same person. Thus, their generated results are restricted by the scale and domain of the data source. To overcome these challenges, we propose a novel unsupervised framework that can synthesize photo-realistic rotated faces using only single-view image collections in the wild. Our key insight is that rotating faces in the 3D space back and forth, and re-rendering them to the 2D plane can serve as a strong self-supervision. We leverage the recent advances in 3D face modeling and high-resolution GAN to constitute our building blocks. Since the 3D rotation-and-render on faces can be applied to arbitrary angles without losing details, our approach is extremely suitable for in-the-wild scenarios (i.e. no paired data are available), where existing methods fall short. Extensive experiments demonstrate that our approach has superior synthesis quality as well as identity preservation over the state-of-the-art methods, across a wide range of poses and domains. Furthermore, we validate that our rotate-and-render framework naturally can act as an effective data augmentation engine for boosting modern face recognition systems even on strong baseline models.
Tasks Data Augmentation, Face Recognition
Published 2020-03-18
URL https://arxiv.org/abs/2003.08124v1
PDF https://arxiv.org/pdf/2003.08124v1.pdf
PWC https://paperswithcode.com/paper/rotate-and-render-unsupervised-photorealistic
Repo
Framework

How Does BN Increase Collapsed Neural Network Filters?

Title How Does BN Increase Collapsed Neural Network Filters?
Authors Sheng Zhou, Xinjiang Wang, Ping Luo, Litong Feng, Wenjie Li, Wei Zhang
Abstract Improving sparsity of deep neural networks (DNNs) is essential for network compression and has drawn much attention. In this work, we disclose a harmful sparsifying process called filter collapse, which is common in DNNs with batch normalization (BN) and rectified linear activation functions (e.g. ReLU, Leaky ReLU). It occurs even without explicit sparsity-inducing regularizations such as $L_1$. This phenomenon is caused by the normalization effect of BN, which induces a non-trainable region in the parameter space and reduces the network capacity as a result. This phenomenon becomes more prominent when the network is trained with large learning rates (LR) or adaptive LR schedulers, and when the network is finetuned. We analytically prove that the parameters of BN tend to become sparser during SGD updates with high gradient noise and that the sparsifying probability is proportional to the square of learning rate and inversely proportional to the square of the scale parameter of BN. To prevent the undesirable collapsed filters, we propose a simple yet effective approach named post-shifted BN (psBN), which has the same representation ability as BN while being able to automatically make BN parameters trainable again as they saturate during training. With psBN, we can recover collapsed filters and increase the model performance in various tasks such as classification on CIFAR-10 and object detection on MS-COCO2017.
Tasks Object Detection
Published 2020-01-30
URL https://arxiv.org/abs/2001.11216v2
PDF https://arxiv.org/pdf/2001.11216v2.pdf
PWC https://paperswithcode.com/paper/how-does-bn-increase-collapsed-neural-network
Repo
Framework

Quantum noise protects quantum classifiers against adversaries

Title Quantum noise protects quantum classifiers against adversaries
Authors Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Dacheng Tao, Nana Liu
Abstract Noise in quantum information processing is often viewed as a disruptive and difficult-to-avoid feature, especially in near-term quantum technologies. However, noise has often played beneficial roles, from enhancing weak signals in stochastic resonance to protecting the privacy of data in differential privacy. It is then natural to ask, can we harness the power of quantum noise that is beneficial to quantum computing? An important current direction for quantum computing is its application to machine learning, such as classification problems. One outstanding problem in machine learning for classification is its sensitivity to adversarial examples. These are small, undetectable perturbations from the original data where the perturbed data is completely misclassified in otherwise extremely accurate classifiers. They can also be considered as `worst-case’ perturbations by unknown noise sources. We show that by taking advantage of depolarisation noise in quantum circuits for classification, a robustness bound against adversaries can be derived where the robustness improves with increasing noise. This robustness property is intimately connected with an important security concept called differential privacy which can be extended to quantum differential privacy. For the protection of quantum data, this is the first quantum protocol that can be used against the most general adversaries. Furthermore, we show how the robustness in the classical case can be sensitive to the details of the classification model, but in the quantum case the details of classification model are absent, thus also providing a potential quantum advantage for classical data that is independent of quantum speedups. This opens the opportunity to explore other ways in which quantum noise can be used in our favour, as well as identifying other ways quantum algorithms can be helpful that is independent of quantum speedups. |
Tasks
Published 2020-03-20
URL https://arxiv.org/abs/2003.09416v1
PDF https://arxiv.org/pdf/2003.09416v1.pdf
PWC https://paperswithcode.com/paper/quantum-noise-protects-quantum-classifiers
Repo
Framework

Adaptive Gradient Sparsification for Efficient Federated Learning: An Online Learning Approach

Title Adaptive Gradient Sparsification for Efficient Federated Learning: An Online Learning Approach
Authors Pengchao Han, Shiqiang Wang, Kin K. Leung
Abstract Federated learning (FL) is an emerging technique for training machine learning models using geographically dispersed data collected by local entities. It includes local computation and synchronization steps. To reduce the communication overhead and improve the overall efficiency of FL, gradient sparsification (GS) can be applied, where instead of the full gradient, only a small subset of important elements of the gradient is communicated. Existing work on GS uses a fixed degree of gradient sparsity for i.i.d.-distributed data within a datacenter. In this paper, we consider adaptive degree of sparsity and non-i.i.d. local datasets. We first present a fairness-aware GS method which ensures that different clients provide a similar amount of updates. Then, with the goal of minimizing the overall training time, we propose a novel online learning formulation and algorithm for automatically determining the near-optimal communication and computation trade-off that is controlled by the degree of gradient sparsity. The online learning algorithm uses an estimated sign of the derivative of the objective function, which gives a regret bound that is asymptotically equal to the case where exact derivative is available. Experiments with real datasets confirm the benefits of our proposed approaches, showing up to $40%$ improvement in model accuracy for a finite training time.
Tasks
Published 2020-01-14
URL https://arxiv.org/abs/2001.04756v3
PDF https://arxiv.org/pdf/2001.04756v3.pdf
PWC https://paperswithcode.com/paper/adaptive-gradient-sparsification-for
Repo
Framework

HEMlets PoSh: Learning Part-Centric Heatmap Triplets for 3D Human Pose and Shape Estimation

Title HEMlets PoSh: Learning Part-Centric Heatmap Triplets for 3D Human Pose and Shape Estimation
Authors Kun Zhou, Xiaoguang Han, Nianjuan Jiang, Kui Jia, Jiangbo Lu
Abstract Estimating 3D human pose from a single image is a challenging task. This work attempts to address the uncertainty of lifting the detected 2D joints to the 3D space by introducing an intermediate state-Part-Centric Heatmap Triplets (HEMlets), which shortens the gap between the 2D observation and the 3D interpretation. The HEMlets utilize three joint-heatmaps to represent the relative depth information of the end-joints for each skeletal body part. In our approach, a Convolutional Network (ConvNet) is first trained to predict HEMlets from the input image, followed by a volumetric joint-heatmap regression. We leverage on the integral operation to extract the joint locations from the volumetric heatmaps, guaranteeing end-to-end learning. Despite the simplicity of the network design, the quantitative comparisons show a significant performance improvement over the best-of-grade methods (e.g. $20%$ on Human3.6M). The proposed method naturally supports training with “in-the-wild” images, where only weakly-annotated relative depth information of skeletal joints is available. This further improves the generalization ability of our model, as validated by qualitative comparisons on outdoor images. Leveraging the strength of the HEMlets pose estimation, we further design and append a shallow yet effective network module to regress the SMPL parameters of the body pose and shape. We term the entire HEMlets-based human pose and shape recovery pipeline HEMlets PoSh. Extensive quantitative and qualitative experiments on the existing human body recovery benchmarks justify the state-of-the-art results obtained with our HEMlets PoSh approach.
Tasks Pose Estimation
Published 2020-03-10
URL https://arxiv.org/abs/2003.04894v1
PDF https://arxiv.org/pdf/2003.04894v1.pdf
PWC https://paperswithcode.com/paper/hemlets-posh-learning-part-centric-heatmap
Repo
Framework

Modeling the Background for Incremental Learning in Semantic Segmentation

Title Modeling the Background for Incremental Learning in Semantic Segmentation
Authors Fabio Cermelli, Massimiliano Mancini, Samuel Rota Bulò, Elisa Ricci, Barbara Caputo
Abstract Despite their effectiveness in a wide range of tasks, deep architectures suffer from some important limitations. In particular, they are vulnerable to catastrophic forgetting, i.e. they perform poorly when they are required to update their model as new classes are available but the original training set is not retained. This paper addresses this problem in the context of semantic segmentation. Current strategies fail on this task because they do not consider a peculiar aspect of semantic segmentation: since each training step provides annotation only for a subset of all possible classes, pixels of the background class (i.e. pixels that do not belong to any other classes) exhibit a semantic distribution shift. In this work we revisit classical incremental learning methods, proposing a new distillation-based framework which explicitly accounts for this shift. Furthermore, we introduce a novel strategy to initialize classifier’s parameters, thus preventing biased predictions toward the background class. We demonstrate the effectiveness of our approach with an extensive evaluation on the Pascal-VOC 2012 and ADE20K datasets, significantly outperforming state of the art incremental learning methods.
Tasks Semantic Segmentation
Published 2020-02-03
URL https://arxiv.org/abs/2002.00718v2
PDF https://arxiv.org/pdf/2002.00718v2.pdf
PWC https://paperswithcode.com/paper/modeling-the-background-for-incremental
Repo
Framework

Single-Shot Pose Estimation of Surgical Robot Instruments’ Shafts from Monocular Endoscopic Images

Title Single-Shot Pose Estimation of Surgical Robot Instruments’ Shafts from Monocular Endoscopic Images
Authors Masakazu Yoshimura, Murilo M. Marinho, Kanako Harada, Mamoru Mitsuishi
Abstract Surgical robots are used to perform minimally invasive surgery and alleviate much of the burden imposed on surgeons. Our group has developed a surgical robot to aid in the removal of tumors at the base of the skull via access through the nostrils. To avoid injuring the patients, a collision-avoidance algorithm that depends on having an accurate model for the poses of the instruments’ shafts is used. Given that the model’s parameters can change over time owing to interactions between instruments and other disturbances, the online estimation of the poses of the instrument’s shaft is essential. In this work, we propose a new method to estimate the pose of the surgical instruments’ shafts using a monocular endoscope. Our method is based on the use of an automatically annotated training dataset and an improved pose-estimation deep-learning architecture. In preliminary experiments, we show that our method can surpass state of the art vision-based marker-less pose estimation techniques (providing an error decrease of 55% in position estimation, 64% in pitch, and 69% in yaw) by using artificial images.
Tasks Pose Estimation
Published 2020-03-03
URL https://arxiv.org/abs/2003.01267v1
PDF https://arxiv.org/pdf/2003.01267v1.pdf
PWC https://paperswithcode.com/paper/single-shot-pose-estimation-of-surgical-robot
Repo
Framework

Numerical Sequence Prediction using Bayesian Concept Learning

Title Numerical Sequence Prediction using Bayesian Concept Learning
Authors Mohith Damarapati, Inavamsi B. Enaganti, Alfred Ajay Aureate Rajakumar
Abstract When people learn mathematical patterns or sequences, they are able to identify the concepts (or rules) underlying those patterns. Having learned the underlying concepts, humans are also able to generalize those concepts to other numbers, so far as to even identify previously unseen combinations of those rules. Current state-of-the art RNN architectures like LSTMs perform well in predicting successive elements of sequential data, but require vast amounts of training examples. Even with extensive data, these models struggle to generalize concepts. From our behavioral study, we also found that humans are able to disregard noise and identify the underlying rules generating the corrupted sequences. We therefore propose a Bayesian model that captures these human-like learning capabilities to predict next number in a given sequence, better than traditional LSTMs.
Tasks
Published 2020-01-13
URL https://arxiv.org/abs/2001.04072v1
PDF https://arxiv.org/pdf/2001.04072v1.pdf
PWC https://paperswithcode.com/paper/numerical-sequence-prediction-using-bayesian
Repo
Framework

Provable Meta-Learning of Linear Representations

Title Provable Meta-Learning of Linear Representations
Authors Nilesh Tripuraneni, Chi Jin, Michael I. Jordan
Abstract Meta-learning, or learning-to-learn, seeks to design algorithms that can utilize previous experience to rapidly learn new skills or adapt to new environments. Representation learning—a key tool for performing meta-learning—learns a data representation that can transfer knowledge across multiple tasks, which is essential in regimes where data is scarce. Despite a recent surge of interest in the practice of meta-learning, the theoretical underpinnings of meta-learning algorithms are lacking, especially in the context of learning transferable representations. In this paper, we focus on the problem of multi-task linear regression—in which multiple linear regression models share a common, low-dimensional linear representation. Here, we provide provably fast, sample-efficient algorithms to address the dual challenges of (1) learning a common set of features from multiple, related tasks, and (2) transferring this knowledge to new, unseen tasks. Both are central to the general problem of meta-learning. Finally, we complement these results by providing information-theoretic lower bounds on the sample complexity of learning these linear features, showing that our algorithms are optimal up to logarithmic factors.
Tasks Meta-Learning, Representation Learning
Published 2020-02-26
URL https://arxiv.org/abs/2002.11684v1
PDF https://arxiv.org/pdf/2002.11684v1.pdf
PWC https://paperswithcode.com/paper/provable-meta-learning-of-linear
Repo
Framework

Effectively Trainable Semi-Quantum Restricted Boltzmann Machine

Title Effectively Trainable Semi-Quantum Restricted Boltzmann Machine
Authors Ya. S. Lyakhova, E. A. Polyakov, A. N. Rubtsov
Abstract We propose a novel quantum model for the restricted Boltzmann machine (RBM), in which the visible units remain classical whereas the hidden units are quantized as noninteracting fermions. The free motion of the fermions is parametrically coupled to the classical signal of the visible units. This model possesses a quantum behaviour such as coherences between the hidden units. Numerical experiments show that this fact makes it more powerful than the classical RBM with the same number of hidden units. At the same time, a significant advantage of the proposed model over the other approaches to the Quantum Boltzmann Machine (QBM) is that it is exactly solvable and efficiently trainable on a classical computer: there is a closed expression for the log-likelihood gradient with respect to its parameters. This fact makes it interesting not only as a model of a hypothetical quantum simulator, but also as a quantum-inspired classical machine-learning algorithm.
Tasks
Published 2020-01-24
URL https://arxiv.org/abs/2001.08997v3
PDF https://arxiv.org/pdf/2001.08997v3.pdf
PWC https://paperswithcode.com/paper/effectively-trainable-semi-quantum-restricted
Repo
Framework

Scene Text Recognition With Finer Grid Rectification

Title Scene Text Recognition With Finer Grid Rectification
Authors Gang Wang
Abstract Scene Text Recognition is a challenging problem because of irregular styles and various distortions. This paper proposed an end-to-end trainable model consists of a finer rectification module and a bidirectional attentional recognition network(Firbarn). The rectification module adopts finer grid to rectify the distorted input image and the bidirectional decoder contains only one decoding layer instead of two separated one. Firbarn can be trained in a weak supervised way, only requiring the scene text images and the corresponding word labels. With the flexible rectification and the novel bidirectional decoder, the results of extensive evaluation on the standard benchmarks show Firbarn outperforms previous works, especially on irregular datasets.
Tasks Scene Text Recognition
Published 2020-01-26
URL https://arxiv.org/abs/2001.09389v1
PDF https://arxiv.org/pdf/2001.09389v1.pdf
PWC https://paperswithcode.com/paper/scene-text-recognition-with-finer-grid
Repo
Framework

Déjà vu: A Contextualized Temporal Attention Mechanism for Sequential Recommendation

Title Déjà vu: A Contextualized Temporal Attention Mechanism for Sequential Recommendation
Authors Jibang Wu, Renqin Cai, Hongning Wang
Abstract Predicting users’ preferences based on their sequential behaviors in history is challenging and crucial for modern recommender systems. Most existing sequential recommendation algorithms focus on transitional structure among the sequential actions, but largely ignore the temporal and context information, when modeling the influence of a historical event to current prediction. In this paper, we argue that the influence from the past events on a user’s current action should vary over the course of time and under different context. Thus, we propose a Contextualized Temporal Attention Mechanism that learns to weigh historical actions’ influence on not only what action it is, but also when and how the action took place. More specifically, to dynamically calibrate the relative input dependence from the self-attention mechanism, we deploy multiple parameterized kernel functions to learn various temporal dynamics, and then use the context information to determine which of these reweighing kernels to follow for each input. In empirical evaluations on two large public recommendation datasets, our model consistently outperformed an extensive set of state-of-the-art sequential recommendation methods.
Tasks Recommendation Systems
Published 2020-01-29
URL https://arxiv.org/abs/2002.00741v1
PDF https://arxiv.org/pdf/2002.00741v1.pdf
PWC https://paperswithcode.com/paper/deja-vu-a-contextualized-temporal-attention
Repo
Framework

Joint Deep Learning of Facial Expression Synthesis and Recognition

Title Joint Deep Learning of Facial Expression Synthesis and Recognition
Authors Yan Yan, Ying Huang, Si Chen, Chunhua Shen, Hanzi Wang
Abstract Recently, deep learning based facial expression recognition (FER) methods have attracted considerable attention and they usually require large-scale labelled training data. Nonetheless, the publicly available facial expression databases typically contain a small amount of labelled data. In this paper, to overcome the above issue, we propose a novel joint deep learning of facial expression synthesis and recognition method for effective FER. More specifically, the proposed method involves a two-stage learning procedure. Firstly, a facial expression synthesis generative adversarial network (FESGAN) is pre-trained to generate facial images with different facial expressions. To increase the diversity of the training images, FESGAN is elaborately designed to generate images with new identities from a prior distribution. Secondly, an expression recognition network is jointly learned with the pre-trained FESGAN in a unified framework. In particular, the classification loss computed from the recognition network is used to simultaneously optimize the performance of both the recognition network and the generator of FESGAN. Moreover, in order to alleviate the problem of data bias between the real images and the synthetic images, we propose an intra-class loss with a novel real data-guided back-propagation (RDBP) algorithm to reduce the intra-class variations of images from the same class, which can significantly improve the final performance. Extensive experimental results on public facial expression databases demonstrate the superiority of the proposed method compared with several state-of-the-art FER methods.
Tasks Facial Expression Recognition
Published 2020-02-06
URL https://arxiv.org/abs/2002.02194v1
PDF https://arxiv.org/pdf/2002.02194v1.pdf
PWC https://paperswithcode.com/paper/joint-deep-learning-of-facial-expression
Repo
Framework

Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization

Title Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization
Authors Geoffrey Négiar, Gideon Dresdner, Alicia Tsai, Laurent El Ghaoui, Francesco Locatello, Fabian Pedregosa
Abstract We propose a novel Stochastic Frank-Wolfe (a.k.a. Conditional Gradient) algorithm with a fixed batch size tailored to the constrained optimization of a finite sum of smooth objectives. The design of our method hinges on a primal-dual interpretation of the Frank-Wolfe algorithm. Recent work to design stochastic variants of the Frank-Wolfe algorithm falls into two categories: algorithms with increasing batch size, and algorithms with a given, constant, batch size. The former have faster convergence rates but are impractical; the latter are practical but slower. The proposed method combines the advantages of both: it converges for unit batch size, and has faster theoretical worst-case rates than previous unit batch size algorithms. Our experiments also show faster empirical convergence than previous unit batch size methods for several tasks. Finally, we construct a stochastic estimator of the Frank-Wolfe gap. It allows us to bound the true Frank-Wolfe gap, which in the convex setting bounds the primal-dual gap in the convex case while in general is a measure of stationarity. Our gap estimator can therefore be used as a practical stopping criterion in all cases.
Tasks
Published 2020-02-27
URL https://arxiv.org/abs/2002.11860v2
PDF https://arxiv.org/pdf/2002.11860v2.pdf
PWC https://paperswithcode.com/paper/stochastic-frank-wolfe-for-constrained-finite
Repo
Framework
comments powered by Disqus