January 31, 2020

3118 words 15 mins read

Paper Group AWR 441

Uncertainty-guided Continual Learning with Bayesian Neural Networks. Continual learning with hypernetworks. Uncertainty-based Continual Learning with Adaptive Regularization. ProbAct: A Probabilistic Activation Function for Deep Neural Networks. Hardware Aware Neural Network Architectures using FbNet. Hierarchically Structured Meta-learning. Improv …

Uncertainty-guided Continual Learning with Bayesian Neural Networks


Title	Uncertainty-guided Continual Learning with Bayesian Neural Networks
Authors	Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach
Abstract	Continual learning aims to learn new tasks without forgetting previously learned ones. This is especially challenging when one cannot access data from previous tasks and when the model has a fixed capacity. Current regularization-based continual learning algorithms need an external representation and extra computation to measure the parameters’ \textit{importance}. In contrast, we propose Uncertainty-guided Continual Bayesian Neural Networks (UCB), where the learning rate adapts according to the uncertainty defined in the probability distribution of the weights in networks. Uncertainty is a natural way to identify \textit{what to remember} and \textit{what to change} as we continually learn, and thus mitigate catastrophic forgetting. We also show a variant of our model, which uses uncertainty for weight pruning and retains task performance after pruning by saving binary masks per tasks. We evaluate our UCB approach extensively on diverse object classification datasets with short and long sequences of tasks and report superior or on-par performance compared to existing approaches. Additionally, we show that our model does not necessarily need task information at test time, i.e. it does not presume knowledge of which task a sample belongs to.
Tasks	Continual Learning, Object Classification
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02425v2
PDF	https://arxiv.org/pdf/1906.02425v2.pdf
PWC	https://paperswithcode.com/paper/uncertainty-guided-continual-learning-with
Repo	https://github.com/SaynaEbrahimi/BayesianContinualLearning
Framework	pytorch

Continual learning with hypernetworks


Title	Continual learning with hypernetworks
Authors	Johannes von Oswald, Christian Henning, João Sacramento, Benjamin F. Grewe
Abstract	Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. To overcome this problem, we present a novel approach based on task-conditioned hypernetworks, i.e., networks that generate the weights of a target model based on task identity. Continual learning (CL) is less difficult for this class of models thanks to a simple key feature: instead of recalling the input-output relations of all previously seen data, task-conditioned hypernetworks only require rehearsing task-specific weight realizations, which can be maintained in memory using a simple regularizer. Besides achieving state-of-the-art performance on standard CL benchmarks, additional experiments on long task sequences reveal that task-conditioned hypernetworks display a very large capacity to retain previous memories. Notably, such long memory lifetimes are achieved in a compressive regime, when the number of trainable hypernetwork weights is comparable or smaller than target network size. We provide insight into the structure of low-dimensional task embedding spaces (the input space of the hypernetwork) and show that task-conditioned hypernetworks demonstrate transfer learning. Finally, forward information transfer is further supported by empirical results on a challenging CL benchmark based on the CIFAR-10/100 image datasets.
Tasks	Continual Learning, Transfer Learning
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00695v3
PDF	https://arxiv.org/pdf/1906.00695v3.pdf
PWC	https://paperswithcode.com/paper/190600695
Repo	https://github.com/chrhenning/hypercl
Framework	pytorch

Uncertainty-based Continual Learning with Adaptive Regularization


Title	Uncertainty-based Continual Learning with Adaptive Regularization
Authors	Hongjoon Ahn, Sungmin Cha, Donggyu Lee, Taesup Moon
Abstract	We introduce a new neural network-based continual learning algorithm, dubbed as Uncertainty-regularized Continual Learning (UCL), which builds on traditional Bayesian online learning framework with variational inference. We focus on two significant drawbacks of the recently proposed regularization-based methods: a) considerable additional memory cost for determining the per-weight regularization strengths and b) the absence of gracefully forgetting scheme, which can prevent performance degradation in learning new tasks. In this paper, we show UCL can solve these two problems by introducing a fresh interpretation on the Kullback-Leibler (KL) divergence term of the variational lower bound for Gaussian mean-field approximation. Based on the interpretation, we propose the notion of node-wise uncertainty, which drastically reduces the number of additional parameters for implementing per-weight regularization. Moreover, we devise two additional regularization terms that enforce stability by freezing important parameters for past tasks and allow plasticity by controlling the actively learning parameters for a new task. Through extensive experiments, we show UCL convincingly outperforms most of recent state-of-the-art baselines not only on popular supervised learning benchmarks, but also on challenging lifelong reinforcement learning tasks. The source code of our algorithm is available at https://github.com/csm9493/UCL.
Tasks	Continual Learning
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11614v3
PDF	https://arxiv.org/pdf/1905.11614v3.pdf
PWC	https://paperswithcode.com/paper/uncertainty-based-continual-learning-with
Repo	https://github.com/csm9493/UCL
Framework	pytorch

ProbAct: A Probabilistic Activation Function for Deep Neural Networks


Title	ProbAct: A Probabilistic Activation Function for Deep Neural Networks
Authors	Joonho Lee, Kumar Shridhar, Hideaki Hayashi, Brian Kenji Iwana, Seokjun Kang, Seiichi Uchida
Abstract	Activation functions play an important role in the training of artificial neural networks and the Rectified Linear Unit (ReLU) has been the mainstream in recent years. Most of the activation functions currently used are deterministic in nature, whose input-output relationship is fixed. In this work, we propose a probabilistic activation function, called ProbAct. The output value of ProbAct is sampled from a normal distribution, with the mean value same as the output of ReLU and with a fixed or trainable variance for each element. In the trainable ProbAct, the variance of the activation distribution is trained through back-propagation. We also show that the stochastic perturbation through ProbAct is a viable generalization technique that can prevent overfitting. In our experiments, we demonstrate that when using ProbAct, it is possible to boost the image classification performance on CIFAR-10, CIFAR-100, and STL-10 datasets.
Tasks	Image Classification
Published	2019-05-26
URL	https://arxiv.org/abs/1905.10761v1
PDF	https://arxiv.org/pdf/1905.10761v1.pdf
PWC	https://paperswithcode.com/paper/probact-a-probabilistic-activation-function
Repo	https://github.com/kumar-shridhar/ProbAct-Probabilistic-Activation-Function
Framework	pytorch

Hardware Aware Neural Network Architectures using FbNet


Title	Hardware Aware Neural Network Architectures using FbNet
Authors	Sai Vineeth Kalluru Srinivas, Harideep Nair, Vinay Vidyasagar
Abstract	We implement a differentiable Neural Architecture Search (NAS) method inspired by FBNet for discovering neural networks that are heavily optimized for a particular target device. The FBNet NAS method discovers a neural network from a given search space by optimizing over a loss function which accounts for accuracy and target device latency. We extend this loss function by adding an energy term. This will potentially enhance the ``hardware awareness” and help us find a neural network architecture that is optimal in terms of accuracy, latency and energy consumption, given a target device (Raspberry Pi in our case). We name our trained child architecture obtained at the end of search process as Hardware Aware Neural Network Architecture (HANNA). We prove the efficacy of our approach by benchmarking HANNA against two other state-of-the-art neural networks designed for mobile/embedded applications, namely MobileNetv2 and CondenseNet for CIFAR-10 dataset. Our results show that HANNA provides a speedup of about 2.5x and 1.7x, and reduces energy consumption by 3.8x and 2x compared to MobileNetv2 and CondenseNet respectively. HANNA is able to provide such significant speedup and energy efficiency benefits over the state-of-the-art baselines at the cost of a tolerable 4-5% drop in accuracy. \|
Tasks	Neural Architecture Search
Published	2019-06-17
URL	https://arxiv.org/abs/1906.07214v1
PDF	https://arxiv.org/pdf/1906.07214v1.pdf
PWC	https://paperswithcode.com/paper/hardware-aware-neural-network-architectures
Repo	https://github.com/hpnair/18663_Project_FBNet
Framework	pytorch

Hierarchically Structured Meta-learning


Title	Hierarchically Structured Meta-learning
Authors	Huaxiu Yao, Ying Wei, Junzhou Huang, Zhenhui Li
Abstract	In order to learn quickly with few samples, meta-learning utilizes prior knowledge learned from previous tasks. However, a critical challenge in meta-learning is task uncertainty and heterogeneity, which can not be handled via globally sharing knowledge among tasks. In this paper, based on gradient-based meta-learning, we propose a hierarchically structured meta-learning (HSML) algorithm that explicitly tailors the transferable knowledge to different clusters of tasks. Inspired by the way human beings organize knowledge, we resort to a hierarchical task clustering structure to cluster tasks. As a result, the proposed approach not only addresses the challenge via the knowledge customization to different clusters of tasks, but also preserves knowledge generalization among a cluster of similar tasks. To tackle the changing of task relationship, in addition, we extend the hierarchical structure to a continual learning environment. The experimental results show that our approach can achieve state-of-the-art performance in both toy-regression and few-shot image classification problems.
Tasks	Continual Learning, Few-Shot Image Classification, Image Classification, Meta-Learning
Published	2019-05-13
URL	https://arxiv.org/abs/1905.05301v2
PDF	https://arxiv.org/pdf/1905.05301v2.pdf
PWC	https://paperswithcode.com/paper/hierarchically-structured-meta-learning
Repo	https://github.com/huaxiuyao/HSML
Framework	tf

Improving and Understanding Variational Continual Learning


Title	Improving and Understanding Variational Continual Learning
Authors	Siddharth Swaroop, Cuong V. Nguyen, Thang D. Bui, Richard E. Turner
Abstract	In the continual learning setting, tasks are encountered sequentially. The goal is to learn whilst i) avoiding catastrophic forgetting, ii) efficiently using model capacity, and iii) employing forward and backward transfer learning. In this paper, we explore how the Variational Continual Learning (VCL) framework achieves these desiderata on two benchmarks in continual learning: split MNIST and permuted MNIST. We first report significantly improved results on what was already a competitive approach. The improvements are achieved by establishing a new best practice approach to mean-field variational Bayesian neural networks. We then look at the solutions in detail. This allows us to obtain an understanding of why VCL performs as it does, and we compare the solution to what an `ideal’ continual learning solution might be. \|
Tasks	Continual Learning, Transfer Learning
Published	2019-05-06
URL	https://arxiv.org/abs/1905.02099v1
PDF	https://arxiv.org/pdf/1905.02099v1.pdf
PWC	https://paperswithcode.com/paper/improving-and-understanding-variational
Repo	https://github.com/nvcuong/variational-continual-learning
Framework	tf

Continual Learning for Sentence Representations Using Conceptors


Title	Continual Learning for Sentence Representations Using Conceptors
Authors	Tianlin Liu, Lyle Ungar, João Sedoc
Abstract	Distributed representations of sentences have become ubiquitous in natural language processing tasks. In this paper, we consider a continual learning scenario for sentence representations: Given a sequence of corpora, we aim to optimize the sentence encoder with respect to the new corpus while maintaining its accuracy on the old corpora. To address this problem, we propose to initialize sentence encoders with the help of corpus-independent features, and then sequentially update sentence encoders using Boolean operations of conceptor matrices to learn corpus-dependent features. We evaluate our approach on semantic textual similarity tasks and show that our proposed sentence encoder can continually learn features from new corpora while retaining its competence on previously encountered corpora.
Tasks	Continual Learning, Semantic Textual Similarity
Published	2019-04-18
URL	http://arxiv.org/abs/1904.09187v1
PDF	http://arxiv.org/pdf/1904.09187v1.pdf
PWC	https://paperswithcode.com/paper/continual-learning-for-sentence
Repo	https://github.com/liutianlin0121/contSentEmbed
Framework	none

Unsupervised Anomaly Localization using Variational Auto-Encoders


Title	Unsupervised Anomaly Localization using Variational Auto-Encoders
Authors	David Zimmerer, Fabian Isensee, Jens Petersen, Simon Kohl, Klaus Maier-Hein
Abstract	An assumption-free automatic check of medical images for potentially overseen anomalies would be a valuable assistance for a radiologist. Deep learning and especially Variational Auto-Encoders (VAEs) have shown great potential in the unsupervised learning of data distributions. In principle, this allows for such a check and even the localization of parts in the image that are most suspicious. Currently, however, the reconstruction-based localization by design requires adjusting the model architecture to the specific problem looked at during evaluation. This contradicts the principle of building assumption-free models. We propose complementing the localization part with a term derived from the Kullback-Leibler (KL)-divergence. For validation, we perform a series of experiments on FashionMNIST as well as on a medical task including >1000 healthy and >250 brain tumor patients. Results show that the proposed formalism outperforms the state of the art VAE-based localization of anomalies across many hyperparameter settings and also shows a competitive max performance.
Tasks
Published	2019-07-04
URL	https://arxiv.org/abs/1907.02796v2
PDF	https://arxiv.org/pdf/1907.02796v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-anomaly-localization-using
Repo	https://github.com/MIC-DKFZ/vae-anomaly-experiments
Framework	pytorch

Object Contour and Edge Detection with RefineContourNet


Title	Object Contour and Edge Detection with RefineContourNet
Authors	Andre Peter Kelm, Vijesh Soorya Rao, Udo Zoelzer
Abstract	A ResNet-based multi-path refinement CNN is used for object contour detection. For this task, we prioritise the effective utilization of the high-level abstraction capability of a ResNet, which leads to state-of-the-art results for edge detection. Keeping our focus in mind, we fuse the high, mid and low-level features in that specific order, which differs from many other approaches. It uses the tensor with the highest-levelled features as the starting point to combine it layer-by-layer with features of a lower abstraction level until it reaches the lowest level. We train this network on a modified PASCAL VOC 2012 dataset for object contour detection and evaluate on a refined PASCAL-val dataset reaching an excellent performance and an Optimal Dataset Scale (ODS) of 0.752. Furthermore, by fine-training on the BSDS500 dataset we reach state-of-the-art results for edge-detection with an ODS of 0.824.
Tasks	Contour Detection, Edge Detection
Published	2019-04-30
URL	https://arxiv.org/abs/1904.13353v2
PDF	https://arxiv.org/pdf/1904.13353v2.pdf
PWC	https://paperswithcode.com/paper/object-contour-and-edge-detection-with
Repo	https://github.com/AndreKelm/RefineContourNet
Framework	none

FERAtt: Facial Expression Recognition with Attention Net


Title	FERAtt: Facial Expression Recognition with Attention Net
Authors	Pedro D. Marrero Fernandez, Fidel A. Guerrero Peña, Tsang Ing Ren, Alexandre Cunha
Abstract	We present a new end-to-end network architecture for facial expression recognition with an attention model. It focuses attention in the human face and uses a Gaussian space representation for expression recognition. We devise this architecture based on two fundamental complementary components: (1) facial image correction and attention and (2) facial expression representation and classification. The first component uses an encoder-decoder style network and a convolutional feature extractor that are pixel-wise multiplied to obtain a feature attention map. The second component is responsible for obtaining an embedded representation and classification of the facial expression. We propose a loss function that creates a Gaussian structure on the representation space. To demonstrate the proposed method, we create two larger and more comprehensive synthetic datasets using the traditional BU3DFE and CK+ facial datasets. We compared results with the PreActResNet18 baseline. Our experiments on these datasets have shown the superiority of our approach in recognizing facial expressions.
Tasks	Facial Expression Recognition
Published	2019-02-08
URL	http://arxiv.org/abs/1902.03284v1
PDF	http://arxiv.org/pdf/1902.03284v1.pdf
PWC	https://paperswithcode.com/paper/feratt-facial-expression-recognition-with
Repo	https://github.com/pedrodiamel/ferattention
Framework	pytorch

On the Delta Method for Uncertainty Approximation in Deep Learning


Title	On the Delta Method for Uncertainty Approximation in Deep Learning
Authors	Geir K. Nilsen, Antonella Z. Munthe-Kaas, Hans J. Skaug, Morten Brun
Abstract	The Delta method is a well known procedure used to quantify uncertainty in statistical models. The method has previously been applied in the context of neural networks, but has not reached much popularity in deep learning because of the sheer size of the Hessian matrix. In this paper, we propose a low cost variant of the method based on an approximate eigendecomposition of the positive curvature subspace of the Hessian matrix. The method has a computational complexity of $O(KPN)$ time and $O(KP)$ space, where $K$ is the number of utilized Hessian eigenpairs, $P$ is the number of model parameters and $N$ is the number of training examples. Given that the model is $L_2$-regularized with rate $\lambda/2$, we provide a bound on the uncertainty approximation error given $K$. We show that when the smallest Hessian eigenvalue in the positive $K/2$-tail of the full spectrum, and the largest Hessian eigenvalue in the negative $K/2$-tail of the full spectrum are both approximately equal to $\lambda$, the error will be close to zero even when $K\ll P$ . We demonstrate the method by a TensorFlow implementation, and show that meaningful rankings of images based on prediction uncertainty can be obtained for a convolutional neural network based MNIST classifier. We also observe that false positives have higher prediction uncertainty than true positives. This suggests that there is supplementing information in the uncertainty measure not captured by the probability alone.
Tasks
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00832v1
PDF	https://arxiv.org/pdf/1912.00832v1.pdf
PWC	https://paperswithcode.com/paper/on-the-delta-method-for-uncertainty
Repo	https://github.com/gknilsen/pyhessian
Framework	tf

Dense Haze: A benchmark for image dehazing with dense-haze and haze-free images


Title	Dense Haze: A benchmark for image dehazing with dense-haze and haze-free images
Authors	Codruta O. Ancuti, Cosmin Ancuti, Mateu Sbert, Radu Timofte
Abstract	Single image dehazing is an ill-posed problem that has recently drawn important attention. Despite the significant increase in interest shown for dehazing over the past few years, the validation of the dehazing methods remains largely unsatisfactory, due to the lack of pairs of real hazy and corresponding haze-free reference images. To address this limitation, we introduce Dense-Haze - a novel dehazing dataset. Characterized by dense and homogeneous hazy scenes, Dense-Haze contains 33 pairs of real hazy and corresponding haze-free images of various outdoor scenes. The hazy scenes have been recorded by introducing real haze, generated by professional haze machines. The hazy and haze-free corresponding scenes contain the same visual content captured under the same illumination parameters. Dense-Haze dataset aims to push significantly the state-of-the-art in single-image dehazing by promoting robust methods for real and various hazy scenes. We also provide a comprehensive qualitative and quantitative evaluation of state-of-the-art single image dehazing techniques based on the Dense-Haze dataset. Not surprisingly, our study reveals that the existing dehazing techniques perform poorly for dense homogeneous hazy scenes and that there is still much room for improvement.
Tasks	Image Dehazing, Single Image Dehazing
Published	2019-04-05
URL	http://arxiv.org/abs/1904.02904v1
PDF	http://arxiv.org/pdf/1904.02904v1.pdf
PWC	https://paperswithcode.com/paper/dense-haze-a-benchmark-for-image-dehazing
Repo	https://github.com/pmm09c/ntire-dehazing
Framework	pytorch

Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning


Title	Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning
Authors	Eric Arazo, Diego Ortego, Paul Albert, Noel E. O’Connor, Kevin McGuinness
Abstract	Semi-supervised learning, i.e. jointly learning from labeled and unlabeled samples, is an active research topic due to its key role on relaxing human supervision. In the context of image classification, recent advances to learn from unlabeled samples are mainly focused on consistency regularization methods that encourage invariant predictions for different perturbations of unlabeled samples. We, conversely, propose to learn from unlabeled data by generating soft pseudo-labels using the network predictions. We show that a naive pseudo-labeling overfits to incorrect pseudo-labels due to the so-called confirmation bias and demonstrate that mixup augmentation and setting a minimum number of labeled samples per mini-batch are effective regularization techniques for reducing it. The proposed approach achieves state-of-the-art results in CIFAR-10/100, SVHN, and Mini-ImageNet despite being much simpler than other methods. These results demonstrate that pseudo-labeling alone can outperform consistency regularization methods, while the opposite was supposed in previous work. Source code is available at https://git.io/fjQsC.
Tasks	Image Classification
Published	2019-08-08
URL	https://arxiv.org/abs/1908.02983v4
PDF	https://arxiv.org/pdf/1908.02983v4.pdf
PWC	https://paperswithcode.com/paper/pseudo-labeling-and-confirmation-bias-in-deep
Repo	https://github.com/EricArazo/PseudoLabeling
Framework	pytorch

Pyramid Real Image Denoising Network


Title	Pyramid Real Image Denoising Network
Authors	Yiyun Zhao, Zhuqing Jiang, Aidong Men, Guodong Ju
Abstract	While deep Convolutional Neural Networks (CNNs) have shown extraordinary capability of modelling specific noise and denoising, they still perform poorly on real-world noisy images. The main reason is that the real-world noise is more sophisticated and diverse. To tackle the issue of blind denoising, in this paper, we propose a novel pyramid real image denoising network (PRIDNet), which contains three stages. First, the noise estimation stage uses channel attention mechanism to recalibrate the channel importance of input noise. Second, at the multi-scale denoising stage, pyramid pooling is utilized to extract multi-scale features. Third, the stage of feature fusion adopts a kernel selecting operation to adaptively fuse multi-scale features. Experiments on two datasets of real noisy photographs demonstrate that our approach can achieve competitive performance in comparison with state-of-the-art denoisers in terms of both quantitative measure and visual perception quality. Code is available at https://github.com/491506870/PRIDNet.
Tasks	Denoising, Image Denoising
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00273v2
PDF	https://arxiv.org/pdf/1908.00273v2.pdf
PWC	https://paperswithcode.com/paper/pyramid-real-image-denoising-network
Repo	https://github.com/491506870/PRIDNet
Framework	tf