Paper Group AWR 441
Uncertainty-guided Continual Learning with Bayesian Neural Networks. Continual learning with hypernetworks. Uncertainty-based Continual Learning with Adaptive Regularization. ProbAct: A Probabilistic Activation Function for Deep Neural Networks. Hardware Aware Neural Network Architectures using FbNet. Hierarchically Structured Meta-learning. Improv …
Uncertainty-guided Continual Learning with Bayesian Neural Networks
Title | Uncertainty-guided Continual Learning with Bayesian Neural Networks |
Authors | Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach |
Abstract | Continual learning aims to learn new tasks without forgetting previously learned ones. This is especially challenging when one cannot access data from previous tasks and when the model has a fixed capacity. Current regularization-based continual learning algorithms need an external representation and extra computation to measure the parameters’ \textit{importance}. In contrast, we propose Uncertainty-guided Continual Bayesian Neural Networks (UCB), where the learning rate adapts according to the uncertainty defined in the probability distribution of the weights in networks. Uncertainty is a natural way to identify \textit{what to remember} and \textit{what to change} as we continually learn, and thus mitigate catastrophic forgetting. We also show a variant of our model, which uses uncertainty for weight pruning and retains task performance after pruning by saving binary masks per tasks. We evaluate our UCB approach extensively on diverse object classification datasets with short and long sequences of tasks and report superior or on-par performance compared to existing approaches. Additionally, we show that our model does not necessarily need task information at test time, i.e. it does not presume knowledge of which task a sample belongs to. |
Tasks | Continual Learning, Object Classification |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02425v2 |
https://arxiv.org/pdf/1906.02425v2.pdf | |
PWC | https://paperswithcode.com/paper/uncertainty-guided-continual-learning-with |
Repo | https://github.com/SaynaEbrahimi/BayesianContinualLearning |
Framework | pytorch |
Continual learning with hypernetworks
Title | Continual learning with hypernetworks |
Authors | Johannes von Oswald, Christian Henning, João Sacramento, Benjamin F. Grewe |
Abstract | Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. To overcome this problem, we present a novel approach based on task-conditioned hypernetworks, i.e., networks that generate the weights of a target model based on task identity. Continual learning (CL) is less difficult for this class of models thanks to a simple key feature: instead of recalling the input-output relations of all previously seen data, task-conditioned hypernetworks only require rehearsing task-specific weight realizations, which can be maintained in memory using a simple regularizer. Besides achieving state-of-the-art performance on standard CL benchmarks, additional experiments on long task sequences reveal that task-conditioned hypernetworks display a very large capacity to retain previous memories. Notably, such long memory lifetimes are achieved in a compressive regime, when the number of trainable hypernetwork weights is comparable or smaller than target network size. We provide insight into the structure of low-dimensional task embedding spaces (the input space of the hypernetwork) and show that task-conditioned hypernetworks demonstrate transfer learning. Finally, forward information transfer is further supported by empirical results on a challenging CL benchmark based on the CIFAR-10/100 image datasets. |
Tasks | Continual Learning, Transfer Learning |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.00695v3 |
https://arxiv.org/pdf/1906.00695v3.pdf | |
PWC | https://paperswithcode.com/paper/190600695 |
Repo | https://github.com/chrhenning/hypercl |
Framework | pytorch |
Uncertainty-based Continual Learning with Adaptive Regularization
Title | Uncertainty-based Continual Learning with Adaptive Regularization |
Authors | Hongjoon Ahn, Sungmin Cha, Donggyu Lee, Taesup Moon |
Abstract | We introduce a new neural network-based continual learning algorithm, dubbed as Uncertainty-regularized Continual Learning (UCL), which builds on traditional Bayesian online learning framework with variational inference. We focus on two significant drawbacks of the recently proposed regularization-based methods: a) considerable additional memory cost for determining the per-weight regularization strengths and b) the absence of gracefully forgetting scheme, which can prevent performance degradation in learning new tasks. In this paper, we show UCL can solve these two problems by introducing a fresh interpretation on the Kullback-Leibler (KL) divergence term of the variational lower bound for Gaussian mean-field approximation. Based on the interpretation, we propose the notion of node-wise uncertainty, which drastically reduces the number of additional parameters for implementing per-weight regularization. Moreover, we devise two additional regularization terms that enforce stability by freezing important parameters for past tasks and allow plasticity by controlling the actively learning parameters for a new task. Through extensive experiments, we show UCL convincingly outperforms most of recent state-of-the-art baselines not only on popular supervised learning benchmarks, but also on challenging lifelong reinforcement learning tasks. The source code of our algorithm is available at https://github.com/csm9493/UCL. |
Tasks | Continual Learning |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11614v3 |
https://arxiv.org/pdf/1905.11614v3.pdf | |
PWC | https://paperswithcode.com/paper/uncertainty-based-continual-learning-with |
Repo | https://github.com/csm9493/UCL |
Framework | pytorch |
ProbAct: A Probabilistic Activation Function for Deep Neural Networks
Title | ProbAct: A Probabilistic Activation Function for Deep Neural Networks |
Authors | Joonho Lee, Kumar Shridhar, Hideaki Hayashi, Brian Kenji Iwana, Seokjun Kang, Seiichi Uchida |
Abstract | Activation functions play an important role in the training of artificial neural networks and the Rectified Linear Unit (ReLU) has been the mainstream in recent years. Most of the activation functions currently used are deterministic in nature, whose input-output relationship is fixed. In this work, we propose a probabilistic activation function, called ProbAct. The output value of ProbAct is sampled from a normal distribution, with the mean value same as the output of ReLU and with a fixed or trainable variance for each element. In the trainable ProbAct, the variance of the activation distribution is trained through back-propagation. We also show that the stochastic perturbation through ProbAct is a viable generalization technique that can prevent overfitting. In our experiments, we demonstrate that when using ProbAct, it is possible to boost the image classification performance on CIFAR-10, CIFAR-100, and STL-10 datasets. |
Tasks | Image Classification |
Published | 2019-05-26 |
URL | https://arxiv.org/abs/1905.10761v1 |
https://arxiv.org/pdf/1905.10761v1.pdf | |
PWC | https://paperswithcode.com/paper/probact-a-probabilistic-activation-function |
Repo | https://github.com/kumar-shridhar/ProbAct-Probabilistic-Activation-Function |
Framework | pytorch |
Hardware Aware Neural Network Architectures using FbNet
Title | Hardware Aware Neural Network Architectures using FbNet |
Authors | Sai Vineeth Kalluru Srinivas, Harideep Nair, Vinay Vidyasagar |
Abstract | We implement a differentiable Neural Architecture Search (NAS) method inspired by FBNet for discovering neural networks that are heavily optimized for a particular target device. The FBNet NAS method discovers a neural network from a given search space by optimizing over a loss function which accounts for accuracy and target device latency. We extend this loss function by adding an energy term. This will potentially enhance the ``hardware awareness” and help us find a neural network architecture that is optimal in terms of accuracy, latency and energy consumption, given a target device (Raspberry Pi in our case). We name our trained child architecture obtained at the end of search process as Hardware Aware Neural Network Architecture (HANNA). We prove the efficacy of our approach by benchmarking HANNA against two other state-of-the-art neural networks designed for mobile/embedded applications, namely MobileNetv2 and CondenseNet for CIFAR-10 dataset. Our results show that HANNA provides a speedup of about 2.5x and 1.7x, and reduces energy consumption by 3.8x and 2x compared to MobileNetv2 and CondenseNet respectively. HANNA is able to provide such significant speedup and energy efficiency benefits over the state-of-the-art baselines at the cost of a tolerable 4-5% drop in accuracy. | |
Tasks | Neural Architecture Search |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.07214v1 |
https://arxiv.org/pdf/1906.07214v1.pdf | |
PWC | https://paperswithcode.com/paper/hardware-aware-neural-network-architectures |
Repo | https://github.com/hpnair/18663_Project_FBNet |
Framework | pytorch |
Hierarchically Structured Meta-learning
Title | Hierarchically Structured Meta-learning |
Authors | Huaxiu Yao, Ying Wei, Junzhou Huang, Zhenhui Li |
Abstract | In order to learn quickly with few samples, meta-learning utilizes prior knowledge learned from previous tasks. However, a critical challenge in meta-learning is task uncertainty and heterogeneity, which can not be handled via globally sharing knowledge among tasks. In this paper, based on gradient-based meta-learning, we propose a hierarchically structured meta-learning (HSML) algorithm that explicitly tailors the transferable knowledge to different clusters of tasks. Inspired by the way human beings organize knowledge, we resort to a hierarchical task clustering structure to cluster tasks. As a result, the proposed approach not only addresses the challenge via the knowledge customization to different clusters of tasks, but also preserves knowledge generalization among a cluster of similar tasks. To tackle the changing of task relationship, in addition, we extend the hierarchical structure to a continual learning environment. The experimental results show that our approach can achieve state-of-the-art performance in both toy-regression and few-shot image classification problems. |
Tasks | Continual Learning, Few-Shot Image Classification, Image Classification, Meta-Learning |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.05301v2 |
https://arxiv.org/pdf/1905.05301v2.pdf | |
PWC | https://paperswithcode.com/paper/hierarchically-structured-meta-learning |
Repo | https://github.com/huaxiuyao/HSML |
Framework | tf |
Improving and Understanding Variational Continual Learning
Title | Improving and Understanding Variational Continual Learning |
Authors | Siddharth Swaroop, Cuong V. Nguyen, Thang D. Bui, Richard E. Turner |
Abstract | In the continual learning setting, tasks are encountered sequentially. The goal is to learn whilst i) avoiding catastrophic forgetting, ii) efficiently using model capacity, and iii) employing forward and backward transfer learning. In this paper, we explore how the Variational Continual Learning (VCL) framework achieves these desiderata on two benchmarks in continual learning: split MNIST and permuted MNIST. We first report significantly improved results on what was already a competitive approach. The improvements are achieved by establishing a new best practice approach to mean-field variational Bayesian neural networks. We then look at the solutions in detail. This allows us to obtain an understanding of why VCL performs as it does, and we compare the solution to what an `ideal’ continual learning solution might be. | |
Tasks | Continual Learning, Transfer Learning |
Published | 2019-05-06 |
URL | https://arxiv.org/abs/1905.02099v1 |
https://arxiv.org/pdf/1905.02099v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-and-understanding-variational |
Repo | https://github.com/nvcuong/variational-continual-learning |
Framework | tf |
Continual Learning for Sentence Representations Using Conceptors
Title | Continual Learning for Sentence Representations Using Conceptors |
Authors | Tianlin Liu, Lyle Ungar, João Sedoc |
Abstract | Distributed representations of sentences have become ubiquitous in natural language processing tasks. In this paper, we consider a continual learning scenario for sentence representations: Given a sequence of corpora, we aim to optimize the sentence encoder with respect to the new corpus while maintaining its accuracy on the old corpora. To address this problem, we propose to initialize sentence encoders with the help of corpus-independent features, and then sequentially update sentence encoders using Boolean operations of conceptor matrices to learn corpus-dependent features. We evaluate our approach on semantic textual similarity tasks and show that our proposed sentence encoder can continually learn features from new corpora while retaining its competence on previously encountered corpora. |
Tasks | Continual Learning, Semantic Textual Similarity |
Published | 2019-04-18 |
URL | http://arxiv.org/abs/1904.09187v1 |
http://arxiv.org/pdf/1904.09187v1.pdf | |
PWC | https://paperswithcode.com/paper/continual-learning-for-sentence |
Repo | https://github.com/liutianlin0121/contSentEmbed |
Framework | none |
Unsupervised Anomaly Localization using Variational Auto-Encoders
Title | Unsupervised Anomaly Localization using Variational Auto-Encoders |
Authors | David Zimmerer, Fabian Isensee, Jens Petersen, Simon Kohl, Klaus Maier-Hein |
Abstract | An assumption-free automatic check of medical images for potentially overseen anomalies would be a valuable assistance for a radiologist. Deep learning and especially Variational Auto-Encoders (VAEs) have shown great potential in the unsupervised learning of data distributions. In principle, this allows for such a check and even the localization of parts in the image that are most suspicious. Currently, however, the reconstruction-based localization by design requires adjusting the model architecture to the specific problem looked at during evaluation. This contradicts the principle of building assumption-free models. We propose complementing the localization part with a term derived from the Kullback-Leibler (KL)-divergence. For validation, we perform a series of experiments on FashionMNIST as well as on a medical task including >1000 healthy and >250 brain tumor patients. Results show that the proposed formalism outperforms the state of the art VAE-based localization of anomalies across many hyperparameter settings and also shows a competitive max performance. |
Tasks | |
Published | 2019-07-04 |
URL | https://arxiv.org/abs/1907.02796v2 |
https://arxiv.org/pdf/1907.02796v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-anomaly-localization-using |
Repo | https://github.com/MIC-DKFZ/vae-anomaly-experiments |
Framework | pytorch |
Object Contour and Edge Detection with RefineContourNet
Title | Object Contour and Edge Detection with RefineContourNet |
Authors | Andre Peter Kelm, Vijesh Soorya Rao, Udo Zoelzer |
Abstract | A ResNet-based multi-path refinement CNN is used for object contour detection. For this task, we prioritise the effective utilization of the high-level abstraction capability of a ResNet, which leads to state-of-the-art results for edge detection. Keeping our focus in mind, we fuse the high, mid and low-level features in that specific order, which differs from many other approaches. It uses the tensor with the highest-levelled features as the starting point to combine it layer-by-layer with features of a lower abstraction level until it reaches the lowest level. We train this network on a modified PASCAL VOC 2012 dataset for object contour detection and evaluate on a refined PASCAL-val dataset reaching an excellent performance and an Optimal Dataset Scale (ODS) of 0.752. Furthermore, by fine-training on the BSDS500 dataset we reach state-of-the-art results for edge-detection with an ODS of 0.824. |
Tasks | Contour Detection, Edge Detection |
Published | 2019-04-30 |
URL | https://arxiv.org/abs/1904.13353v2 |
https://arxiv.org/pdf/1904.13353v2.pdf | |
PWC | https://paperswithcode.com/paper/object-contour-and-edge-detection-with |
Repo | https://github.com/AndreKelm/RefineContourNet |
Framework | none |
FERAtt: Facial Expression Recognition with Attention Net
Title | FERAtt: Facial Expression Recognition with Attention Net |
Authors | Pedro D. Marrero Fernandez, Fidel A. Guerrero Peña, Tsang Ing Ren, Alexandre Cunha |
Abstract | We present a new end-to-end network architecture for facial expression recognition with an attention model. It focuses attention in the human face and uses a Gaussian space representation for expression recognition. We devise this architecture based on two fundamental complementary components: (1) facial image correction and attention and (2) facial expression representation and classification. The first component uses an encoder-decoder style network and a convolutional feature extractor that are pixel-wise multiplied to obtain a feature attention map. The second component is responsible for obtaining an embedded representation and classification of the facial expression. We propose a loss function that creates a Gaussian structure on the representation space. To demonstrate the proposed method, we create two larger and more comprehensive synthetic datasets using the traditional BU3DFE and CK+ facial datasets. We compared results with the PreActResNet18 baseline. Our experiments on these datasets have shown the superiority of our approach in recognizing facial expressions. |
Tasks | Facial Expression Recognition |
Published | 2019-02-08 |
URL | http://arxiv.org/abs/1902.03284v1 |
http://arxiv.org/pdf/1902.03284v1.pdf | |
PWC | https://paperswithcode.com/paper/feratt-facial-expression-recognition-with |
Repo | https://github.com/pedrodiamel/ferattention |
Framework | pytorch |
On the Delta Method for Uncertainty Approximation in Deep Learning
Title | On the Delta Method for Uncertainty Approximation in Deep Learning |
Authors | Geir K. Nilsen, Antonella Z. Munthe-Kaas, Hans J. Skaug, Morten Brun |
Abstract | The Delta method is a well known procedure used to quantify uncertainty in statistical models. The method has previously been applied in the context of neural networks, but has not reached much popularity in deep learning because of the sheer size of the Hessian matrix. In this paper, we propose a low cost variant of the method based on an approximate eigendecomposition of the positive curvature subspace of the Hessian matrix. The method has a computational complexity of $O(KPN)$ time and $O(KP)$ space, where $K$ is the number of utilized Hessian eigenpairs, $P$ is the number of model parameters and $N$ is the number of training examples. Given that the model is $L_2$-regularized with rate $\lambda/2$, we provide a bound on the uncertainty approximation error given $K$. We show that when the smallest Hessian eigenvalue in the positive $K/2$-tail of the full spectrum, and the largest Hessian eigenvalue in the negative $K/2$-tail of the full spectrum are both approximately equal to $\lambda$, the error will be close to zero even when $K\ll P$ . We demonstrate the method by a TensorFlow implementation, and show that meaningful rankings of images based on prediction uncertainty can be obtained for a convolutional neural network based MNIST classifier. We also observe that false positives have higher prediction uncertainty than true positives. This suggests that there is supplementing information in the uncertainty measure not captured by the probability alone. |
Tasks | |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00832v1 |
https://arxiv.org/pdf/1912.00832v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-delta-method-for-uncertainty |
Repo | https://github.com/gknilsen/pyhessian |
Framework | tf |
Dense Haze: A benchmark for image dehazing with dense-haze and haze-free images
Title | Dense Haze: A benchmark for image dehazing with dense-haze and haze-free images |
Authors | Codruta O. Ancuti, Cosmin Ancuti, Mateu Sbert, Radu Timofte |
Abstract | Single image dehazing is an ill-posed problem that has recently drawn important attention. Despite the significant increase in interest shown for dehazing over the past few years, the validation of the dehazing methods remains largely unsatisfactory, due to the lack of pairs of real hazy and corresponding haze-free reference images. To address this limitation, we introduce Dense-Haze - a novel dehazing dataset. Characterized by dense and homogeneous hazy scenes, Dense-Haze contains 33 pairs of real hazy and corresponding haze-free images of various outdoor scenes. The hazy scenes have been recorded by introducing real haze, generated by professional haze machines. The hazy and haze-free corresponding scenes contain the same visual content captured under the same illumination parameters. Dense-Haze dataset aims to push significantly the state-of-the-art in single-image dehazing by promoting robust methods for real and various hazy scenes. We also provide a comprehensive qualitative and quantitative evaluation of state-of-the-art single image dehazing techniques based on the Dense-Haze dataset. Not surprisingly, our study reveals that the existing dehazing techniques perform poorly for dense homogeneous hazy scenes and that there is still much room for improvement. |
Tasks | Image Dehazing, Single Image Dehazing |
Published | 2019-04-05 |
URL | http://arxiv.org/abs/1904.02904v1 |
http://arxiv.org/pdf/1904.02904v1.pdf | |
PWC | https://paperswithcode.com/paper/dense-haze-a-benchmark-for-image-dehazing |
Repo | https://github.com/pmm09c/ntire-dehazing |
Framework | pytorch |
Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning
Title | Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning |
Authors | Eric Arazo, Diego Ortego, Paul Albert, Noel E. O’Connor, Kevin McGuinness |
Abstract | Semi-supervised learning, i.e. jointly learning from labeled and unlabeled samples, is an active research topic due to its key role on relaxing human supervision. In the context of image classification, recent advances to learn from unlabeled samples are mainly focused on consistency regularization methods that encourage invariant predictions for different perturbations of unlabeled samples. We, conversely, propose to learn from unlabeled data by generating soft pseudo-labels using the network predictions. We show that a naive pseudo-labeling overfits to incorrect pseudo-labels due to the so-called confirmation bias and demonstrate that mixup augmentation and setting a minimum number of labeled samples per mini-batch are effective regularization techniques for reducing it. The proposed approach achieves state-of-the-art results in CIFAR-10/100, SVHN, and Mini-ImageNet despite being much simpler than other methods. These results demonstrate that pseudo-labeling alone can outperform consistency regularization methods, while the opposite was supposed in previous work. Source code is available at https://git.io/fjQsC. |
Tasks | Image Classification |
Published | 2019-08-08 |
URL | https://arxiv.org/abs/1908.02983v4 |
https://arxiv.org/pdf/1908.02983v4.pdf | |
PWC | https://paperswithcode.com/paper/pseudo-labeling-and-confirmation-bias-in-deep |
Repo | https://github.com/EricArazo/PseudoLabeling |
Framework | pytorch |
Pyramid Real Image Denoising Network
Title | Pyramid Real Image Denoising Network |
Authors | Yiyun Zhao, Zhuqing Jiang, Aidong Men, Guodong Ju |
Abstract | While deep Convolutional Neural Networks (CNNs) have shown extraordinary capability of modelling specific noise and denoising, they still perform poorly on real-world noisy images. The main reason is that the real-world noise is more sophisticated and diverse. To tackle the issue of blind denoising, in this paper, we propose a novel pyramid real image denoising network (PRIDNet), which contains three stages. First, the noise estimation stage uses channel attention mechanism to recalibrate the channel importance of input noise. Second, at the multi-scale denoising stage, pyramid pooling is utilized to extract multi-scale features. Third, the stage of feature fusion adopts a kernel selecting operation to adaptively fuse multi-scale features. Experiments on two datasets of real noisy photographs demonstrate that our approach can achieve competitive performance in comparison with state-of-the-art denoisers in terms of both quantitative measure and visual perception quality. Code is available at https://github.com/491506870/PRIDNet. |
Tasks | Denoising, Image Denoising |
Published | 2019-08-01 |
URL | https://arxiv.org/abs/1908.00273v2 |
https://arxiv.org/pdf/1908.00273v2.pdf | |
PWC | https://paperswithcode.com/paper/pyramid-real-image-denoising-network |
Repo | https://github.com/491506870/PRIDNet |
Framework | tf |