Paper Group AWR 288
Light-Weight RefineNet for Real-Time Semantic Segmentation. ResNet with one-neuron hidden layers is a Universal Approximator. Constraining Effective Field Theories with Machine Learning. SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient. Uncovering divergent linguistic information in word embeddings w …
Light-Weight RefineNet for Real-Time Semantic Segmentation
Title | Light-Weight RefineNet for Real-Time Semantic Segmentation |
Authors | Vladimir Nekrasov, Chunhua Shen, Ian Reid |
Abstract | We consider an important task of effective and efficient semantic image segmentation. In particular, we adapt a powerful semantic segmentation architecture, called RefineNet, into the more compact one, suitable even for tasks requiring real-time performance on high-resolution inputs. To this end, we identify computationally expensive blocks in the original setup, and propose two modifications aimed to decrease the number of parameters and floating point operations. By doing that, we achieve more than twofold model reduction, while keeping the performance levels almost intact. Our fastest model undergoes a significant speed-up boost from 20 FPS to 55 FPS on a generic GPU card on 512x512 inputs with solid 81.1% mean iou performance on the test set of PASCAL VOC, while our slowest model with 32 FPS (from original 17 FPS) shows 82.7% mean iou on the same dataset. Alternatively, we showcase that our approach is easily mixable with light-weight classification networks: we attain 79.2% mean iou on PASCAL VOC using a model that contains only 3.3M parameters and performs only 9.3B floating point operations. |
Tasks | Real-Time Semantic Segmentation, Semantic Segmentation |
Published | 2018-10-08 |
URL | http://arxiv.org/abs/1810.03272v1 |
http://arxiv.org/pdf/1810.03272v1.pdf | |
PWC | https://paperswithcode.com/paper/light-weight-refinenet-for-real-time-semantic |
Repo | https://github.com/DrSleep/light-weight-refinenet |
Framework | pytorch |
ResNet with one-neuron hidden layers is a Universal Approximator
Title | ResNet with one-neuron hidden layers is a Universal Approximator |
Authors | Hongzhou Lin, Stefanie Jegelka |
Abstract | We demonstrate that a very deep ResNet with stacked modules with one neuron per hidden layer and ReLU activation functions can uniformly approximate any Lebesgue integrable function in $d$ dimensions, i.e. $\ell_1(\mathbb{R}^d)$. Because of the identity mapping inherent to ResNets, our network has alternating layers of dimension one and $d$. This stands in sharp contrast to fully connected networks, which are not universal approximators if their width is the input dimension $d$ [Lu et al, 2017; Hanin and Sellke, 2017]. Hence, our result implies an increase in representational power for narrow deep networks by the ResNet architecture. |
Tasks | |
Published | 2018-06-28 |
URL | http://arxiv.org/abs/1806.10909v2 |
http://arxiv.org/pdf/1806.10909v2.pdf | |
PWC | https://paperswithcode.com/paper/resnet-with-one-neuron-hidden-layers-is-a |
Repo | https://github.com/sivakon/resnet-approximator |
Framework | none |
Constraining Effective Field Theories with Machine Learning
Title | Constraining Effective Field Theories with Machine Learning |
Authors | Johann Brehmer, Kyle Cranmer, Gilles Louppe, Juan Pavez |
Abstract | We present powerful new analysis techniques to constrain effective field theories at the LHC. By leveraging the structure of particle physics processes, we extract extra information from Monte-Carlo simulations, which can be used to train neural network models that estimate the likelihood ratio. These methods scale well to processes with many observables and theory parameters, do not require any approximations of the parton shower or detector response, and can be evaluated in microseconds. We show that they allow us to put significantly stronger bounds on dimension-six operators than existing methods, demonstrating their potential to improve the precision of the LHC legacy constraints. |
Tasks | |
Published | 2018-04-30 |
URL | http://arxiv.org/abs/1805.00013v4 |
http://arxiv.org/pdf/1805.00013v4.pdf | |
PWC | https://paperswithcode.com/paper/constraining-effective-field-theories-with |
Repo | https://github.com/johannbrehmer/simulator-mining-example |
Framework | none |
SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient
Title | SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient |
Authors | Aaron Mishkin, Frederik Kunstner, Didrik Nielsen, Mark Schmidt, Mohammad Emtiyaz Khan |
Abstract | Uncertainty estimation in large deep-learning models is a computationally challenging task, where it is difficult to form even a Gaussian approximation to the posterior distribution. In such situations, existing methods usually resort to a diagonal approximation of the covariance matrix despite, the fact that these matrices are known to result in poor uncertainty estimates. To address this issue, we propose a new stochastic, low-rank, approximate natural-gradient (SLANG) method for variational inference in large, deep models. Our method estimates a “diagonal plus low-rank” structure based solely on back-propagated gradients of the network log-likelihood. This requires strictly less gradient computations than methods that compute the gradient of the whole variational objective. Empirical evaluations on standard benchmarks confirm that SLANG enables faster and more accurate estimation of uncertainty than mean-field methods, and performs comparably to state-of-the-art methods. |
Tasks | |
Published | 2018-11-11 |
URL | http://arxiv.org/abs/1811.04504v2 |
http://arxiv.org/pdf/1811.04504v2.pdf | |
PWC | https://paperswithcode.com/paper/slang-fast-structured-covariance |
Repo | https://github.com/lamantinushka/StructuredCovariance |
Framework | pytorch |
Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation
Title | Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation |
Authors | Mikel Artetxe, Gorka Labaka, Iñigo Lopez-Gazpio, Eneko Agirre |
Abstract | Following the recent success of word embeddings, it has been argued that there is no such thing as an ideal representation for words, as different models tend to capture divergent and often mutually incompatible aspects like semantics/syntax and similarity/relatedness. In this paper, we show that each embedding model captures more information than directly apparent. A linear transformation that adjusts the similarity order of the model without any external resource can tailor it to achieve better results in those aspects, providing a new perspective on how embeddings encode divergent linguistic information. In addition, we explore the relation between intrinsic and extrinsic evaluation, as the effect of our transformations in downstream tasks is higher for unsupervised systems than for supervised ones. |
Tasks | Word Embeddings |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.02094v1 |
http://arxiv.org/pdf/1809.02094v1.pdf | |
PWC | https://paperswithcode.com/paper/uncovering-divergent-linguistic-information |
Repo | https://github.com/lgazpio/DAM_STS |
Framework | pytorch |
Effects of Degradations on Deep Neural Network Architectures
Title | Effects of Degradations on Deep Neural Network Architectures |
Authors | Prasun Roy, Subhankar Ghosh, Saumik Bhattacharya, Umapada Pal |
Abstract | Recently, image classification methods based on capsules (groups of neurons) and a novel dynamic routing protocol are proposed. The methods show promising performances than the state-of-the-art CNN-based models in some of the existing datasets. However, the behavior of capsule-based models and CNN-based models are largely unknown in presence of noise. So it is important to study the performance of these models under various noises. In this paper, we demonstrate the effect of image degradations on deep neural network architectures for image classification task. We select six widely used CNN architectures to analyse their performances for image classification task on datasets of various distortions. Our work has three main contributions: 1) we observe the effects of degradations on different CNN models; 2) accordingly, we propose a network setup that can enhance the robustness of any CNN architecture for certain degradations, and 3) we propose a new capsule network that achieves high recognition accuracy. To the best of our knowledge, this is the first study on the performance of CapsuleNet (CapsNet) and other state-of-the-art CNN architectures under different types of image degradations. Also, our datasets and source code are available publicly to the researchers. |
Tasks | Image Classification |
Published | 2018-07-26 |
URL | https://arxiv.org/abs/1807.10108v4 |
https://arxiv.org/pdf/1807.10108v4.pdf | |
PWC | https://paperswithcode.com/paper/effects-of-degradations-on-deep-neural |
Repo | https://github.com/prasunroy/cnn-on-degraded-images |
Framework | tf |
Parkinson’s Disease Assessment from a Wrist-Worn Wearable Sensor in Free-Living Conditions: Deep Ensemble Learning and Visualization
Title | Parkinson’s Disease Assessment from a Wrist-Worn Wearable Sensor in Free-Living Conditions: Deep Ensemble Learning and Visualization |
Authors | Terry Taewoong Um, Franz Michael Josef Pfister, Daniel Christian Pichler, Satoshi Endo, Muriel Lang, Sandra Hirche, Urban Fietzek, Dana Kulić |
Abstract | Parkinson’s Disease (PD) is characterized by disorders in motor function such as freezing of gait, rest tremor, rigidity, and slowed and hyposcaled movements. Medication with dopaminergic medication may alleviate those motor symptoms, however, side-effects may include uncontrolled movements, known as dyskinesia. In this paper, an automatic PD motor-state assessment in free-living conditions is proposed using an accelerometer in a wrist-worn wearable sensor. In particular, an ensemble of convolutional neural networks (CNNs) is applied to capture the large variability of daily-living activities and overcome the dissimilarity between training and test patients due to the inter-patient variability. In addition, class activation map (CAM), a visualization technique for CNNs, is applied for providing an interpretation of the results. |
Tasks | |
Published | 2018-08-08 |
URL | http://arxiv.org/abs/1808.02870v1 |
http://arxiv.org/pdf/1808.02870v1.pdf | |
PWC | https://paperswithcode.com/paper/parkinsons-disease-assessment-from-a-wrist |
Repo | https://github.com/terryum/Deep_Ensemble_CNN_for_Imbalance_Labels |
Framework | none |
SCUT-FBP5500: A Diverse Benchmark Dataset for Multi-Paradigm Facial Beauty Prediction
Title | SCUT-FBP5500: A Diverse Benchmark Dataset for Multi-Paradigm Facial Beauty Prediction |
Authors | Lingyu Liang, Luojun Lin, Lianwen Jin, Duorui Xie, Mengru Li |
Abstract | Facial beauty prediction (FBP) is a significant visual recognition problem to make assessment of facial attractiveness that is consistent to human perception. To tackle this problem, various data-driven models, especially state-of-the-art deep learning techniques, were introduced, and benchmark dataset become one of the essential elements to achieve FBP. Previous works have formulated the recognition of facial beauty as a specific supervised learning problem of classification, regression or ranking, which indicates that FBP is intrinsically a computation problem with multiple paradigms. However, most of FBP benchmark datasets were built under specific computation constrains, which limits the performance and flexibility of the computational model trained on the dataset. In this paper, we argue that FBP is a multi-paradigm computation problem, and propose a new diverse benchmark dataset, called SCUT-FBP5500, to achieve multi-paradigm facial beauty prediction. The SCUT-FBP5500 dataset has totally 5500 frontal faces with diverse properties (male/female, Asian/Caucasian, ages) and diverse labels (face landmarks, beauty scores within [1,~5], beauty score distribution), which allows different computational models with different FBP paradigms, such as appearance-based/shape-based facial beauty classification/regression model for male/female of Asian/Caucasian. We evaluated the SCUT-FBP5500 dataset for FBP using different combinations of feature and predictor, and various deep learning methods. The results indicates the improvement of FBP and the potential applications based on the SCUT-FBP5500. |
Tasks | Facial Beauty Prediction |
Published | 2018-01-19 |
URL | http://arxiv.org/abs/1801.06345v1 |
http://arxiv.org/pdf/1801.06345v1.pdf | |
PWC | https://paperswithcode.com/paper/scut-fbp5500-a-diverse-benchmark-dataset-for |
Repo | https://github.com/HCIILAB/SCUT-FBP5500-Database-Release |
Framework | pytorch |
Global Robustness Evaluation of Deep Neural Networks with Provable Guarantees for the $L_0$ Norm
Title | Global Robustness Evaluation of Deep Neural Networks with Provable Guarantees for the $L_0$ Norm |
Authors | Wenjie Ruan, Min Wu, Youcheng Sun, Xiaowei Huang, Daniel Kroening, Marta Kwiatkowska |
Abstract | Deployment of deep neural networks (DNNs) in safety- or security-critical systems requires provable guarantees on their correct behaviour. A common requirement is robustness to adversarial perturbations in a neighbourhood around an input. In this paper we focus on the $L_0$ norm and aim to compute, for a trained DNN and an input, the maximal radius of a safe norm ball around the input within which there are no adversarial examples. Then we define global robustness as an expectation of the maximal safe radius over a test data set. We first show that the problem is NP-hard, and then propose an approximate approach to iteratively compute lower and upper bounds on the network’s robustness. The approach is \emph{anytime}, i.e., it returns intermediate bounds and robustness estimates that are gradually, but strictly, improved as the computation proceeds; \emph{tensor-based}, i.e., the computation is conducted over a set of inputs simultaneously, instead of one by one, to enable efficient GPU computation; and has \emph{provable guarantees}, i.e., both the bounds and the robustness estimates can converge to their optimal values. Finally, we demonstrate the utility of the proposed approach in practice to compute tight bounds by applying and adapting the anytime algorithm to a set of challenging problems, including global robustness evaluation, competitive $L_0$ attacks, test case generation for DNNs, and local robustness evaluation on large-scale ImageNet DNNs. We release the code of all case studies via GitHub. |
Tasks | |
Published | 2018-04-16 |
URL | http://arxiv.org/abs/1804.05805v2 |
http://arxiv.org/pdf/1804.05805v2.pdf | |
PWC | https://paperswithcode.com/paper/global-robustness-evaluation-of-deep-neural |
Repo | https://github.com/Accountable-Machine-Intelligence-Lab/DeepTRE |
Framework | none |
3D Conceptual Design Using Deep Learning
Title | 3D Conceptual Design Using Deep Learning |
Authors | Zhangsihao Yang, Haoliang Jiang, Zou Lan |
Abstract | This article proposes a data-driven methodology to achieve a fast design support, in order to generate or develop novel designs covering multiple object categories. This methodology implements two state-of-the-art Variational Autoencoder dealing with 3D model data. Our methodology constructs a self-defined loss function. The loss function, containing the outputs of certain layers in the autoencoder, obtains combination of different latent features from different 3D model categories. Additionally, this article provide detail explanation to utilize the Princeton ModelNet40 database, a comprehensive clean collection of 3D CAD models for objects. After convert the original 3D mesh file to voxel and point cloud data type, we enable to feed our autoencoder with data of the same size of dimension. The novelty of this work is to leverage the power of deep learning methods as an efficient latent feature extractor to explore unknown designing areas. Through this project, we expect the output can show a clear and smooth interpretation of model from different categories to develop a fast design support to generate novel shapes. This final report will explore 1) the theoretical ideas, 2) the progresses to implement Variantional Autoencoder to attain implicit features from input shapes, 3) the results of output shapes during training in selected domains of both 3D voxel data and 3D point cloud data, and 4) our conclusion and future work to achieve the more outstanding goal. |
Tasks | |
Published | 2018-08-05 |
URL | http://arxiv.org/abs/1808.01675v1 |
http://arxiv.org/pdf/1808.01675v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-conceptual-design-using-deep-learning |
Repo | https://github.com/vivienzou1/3D_conceptal_design_using_deep_learning_my_publication |
Framework | none |
Tensor2Tensor for Neural Machine Translation
Title | Tensor2Tensor for Neural Machine Translation |
Authors | Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit |
Abstract | Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model. |
Tasks | Machine Translation |
Published | 2018-03-16 |
URL | http://arxiv.org/abs/1803.07416v1 |
http://arxiv.org/pdf/1803.07416v1.pdf | |
PWC | https://paperswithcode.com/paper/tensor2tensor-for-neural-machine-translation |
Repo | https://github.com/tensorflow/tensor2tensor |
Framework | tf |
Meta-Learning for Stochastic Gradient MCMC
Title | Meta-Learning for Stochastic Gradient MCMC |
Authors | Wenbo Gong, Yingzhen Li, José Miguel Hernández-Lobato |
Abstract | Stochastic gradient Markov chain Monte Carlo (SG-MCMC) has become increasingly popular for simulating posterior samples in large-scale Bayesian modeling. However, existing SG-MCMC schemes are not tailored to any specific probabilistic model, even a simple modification of the underlying dynamical system requires significant physical intuition. This paper presents the first meta-learning algorithm that allows automated design for the underlying continuous dynamics of an SG-MCMC sampler. The learned sampler generalizes Hamiltonian dynamics with state-dependent drift and diffusion, enabling fast traversal and efficient exploration of neural network energy landscapes. Experiments validate the proposed approach on both Bayesian fully connected neural network and Bayesian recurrent neural network tasks, showing that the learned sampler out-performs generic, hand-designed SG-MCMC algorithms, and generalizes to different datasets and larger architectures. |
Tasks | Efficient Exploration, Meta-Learning |
Published | 2018-06-12 |
URL | http://arxiv.org/abs/1806.04522v1 |
http://arxiv.org/pdf/1806.04522v1.pdf | |
PWC | https://paperswithcode.com/paper/meta-learning-for-stochastic-gradient-mcmc |
Repo | https://github.com/WenboGong/MetaSGMCMC |
Framework | pytorch |
Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis
Title | Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis |
Authors | Thomas George, César Laurent, Xavier Bouthillier, Nicolas Ballas, Pascal Vincent |
Abstract | Repository containing Pytorch code for EKFAC and K-FAC perconditioners. |
Tasks | |
Published | 2018-06-11 |
URL | http://arxiv.org/abs/1806.03884v1 |
http://arxiv.org/pdf/1806.03884v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-approximate-natural-gradient-descent-in-1 |
Repo | https://github.com/Thrandis/EKFAC-pytorch |
Framework | pytorch |
Adversarial Attacks on Variational Autoencoders
Title | Adversarial Attacks on Variational Autoencoders |
Authors | George Gondim-Ribeiro, Pedro Tabacof, Eduardo Valle |
Abstract | Adversarial attacks are malicious inputs that derail machine-learning models. We propose a scheme to attack autoencoders, as well as a quantitative evaluation framework that correlates well with the qualitative assessment of the attacks. We assess — with statistically validated experiments — the resistance to attacks of three variational autoencoders (simple, convolutional, and DRAW) in three datasets (MNIST, SVHN, CelebA), showing that both DRAW’s recurrence and attention mechanism lead to better resistance. As autoencoders are proposed for compressing data — a scenario in which their safety is paramount — we expect more attention will be given to adversarial attacks on them. |
Tasks | |
Published | 2018-06-12 |
URL | http://arxiv.org/abs/1806.04646v1 |
http://arxiv.org/pdf/1806.04646v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-attacks-on-variational |
Repo | https://github.com/gondimribeiro/adv-attacks-vae |
Framework | tf |
Backtracking gradient descent method for general $C^1$ functions, with applications to Deep Learning
Title | Backtracking gradient descent method for general $C^1$ functions, with applications to Deep Learning |
Authors | Tuyen Trung Truong, Tuan Hang Nguyen |
Abstract | While Standard gradient descent is one very popular optimisation method, its convergence cannot be proven beyond the class of functions whose gradient is globally Lipschitz continuous. As such, it is not actually applicable to realistic applications such as Deep Neural Networks. In this paper, we prove that its backtracking variant behaves very nicely, in particular convergence can be shown for all Morse functions. The main theoretical result of this paper is as follows. Theorem. Let $f:\mathbb{R}^k\rightarrow \mathbb{R}$ be a $C^1$ function, and ${z_n}$ a sequence constructed from the Backtracking gradient descent algorithm. (1) Either $\lim _{n\rightarrow\infty}z_n=\infty$ or $\lim {n\rightarrow\infty}z{n+1}-z_n=0$. (2) Assume that $f$ has at most countably many critical points. Then either $\lim _{n\rightarrow\infty}z_n=\infty$ or ${z_n}$ converges to a critical point of $f$. (3) More generally, assume that all connected components of the set of critical points of $f$ are compact. Then either $\lim _{n\rightarrow\infty}z_n=\infty$ or ${z_n}$ is bounded. Moreover, in the latter case the set of cluster points of ${z_n}$ is connected. Some generalised versions of this result, including an inexact version, are included. Another result in this paper concerns the problem of saddle points. We then present a heuristic argument to explain why Standard gradient descent method works so well, and modifications of the backtracking versions of GD, MMT and NAG. Experiments with datasets CIFAR10 and CIFAR100 on various popular architectures verify the heuristic argument also for the mini-batch practice and show that our new algorithms, while automatically fine tuning learning rates, perform better than current state-of-the-art methods such as MMT, NAG, Adagrad, Adadelta, RMSProp, Adam and Adamax. |
Tasks | |
Published | 2018-08-15 |
URL | https://arxiv.org/abs/1808.05160v2 |
https://arxiv.org/pdf/1808.05160v2.pdf | |
PWC | https://paperswithcode.com/paper/backtracking-gradient-descent-method-for |
Repo | https://github.com/hank-nguyen/MBT-optimizer |
Framework | pytorch |