October 20, 2019

2847 words 14 mins read

Paper Group AWR 288

Light-Weight RefineNet for Real-Time Semantic Segmentation. ResNet with one-neuron hidden layers is a Universal Approximator. Constraining Effective Field Theories with Machine Learning. SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient. Uncovering divergent linguistic information in word embeddings w …

Light-Weight RefineNet for Real-Time Semantic Segmentation


Title	Light-Weight RefineNet for Real-Time Semantic Segmentation
Authors	Vladimir Nekrasov, Chunhua Shen, Ian Reid
Abstract	We consider an important task of effective and efficient semantic image segmentation. In particular, we adapt a powerful semantic segmentation architecture, called RefineNet, into the more compact one, suitable even for tasks requiring real-time performance on high-resolution inputs. To this end, we identify computationally expensive blocks in the original setup, and propose two modifications aimed to decrease the number of parameters and floating point operations. By doing that, we achieve more than twofold model reduction, while keeping the performance levels almost intact. Our fastest model undergoes a significant speed-up boost from 20 FPS to 55 FPS on a generic GPU card on 512x512 inputs with solid 81.1% mean iou performance on the test set of PASCAL VOC, while our slowest model with 32 FPS (from original 17 FPS) shows 82.7% mean iou on the same dataset. Alternatively, we showcase that our approach is easily mixable with light-weight classification networks: we attain 79.2% mean iou on PASCAL VOC using a model that contains only 3.3M parameters and performs only 9.3B floating point operations.
Tasks	Real-Time Semantic Segmentation, Semantic Segmentation
Published	2018-10-08
URL	http://arxiv.org/abs/1810.03272v1
PDF	http://arxiv.org/pdf/1810.03272v1.pdf
PWC	https://paperswithcode.com/paper/light-weight-refinenet-for-real-time-semantic
Repo	https://github.com/DrSleep/light-weight-refinenet
Framework	pytorch

ResNet with one-neuron hidden layers is a Universal Approximator


Title	ResNet with one-neuron hidden layers is a Universal Approximator
Authors	Hongzhou Lin, Stefanie Jegelka
Abstract	We demonstrate that a very deep ResNet with stacked modules with one neuron per hidden layer and ReLU activation functions can uniformly approximate any Lebesgue integrable function in $d$ dimensions, i.e. $\ell_1(\mathbb{R}^d)$. Because of the identity mapping inherent to ResNets, our network has alternating layers of dimension one and $d$. This stands in sharp contrast to fully connected networks, which are not universal approximators if their width is the input dimension $d$ [Lu et al, 2017; Hanin and Sellke, 2017]. Hence, our result implies an increase in representational power for narrow deep networks by the ResNet architecture.
Tasks
Published	2018-06-28
URL	http://arxiv.org/abs/1806.10909v2
PDF	http://arxiv.org/pdf/1806.10909v2.pdf
PWC	https://paperswithcode.com/paper/resnet-with-one-neuron-hidden-layers-is-a
Repo	https://github.com/sivakon/resnet-approximator
Framework	none

Constraining Effective Field Theories with Machine Learning


Title	Constraining Effective Field Theories with Machine Learning
Authors	Johann Brehmer, Kyle Cranmer, Gilles Louppe, Juan Pavez
Abstract	We present powerful new analysis techniques to constrain effective field theories at the LHC. By leveraging the structure of particle physics processes, we extract extra information from Monte-Carlo simulations, which can be used to train neural network models that estimate the likelihood ratio. These methods scale well to processes with many observables and theory parameters, do not require any approximations of the parton shower or detector response, and can be evaluated in microseconds. We show that they allow us to put significantly stronger bounds on dimension-six operators than existing methods, demonstrating their potential to improve the precision of the LHC legacy constraints.
Tasks
Published	2018-04-30
URL	http://arxiv.org/abs/1805.00013v4
PDF	http://arxiv.org/pdf/1805.00013v4.pdf
PWC	https://paperswithcode.com/paper/constraining-effective-field-theories-with
Repo	https://github.com/johannbrehmer/simulator-mining-example
Framework	none

SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient


Title	SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient
Authors	Aaron Mishkin, Frederik Kunstner, Didrik Nielsen, Mark Schmidt, Mohammad Emtiyaz Khan
Abstract	Uncertainty estimation in large deep-learning models is a computationally challenging task, where it is difficult to form even a Gaussian approximation to the posterior distribution. In such situations, existing methods usually resort to a diagonal approximation of the covariance matrix despite, the fact that these matrices are known to result in poor uncertainty estimates. To address this issue, we propose a new stochastic, low-rank, approximate natural-gradient (SLANG) method for variational inference in large, deep models. Our method estimates a “diagonal plus low-rank” structure based solely on back-propagated gradients of the network log-likelihood. This requires strictly less gradient computations than methods that compute the gradient of the whole variational objective. Empirical evaluations on standard benchmarks confirm that SLANG enables faster and more accurate estimation of uncertainty than mean-field methods, and performs comparably to state-of-the-art methods.
Tasks
Published	2018-11-11
URL	http://arxiv.org/abs/1811.04504v2
PDF	http://arxiv.org/pdf/1811.04504v2.pdf
PWC	https://paperswithcode.com/paper/slang-fast-structured-covariance
Repo	https://github.com/lamantinushka/StructuredCovariance
Framework	pytorch

Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation


Title	Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation
Authors	Mikel Artetxe, Gorka Labaka, Iñigo Lopez-Gazpio, Eneko Agirre
Abstract	Following the recent success of word embeddings, it has been argued that there is no such thing as an ideal representation for words, as different models tend to capture divergent and often mutually incompatible aspects like semantics/syntax and similarity/relatedness. In this paper, we show that each embedding model captures more information than directly apparent. A linear transformation that adjusts the similarity order of the model without any external resource can tailor it to achieve better results in those aspects, providing a new perspective on how embeddings encode divergent linguistic information. In addition, we explore the relation between intrinsic and extrinsic evaluation, as the effect of our transformations in downstream tasks is higher for unsupervised systems than for supervised ones.
Tasks	Word Embeddings
Published	2018-09-06
URL	http://arxiv.org/abs/1809.02094v1
PDF	http://arxiv.org/pdf/1809.02094v1.pdf
PWC	https://paperswithcode.com/paper/uncovering-divergent-linguistic-information
Repo	https://github.com/lgazpio/DAM_STS
Framework	pytorch

Effects of Degradations on Deep Neural Network Architectures


Title	Effects of Degradations on Deep Neural Network Architectures
Authors	Prasun Roy, Subhankar Ghosh, Saumik Bhattacharya, Umapada Pal
Abstract	Recently, image classification methods based on capsules (groups of neurons) and a novel dynamic routing protocol are proposed. The methods show promising performances than the state-of-the-art CNN-based models in some of the existing datasets. However, the behavior of capsule-based models and CNN-based models are largely unknown in presence of noise. So it is important to study the performance of these models under various noises. In this paper, we demonstrate the effect of image degradations on deep neural network architectures for image classification task. We select six widely used CNN architectures to analyse their performances for image classification task on datasets of various distortions. Our work has three main contributions: 1) we observe the effects of degradations on different CNN models; 2) accordingly, we propose a network setup that can enhance the robustness of any CNN architecture for certain degradations, and 3) we propose a new capsule network that achieves high recognition accuracy. To the best of our knowledge, this is the first study on the performance of CapsuleNet (CapsNet) and other state-of-the-art CNN architectures under different types of image degradations. Also, our datasets and source code are available publicly to the researchers.
Tasks	Image Classification
Published	2018-07-26
URL	https://arxiv.org/abs/1807.10108v4
PDF	https://arxiv.org/pdf/1807.10108v4.pdf
PWC	https://paperswithcode.com/paper/effects-of-degradations-on-deep-neural
Repo	https://github.com/prasunroy/cnn-on-degraded-images
Framework	tf

Parkinson’s Disease Assessment from a Wrist-Worn Wearable Sensor in Free-Living Conditions: Deep Ensemble Learning and Visualization


Title	Parkinson’s Disease Assessment from a Wrist-Worn Wearable Sensor in Free-Living Conditions: Deep Ensemble Learning and Visualization
Authors	Terry Taewoong Um, Franz Michael Josef Pfister, Daniel Christian Pichler, Satoshi Endo, Muriel Lang, Sandra Hirche, Urban Fietzek, Dana Kulić
Abstract	Parkinson’s Disease (PD) is characterized by disorders in motor function such as freezing of gait, rest tremor, rigidity, and slowed and hyposcaled movements. Medication with dopaminergic medication may alleviate those motor symptoms, however, side-effects may include uncontrolled movements, known as dyskinesia. In this paper, an automatic PD motor-state assessment in free-living conditions is proposed using an accelerometer in a wrist-worn wearable sensor. In particular, an ensemble of convolutional neural networks (CNNs) is applied to capture the large variability of daily-living activities and overcome the dissimilarity between training and test patients due to the inter-patient variability. In addition, class activation map (CAM), a visualization technique for CNNs, is applied for providing an interpretation of the results.
Tasks
Published	2018-08-08
URL	http://arxiv.org/abs/1808.02870v1
PDF	http://arxiv.org/pdf/1808.02870v1.pdf
PWC	https://paperswithcode.com/paper/parkinsons-disease-assessment-from-a-wrist
Repo	https://github.com/terryum/Deep_Ensemble_CNN_for_Imbalance_Labels
Framework	none

SCUT-FBP5500: A Diverse Benchmark Dataset for Multi-Paradigm Facial Beauty Prediction


Title	SCUT-FBP5500: A Diverse Benchmark Dataset for Multi-Paradigm Facial Beauty Prediction
Authors	Lingyu Liang, Luojun Lin, Lianwen Jin, Duorui Xie, Mengru Li
Abstract	Facial beauty prediction (FBP) is a significant visual recognition problem to make assessment of facial attractiveness that is consistent to human perception. To tackle this problem, various data-driven models, especially state-of-the-art deep learning techniques, were introduced, and benchmark dataset become one of the essential elements to achieve FBP. Previous works have formulated the recognition of facial beauty as a specific supervised learning problem of classification, regression or ranking, which indicates that FBP is intrinsically a computation problem with multiple paradigms. However, most of FBP benchmark datasets were built under specific computation constrains, which limits the performance and flexibility of the computational model trained on the dataset. In this paper, we argue that FBP is a multi-paradigm computation problem, and propose a new diverse benchmark dataset, called SCUT-FBP5500, to achieve multi-paradigm facial beauty prediction. The SCUT-FBP5500 dataset has totally 5500 frontal faces with diverse properties (male/female, Asian/Caucasian, ages) and diverse labels (face landmarks, beauty scores within [1,~5], beauty score distribution), which allows different computational models with different FBP paradigms, such as appearance-based/shape-based facial beauty classification/regression model for male/female of Asian/Caucasian. We evaluated the SCUT-FBP5500 dataset for FBP using different combinations of feature and predictor, and various deep learning methods. The results indicates the improvement of FBP and the potential applications based on the SCUT-FBP5500.
Tasks	Facial Beauty Prediction
Published	2018-01-19
URL	http://arxiv.org/abs/1801.06345v1
PDF	http://arxiv.org/pdf/1801.06345v1.pdf
PWC	https://paperswithcode.com/paper/scut-fbp5500-a-diverse-benchmark-dataset-for
Repo	https://github.com/HCIILAB/SCUT-FBP5500-Database-Release
Framework	pytorch

Global Robustness Evaluation of Deep Neural Networks with Provable Guarantees for the $L_0$ Norm


Title	Global Robustness Evaluation of Deep Neural Networks with Provable Guarantees for the $L_0$ Norm
Authors	Wenjie Ruan, Min Wu, Youcheng Sun, Xiaowei Huang, Daniel Kroening, Marta Kwiatkowska
Abstract	Deployment of deep neural networks (DNNs) in safety- or security-critical systems requires provable guarantees on their correct behaviour. A common requirement is robustness to adversarial perturbations in a neighbourhood around an input. In this paper we focus on the $L_0$ norm and aim to compute, for a trained DNN and an input, the maximal radius of a safe norm ball around the input within which there are no adversarial examples. Then we define global robustness as an expectation of the maximal safe radius over a test data set. We first show that the problem is NP-hard, and then propose an approximate approach to iteratively compute lower and upper bounds on the network’s robustness. The approach is \emph{anytime}, i.e., it returns intermediate bounds and robustness estimates that are gradually, but strictly, improved as the computation proceeds; \emph{tensor-based}, i.e., the computation is conducted over a set of inputs simultaneously, instead of one by one, to enable efficient GPU computation; and has \emph{provable guarantees}, i.e., both the bounds and the robustness estimates can converge to their optimal values. Finally, we demonstrate the utility of the proposed approach in practice to compute tight bounds by applying and adapting the anytime algorithm to a set of challenging problems, including global robustness evaluation, competitive $L_0$ attacks, test case generation for DNNs, and local robustness evaluation on large-scale ImageNet DNNs. We release the code of all case studies via GitHub.
Tasks
Published	2018-04-16
URL	http://arxiv.org/abs/1804.05805v2
PDF	http://arxiv.org/pdf/1804.05805v2.pdf
PWC	https://paperswithcode.com/paper/global-robustness-evaluation-of-deep-neural
Repo	https://github.com/Accountable-Machine-Intelligence-Lab/DeepTRE
Framework	none

3D Conceptual Design Using Deep Learning


Title	3D Conceptual Design Using Deep Learning
Authors	Zhangsihao Yang, Haoliang Jiang, Zou Lan
Abstract	This article proposes a data-driven methodology to achieve a fast design support, in order to generate or develop novel designs covering multiple object categories. This methodology implements two state-of-the-art Variational Autoencoder dealing with 3D model data. Our methodology constructs a self-defined loss function. The loss function, containing the outputs of certain layers in the autoencoder, obtains combination of different latent features from different 3D model categories. Additionally, this article provide detail explanation to utilize the Princeton ModelNet40 database, a comprehensive clean collection of 3D CAD models for objects. After convert the original 3D mesh file to voxel and point cloud data type, we enable to feed our autoencoder with data of the same size of dimension. The novelty of this work is to leverage the power of deep learning methods as an efficient latent feature extractor to explore unknown designing areas. Through this project, we expect the output can show a clear and smooth interpretation of model from different categories to develop a fast design support to generate novel shapes. This final report will explore 1) the theoretical ideas, 2) the progresses to implement Variantional Autoencoder to attain implicit features from input shapes, 3) the results of output shapes during training in selected domains of both 3D voxel data and 3D point cloud data, and 4) our conclusion and future work to achieve the more outstanding goal.
Tasks
Published	2018-08-05
URL	http://arxiv.org/abs/1808.01675v1
PDF	http://arxiv.org/pdf/1808.01675v1.pdf
PWC	https://paperswithcode.com/paper/3d-conceptual-design-using-deep-learning
Repo	https://github.com/vivienzou1/3D_conceptal_design_using_deep_learning_my_publication
Framework	none

Tensor2Tensor for Neural Machine Translation


Title	Tensor2Tensor for Neural Machine Translation
Authors	Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit
Abstract	Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.
Tasks	Machine Translation
Published	2018-03-16
URL	http://arxiv.org/abs/1803.07416v1
PDF	http://arxiv.org/pdf/1803.07416v1.pdf
PWC	https://paperswithcode.com/paper/tensor2tensor-for-neural-machine-translation
Repo	https://github.com/tensorflow/tensor2tensor
Framework	tf

Meta-Learning for Stochastic Gradient MCMC


Title	Meta-Learning for Stochastic Gradient MCMC
Authors	Wenbo Gong, Yingzhen Li, José Miguel Hernández-Lobato
Abstract	Stochastic gradient Markov chain Monte Carlo (SG-MCMC) has become increasingly popular for simulating posterior samples in large-scale Bayesian modeling. However, existing SG-MCMC schemes are not tailored to any specific probabilistic model, even a simple modification of the underlying dynamical system requires significant physical intuition. This paper presents the first meta-learning algorithm that allows automated design for the underlying continuous dynamics of an SG-MCMC sampler. The learned sampler generalizes Hamiltonian dynamics with state-dependent drift and diffusion, enabling fast traversal and efficient exploration of neural network energy landscapes. Experiments validate the proposed approach on both Bayesian fully connected neural network and Bayesian recurrent neural network tasks, showing that the learned sampler out-performs generic, hand-designed SG-MCMC algorithms, and generalizes to different datasets and larger architectures.
Tasks	Efficient Exploration, Meta-Learning
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04522v1
PDF	http://arxiv.org/pdf/1806.04522v1.pdf
PWC	https://paperswithcode.com/paper/meta-learning-for-stochastic-gradient-mcmc
Repo	https://github.com/WenboGong/MetaSGMCMC
Framework	pytorch

Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis


Title	Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis
Authors	Thomas George, César Laurent, Xavier Bouthillier, Nicolas Ballas, Pascal Vincent
Abstract	Repository containing Pytorch code for EKFAC and K-FAC perconditioners.
Tasks
Published	2018-06-11
URL	http://arxiv.org/abs/1806.03884v1
PDF	http://arxiv.org/pdf/1806.03884v1.pdf
PWC	https://paperswithcode.com/paper/fast-approximate-natural-gradient-descent-in-1
Repo	https://github.com/Thrandis/EKFAC-pytorch
Framework	pytorch

Adversarial Attacks on Variational Autoencoders


Title	Adversarial Attacks on Variational Autoencoders
Authors	George Gondim-Ribeiro, Pedro Tabacof, Eduardo Valle
Abstract	Adversarial attacks are malicious inputs that derail machine-learning models. We propose a scheme to attack autoencoders, as well as a quantitative evaluation framework that correlates well with the qualitative assessment of the attacks. We assess — with statistically validated experiments — the resistance to attacks of three variational autoencoders (simple, convolutional, and DRAW) in three datasets (MNIST, SVHN, CelebA), showing that both DRAW’s recurrence and attention mechanism lead to better resistance. As autoencoders are proposed for compressing data — a scenario in which their safety is paramount — we expect more attention will be given to adversarial attacks on them.
Tasks
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04646v1
PDF	http://arxiv.org/pdf/1806.04646v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-attacks-on-variational
Repo	https://github.com/gondimribeiro/adv-attacks-vae
Framework	tf

Backtracking gradient descent method for general $C^1$ functions, with applications to Deep Learning


Title	Backtracking gradient descent method for general $C^1$ functions, with applications to Deep Learning
Authors	Tuyen Trung Truong, Tuan Hang Nguyen
Abstract	While Standard gradient descent is one very popular optimisation method, its convergence cannot be proven beyond the class of functions whose gradient is globally Lipschitz continuous. As such, it is not actually applicable to realistic applications such as Deep Neural Networks. In this paper, we prove that its backtracking variant behaves very nicely, in particular convergence can be shown for all Morse functions. The main theoretical result of this paper is as follows. Theorem. Let $f:\mathbb{R}^k\rightarrow \mathbb{R}$ be a $C^1$ function, and ${z_n}$ a sequence constructed from the Backtracking gradient descent algorithm. (1) Either $\lim _{n\rightarrow\infty}z_n=\infty$ or $\lim {n\rightarrow\infty}z{n+1}-z_n=0$. (2) Assume that $f$ has at most countably many critical points. Then either $\lim _{n\rightarrow\infty}z_n=\infty$ or ${z_n}$ converges to a critical point of $f$. (3) More generally, assume that all connected components of the set of critical points of $f$ are compact. Then either $\lim _{n\rightarrow\infty}z_n=\infty$ or ${z_n}$ is bounded. Moreover, in the latter case the set of cluster points of ${z_n}$ is connected. Some generalised versions of this result, including an inexact version, are included. Another result in this paper concerns the problem of saddle points. We then present a heuristic argument to explain why Standard gradient descent method works so well, and modifications of the backtracking versions of GD, MMT and NAG. Experiments with datasets CIFAR10 and CIFAR100 on various popular architectures verify the heuristic argument also for the mini-batch practice and show that our new algorithms, while automatically fine tuning learning rates, perform better than current state-of-the-art methods such as MMT, NAG, Adagrad, Adadelta, RMSProp, Adam and Adamax.
Tasks
Published	2018-08-15
URL	https://arxiv.org/abs/1808.05160v2
PDF	https://arxiv.org/pdf/1808.05160v2.pdf
PWC	https://paperswithcode.com/paper/backtracking-gradient-descent-method-for
Repo	https://github.com/hank-nguyen/MBT-optimizer
Framework	pytorch