July 30, 2019

2862 words 14 mins read

Paper Group AWR 42

Paper Group AWR 42

Automatic Classification of Bright Retinal Lesions via Deep Network Features. A Hybrid Convolutional Variational Autoencoder for Text Generation. A Bayesian Data Augmentation Approach for Learning Deep Models. Rank Pruning for Dominance Queries in CP-Nets. REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models. Fisher- …

Automatic Classification of Bright Retinal Lesions via Deep Network Features

Title Automatic Classification of Bright Retinal Lesions via Deep Network Features
Authors Ibrahim Sadek, Mohamed Elawady, Abd El Rahman Shabayek
Abstract The diabetic retinopathy is timely diagonalized through color eye fundus images by experienced ophthalmologists, in order to recognize potential retinal features and identify early-blindness cases. In this paper, it is proposed to extract deep features from the last fully-connected layer of, four different, pre-trained convolutional neural networks. These features are then feeded into a non-linear classifier to discriminate three-class diabetic cases, i.e., normal, exudates, and drusen. Averaged across 1113 color retinal images collected from six publicly available annotated datasets, the deep features approach perform better than the classical bag-of-words approach. The proposed approaches have an average accuracy between 91.23% and 92.00% with more than 13% improvement over the traditional state of art methods.
Tasks
Published 2017-07-07
URL http://arxiv.org/abs/1707.02022v3
PDF http://arxiv.org/pdf/1707.02022v3.pdf
PWC https://paperswithcode.com/paper/automatic-classification-of-bright-retinal
Repo https://github.com/mawady/DeepRetinalClassification
Framework none

A Hybrid Convolutional Variational Autoencoder for Text Generation

Title A Hybrid Convolutional Variational Autoencoder for Text Generation
Authors Stanislau Semeniuta, Aliaksei Severyn, Erhardt Barth
Abstract In this paper we explore the effect of architectural choices on learning a Variational Autoencoder (VAE) for text generation. In contrast to the previously introduced VAE model for text where both the encoder and decoder are RNNs, we propose a novel hybrid architecture that blends fully feed-forward convolutional and deconvolutional components with a recurrent language model. Our architecture exhibits several attractive properties such as faster run time and convergence, ability to better handle long sequences and, more importantly, it helps to avoid some of the major difficulties posed by training VAE models on textual data.
Tasks Language Modelling, Text Generation
Published 2017-02-08
URL http://arxiv.org/abs/1702.02390v1
PDF http://arxiv.org/pdf/1702.02390v1.pdf
PWC https://paperswithcode.com/paper/a-hybrid-convolutional-variational
Repo https://github.com/ryokamoi/hybrid_textvae
Framework tf

A Bayesian Data Augmentation Approach for Learning Deep Models

Title A Bayesian Data Augmentation Approach for Learning Deep Models
Authors Toan Tran, Trung Pham, Gustavo Carneiro, Lyle Palmer, Ian Reid
Abstract Data augmentation is an essential part of the training process applied to deep learning models. The motivation is that a robust training process for deep learning models depends on large annotated datasets, which are expensive to be acquired, stored and processed. Therefore a reasonable alternative is to be able to automatically generate new annotated training samples using a process known as data augmentation. The dominant data augmentation approach in the field assumes that new training samples can be obtained via random geometric or appearance transformations applied to annotated training samples, but this is a strong assumption because it is unclear if this is a reliable generative model for producing new training samples. In this paper, we provide a novel Bayesian formulation to data augmentation, where new annotated training points are treated as missing variables and generated based on the distribution learned from the training set. For learning, we introduce a theoretically sound algorithm — generalised Monte Carlo expectation maximisation, and demonstrate one possible implementation via an extension of the Generative Adversarial Network (GAN). Classification results on MNIST, CIFAR-10 and CIFAR-100 show the better performance of our proposed method compared to the current dominant data augmentation approach mentioned above — the results also show that our approach produces better classification results than similar GAN models.
Tasks Data Augmentation
Published 2017-10-29
URL http://arxiv.org/abs/1710.10564v1
PDF http://arxiv.org/pdf/1710.10564v1.pdf
PWC https://paperswithcode.com/paper/a-bayesian-data-augmentation-approach-for
Repo https://github.com/toantm/keras-bda
Framework pytorch

Rank Pruning for Dominance Queries in CP-Nets

Title Rank Pruning for Dominance Queries in CP-Nets
Authors Kathryn Laing, Peter Adam Thwaites, John Paul Gosling
Abstract Conditional preference networks (CP-nets) are a graphical representation of a person’s (conditional) preferences over a set of discrete variables. In this paper, we introduce a novel method of quantifying preference for any given outcome based on a CP-net representation of a user’s preferences. We demonstrate that these values are useful for reasoning about user preferences. In particular, they allow us to order (any subset of) the possible outcomes in accordance with the user’s preferences. Further, these values can be used to improve the efficiency of outcome dominance testing. That is, given a pair of outcomes, we can determine which the user prefers more efficiently. Through experimental results, we show that this method is more effective than existing techniques for improving dominance testing efficiency. We show that the above results also hold for CP-nets that express indifference between variable values.
Tasks
Published 2017-12-22
URL http://arxiv.org/abs/1712.08588v2
PDF http://arxiv.org/pdf/1712.08588v2.pdf
PWC https://paperswithcode.com/paper/rank-pruning-for-dominance-queries-in-cp-nets
Repo https://github.com/KathrynLaing/DQ-Pruning
Framework none

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

Title REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
Authors George Tucker, Andriy Mnih, Chris J. Maddison, Dieterich Lawson, Jascha Sohl-Dickstein
Abstract Learning in models with discrete latent variables is challenging due to high variance gradient estimators. Generally, approaches have relied on control variates to reduce the variance of the REINFORCE estimator. Recent work (Jang et al. 2016, Maddison et al. 2016) has taken a different approach, introducing a continuous relaxation of discrete variables to produce low-variance, but biased, gradient estimates. In this work, we combine the two approaches through a novel control variate that produces low-variance, \emph{unbiased} gradient estimates. Then, we introduce a modification to the continuous relaxation and show that the tightness of the relaxation can be adapted online, removing it as a hyperparameter. We show state-of-the-art variance reduction on several benchmark generative modeling tasks, generally leading to faster convergence to a better final log-likelihood.
Tasks Latent Variable Models
Published 2017-03-21
URL http://arxiv.org/abs/1703.07370v4
PDF http://arxiv.org/pdf/1703.07370v4.pdf
PWC https://paperswithcode.com/paper/rebar-low-variance-unbiased-gradient
Repo https://github.com/tensorflow/models/tree/master/research/rebar
Framework tf

Fisher-Rao Metric, Geometry, and Complexity of Neural Networks

Title Fisher-Rao Metric, Geometry, and Complexity of Neural Networks
Authors Tengyuan Liang, Tomaso Poggio, Alexander Rakhlin, James Stokes
Abstract We study the relationship between geometry and capacity measures for deep neural networks from an invariance viewpoint. We introduce a new notion of capacity — the Fisher-Rao norm — that possesses desirable invariance properties and is motivated by Information Geometry. We discover an analytical characterization of the new capacity measure, through which we establish norm-comparison inequalities and further show that the new measure serves as an umbrella for several existing norm-based complexity measures. We discuss upper bounds on the generalization error induced by the proposed measure. Extensive numerical experiments on CIFAR-10 support our theoretical findings. Our theoretical analysis rests on a key structural lemma about partial derivatives of multi-layer rectifier networks.
Tasks
Published 2017-11-05
URL http://arxiv.org/abs/1711.01530v2
PDF http://arxiv.org/pdf/1711.01530v2.pdf
PWC https://paperswithcode.com/paper/fisher-rao-metric-geometry-and-complexity-of
Repo https://github.com/ML-KA/PDG-Theory
Framework none

Maximum Principle Based Algorithms for Deep Learning

Title Maximum Principle Based Algorithms for Deep Learning
Authors Qianxiao Li, Long Chen, Cheng Tai, Weinan E
Abstract The continuous dynamical system approach to deep learning is explored in order to devise alternative frameworks for training algorithms. Training is recast as a control problem and this allows us to formulate necessary optimality conditions in continuous time using the Pontryagin’s maximum principle (PMP). A modification of the method of successive approximations is then used to solve the PMP, giving rise to an alternative training algorithm for deep learning. This approach has the advantage that rigorous error estimates and convergence results can be established. We also show that it may avoid some pitfalls of gradient-based methods, such as slow convergence on flat landscapes near saddle points. Furthermore, we demonstrate that it obtains favorable initial convergence rate per-iteration, provided Hamiltonian maximization can be efficiently carried out - a step which is still in need of improvement. Overall, the approach opens up new avenues to attack problems associated with deep learning, such as trapping in slow manifolds and inapplicability of gradient-based methods for discrete trainable variables.
Tasks
Published 2017-10-26
URL http://arxiv.org/abs/1710.09513v4
PDF http://arxiv.org/pdf/1710.09513v4.pdf
PWC https://paperswithcode.com/paper/maximum-principle-based-algorithms-for-deep
Repo https://github.com/foxycharm/E-MSA
Framework tf

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Title DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning
Authors Wenhan Xiong, Thien Hoang, William Yang Wang
Abstract We study the problem of learning to reason in large scale knowledge graphs (KGs). More specifically, we describe a novel reinforcement learning framework for learning multi-hop relational paths: we use a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector space by sampling the most promising relation to extend its path. In contrast to prior work, our approach includes a reward function that takes the accuracy, diversity, and efficiency into consideration. Experimentally, we show that our proposed method outperforms a path-ranking based algorithm and knowledge graph embedding methods on Freebase and Never-Ending Language Learning datasets.
Tasks Graph Embedding, Knowledge Graph Embedding, Knowledge Graph Embeddings, Knowledge Graphs
Published 2017-07-20
URL http://arxiv.org/abs/1707.06690v3
PDF http://arxiv.org/pdf/1707.06690v3.pdf
PWC https://paperswithcode.com/paper/deeppath-a-reinforcement-learning-method-for
Repo https://github.com/xwhan/DeepPath
Framework none

SERKET: An Architecture for Connecting Stochastic Models to Realize a Large-Scale Cognitive Model

Title SERKET: An Architecture for Connecting Stochastic Models to Realize a Large-Scale Cognitive Model
Authors Tomoaki Nakamura, Takayuki Nagai, Tadahiro Taniguchi
Abstract To realize human-like robot intelligence, a large-scale cognitive architecture is required for robots to understand the environment through a variety of sensors with which they are equipped. In this paper, we propose a novel framework named Serket that enables the construction of a large-scale generative model and its inference easily by connecting sub-modules to allow the robots to acquire various capabilities through interaction with their environments and others. We consider that large-scale cognitive models can be constructed by connecting smaller fundamental models hierarchically while maintaining their programmatic independence. Moreover, connected modules are dependent on each other, and parameters are required to be optimized as a whole. Conventionally, the equations for parameter estimation have to be derived and implemented depending on the models. However, it becomes harder to derive and implement those of a larger scale model. To solve these problems, in this paper, we propose a method for parameter estimation by communicating the minimal parameters between various modules while maintaining their programmatic independence. Therefore, Serket makes it easy to construct large-scale models and estimate their parameters via the connection of modules. Experimental results demonstrated that the model can be constructed by connecting modules, the parameters can be optimized as a whole, and they are comparable with the original models that we have proposed.
Tasks
Published 2017-12-04
URL http://arxiv.org/abs/1712.00929v3
PDF http://arxiv.org/pdf/1712.00929v3.pdf
PWC https://paperswithcode.com/paper/serket-an-architecture-for-connecting
Repo https://github.com/naka-lab/Serket
Framework none

Acronym Disambiguation: A Domain Independent Approach

Title Acronym Disambiguation: A Domain Independent Approach
Authors Aditya Thakker, Suhail Barot, Sudhir Bagul
Abstract Acronyms are omnipresent. They usually express information that is repetitive and well known. But acronyms can also be ambiguous because there can be multiple expansions for the same acronym. In this paper, we propose a general system for acronym disambiguation that can work on any acronym given some context information. We present methods for retrieving all the possible expansions of an acronym from Wikipedia and AcronymsFinder.com. We propose to use these expansions to collect all possible contexts in which these acronyms are used and then score them using a paragraph embedding technique called Doc2Vec. This method collectively led to achieving an accuracy of 90.9% in selecting the correct expansion for given acronym, on a dataset we scraped from Wikipedia with 707 distinct acronyms and 14,876 disambiguations.
Tasks
Published 2017-11-25
URL http://arxiv.org/abs/1711.09271v3
PDF http://arxiv.org/pdf/1711.09271v3.pdf
PWC https://paperswithcode.com/paper/acronym-disambiguation-a-domain-independent
Repo https://github.com/adityathakker/AcronymExpansion
Framework none

Neural Models for Key Phrase Detection and Question Generation

Title Neural Models for Key Phrase Detection and Question Generation
Authors Sandeep Subramanian, Tong Wang, Xingdi Yuan, Saizheng Zhang, Yoshua Bengio, Adam Trischler
Abstract We propose a two-stage neural model to tackle question generation from documents. First, our model estimates the probability that word sequences in a document are ones that a human would pick when selecting candidate answers by training a neural key-phrase extractor on the answers in a question-answering corpus. Predicted key phrases then act as target answers and condition a sequence-to-sequence question-generation model with a copy mechanism. Empirically, our key-phrase extraction model significantly outperforms an entity-tagging baseline and existing rule-based approaches. We further demonstrate that our question generation system formulates fluent, answerable questions from key phrases. This two-stage system could be used to augment or generate reading comprehension datasets, which may be leveraged to improve machine reading systems or in educational settings.
Tasks Question Answering, Question Generation, Reading Comprehension
Published 2017-06-14
URL http://arxiv.org/abs/1706.04560v3
PDF http://arxiv.org/pdf/1706.04560v3.pdf
PWC https://paperswithcode.com/paper/neural-models-for-key-phrase-detection-and
Repo https://github.com/partoftheorigin/100DaysOfMLCode
Framework tf

Soft Weight-Sharing for Neural Network Compression

Title Soft Weight-Sharing for Neural Network Compression
Authors Karen Ullrich, Edward Meeds, Max Welling
Abstract The success of deep learning in numerous application domains created the de- sire to run and train them on mobile devices. This however, conflicts with their computationally, memory and energy intense nature, leading to a growing interest in compression. Recent work by Han et al. (2015a) propose a pipeline that involves retraining, pruning and quantization of neural network weights, obtaining state-of-the-art compression rates. In this paper, we show that competitive compression rates can be achieved by using a version of soft weight-sharing (Nowlan & Hinton, 1992). Our method achieves both quantization and pruning in one simple (re-)training procedure. This point of view also exposes the relation between compression and the minimum description length (MDL) principle.
Tasks Neural Network Compression, Quantization
Published 2017-02-13
URL http://arxiv.org/abs/1702.04008v2
PDF http://arxiv.org/pdf/1702.04008v2.pdf
PWC https://paperswithcode.com/paper/soft-weight-sharing-for-neural-network
Repo https://github.com/akashrajkn/waffles-and-posteriors
Framework tf

ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression

Title ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
Authors Jian-Hao Luo, Jianxin Wu, Weiyao Lin
Abstract We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages. We focus on the filter level pruning, i.e., the whole filter would be discarded if it is less important. Our method does not change the original network structure, thus it can be perfectly supported by any off-the-shelf deep learning libraries. We formally establish filter pruning as an optimization problem, and reveal that we need to prune filters based on statistics information computed from its next layer, not the current layer, which differentiates ThiNet from existing methods. Experimental results demonstrate the effectiveness of this strategy, which has advanced the state-of-the-art. We also show the performance of ThiNet on ILSVRC-12 benchmark. ThiNet achieves 3.31$\times$ FLOPs reduction and 16.63$\times$ compression on VGG-16, with only 0.52$%$ top-5 accuracy drop. Similar experiments with ResNet-50 reveal that even for a compact network, ThiNet can also reduce more than half of the parameters and FLOPs, at the cost of roughly 1$%$ top-5 accuracy drop. Moreover, the original VGG-16 model can be further pruned into a very small model with only 5.05MB model size, preserving AlexNet level accuracy but showing much stronger generalization ability.
Tasks Neural Network Compression
Published 2017-07-20
URL http://arxiv.org/abs/1707.06342v1
PDF http://arxiv.org/pdf/1707.06342v1.pdf
PWC https://paperswithcode.com/paper/thinet-a-filter-level-pruning-method-for-deep
Repo https://github.com/Roll920/ThiNet
Framework none

StrassenNets: Deep Learning with a Multiplication Budget

Title StrassenNets: Deep Learning with a Multiplication Budget
Authors Michael Tschannen, Aran Khanna, Anima Anandkumar
Abstract A large fraction of the arithmetic operations required to evaluate deep neural networks (DNNs) consists of matrix multiplications, in both convolution and fully connected layers. We perform end-to-end learning of low-cost approximations of matrix multiplications in DNN layers by casting matrix multiplications as 2-layer sum-product networks (SPNs) (arithmetic circuits) and learning their (ternary) edge weights from data. The SPNs disentangle multiplication and addition operations and enable us to impose a budget on the number of multiplication operations. Combining our method with knowledge distillation and applying it to image classification DNNs (trained on ImageNet) and language modeling DNNs (using LSTMs), we obtain a first-of-a-kind reduction in number of multiplications (over 99.5%) while maintaining the predictive performance of the full-precision models. Finally, we demonstrate that the proposed framework is able to rediscover Strassen’s matrix multiplication algorithm, learning to multiply $2 \times 2$ matrices using only 7 multiplications instead of 8.
Tasks Image Classification, Language Modelling, Model Compression, Neural Network Compression
Published 2017-12-11
URL http://arxiv.org/abs/1712.03942v3
PDF http://arxiv.org/pdf/1712.03942v3.pdf
PWC https://paperswithcode.com/paper/strassennets-deep-learning-with-a
Repo https://github.com/mitscha/strassennets
Framework mxnet

Graphical posterior predictive classifier: Bayesian model averaging with particle Gibbs

Title Graphical posterior predictive classifier: Bayesian model averaging with particle Gibbs
Authors Tatjana Pavlenko, Felix Leopoldo Rios
Abstract In this study, we present a multi-class graphical Bayesian predictive classifier that incorporates the uncertainty in the model selection into the standard Bayesian formalism. For each class, the dependence structure underlying the observed features is represented by a set of decomposable Gaussian graphical models. Emphasis is then placed on the Bayesian model averaging which takes full account of the class-specific model uncertainty by averaging over the posterior graph model probabilities. An explicit evaluation of the model probabilities is well known to be infeasible. To address this issue, we consider the particle Gibbs strategy of Olsson et al. (2018b) for posterior sampling from decomposable graphical models which utilizes the Christmas tree algorithm of Olsson et al. (2018a) as proposal kernel. We also derive a strong hyper Markov law which we call the hyper normal Wishart law that allow to perform the resultant Bayesian calculations locally. The proposed predictive graphical classifier reveals superior performance compared to the ordinary Bayesian predictive rule that does not account for the model uncertainty, as well as to a number of out-of-the-box classifiers.
Tasks Model Selection
Published 2017-07-21
URL http://arxiv.org/abs/1707.06792v4
PDF http://arxiv.org/pdf/1707.06792v4.pdf
PWC https://paperswithcode.com/paper/graphical-posterior-predictive-classifier
Repo https://github.com/felixleopoldo/trilearn
Framework none
comments powered by Disqus