October 20, 2019

3300 words 16 mins read

Paper Group AWR 226

Knowledge Distillation by On-the-Fly Native Ensemble. Computer vision-based framework for extracting geological lineaments from optical remote sensing data. Noise Contrastive Priors for Functional Uncertainty. Band selection with Higher Order Multivariate Cumulants for small target detection in hyperspectral images. Learning Multilingual Word Embed …

Knowledge Distillation by On-the-Fly Native Ensemble


Title	Knowledge Distillation by On-the-Fly Native Ensemble
Authors	Xu Lan, Xiatian Zhu, Shaogang Gong
Abstract	Knowledge distillation is effective to train small and generalisable network models for meeting the low-memory and fast running requirements. Existing offline distillation methods rely on a strong pre-trained teacher, which enables favourable knowledge discovery and transfer but requires a complex two-phase training procedure. Online counterparts address this limitation at the price of lacking a highcapacity teacher. In this work, we present an On-the-fly Native Ensemble (ONE) strategy for one-stage online distillation. Specifically, ONE trains only a single multi-branch network while simultaneously establishing a strong teacher on-the- fly to enhance the learning of target network. Extensive evaluations show that ONE improves the generalisation performance a variety of deep neural networks more significantly than alternative methods on four image classification dataset: CIFAR10, CIFAR100, SVHN, and ImageNet, whilst having the computational efficiency advantages.
Tasks	Image Classification
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04606v2
PDF	http://arxiv.org/pdf/1806.04606v2.pdf
PWC	https://paperswithcode.com/paper/knowledge-distillation-by-on-the-fly-native
Repo	https://github.com/Lan1991Xu/ONE_NeurIPS2018
Framework	pytorch

Computer vision-based framework for extracting geological lineaments from optical remote sensing data


Title	Computer vision-based framework for extracting geological lineaments from optical remote sensing data
Authors	Ehsan Farahbakhsh, Rohitash Chandra, Hugo K. H. Olierook, Richard Scalzo, Chris Clark, Steven M. Reddy, R. Dietmar Muller
Abstract	The extraction of geological lineaments from digital satellite data is a fundamental application in remote sensing. The location of geological lineaments such as faults and dykes are of interest for a range of applications, particularly because of their association with hydrothermal mineralization. Although a wide range of applications have utilized computer vision techniques, a standard workflow for application of these techniques to mineral exploration is lacking. We present a framework for extracting geological lineaments using computer vision techniques which is a combination of edge detection and line extraction algorithms for extracting geological lineaments using optical remote sensing data. It features ancillary computer vision techniques for reducing data dimensionality, removing noise and enhancing the expression of lineaments. We test the proposed framework on Landsat 8 data of a mineral-rich portion of the Gascoyne Province in Western Australia using different dimension reduction techniques and convolutional filters. To validate the results, the extracted lineaments are compared to our manual photointerpretation and geologically mapped structures by the Geological Survey of Western Australia (GSWA). The results show that the best correlation between our extracted geological lineaments and the GSWA geological lineament map is achieved by applying a minimum noise fraction transformation and a Laplacian filter. Application of a directional filter instead shows a stronger correlation with the output of our manual photointerpretation and known sites of hydrothermal mineralization. Hence, our framework using either filter can be used for mineral prospectivity mapping in other regions where faults are exposed and observable in optical remote sensing data.
Tasks	Dimensionality Reduction, Edge Detection
Published	2018-10-04
URL	http://arxiv.org/abs/1810.02320v1
PDF	http://arxiv.org/pdf/1810.02320v1.pdf
PWC	https://paperswithcode.com/paper/computer-vision-based-framework-for
Repo	https://github.com/intelligent-exploration/IP_MinEx
Framework	none

Noise Contrastive Priors for Functional Uncertainty


Title	Noise Contrastive Priors for Functional Uncertainty
Authors	Danijar Hafner, Dustin Tran, Timothy Lillicrap, Alex Irpan, James Davidson
Abstract	Obtaining reliable uncertainty estimates of neural network predictions is a long standing challenge. Bayesian neural networks have been proposed as a solution, but it remains open how to specify their prior. In particular, the common practice of an independent normal prior in weight space imposes relatively weak constraints on the function posterior, allowing it to generalize in unforeseen ways on inputs outside of the training distribution. We propose noise contrastive priors (NCPs) to obtain reliable uncertainty estimates. The key idea is to train the model to output high uncertainty for data points outside of the training distribution. NCPs do so using an input prior, which adds noise to the inputs of the current mini batch, and an output prior, which is a wide distribution given these inputs. NCPs are compatible with any model that can output uncertainty estimates, are easy to scale, and yield reliable uncertainty estimates throughout training. Empirically, we show that NCPs prevent overfitting outside of the training distribution and result in uncertainty estimates that are useful for active learning. We demonstrate the scalability of our method on the flight delays data set, where we significantly improve upon previously published results.
Tasks	Active Learning
Published	2018-07-24
URL	https://arxiv.org/abs/1807.09289v3
PDF	https://arxiv.org/pdf/1807.09289v3.pdf
PWC	https://paperswithcode.com/paper/reliable-uncertainty-estimates-in-deep-neural
Repo	https://github.com/brain-research/ncp
Framework	tf

Band selection with Higher Order Multivariate Cumulants for small target detection in hyperspectral images


Title	Band selection with Higher Order Multivariate Cumulants for small target detection in hyperspectral images
Authors	Przemysław Głomb, Krzysztof Domino, Michał Romaszewski, Michał Cholewa
Abstract	In the small target detection problem a pattern to be located is on the order of magnitude less numerous than other patterns present in the dataset. This applies both to the case of supervised detection, where the known template is expected to match in just a few areas and unsupervised anomaly detection, as anomalies are rare by definition. This problem is frequently related to the imaging applications, i.e. detection within the scene acquired by a camera. To maximize available data about the scene, hyperspectral cameras are used; at each pixel, they record spectral data in hundreds of narrow bands. The typical feature of hyperspectral imaging is that characteristic properties of target materials are visible in the small number of bands, where light of certain wavelength interacts with characteristic molecules. A target-independent band selection method based on statistical principles is a versatile tool for solving this problem in different practical applications. Combination of a regular background and a rare standing out anomaly will produce a distortion in the joint distribution of hyperspectral pixels. Higher Order Cumulants Tensors are a natural `window’ into this distribution, allowing to measure properties and suggest candidate bands for removal. While there have been attempts at producing band selection algorithms based on the 3 rd cumulant’s tensor i.e. the joint skewness, the literature lacks a systematic analysis of how the order of the cumulant tensor used affects effectiveness of band selection in detection applications. In this paper we present an analysis of a general algorithm for band selection based on higher order cumulants. We discuss its usability related to the observed breaking points in performance, depending both on method order and the desired number of bands. Finally we perform experiments and evaluate these methods in a hyperspectral detection scenario. \|
Tasks	Anomaly Detection, Unsupervised Anomaly Detection
Published	2018-08-10
URL	http://arxiv.org/abs/1808.03513v1
PDF	http://arxiv.org/pdf/1808.03513v1.pdf
PWC	https://paperswithcode.com/paper/band-selection-with-higher-order-multivariate
Repo	https://github.com/ZKSI/CumulantsFeatures.jl
Framework	none

Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach


Title	Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach
Authors	Pratik Jawanpuria, Arjun Balgovind, Anoop Kunchukuttan, Bamdev Mishra
Abstract	We propose a novel geometric approach for learning bilingual mappings given monolingual embeddings and a bilingual dictionary. Our approach decouples learning the transformation from the source language to the target language into (a) learning rotations for language-specific embeddings to align them to a common space, and (b) learning a similarity metric in the common space to model similarities between the embeddings. We model the bilingual mapping problem as an optimization problem on smooth Riemannian manifolds. We show that our approach outperforms previous approaches on the bilingual lexicon induction and cross-lingual word similarity tasks. We also generalize our framework to represent multiple languages in a common latent space. In particular, the latent space representations for several languages are learned jointly, given bilingual dictionaries for multiple language pairs. We illustrate the effectiveness of joint learning for multiple languages in zero-shot word translation setting. Our implementation is available at https://github.com/anoopkunchukuttan/geomm .
Tasks	Multilingual Word Embeddings, Word Embeddings
Published	2018-08-27
URL	http://arxiv.org/abs/1808.08773v3
PDF	http://arxiv.org/pdf/1808.08773v3.pdf
PWC	https://paperswithcode.com/paper/learning-multilingual-word-embeddings-in
Repo	https://github.com/anoopkunchukuttan/geomm
Framework	tf

Deep Structured Prediction with Nonlinear Output Transformations


Title	Deep Structured Prediction with Nonlinear Output Transformations
Authors	Colin Graber, Ofer Meshi, Alexander Schwing
Abstract	Deep structured models are widely used for tasks like semantic segmentation, where explicit correlations between variables provide important prior information which generally helps to reduce the data needs of deep nets. However, current deep structured models are restricted by oftentimes very local neighborhood structure, which cannot be increased for computational complexity reasons, and by the fact that the output configuration, or a representation thereof, cannot be transformed further. Very recent approaches which address those issues include graphical model inference inside deep nets so as to permit subsequent non-linear output space transformations. However, optimization of those formulations is challenging and not well understood. Here, we develop a novel model which generalizes existing approaches, such as structured prediction energy networks, and discuss a formulation which maintains applicability of existing inference techniques.
Tasks	Semantic Segmentation, Structured Prediction
Published	2018-11-01
URL	http://arxiv.org/abs/1811.00539v1
PDF	http://arxiv.org/pdf/1811.00539v1.pdf
PWC	https://paperswithcode.com/paper/deep-structured-prediction-with-nonlinear
Repo	https://github.com/cgraber/NLStruct
Framework	pytorch

Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning


Title	Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning
Authors	Noah Frazier-Logue, Stephen José Hanson
Abstract	Multi-layer neural networks have lead to remarkable performance on many kinds of benchmark tasks in text, speech and image processing. Nonlinear parameter estimation in hierarchical models is known to be subject to overfitting and misspecification. One approach to these estimation and related problems (local minima, colinearity, feature discovery etc.) is called Dropout (Hinton, et al 2012, Baldi et al 2016). The Dropout algorithm removes hidden units according to a Bernoulli random variable with probability $p$ prior to each update, creating random “shocks” to the network that are averaged over updates. In this paper we will show that Dropout is a special case of a more general model published originally in 1990 called the Stochastic Delta Rule, or SDR (Hanson, 1990). SDR redefines each weight in the network as a random variable with mean $\mu_{w_{ij}}$ and standard deviation $\sigma_{w_{ij}}$. Each weight random variable is sampled on each forward activation, consequently creating an exponential number of potential networks with shared weights. Both parameters are updated according to prediction error, thus resulting in weight noise injections that reflect a local history of prediction error and local model averaging. SDR therefore implements a more sensitive local gradient-dependent simulated annealing per weight converging in the limit to a Bayes optimal network. Tests on standard benchmarks (CIFAR) using a modified version of DenseNet shows the SDR outperforms standard Dropout in test error by approx. $17%$ with DenseNet-BC 250 on CIFAR-100 and approx. $12-14%$ in smaller networks. We also show that SDR reaches the same accuracy that Dropout attains in 100 epochs in as few as 35 epochs.
Tasks
Published	2018-08-10
URL	http://arxiv.org/abs/1808.03578v2
PDF	http://arxiv.org/pdf/1808.03578v2.pdf
PWC	https://paperswithcode.com/paper/dropout-is-a-special-case-of-the-stochastic
Repo	https://github.com/noahfl/densenet-sdr
Framework	tf

Efficient semantic image segmentation with superpixel pooling


Title	Efficient semantic image segmentation with superpixel pooling
Authors	Mathijs Schuurmans, Maxim Berman, Matthew B. Blaschko
Abstract	In this work, we evaluate the use of superpixel pooling layers in deep network architectures for semantic segmentation. Superpixel pooling is a flexible and efficient replacement for other pooling strategies that incorporates spatial prior information. We propose a simple and efficient GPU-implementation of the layer and explore several designs for the integration of the layer into existing network architectures. We provide experimental results on the IBSR and Cityscapes dataset, demonstrating that superpixel pooling can be leveraged to consistently increase network accuracy with minimal computational overhead. Source code is available at https://github.com/bermanmaxim/superpixPool
Tasks	Semantic Segmentation
Published	2018-06-07
URL	http://arxiv.org/abs/1806.02705v1
PDF	http://arxiv.org/pdf/1806.02705v1.pdf
PWC	https://paperswithcode.com/paper/efficient-semantic-image-segmentation-with
Repo	https://github.com/bermanmaxim/superpixPool
Framework	pytorch

DeepNeuro: an open-source deep learning toolbox for neuroimaging


Title	DeepNeuro: an open-source deep learning toolbox for neuroimaging
Authors	Andrew Beers, James Brown, Ken Chang, Katharina Hoebel, Elizabeth Gerstner, Bruce Rosen, Jayashree Kalpathy-Cramer
Abstract	Translating neural networks from theory to clinical practice has unique challenges, specifically in the field of neuroimaging. In this paper, we present DeepNeuro, a deep learning framework that is best-suited to putting deep learning algorithms for neuroimaging in practical usage with a minimum of friction. We show how this framework can be used to both design and train neural network architectures, as well as modify state-of-the-art architectures in a flexible and intuitive way. We display the pre- and postprocessing functions common in the medical imaging community that DeepNeuro offers to ensure consistent performance of networks across variable users, institutions, and scanners. And we show how pipelines created in DeepNeuro can be concisely packaged into shareable Docker containers and command-line interfaces using DeepNeuro’s pipeline resources.
Tasks
Published	2018-08-14
URL	http://arxiv.org/abs/1808.04589v1
PDF	http://arxiv.org/pdf/1808.04589v1.pdf
PWC	https://paperswithcode.com/paper/deepneuro-an-open-source-deep-learning
Repo	https://github.com/QTIM-Lab/DeepRad
Framework	none

Variational Disparity Estimation Framework for Plenoptic Image


Title	Variational Disparity Estimation Framework for Plenoptic Image
Authors	Trung-Hieu Tran, Zhe Wang, Sven Simon
Abstract	This paper presents a computational framework for accurately estimating the disparity map of plenoptic images. The proposed framework is based on the variational principle and provides intrinsic sub-pixel precision. The light-field motion tensor introduced in the framework allows us to combine advanced robust data terms as well as provides explicit treatments for different color channels. A warping strategy is embedded in our framework for tackling the large displacement problem. We also show that by applying a simple regularization term and a guided median filtering, the accuracy of displacement field at occluded area could be greatly enhanced. We demonstrate the excellent performance of the proposed framework by intensive comparisons with the Lytro software and contemporary approaches on both synthetic and real-world datasets.
Tasks	Disparity Estimation
Published	2018-04-18
URL	http://arxiv.org/abs/1804.06633v1
PDF	http://arxiv.org/pdf/1804.06633v1.pdf
PWC	https://paperswithcode.com/paper/variational-disparity-estimation-framework
Repo	https://github.com/hieuttcse/variational_plenoptic_disparity_estimation
Framework	none

CBMV: A Coalesced Bidirectional Matching Volume for Disparity Estimation


Title	CBMV: A Coalesced Bidirectional Matching Volume for Disparity Estimation
Authors	Konstantinos Batsos, Changjiang Cai, Philippos Mordohai
Abstract	Recently, there has been a paradigm shift in stereo matching with learning-based methods achieving the best results on all popular benchmarks. The success of these methods is due to the availability of training data with ground truth; training learning-based systems on these datasets has allowed them to surpass the accuracy of conventional approaches based on heuristics and assumptions. Many of these assumptions, however, had been validated extensively and hold for the majority of possible inputs. In this paper, we generate a matching volume leveraging both data with ground truth and conventional wisdom. We accomplish this by coalescing diverse evidence from a bidirectional matching process via random forest classifiers. We show that the resulting matching volume estimation method achieves similar accuracy to purely data-driven alternatives on benchmarks and that it generalizes to unseen data much better. In fact, the results we submitted to the KITTI and ETH3D benchmarks were generated using a classifier trained on the Middlebury 2014 dataset.
Tasks	Disparity Estimation, Stereo Matching, Stereo Matching Hand
Published	2018-04-05
URL	http://arxiv.org/abs/1804.01967v1
PDF	http://arxiv.org/pdf/1804.01967v1.pdf
PWC	https://paperswithcode.com/paper/cbmv-a-coalesced-bidirectional-matching
Repo	https://github.com/kbatsos/CBMV
Framework	none

Learning Multimodal Graph-to-Graph Translation for Molecular Optimization


Title	Learning Multimodal Graph-to-Graph Translation for Molecular Optimization
Authors	Wengong Jin, Kevin Yang, Regina Barzilay, Tommi Jaakkola
Abstract	We view molecular optimization as a graph-to-graph translation problem. The goal is to learn to map from one molecular graph to another with better properties based on an available corpus of paired molecules. Since molecules can be optimized in different ways, there are multiple viable translations for each input graph. A key challenge is therefore to model diverse translation outputs. Our primary contributions include a junction tree encoder-decoder for learning diverse graph translations along with a novel adversarial training method for aligning distributions of molecules. Diverse output distributions in our model are explicitly realized by low-dimensional latent vectors that modulate the translation process. We evaluate our model on multiple molecular optimization tasks and show that our model outperforms previous state-of-the-art baselines.
Tasks	Graph-To-Graph Translation
Published	2018-12-03
URL	http://arxiv.org/abs/1812.01070v3
PDF	http://arxiv.org/pdf/1812.01070v3.pdf
PWC	https://paperswithcode.com/paper/learning-multimodal-graph-to-graph-1
Repo	https://github.com/kovanostra/graph-to-graph
Framework	none

Large Scale Language Modeling: Converging on 40GB of Text in Four Hours


Title	Large Scale Language Modeling: Converging on 40GB of Text in Four Hours
Authors	Raul Puri, Robert Kirby, Nikolai Yakovenko, Bryan Catanzaro
Abstract	Recent work has shown how to train Convolutional Neural Networks (CNNs) rapidly on large image datasets, then transfer the knowledge gained from these models to a variety of tasks. Following [Radford 2017], in this work, we demonstrate similar scalability and transfer for Recurrent Neural Networks (RNNs) for Natural Language tasks. By utilizing mixed precision arithmetic and a 32k batch size distributed across 128 NVIDIA Tesla V100 GPUs, we are able to train a character-level 4096-dimension multiplicative LSTM (mLSTM) for unsupervised text reconstruction over 3 epochs of the 40 GB Amazon Reviews dataset in four hours. This runtime compares favorably with previous work taking one month to train the same size and configuration for one epoch over the same dataset. Converging large batch RNN models can be challenging. Recent work has suggested scaling the learning rate as a function of batch size, but we find that simply scaling the learning rate as a function of batch size leads either to significantly worse convergence or immediate divergence for this problem. We provide a learning rate schedule that allows our model to converge with a 32k batch size. Since our model converges over the Amazon Reviews dataset in hours, and our compute requirement of 128 Tesla V100 GPUs, while substantial, is commercially available, this work opens up large scale unsupervised NLP training to most commercial applications and deep learning researchers. A model can be trained over most public or private text datasets overnight.
Tasks	Language Modelling
Published	2018-08-03
URL	http://arxiv.org/abs/1808.01371v2
PDF	http://arxiv.org/pdf/1808.01371v2.pdf
PWC	https://paperswithcode.com/paper/large-scale-language-modeling-converging-on
Repo	https://github.com/NVIDIA/sentiment-discovery
Framework	pytorch

Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards


Title	Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards
Authors	Daniel McDuff, Ashish Kapoor
Abstract	As people learn to navigate the world, autonomic nervous system (e.g., “fight or flight”) responses provide intrinsic feedback about the potential consequence of action choices (e.g., becoming nervous when close to a cliff edge or driving fast around a bend.) Physiological changes are correlated with these biological preparations to protect one-self from danger. We present a novel approach to reinforcement learning that leverages a task-independent intrinsic reward function trained on peripheral pulse measurements that are correlated with human autonomic nervous system responses. Our hypothesis is that such reward functions can circumvent the challenges associated with sparse and skewed rewards in reinforcement learning settings and can help improve sample efficiency. We test this in a simulated driving environment and show that it can increase the speed of learning and reduce the number of collisions during the learning stage.
Tasks
Published	2018-05-25
URL	http://arxiv.org/abs/1805.09975v2
PDF	http://arxiv.org/pdf/1805.09975v2.pdf
PWC	https://paperswithcode.com/paper/visceral-machines-reinforcement-learning-with
Repo	https://github.com/microsoft/affectbased
Framework	tf

Applications of Deep Learning to Nuclear Fusion Research


Title	Applications of Deep Learning to Nuclear Fusion Research
Authors	Diogo R. Ferreira on behalf of JET Contributors
Abstract	Nuclear fusion is the process that powers the sun, and it is one of the best hopes to achieve a virtually unlimited energy source for the future of humanity. However, reproducing sustainable nuclear fusion reactions here on Earth is a tremendous scientific and technical challenge. Special devices – called tokamaks – have been built around the world, with JET (Joint European Torus, in the UK) being the largest tokamak currently in operation. Such devices confine matter and heat it up to extremely high temperatures, creating a plasma where fusion reactions begin to occur. JET has over one hundred diagnostic systems to monitor what happens inside the plasma, and each 30-second experiment (or pulse) generates about 50 GB of data. In this work, we show how convolutional neural networks (CNNs) can be used to reconstruct the 2D plasma profile inside the device based on data coming from those diagnostics. We also discuss how recurrent neural networks (RNNs) can be used to predict plasma disruptions, which are one of the major problems affecting tokamaks today. Training of such networks is done on NVIDIA GPUs.
Tasks
Published	2018-11-01
URL	http://arxiv.org/abs/1811.00333v1
PDF	http://arxiv.org/pdf/1811.00333v1.pdf
PWC	https://paperswithcode.com/paper/applications-of-deep-learning-to-nuclear
Repo	https://github.com/Veloc1tyE/Drift
Framework	pytorch