July 29, 2019

3067 words 15 mins read

Paper Group AWR 197

What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?. Solving ill-posed inverse problems using iterative deep neural networks. The Reversible Residual Network: Backpropagation Without Storing Activations. Variational Deep Q Network. WaterGAN: Unsupervised Generative Network to Enable Real-time Color Correction of Mono …

What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?


Title	What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?
Authors	Marc Tanti, Albert Gatt, Kenneth P. Camilleri
Abstract	In neural image captioning systems, a recurrent neural network (RNN) is typically viewed as the primary `generation' component. This view suggests that the image features should be` injected’ into the RNN. This is in fact the dominant view in the literature. Alternatively, the RNN can instead be viewed as only encoding the previously generated words. This view suggests that the RNN should only be used to encode linguistic features and that only the final representation should be `merged’ with the image features at a later stage. This paper compares these two architectures. We find that, in general, late merging outperforms injection, suggesting that RNNs are better viewed as encoders, rather than generators. \|
Tasks	Image Captioning
Published	2017-08-07
URL	http://arxiv.org/abs/1708.02043v2
PDF	http://arxiv.org/pdf/1708.02043v2.pdf
PWC	https://paperswithcode.com/paper/what-is-the-role-of-recurrent-neural-networks
Repo	https://github.com/mrityunjayojha10/Image-Caption-Generation
Framework	none

Solving ill-posed inverse problems using iterative deep neural networks


Title	Solving ill-posed inverse problems using iterative deep neural networks
Authors	Jonas Adler, Ozan Öktem
Abstract	We propose a partially learned approach for the solution of ill posed inverse problems with not necessarily linear forward operators. The method builds on ideas from classical regularization theory and recent advances in deep learning to perform learning while making use of prior information about the inverse problem encoded in the forward operator, noise model and a regularizing functional. The method results in a gradient-like iterative scheme, where the “gradient” component is learned using a convolutional network that includes the gradients of the data discrepancy and regularizer as input in each iteration. We present results of such a partially learned gradient scheme on a non-linear tomographic inversion problem with simulated data from both the Sheep-Logan phantom as well as a head CT. The outcome is compared against FBP and TV reconstruction and the proposed method provides a 5.4 dB PSNR improvement over the TV reconstruction while being significantly faster, giving reconstructions of 512 x 512 volumes in about 0.4 seconds using a single GPU.
Tasks
Published	2017-04-13
URL	http://arxiv.org/abs/1704.04058v2
PDF	http://arxiv.org/pdf/1704.04058v2.pdf
PWC	https://paperswithcode.com/paper/solving-ill-posed-inverse-problems-using
Repo	https://github.com/adler-j/learned_gradient_tomography
Framework	tf

The Reversible Residual Network: Backpropagation Without Storing Activations


Title	The Reversible Residual Network: Backpropagation Without Storing Activations
Authors	Aidan N. Gomez, Mengye Ren, Raquel Urtasun, Roger B. Grosse
Abstract	Deep residual networks (ResNets) have significantly pushed forward the state-of-the-art on image classification, increasing in performance as networks grow both deeper and wider. However, memory consumption becomes a bottleneck, as one needs to store the activations in order to calculate gradients using backpropagation. We present the Reversible Residual Network (RevNet), a variant of ResNets where each layer’s activations can be reconstructed exactly from the next layer’s. Therefore, the activations for most layers need not be stored in memory during backpropagation. We demonstrate the effectiveness of RevNets on CIFAR-10, CIFAR-100, and ImageNet, establishing nearly identical classification accuracy to equally-sized ResNets, even though the activation storage requirements are independent of depth.
Tasks	Image Classification
Published	2017-07-14
URL	http://arxiv.org/abs/1707.04585v1
PDF	http://arxiv.org/pdf/1707.04585v1.pdf
PWC	https://paperswithcode.com/paper/the-reversible-residual-network
Repo	https://github.com/renmengye/revnet-public
Framework	tf

Variational Deep Q Network


Title	Variational Deep Q Network
Authors	Yunhao Tang, Alp Kucukelbir
Abstract	We propose a framework that directly tackles the probability distribution of the value function parameters in Deep Q Network (DQN), with powerful variational inference subroutines to approximate the posterior of the parameters. We will establish the equivalence between our proposed surrogate objective and variational inference loss. Our new algorithm achieves efficient exploration and performs well on large scale chain Markov Decision Process (MDP).
Tasks	Efficient Exploration
Published	2017-11-30
URL	http://arxiv.org/abs/1711.11225v1
PDF	http://arxiv.org/pdf/1711.11225v1.pdf
PWC	https://paperswithcode.com/paper/variational-deep-q-network
Repo	https://github.com/HarriBellThomas/VDQN
Framework	none

WaterGAN: Unsupervised Generative Network to Enable Real-time Color Correction of Monocular Underwater Images


Title	WaterGAN: Unsupervised Generative Network to Enable Real-time Color Correction of Monocular Underwater Images
Authors	Jie Li, Katherine A. Skinner, Ryan M. Eustice, Matthew Johnson-Roberson
Abstract	This paper reports on WaterGAN, a generative adversarial network (GAN) for generating realistic underwater images from in-air image and depth pairings in an unsupervised pipeline used for color correction of monocular underwater images. Cameras onboard autonomous and remotely operated vehicles can capture high resolution images to map the seafloor, however, underwater image formation is subject to the complex process of light propagation through the water column. The raw images retrieved are characteristically different than images taken in air due to effects such as absorption and scattering, which cause attenuation of light at different rates for different wavelengths. While this physical process is well described theoretically, the model depends on many parameters intrinsic to the water column as well as the objects in the scene. These factors make recovery of these parameters difficult without simplifying assumptions or field calibration, hence, restoration of underwater images is a non-trivial problem. Deep learning has demonstrated great success in modeling complex nonlinear systems but requires a large amount of training data, which is difficult to compile in deep sea environments. Using WaterGAN, we generate a large training dataset of paired imagery, both raw underwater and true color in-air, as well as depth data. This data serves as input to a novel end-to-end network for color correction of monocular underwater images. Due to the depth-dependent water column effects inherent to underwater environments, we show that our end-to-end network implicitly learns a coarse depth estimate of the underwater scene from monocular underwater images. Our proposed pipeline is validated with testing on real data collected from both a pure water tank and from underwater surveys in field testing. Source code is made publicly available with sample datasets and pretrained models.
Tasks	Calibration
Published	2017-02-23
URL	http://arxiv.org/abs/1702.07392v3
PDF	http://arxiv.org/pdf/1702.07392v3.pdf
PWC	https://paperswithcode.com/paper/watergan-unsupervised-generative-network-to
Repo	https://github.com/kskin/WaterGAN
Framework	tf

Bayesian Approaches to Distribution Regression


Title	Bayesian Approaches to Distribution Regression
Authors	Ho Chung Leon Law, Dougal J. Sutherland, Dino Sejdinovic, Seth Flaxman
Abstract	Distribution regression has recently attracted much interest as a generic solution to the problem of supervised learning where labels are available at the group level, rather than at the individual level. Current approaches, however, do not propagate the uncertainty in observations due to sampling variability in the groups. This effectively assumes that small and large groups are estimated equally well, and should have equal weight in the final regression. We account for this uncertainty with a Bayesian distribution regression formalism, improving the robustness and performance of the model when group sizes vary. We frame our models in a neural network style, allowing for simple MAP inference using backpropagation to learn the parameters, as well as MCMC-based inference which can fully propagate uncertainty. We demonstrate our approach on illustrative toy datasets, as well as on a challenging problem of predicting age from images.
Tasks
Published	2017-05-11
URL	http://arxiv.org/abs/1705.04293v3
PDF	http://arxiv.org/pdf/1705.04293v3.pdf
PWC	https://paperswithcode.com/paper/bayesian-approaches-to-distribution
Repo	https://github.com/hcllaw/bdr
Framework	tf

Deep Character-Level Click-Through Rate Prediction for Sponsored Search


Title	Deep Character-Level Click-Through Rate Prediction for Sponsored Search
Authors	Bora Edizel, Amin Mantrach, Xiao Bai
Abstract	Predicting the click-through rate of an advertisement is a critical component of online advertising platforms. In sponsored search, the click-through rate estimates the probability that a displayed advertisement is clicked by a user after she submits a query to the search engine. Commercial search engines typically rely on machine learning models trained with a large number of features to make such predictions. This is inevitably requires a lot of engineering efforts to define, compute, and select the appropriate features. In this paper, we propose two novel approaches (one working at character level and the other working at word level) that use deep convolutional neural networks to predict the click-through rate of a query-advertisement pair. Specially, the proposed architectures only consider the textual content appearing in a query-advertisement pair as input, and produce as output a click-through rate prediction. By comparing the character-level model with the word-level model, we show that language representation can be learnt from scratch at character level when trained on enough data. Through extensive experiments using billions of query-advertisement pairs of a popular commercial search engine, we demonstrate that both approaches significantly outperform a baseline model built on well-selected text features and a state-of-the-art word2vec-based approach. Finally, by combining the predictions of the deep models introduced in this study with the prediction of the model in production of the same commercial search engine, we significantly improve the accuracy and the calibration of the click-through rate prediction of the production system.
Tasks	Calibration, Click-Through Rate Prediction
Published	2017-07-07
URL	http://arxiv.org/abs/1707.02158v1
PDF	http://arxiv.org/pdf/1707.02158v1.pdf
PWC	https://paperswithcode.com/paper/deep-character-level-click-through-rate
Repo	https://github.com/yrbahn/deep_match_ctr_prediction
Framework	tf

Meta Networks for Neural Style Transfer


Title	Meta Networks for Neural Style Transfer
Authors	Falong Shen, Shuicheng Yan, Gang Zeng
Abstract	In this paper we propose a new method to get the specified network parameters through one time feed-forward propagation of the meta networks and explore the application to neural style transfer. Recent works on style transfer typically need to train image transformation networks for every new style, and the style is encoded in the network parameters by enormous iterations of stochastic gradient descent. To tackle these issues, we build a meta network which takes in the style image and produces a corresponding image transformations network directly. Compared with optimization-based methods for every style, our meta networks can handle an arbitrary new style within $19ms$ seconds on one modern GPU card. The fast image transformation network generated by our meta network is only 449KB, which is capable of real-time executing on a mobile device. We also investigate the manifold of the style transfer networks by operating the hidden features from meta networks. Experiments have well validated the effectiveness of our method. Code and trained models has been released https://github.com/FalongShen/styletransfer.
Tasks	Style Transfer
Published	2017-09-13
URL	http://arxiv.org/abs/1709.04111v1
PDF	http://arxiv.org/pdf/1709.04111v1.pdf
PWC	https://paperswithcode.com/paper/meta-networks-for-neural-style-transfer
Repo	https://github.com/FalongShen/styletransfer
Framework	pytorch

Snapshot Ensembles: Train 1, get M for free


Title	Snapshot Ensembles: Train 1, get M for free
Authors	Gao Huang, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E. Hopcroft, Kilian Q. Weinberger
Abstract	Ensembles of neural networks are known to be much more robust and accurate than individual networks. However, training multiple deep networks for model averaging is computationally expensive. In this paper, we propose a method to obtain the seemingly contradictory goal of ensembling multiple neural networks at no additional training cost. We achieve this goal by training a single neural network, converging to several local minima along its optimization path and saving the model parameters. To obtain repeated rapid convergence, we leverage recent work on cyclic learning rate schedules. The resulting technique, which we refer to as Snapshot Ensembling, is simple, yet surprisingly effective. We show in a series of experiments that our approach is compatible with diverse network architectures and learning tasks. It consistently yields lower error rates than state-of-the-art single models at no additional training cost, and compares favorably with traditional network ensembles. On CIFAR-10 and CIFAR-100 our DenseNet Snapshot Ensembles obtain error rates of 3.4% and 17.4% respectively.
Tasks
Published	2017-04-01
URL	http://arxiv.org/abs/1704.00109v1
PDF	http://arxiv.org/pdf/1704.00109v1.pdf
PWC	https://paperswithcode.com/paper/snapshot-ensembles-train-1-get-m-for-free
Repo	https://github.com/gaohuang/SnapshotEnsemble
Framework	torch

Distilling a Neural Network Into a Soft Decision Tree


Title	Distilling a Neural Network Into a Soft Decision Tree
Authors	Nicholas Frosst, Geoffrey Hinton
Abstract	Deep neural networks have proved to be a very effective way to perform classification tasks. They excel when the input data is high dimensional, the relationship between the input and the output is complicated, and the number of labeled training examples is large. But it is hard to explain why a learned network makes a particular classification decision on a particular test case. This is due to their reliance on distributed hierarchical representations. If we could take the knowledge acquired by the neural net and express the same knowledge in a model that relies on hierarchical decisions instead, explaining a particular decision would be much easier. We describe a way of using a trained neural net to create a type of soft decision tree that generalizes better than one learned directly from the training data.
Tasks
Published	2017-11-27
URL	http://arxiv.org/abs/1711.09784v1
PDF	http://arxiv.org/pdf/1711.09784v1.pdf
PWC	https://paperswithcode.com/paper/distilling-a-neural-network-into-a-soft
Repo	https://github.com/aclarkData/MachineLearningInterpretability
Framework	none

Noisy Natural Gradient as Variational Inference


Title	Noisy Natural Gradient as Variational Inference
Authors	Guodong Zhang, Shengyang Sun, David Duvenaud, Roger Grosse
Abstract	Variational Bayesian neural nets combine the flexibility of deep learning with Bayesian uncertainty estimation. Unfortunately, there is a tradeoff between cheap but simple variational families (e.g.~fully factorized) or expensive and complicated inference procedures. We show that natural gradient ascent with adaptive weight noise implicitly fits a variational posterior to maximize the evidence lower bound (ELBO). This insight allows us to train full-covariance, fully factorized, or matrix-variate Gaussian variational posteriors using noisy versions of natural gradient, Adam, and K-FAC, respectively, making it possible to scale up to modern-size ConvNets. On standard regression benchmarks, our noisy K-FAC algorithm makes better predictions and matches Hamiltonian Monte Carlo’s predictive variances better than existing methods. Its improved uncertainty estimates lead to more efficient exploration in active learning, and intrinsic motivation for reinforcement learning.
Tasks	Active Learning, Efficient Exploration
Published	2017-12-06
URL	http://arxiv.org/abs/1712.02390v2
PDF	http://arxiv.org/pdf/1712.02390v2.pdf
PWC	https://paperswithcode.com/paper/noisy-natural-gradient-as-variational
Repo	https://github.com/gd-zhang/noisy-K-FAC
Framework	tf

Vector Embedding of Wikipedia Concepts and Entities


Title	Vector Embedding of Wikipedia Concepts and Entities
Authors	Ehsan Sherkat, Evangelos Milios
Abstract	Using deep learning for different machine learning tasks such as image classification and word embedding has recently gained many attentions. Its appealing performance reported across specific Natural Language Processing (NLP) tasks in comparison with other approaches is the reason for its popularity. Word embedding is the task of mapping words or phrases to a low dimensional numerical vector. In this paper, we use deep learning to embed Wikipedia Concepts and Entities. The English version of Wikipedia contains more than five million pages, which suggest its capability to cover many English Entities, Phrases, and Concepts. Each Wikipedia page is considered as a concept. Some concepts correspond to entities, such as a person’s name, an organization or a place. Contrary to word embedding, Wikipedia Concepts Embedding is not ambiguous, so there are different vectors for concepts with similar surface form but different mentions. We proposed several approaches and evaluated their performance based on Concept Analogy and Concept Similarity tasks. The results show that proposed approaches have the performance comparable and in some cases even higher than the state-of-the-art methods.
Tasks	Word Embeddings
Published	2017-02-12
URL	http://arxiv.org/abs/1702.03470v1
PDF	http://arxiv.org/pdf/1702.03470v1.pdf
PWC	https://paperswithcode.com/paper/vector-embedding-of-wikipedia-concepts-and
Repo	https://github.com/ehsansherkat/ConVec
Framework	none

Trial without Error: Towards Safe Reinforcement Learning via Human Intervention


Title	Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
Authors	William Saunders, Girish Sastry, Andreas Stuhlmueller, Owain Evans
Abstract	AI systems are increasingly applied to complex tasks that involve interaction with humans. During training, such systems are potentially dangerous, as they haven’t yet learned to avoid actions that could cause serious harm. How can an AI system explore and learn without making a single mistake that harms humans or otherwise causes serious damage? For model-free reinforcement learning, having a human “in the loop” and ready to intervene is currently the only way to prevent all catastrophes. We formalize human intervention for RL and show how to reduce the human labor required by training a supervised learner to imitate the human’s intervention decisions. We evaluate this scheme on Atari games, with a Deep RL agent being overseen by a human for four hours. When the class of catastrophes is simple, we are able to prevent all catastrophes without affecting the agent’s learning (whereas an RL baseline fails due to catastrophic forgetting). However, this scheme is less successful when catastrophes are more complex: it reduces but does not eliminate catastrophes and the supervised learner fails on adversarial examples found by the agent. Extrapolating to more challenging environments, we show that our implementation would not scale (due to the infeasible amount of human labor required). We outline extensions of the scheme that are necessary if we are to train model-free agents without a single catastrophe.
Tasks	Atari Games
Published	2017-07-17
URL	http://arxiv.org/abs/1707.05173v1
PDF	http://arxiv.org/pdf/1707.05173v1.pdf
PWC	https://paperswithcode.com/paper/trial-without-error-towards-safe
Repo	https://github.com/gsastry/human-rl
Framework	none

Yeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor


Title	Yeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor
Authors	Robin Ruede, Markus Müller, Sebastian Stüker, Alex Waibel
Abstract	Using supporting backchannel (BC) cues can make human-computer interaction more social. BCs provide a feedback from the listener to the speaker indicating to the speaker that he is still listened to. BCs can be expressed in different ways, depending on the modality of the interaction, for example as gestures or acoustic cues. In this work, we only considered acoustic cues. We are proposing an approach towards detecting BC opportunities based on acoustic input features like power and pitch. While other works in the field rely on the use of a hand-written rule set or specialized features, we made use of artificial neural networks. They are capable of deriving higher order features from input features themselves. In our setup, we first used a fully connected feed-forward network to establish an updated baseline in comparison to our previously proposed setup. We also extended this setup by the use of Long Short-Term Memory (LSTM) networks which have shown to outperform feed-forward based setups on various tasks. Our best system achieved an F1-Score of 0.37 using power and pitch features. Adding linguistic information using word2vec, the score increased to 0.39.
Tasks
Published	2017-06-02
URL	http://arxiv.org/abs/1706.01340v1
PDF	http://arxiv.org/pdf/1706.01340v1.pdf
PWC	https://paperswithcode.com/paper/yeah-right-uh-huh-a-deep-learning-backchannel
Repo	https://github.com/phiresky/backchannel-prediction
Framework	none

Learning Spread-out Local Feature Descriptors


Title	Learning Spread-out Local Feature Descriptors
Authors	Xu Zhang, Felix X. Yu, Sanjiv Kumar, Shih-Fu Chang
Abstract	We propose a simple, yet powerful regularization technique that can be used to significantly improve both the pairwise and triplet losses in learning local feature descriptors. The idea is that in order to fully utilize the expressive power of the descriptor space, good local feature descriptors should be sufficiently “spread-out” over the space. In this work, we propose a regularization term to maximize the spread in feature descriptor inspired by the property of uniform distribution. We show that the proposed regularization with triplet loss outperforms existing Euclidean distance based descriptor learning techniques by a large margin. As an extension, the proposed regularization technique can also be used to improve image-level deep feature embedding.
Tasks
Published	2017-08-21
URL	http://arxiv.org/abs/1708.06320v1
PDF	http://arxiv.org/pdf/1708.06320v1.pdf
PWC	https://paperswithcode.com/paper/learning-spread-out-local-feature-descriptors
Repo	https://github.com/ColumbiaDVMM/Spread-out_Local_Feature_Descriptor
Framework	tf