Paper Group AWR 197
What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?. Solving ill-posed inverse problems using iterative deep neural networks. The Reversible Residual Network: Backpropagation Without Storing Activations. Variational Deep Q Network. WaterGAN: Unsupervised Generative Network to Enable Real-time Color Correction of Mono …
What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?
Title | What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator? |
Authors | Marc Tanti, Albert Gatt, Kenneth P. Camilleri |
Abstract | In neural image captioning systems, a recurrent neural network (RNN) is typically viewed as the primary generation' component. This view suggests that the image features should be injected’ into the RNN. This is in fact the dominant view in the literature. Alternatively, the RNN can instead be viewed as only encoding the previously generated words. This view suggests that the RNN should only be used to encode linguistic features and that only the final representation should be `merged’ with the image features at a later stage. This paper compares these two architectures. We find that, in general, late merging outperforms injection, suggesting that RNNs are better viewed as encoders, rather than generators. | |
Tasks | Image Captioning |
Published | 2017-08-07 |
URL | http://arxiv.org/abs/1708.02043v2 |
http://arxiv.org/pdf/1708.02043v2.pdf | |
PWC | https://paperswithcode.com/paper/what-is-the-role-of-recurrent-neural-networks |
Repo | https://github.com/mrityunjayojha10/Image-Caption-Generation |
Framework | none |
Solving ill-posed inverse problems using iterative deep neural networks
Title | Solving ill-posed inverse problems using iterative deep neural networks |
Authors | Jonas Adler, Ozan Öktem |
Abstract | We propose a partially learned approach for the solution of ill posed inverse problems with not necessarily linear forward operators. The method builds on ideas from classical regularization theory and recent advances in deep learning to perform learning while making use of prior information about the inverse problem encoded in the forward operator, noise model and a regularizing functional. The method results in a gradient-like iterative scheme, where the “gradient” component is learned using a convolutional network that includes the gradients of the data discrepancy and regularizer as input in each iteration. We present results of such a partially learned gradient scheme on a non-linear tomographic inversion problem with simulated data from both the Sheep-Logan phantom as well as a head CT. The outcome is compared against FBP and TV reconstruction and the proposed method provides a 5.4 dB PSNR improvement over the TV reconstruction while being significantly faster, giving reconstructions of 512 x 512 volumes in about 0.4 seconds using a single GPU. |
Tasks | |
Published | 2017-04-13 |
URL | http://arxiv.org/abs/1704.04058v2 |
http://arxiv.org/pdf/1704.04058v2.pdf | |
PWC | https://paperswithcode.com/paper/solving-ill-posed-inverse-problems-using |
Repo | https://github.com/adler-j/learned_gradient_tomography |
Framework | tf |
The Reversible Residual Network: Backpropagation Without Storing Activations
Title | The Reversible Residual Network: Backpropagation Without Storing Activations |
Authors | Aidan N. Gomez, Mengye Ren, Raquel Urtasun, Roger B. Grosse |
Abstract | Deep residual networks (ResNets) have significantly pushed forward the state-of-the-art on image classification, increasing in performance as networks grow both deeper and wider. However, memory consumption becomes a bottleneck, as one needs to store the activations in order to calculate gradients using backpropagation. We present the Reversible Residual Network (RevNet), a variant of ResNets where each layer’s activations can be reconstructed exactly from the next layer’s. Therefore, the activations for most layers need not be stored in memory during backpropagation. We demonstrate the effectiveness of RevNets on CIFAR-10, CIFAR-100, and ImageNet, establishing nearly identical classification accuracy to equally-sized ResNets, even though the activation storage requirements are independent of depth. |
Tasks | Image Classification |
Published | 2017-07-14 |
URL | http://arxiv.org/abs/1707.04585v1 |
http://arxiv.org/pdf/1707.04585v1.pdf | |
PWC | https://paperswithcode.com/paper/the-reversible-residual-network |
Repo | https://github.com/renmengye/revnet-public |
Framework | tf |
Variational Deep Q Network
Title | Variational Deep Q Network |
Authors | Yunhao Tang, Alp Kucukelbir |
Abstract | We propose a framework that directly tackles the probability distribution of the value function parameters in Deep Q Network (DQN), with powerful variational inference subroutines to approximate the posterior of the parameters. We will establish the equivalence between our proposed surrogate objective and variational inference loss. Our new algorithm achieves efficient exploration and performs well on large scale chain Markov Decision Process (MDP). |
Tasks | Efficient Exploration |
Published | 2017-11-30 |
URL | http://arxiv.org/abs/1711.11225v1 |
http://arxiv.org/pdf/1711.11225v1.pdf | |
PWC | https://paperswithcode.com/paper/variational-deep-q-network |
Repo | https://github.com/HarriBellThomas/VDQN |
Framework | none |
WaterGAN: Unsupervised Generative Network to Enable Real-time Color Correction of Monocular Underwater Images
Title | WaterGAN: Unsupervised Generative Network to Enable Real-time Color Correction of Monocular Underwater Images |
Authors | Jie Li, Katherine A. Skinner, Ryan M. Eustice, Matthew Johnson-Roberson |
Abstract | This paper reports on WaterGAN, a generative adversarial network (GAN) for generating realistic underwater images from in-air image and depth pairings in an unsupervised pipeline used for color correction of monocular underwater images. Cameras onboard autonomous and remotely operated vehicles can capture high resolution images to map the seafloor, however, underwater image formation is subject to the complex process of light propagation through the water column. The raw images retrieved are characteristically different than images taken in air due to effects such as absorption and scattering, which cause attenuation of light at different rates for different wavelengths. While this physical process is well described theoretically, the model depends on many parameters intrinsic to the water column as well as the objects in the scene. These factors make recovery of these parameters difficult without simplifying assumptions or field calibration, hence, restoration of underwater images is a non-trivial problem. Deep learning has demonstrated great success in modeling complex nonlinear systems but requires a large amount of training data, which is difficult to compile in deep sea environments. Using WaterGAN, we generate a large training dataset of paired imagery, both raw underwater and true color in-air, as well as depth data. This data serves as input to a novel end-to-end network for color correction of monocular underwater images. Due to the depth-dependent water column effects inherent to underwater environments, we show that our end-to-end network implicitly learns a coarse depth estimate of the underwater scene from monocular underwater images. Our proposed pipeline is validated with testing on real data collected from both a pure water tank and from underwater surveys in field testing. Source code is made publicly available with sample datasets and pretrained models. |
Tasks | Calibration |
Published | 2017-02-23 |
URL | http://arxiv.org/abs/1702.07392v3 |
http://arxiv.org/pdf/1702.07392v3.pdf | |
PWC | https://paperswithcode.com/paper/watergan-unsupervised-generative-network-to |
Repo | https://github.com/kskin/WaterGAN |
Framework | tf |
Bayesian Approaches to Distribution Regression
Title | Bayesian Approaches to Distribution Regression |
Authors | Ho Chung Leon Law, Dougal J. Sutherland, Dino Sejdinovic, Seth Flaxman |
Abstract | Distribution regression has recently attracted much interest as a generic solution to the problem of supervised learning where labels are available at the group level, rather than at the individual level. Current approaches, however, do not propagate the uncertainty in observations due to sampling variability in the groups. This effectively assumes that small and large groups are estimated equally well, and should have equal weight in the final regression. We account for this uncertainty with a Bayesian distribution regression formalism, improving the robustness and performance of the model when group sizes vary. We frame our models in a neural network style, allowing for simple MAP inference using backpropagation to learn the parameters, as well as MCMC-based inference which can fully propagate uncertainty. We demonstrate our approach on illustrative toy datasets, as well as on a challenging problem of predicting age from images. |
Tasks | |
Published | 2017-05-11 |
URL | http://arxiv.org/abs/1705.04293v3 |
http://arxiv.org/pdf/1705.04293v3.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-approaches-to-distribution |
Repo | https://github.com/hcllaw/bdr |
Framework | tf |
Deep Character-Level Click-Through Rate Prediction for Sponsored Search
Title | Deep Character-Level Click-Through Rate Prediction for Sponsored Search |
Authors | Bora Edizel, Amin Mantrach, Xiao Bai |
Abstract | Predicting the click-through rate of an advertisement is a critical component of online advertising platforms. In sponsored search, the click-through rate estimates the probability that a displayed advertisement is clicked by a user after she submits a query to the search engine. Commercial search engines typically rely on machine learning models trained with a large number of features to make such predictions. This is inevitably requires a lot of engineering efforts to define, compute, and select the appropriate features. In this paper, we propose two novel approaches (one working at character level and the other working at word level) that use deep convolutional neural networks to predict the click-through rate of a query-advertisement pair. Specially, the proposed architectures only consider the textual content appearing in a query-advertisement pair as input, and produce as output a click-through rate prediction. By comparing the character-level model with the word-level model, we show that language representation can be learnt from scratch at character level when trained on enough data. Through extensive experiments using billions of query-advertisement pairs of a popular commercial search engine, we demonstrate that both approaches significantly outperform a baseline model built on well-selected text features and a state-of-the-art word2vec-based approach. Finally, by combining the predictions of the deep models introduced in this study with the prediction of the model in production of the same commercial search engine, we significantly improve the accuracy and the calibration of the click-through rate prediction of the production system. |
Tasks | Calibration, Click-Through Rate Prediction |
Published | 2017-07-07 |
URL | http://arxiv.org/abs/1707.02158v1 |
http://arxiv.org/pdf/1707.02158v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-character-level-click-through-rate |
Repo | https://github.com/yrbahn/deep_match_ctr_prediction |
Framework | tf |
Meta Networks for Neural Style Transfer
Title | Meta Networks for Neural Style Transfer |
Authors | Falong Shen, Shuicheng Yan, Gang Zeng |
Abstract | In this paper we propose a new method to get the specified network parameters through one time feed-forward propagation of the meta networks and explore the application to neural style transfer. Recent works on style transfer typically need to train image transformation networks for every new style, and the style is encoded in the network parameters by enormous iterations of stochastic gradient descent. To tackle these issues, we build a meta network which takes in the style image and produces a corresponding image transformations network directly. Compared with optimization-based methods for every style, our meta networks can handle an arbitrary new style within $19ms$ seconds on one modern GPU card. The fast image transformation network generated by our meta network is only 449KB, which is capable of real-time executing on a mobile device. We also investigate the manifold of the style transfer networks by operating the hidden features from meta networks. Experiments have well validated the effectiveness of our method. Code and trained models has been released https://github.com/FalongShen/styletransfer. |
Tasks | Style Transfer |
Published | 2017-09-13 |
URL | http://arxiv.org/abs/1709.04111v1 |
http://arxiv.org/pdf/1709.04111v1.pdf | |
PWC | https://paperswithcode.com/paper/meta-networks-for-neural-style-transfer |
Repo | https://github.com/FalongShen/styletransfer |
Framework | pytorch |
Snapshot Ensembles: Train 1, get M for free
Title | Snapshot Ensembles: Train 1, get M for free |
Authors | Gao Huang, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E. Hopcroft, Kilian Q. Weinberger |
Abstract | Ensembles of neural networks are known to be much more robust and accurate than individual networks. However, training multiple deep networks for model averaging is computationally expensive. In this paper, we propose a method to obtain the seemingly contradictory goal of ensembling multiple neural networks at no additional training cost. We achieve this goal by training a single neural network, converging to several local minima along its optimization path and saving the model parameters. To obtain repeated rapid convergence, we leverage recent work on cyclic learning rate schedules. The resulting technique, which we refer to as Snapshot Ensembling, is simple, yet surprisingly effective. We show in a series of experiments that our approach is compatible with diverse network architectures and learning tasks. It consistently yields lower error rates than state-of-the-art single models at no additional training cost, and compares favorably with traditional network ensembles. On CIFAR-10 and CIFAR-100 our DenseNet Snapshot Ensembles obtain error rates of 3.4% and 17.4% respectively. |
Tasks | |
Published | 2017-04-01 |
URL | http://arxiv.org/abs/1704.00109v1 |
http://arxiv.org/pdf/1704.00109v1.pdf | |
PWC | https://paperswithcode.com/paper/snapshot-ensembles-train-1-get-m-for-free |
Repo | https://github.com/gaohuang/SnapshotEnsemble |
Framework | torch |
Distilling a Neural Network Into a Soft Decision Tree
Title | Distilling a Neural Network Into a Soft Decision Tree |
Authors | Nicholas Frosst, Geoffrey Hinton |
Abstract | Deep neural networks have proved to be a very effective way to perform classification tasks. They excel when the input data is high dimensional, the relationship between the input and the output is complicated, and the number of labeled training examples is large. But it is hard to explain why a learned network makes a particular classification decision on a particular test case. This is due to their reliance on distributed hierarchical representations. If we could take the knowledge acquired by the neural net and express the same knowledge in a model that relies on hierarchical decisions instead, explaining a particular decision would be much easier. We describe a way of using a trained neural net to create a type of soft decision tree that generalizes better than one learned directly from the training data. |
Tasks | |
Published | 2017-11-27 |
URL | http://arxiv.org/abs/1711.09784v1 |
http://arxiv.org/pdf/1711.09784v1.pdf | |
PWC | https://paperswithcode.com/paper/distilling-a-neural-network-into-a-soft |
Repo | https://github.com/aclarkData/MachineLearningInterpretability |
Framework | none |
Noisy Natural Gradient as Variational Inference
Title | Noisy Natural Gradient as Variational Inference |
Authors | Guodong Zhang, Shengyang Sun, David Duvenaud, Roger Grosse |
Abstract | Variational Bayesian neural nets combine the flexibility of deep learning with Bayesian uncertainty estimation. Unfortunately, there is a tradeoff between cheap but simple variational families (e.g.~fully factorized) or expensive and complicated inference procedures. We show that natural gradient ascent with adaptive weight noise implicitly fits a variational posterior to maximize the evidence lower bound (ELBO). This insight allows us to train full-covariance, fully factorized, or matrix-variate Gaussian variational posteriors using noisy versions of natural gradient, Adam, and K-FAC, respectively, making it possible to scale up to modern-size ConvNets. On standard regression benchmarks, our noisy K-FAC algorithm makes better predictions and matches Hamiltonian Monte Carlo’s predictive variances better than existing methods. Its improved uncertainty estimates lead to more efficient exploration in active learning, and intrinsic motivation for reinforcement learning. |
Tasks | Active Learning, Efficient Exploration |
Published | 2017-12-06 |
URL | http://arxiv.org/abs/1712.02390v2 |
http://arxiv.org/pdf/1712.02390v2.pdf | |
PWC | https://paperswithcode.com/paper/noisy-natural-gradient-as-variational |
Repo | https://github.com/gd-zhang/noisy-K-FAC |
Framework | tf |
Vector Embedding of Wikipedia Concepts and Entities
Title | Vector Embedding of Wikipedia Concepts and Entities |
Authors | Ehsan Sherkat, Evangelos Milios |
Abstract | Using deep learning for different machine learning tasks such as image classification and word embedding has recently gained many attentions. Its appealing performance reported across specific Natural Language Processing (NLP) tasks in comparison with other approaches is the reason for its popularity. Word embedding is the task of mapping words or phrases to a low dimensional numerical vector. In this paper, we use deep learning to embed Wikipedia Concepts and Entities. The English version of Wikipedia contains more than five million pages, which suggest its capability to cover many English Entities, Phrases, and Concepts. Each Wikipedia page is considered as a concept. Some concepts correspond to entities, such as a person’s name, an organization or a place. Contrary to word embedding, Wikipedia Concepts Embedding is not ambiguous, so there are different vectors for concepts with similar surface form but different mentions. We proposed several approaches and evaluated their performance based on Concept Analogy and Concept Similarity tasks. The results show that proposed approaches have the performance comparable and in some cases even higher than the state-of-the-art methods. |
Tasks | Word Embeddings |
Published | 2017-02-12 |
URL | http://arxiv.org/abs/1702.03470v1 |
http://arxiv.org/pdf/1702.03470v1.pdf | |
PWC | https://paperswithcode.com/paper/vector-embedding-of-wikipedia-concepts-and |
Repo | https://github.com/ehsansherkat/ConVec |
Framework | none |
Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
Title | Trial without Error: Towards Safe Reinforcement Learning via Human Intervention |
Authors | William Saunders, Girish Sastry, Andreas Stuhlmueller, Owain Evans |
Abstract | AI systems are increasingly applied to complex tasks that involve interaction with humans. During training, such systems are potentially dangerous, as they haven’t yet learned to avoid actions that could cause serious harm. How can an AI system explore and learn without making a single mistake that harms humans or otherwise causes serious damage? For model-free reinforcement learning, having a human “in the loop” and ready to intervene is currently the only way to prevent all catastrophes. We formalize human intervention for RL and show how to reduce the human labor required by training a supervised learner to imitate the human’s intervention decisions. We evaluate this scheme on Atari games, with a Deep RL agent being overseen by a human for four hours. When the class of catastrophes is simple, we are able to prevent all catastrophes without affecting the agent’s learning (whereas an RL baseline fails due to catastrophic forgetting). However, this scheme is less successful when catastrophes are more complex: it reduces but does not eliminate catastrophes and the supervised learner fails on adversarial examples found by the agent. Extrapolating to more challenging environments, we show that our implementation would not scale (due to the infeasible amount of human labor required). We outline extensions of the scheme that are necessary if we are to train model-free agents without a single catastrophe. |
Tasks | Atari Games |
Published | 2017-07-17 |
URL | http://arxiv.org/abs/1707.05173v1 |
http://arxiv.org/pdf/1707.05173v1.pdf | |
PWC | https://paperswithcode.com/paper/trial-without-error-towards-safe |
Repo | https://github.com/gsastry/human-rl |
Framework | none |
Yeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor
Title | Yeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor |
Authors | Robin Ruede, Markus Müller, Sebastian Stüker, Alex Waibel |
Abstract | Using supporting backchannel (BC) cues can make human-computer interaction more social. BCs provide a feedback from the listener to the speaker indicating to the speaker that he is still listened to. BCs can be expressed in different ways, depending on the modality of the interaction, for example as gestures or acoustic cues. In this work, we only considered acoustic cues. We are proposing an approach towards detecting BC opportunities based on acoustic input features like power and pitch. While other works in the field rely on the use of a hand-written rule set or specialized features, we made use of artificial neural networks. They are capable of deriving higher order features from input features themselves. In our setup, we first used a fully connected feed-forward network to establish an updated baseline in comparison to our previously proposed setup. We also extended this setup by the use of Long Short-Term Memory (LSTM) networks which have shown to outperform feed-forward based setups on various tasks. Our best system achieved an F1-Score of 0.37 using power and pitch features. Adding linguistic information using word2vec, the score increased to 0.39. |
Tasks | |
Published | 2017-06-02 |
URL | http://arxiv.org/abs/1706.01340v1 |
http://arxiv.org/pdf/1706.01340v1.pdf | |
PWC | https://paperswithcode.com/paper/yeah-right-uh-huh-a-deep-learning-backchannel |
Repo | https://github.com/phiresky/backchannel-prediction |
Framework | none |
Learning Spread-out Local Feature Descriptors
Title | Learning Spread-out Local Feature Descriptors |
Authors | Xu Zhang, Felix X. Yu, Sanjiv Kumar, Shih-Fu Chang |
Abstract | We propose a simple, yet powerful regularization technique that can be used to significantly improve both the pairwise and triplet losses in learning local feature descriptors. The idea is that in order to fully utilize the expressive power of the descriptor space, good local feature descriptors should be sufficiently “spread-out” over the space. In this work, we propose a regularization term to maximize the spread in feature descriptor inspired by the property of uniform distribution. We show that the proposed regularization with triplet loss outperforms existing Euclidean distance based descriptor learning techniques by a large margin. As an extension, the proposed regularization technique can also be used to improve image-level deep feature embedding. |
Tasks | |
Published | 2017-08-21 |
URL | http://arxiv.org/abs/1708.06320v1 |
http://arxiv.org/pdf/1708.06320v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-spread-out-local-feature-descriptors |
Repo | https://github.com/ColumbiaDVMM/Spread-out_Local_Feature_Descriptor |
Framework | tf |