May 7, 2019

2937 words 14 mins read

Paper Group AWR 50

Expert Gate: Lifelong Learning with a Network of Experts. Adversarial examples in the physical world. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning. COCO: A Platform for Comparing Continuous Optimizers in a Bla …

Expert Gate: Lifelong Learning with a Network of Experts


Title	Expert Gate: Lifelong Learning with a Network of Experts
Authors	Rahaf Aljundi, Punarjay Chakravarty, Tinne Tuytelaars
Abstract	In this paper we introduce a model of lifelong learning, based on a Network of Experts. New tasks / experts are learned and added to the model sequentially, building on what was learned before. To ensure scalability of this process,data from previous tasks cannot be stored and hence is not available when learning a new task. A critical issue in such context, not addressed in the literature so far, relates to the decision which expert to deploy at test time. We introduce a set of gating autoencoders that learn a representation for the task at hand, and, at test time, automatically forward the test sample to the relevant expert. This also brings memory efficiency as only one expert network has to be loaded into memory at any given time. Further, the autoencoders inherently capture the relatedness of one task to another, based on which the most relevant prior model to be used for training a new expert, with finetuning or learning without-forgetting, can be selected. We evaluate our method on image classification and video prediction problems.
Tasks	Image Classification, Video Prediction
Published	2016-11-18
URL	http://arxiv.org/abs/1611.06194v2
PDF	http://arxiv.org/pdf/1611.06194v2.pdf
PWC	https://paperswithcode.com/paper/expert-gate-lifelong-learning-with-a-network
Repo	https://github.com/wannabeOG/ExpertNet-Pytorch
Framework	pytorch

Adversarial examples in the physical world


Title	Adversarial examples in the physical world
Authors	Alexey Kurakin, Ian Goodfellow, Samy Bengio
Abstract	Most existing machine learning classifiers are highly vulnerable to adversarial examples. An adversarial example is a sample of input data which has been modified very slightly in a way that is intended to cause a machine learning classifier to misclassify it. In many cases, these modifications can be so subtle that a human observer does not even notice the modification at all, yet the classifier still makes a mistake. Adversarial examples pose security concerns because they could be used to perform an attack on machine learning systems, even if the adversary has no access to the underlying model. Up to now, all previous work have assumed a threat model in which the adversary can feed data directly into the machine learning classifier. This is not always the case for systems operating in the physical world, for example those which are using signals from cameras and other sensors as an input. This paper shows that even in such physical world scenarios, machine learning systems are vulnerable to adversarial examples. We demonstrate this by feeding adversarial images obtained from cell-phone camera to an ImageNet Inception classifier and measuring the classification accuracy of the system. We find that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera.
Tasks
Published	2016-07-08
URL	http://arxiv.org/abs/1607.02533v4
PDF	http://arxiv.org/pdf/1607.02533v4.pdf
PWC	https://paperswithcode.com/paper/adversarial-examples-in-the-physical-world
Repo	https://github.com/1Konny/FGSM
Framework	pytorch

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets


Title	InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
Authors	Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel
Abstract	This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset. Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.
Tasks	Image Generation, Representation Learning, Unsupervised Image Classification, Unsupervised MNIST
Published	2016-06-12
URL	http://arxiv.org/abs/1606.03657v1
PDF	http://arxiv.org/pdf/1606.03657v1.pdf
PWC	https://paperswithcode.com/paper/infogan-interpretable-representation-learning
Repo	https://github.com/sidneyp/bidirectional
Framework	tf

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning


Title	Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
Authors	Jiasen Lu, Caiming Xiong, Devi Parikh, Richard Socher
Abstract	Attention-based neural encoder-decoder frameworks have been widely adopted for image captioning. Most methods force visual attention to be active for every generated word. However, the decoder likely requires little to no visual information from the image to predict non-visual words such as “the” and “of”. Other words that may seem visual can often be predicted reliably just from the language model e.g., “sign” after “behind a red stop” or “phone” following “talking on a cell”. In this paper, we propose a novel adaptive attention model with a visual sentinel. At each time step, our model decides whether to attend to the image (and if so, to which regions) or to the visual sentinel. The model decides whether to attend to the image and where, in order to extract meaningful information for sequential word generation. We test our method on the COCO image captioning 2015 challenge dataset and Flickr30K. Our approach sets the new state-of-the-art by a significant margin.
Tasks	Image Captioning, Language Modelling
Published	2016-12-06
URL	http://arxiv.org/abs/1612.01887v2
PDF	http://arxiv.org/pdf/1612.01887v2.pdf
PWC	https://paperswithcode.com/paper/knowing-when-to-look-adaptive-attention-via-a
Repo	https://github.com/miroblog/AdaptiveAttention
Framework	pytorch

COCO: A Platform for Comparing Continuous Optimizers in a Black-Box Setting


Title	COCO: A Platform for Comparing Continuous Optimizers in a Black-Box Setting
Authors	Nikolaus Hansen, Anne Auger, Olaf Mersmann, Tea Tusar, Dimo Brockhoff
Abstract	COCO is a platform for Comparing Continuous Optimizers in a black-box setting. It aims at automatizing the tedious and repetitive task of benchmarking numerical optimization algorithms to the greatest possible extent. We present the rationals behind the development of the platform as a general proposition for a guideline towards better benchmarking. We detail underlying fundamental concepts of COCO such as its definition of a problem, the idea of instances, the relevance of target values, and runtime as central performance measure. Finally, we give a quick overview of the basic code structure and the available test suites.
Tasks
Published	2016-03-29
URL	http://arxiv.org/abs/1603.08785v3
PDF	http://arxiv.org/pdf/1603.08785v3.pdf
PWC	https://paperswithcode.com/paper/coco-a-platform-for-comparing-continuous
Repo	https://github.com/numbbo/coco
Framework	none

Conditional Image Generation with PixelCNN Decoders


Title	Conditional Image Generation with PixelCNN Decoders
Authors	Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu
Abstract	This work explores conditional image generation with a new image density model based on the PixelCNN architecture. The model can be conditioned on any vector, including descriptive labels or tags, or latent embeddings created by other networks. When conditioned on class labels from the ImageNet database, the model is able to generate diverse, realistic scenes representing distinct animals, objects, landscapes and structures. When conditioned on an embedding produced by a convolutional network given a single image of an unseen face, it generates a variety of new portraits of the same person with different facial expressions, poses and lighting conditions. We also show that conditional PixelCNN can serve as a powerful decoder in an image autoencoder. Additionally, the gated convolutional layers in the proposed model improve the log-likelihood of PixelCNN to match the state-of-the-art performance of PixelRNN on ImageNet, with greatly reduced computational cost.
Tasks	Conditional Image Generation, Image Generation
Published	2016-06-16
URL	http://arxiv.org/abs/1606.05328v2
PDF	http://arxiv.org/pdf/1606.05328v2.pdf
PWC	https://paperswithcode.com/paper/conditional-image-generation-with-pixelcnn
Repo	https://github.com/openai/pixel-cnn
Framework	tf

Deep Visual Foresight for Planning Robot Motion


Title	Deep Visual Foresight for Planning Robot Motion
Authors	Chelsea Finn, Sergey Levine
Abstract	A key challenge in scaling up robot learning to many skills and environments is removing the need for human supervision, so that robots can collect their own data and improve their own performance without being limited by the cost of requesting human feedback. Model-based reinforcement learning holds the promise of enabling an agent to learn to predict the effects of its actions, which could provide flexible predictive models for a wide range of tasks and environments, without detailed human supervision. We develop a method for combining deep action-conditioned video prediction models with model-predictive control that uses entirely unlabeled training data. Our approach does not require a calibrated camera, an instrumented training set-up, nor precise sensing and actuation. Our results show that our method enables a real robot to perform nonprehensile manipulation – pushing objects – and can handle novel objects not seen during training.
Tasks	Video Prediction
Published	2016-10-03
URL	http://arxiv.org/abs/1610.00696v2
PDF	http://arxiv.org/pdf/1610.00696v2.pdf
PWC	https://paperswithcode.com/paper/deep-visual-foresight-for-planning-robot
Repo	https://github.com/m-serra/action-inference-for-video-prediction-benchmarking
Framework	tf

Invertible Conditional GANs for image editing


Title	Invertible Conditional GANs for image editing
Authors	Guim Perarnau, Joost van de Weijer, Bogdan Raducanu, Jose M. Álvarez
Abstract	Generative Adversarial Networks (GANs) have recently demonstrated to successfully approximate complex data distributions. A relevant extension of this model is conditional GANs (cGANs), where the introduction of external information allows to determine specific representations of the generated images. In this work, we evaluate encoders to inverse the mapping of a cGAN, i.e., mapping a real image into a latent space and a conditional representation. This allows, for example, to reconstruct and modify real images of faces conditioning on arbitrary attributes. Additionally, we evaluate the design of cGANs. The combination of an encoder with a cGAN, which we call Invertible cGAN (IcGAN), enables to re-generate real images with deterministic complex modifications.
Tasks	Conditional Image Generation, Image-to-Image Translation
Published	2016-11-19
URL	http://arxiv.org/abs/1611.06355v1
PDF	http://arxiv.org/pdf/1611.06355v1.pdf
PWC	https://paperswithcode.com/paper/invertible-conditional-gans-for-image-editing
Repo	https://github.com/AZHARTHEGEEK/GAN_s
Framework	none

Improving Sampling from Generative Autoencoders with Markov Chains


Title	Improving Sampling from Generative Autoencoders with Markov Chains
Authors	Antonia Creswell, Kai Arulkumaran, Anil Anthony Bharath
Abstract	We focus on generative autoencoders, such as variational or adversarial autoencoders, which jointly learn a generative model alongside an inference model. Generative autoencoders are those which are trained to softly enforce a prior on the latent distribution learned by the inference model. We call the distribution to which the inference model maps observed samples, the learned latent distribution, which may not be consistent with the prior. We formulate a Markov chain Monte Carlo (MCMC) sampling process, equivalent to iteratively decoding and encoding, which allows us to sample from the learned latent distribution. Since, the generative model learns to map from the learned latent distribution, rather than the prior, we may use MCMC to improve the quality of samples drawn from the generative model, especially when the learned latent distribution is far from the prior. Using MCMC sampling, we are able to reveal previously unseen differences between generative autoencoders trained either with or without a denoising criterion.
Tasks
Published	2016-10-28
URL	http://arxiv.org/abs/1610.09296v3
PDF	http://arxiv.org/pdf/1610.09296v3.pdf
PWC	https://paperswithcode.com/paper/improving-sampling-from-generative
Repo	https://github.com/Kaixhin/Autoencoders
Framework	torch

A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images


Title	A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images
Authors	David Vázquez, Jorge Bernal, F. Javier Sánchez, Gloria Fernández-Esparrach, Antonio M. López, Adriana Romero, Michal Drozdzal, Aaron Courville
Abstract	Colorectal cancer (CRC) is the third cause of cancer death worldwide. Currently, the standard approach to reduce CRC-related mortality is to perform regular screening in search for polyps and colonoscopy is the screening tool of choice. The main limitations of this screening procedure are polyp miss-rate and inability to perform visual assessment of polyp malignancy. These drawbacks can be reduced by designing Decision Support Systems (DSS) aiming to help clinicians in the different stages of the procedure by providing endoluminal scene segmentation. Thus, in this paper, we introduce an extended benchmark of colonoscopy image, with the hope of establishing a new strong benchmark for colonoscopy image analysis research. We provide new baselines on this dataset by training standard fully convolutional networks (FCN) for semantic segmentation and significantly outperforming, without any further post-processing, prior results in endoluminal scene segmentation.
Tasks	Scene Segmentation, Semantic Segmentation
Published	2016-12-02
URL	http://arxiv.org/abs/1612.00799v1
PDF	http://arxiv.org/pdf/1612.00799v1.pdf
PWC	https://paperswithcode.com/paper/a-benchmark-for-endoluminal-scene
Repo	https://github.com/guilhermesantos/Semantic-Image-Segmentation
Framework	pytorch

Chained Gaussian Processes


Title	Chained Gaussian Processes
Authors	Alan D. Saul, James Hensman, Aki Vehtari, Neil D. Lawrence
Abstract	Gaussian process models are flexible, Bayesian non-parametric approaches to regression. Properties of multivariate Gaussians mean that they can be combined linearly in the manner of additive models and via a link function (like in generalized linear models) to handle non-Gaussian data. However, the link function formalism is restrictive, link functions are always invertible and must convert a parameter of interest to a linear combination of the underlying processes. There are many likelihoods and models where a non-linear combination is more appropriate. We term these more general models Chained Gaussian Processes: the transformation of the GPs to the likelihood parameters will not generally be invertible, and that implies that linearisation would only be possible with multiple (localized) links, i.e. a chain. We develop an approximate inference procedure for Chained GPs that is scalable and applicable to any factorized likelihood. We demonstrate the approximation on a range of likelihood functions.
Tasks	Gaussian Processes
Published	2016-04-18
URL	http://arxiv.org/abs/1604.05263v1
PDF	http://arxiv.org/pdf/1604.05263v1.pdf
PWC	https://paperswithcode.com/paper/chained-gaussian-processes
Repo	https://github.com/SheffieldML/ChainedGP
Framework	none

Known Unknowns: Uncertainty Quality in Bayesian Neural Networks


Title	Known Unknowns: Uncertainty Quality in Bayesian Neural Networks
Authors	Ramon Oliveira, Pedro Tabacof, Eduardo Valle
Abstract	We evaluate the uncertainty quality in neural networks using anomaly detection. We extract uncertainty measures (e.g. entropy) from the predictions of candidate models, use those measures as features for an anomaly detector, and gauge how well the detector differentiates known from unknown classes. We assign higher uncertainty quality to candidate models that lead to better detectors. We also propose a novel method for sampling a variational approximation of a Bayesian neural network, called One-Sample Bayesian Approximation (OSBA). We experiment on two datasets, MNIST and CIFAR10. We compare the following candidate neural network models: Maximum Likelihood, Bayesian Dropout, OSBA, and — for MNIST — the standard variational approximation. We show that Bayesian Dropout and OSBA provide better uncertainty information than Maximum Likelihood, and are essentially equivalent to the standard variational approximation, but much faster.
Tasks	Anomaly Detection
Published	2016-12-05
URL	http://arxiv.org/abs/1612.01251v2
PDF	http://arxiv.org/pdf/1612.01251v2.pdf
PWC	https://paperswithcode.com/paper/known-unknowns-uncertainty-quality-in
Repo	https://github.com/ramon-oliveira/deepstats
Framework	none

Lost in Space: Geolocation in Event Data


Title	Lost in Space: Geolocation in Event Data
Authors	Sophie J. Lee, Howard Liu, Michael D. Ward
Abstract	Extracting the “correct” location information from text data, i.e., determining the place of event, has long been a goal for automated text processing. To approximate human-like coding schema, we introduce a supervised machine learning algorithm that classifies each location word to be either correct or incorrect. We use news articles collected from around the world (Integrated Crisis Early Warning System [ICEWS] data and Open Event Data Alliance [OEDA] data) to test our algorithm that consists of two stages. In the feature selection stage, we extract contextual information from texts, namely, the N-gram patterns for location words, the frequency of mention, and the context of the sentences containing location words. In the classification stage, we use three classifiers to estimate the model parameters in the training set and then to predict whether a location word in the test set news articles is the place of the event. The validation results show that our algorithm improves the accuracy rate of the current geolocation methods of dictionary approach by as much as 25%.
Tasks	Feature Selection
Published	2016-11-14
URL	http://arxiv.org/abs/1611.04837v1
PDF	http://arxiv.org/pdf/1611.04837v1.pdf
PWC	https://paperswithcode.com/paper/lost-in-space-geolocation-in-event-data
Repo	https://github.com/haoliuhoward/LostinSpace-PSRM
Framework	none

Tensorial Mixture Models


Title	Tensorial Mixture Models
Authors	Or Sharir, Ronen Tamari, Nadav Cohen, Amnon Shashua
Abstract	Casting neural networks in generative frameworks is a highly sought-after endeavor these days. Contemporary methods, such as Generative Adversarial Networks, capture some of the generative capabilities, but not all. In particular, they lack the ability of tractable marginalization, and thus are not suitable for many tasks. Other methods, based on arithmetic circuits and sum-product networks, do allow tractable marginalization, but their performance is challenged by the need to learn the structure of a circuit. Building on the tractability of arithmetic circuits, we leverage concepts from tensor analysis, and derive a family of generative models we call Tensorial Mixture Models (TMMs). TMMs assume a simple convolutional network structure, and in addition, lend themselves to theoretical analyses that allow comprehensive understanding of the relation between their structure and their expressive properties. We thus obtain a generative model that is tractable on one hand, and on the other hand, allows effective representation of rich distributions in an easily controlled manner. These two capabilities are brought together in the task of classification under missing data, where TMMs deliver state of the art accuracies with seamless implementation and design.
Tasks
Published	2016-10-13
URL	http://arxiv.org/abs/1610.04167v5
PDF	http://arxiv.org/pdf/1610.04167v5.pdf
PWC	https://paperswithcode.com/paper/tensorial-mixture-models
Repo	https://github.com/HUJI-Deep/caffe-simnets
Framework	none

End-to-End Instance Segmentation with Recurrent Attention


Title	End-to-End Instance Segmentation with Recurrent Attention
Authors	Mengye Ren, Richard S. Zemel
Abstract	While convolutional neural networks have gained impressive success recently in solving structured prediction problems such as semantic segmentation, it remains a challenge to differentiate individual object instances in the scene. Instance segmentation is very important in a variety of applications, such as autonomous driving, image captioning, and visual question answering. Techniques that combine large graphical models with low-level vision have been proposed to address this problem; however, we propose an end-to-end recurrent neural network (RNN) architecture with an attention mechanism to model a human-like counting process, and produce detailed instance segmentations. The network is jointly trained to sequentially produce regions of interest as well as a dominant object segmentation within each region. The proposed model achieves competitive results on the CVPPP, KITTI, and Cityscapes datasets.
Tasks	Autonomous Driving, Image Captioning, Instance Segmentation, Question Answering, Semantic Segmentation, Structured Prediction, Visual Question Answering
Published	2016-05-30
URL	http://arxiv.org/abs/1605.09410v5
PDF	http://arxiv.org/pdf/1605.09410v5.pdf
PWC	https://paperswithcode.com/paper/end-to-end-instance-segmentation-with
Repo	https://github.com/renmengye/rec-attend-public
Framework	tf