Paper Group AWR 50
Expert Gate: Lifelong Learning with a Network of Experts. Adversarial examples in the physical world. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning. COCO: A Platform for Comparing Continuous Optimizers in a Bla …
Expert Gate: Lifelong Learning with a Network of Experts
Title | Expert Gate: Lifelong Learning with a Network of Experts |
Authors | Rahaf Aljundi, Punarjay Chakravarty, Tinne Tuytelaars |
Abstract | In this paper we introduce a model of lifelong learning, based on a Network of Experts. New tasks / experts are learned and added to the model sequentially, building on what was learned before. To ensure scalability of this process,data from previous tasks cannot be stored and hence is not available when learning a new task. A critical issue in such context, not addressed in the literature so far, relates to the decision which expert to deploy at test time. We introduce a set of gating autoencoders that learn a representation for the task at hand, and, at test time, automatically forward the test sample to the relevant expert. This also brings memory efficiency as only one expert network has to be loaded into memory at any given time. Further, the autoencoders inherently capture the relatedness of one task to another, based on which the most relevant prior model to be used for training a new expert, with finetuning or learning without-forgetting, can be selected. We evaluate our method on image classification and video prediction problems. |
Tasks | Image Classification, Video Prediction |
Published | 2016-11-18 |
URL | http://arxiv.org/abs/1611.06194v2 |
http://arxiv.org/pdf/1611.06194v2.pdf | |
PWC | https://paperswithcode.com/paper/expert-gate-lifelong-learning-with-a-network |
Repo | https://github.com/wannabeOG/ExpertNet-Pytorch |
Framework | pytorch |
Adversarial examples in the physical world
Title | Adversarial examples in the physical world |
Authors | Alexey Kurakin, Ian Goodfellow, Samy Bengio |
Abstract | Most existing machine learning classifiers are highly vulnerable to adversarial examples. An adversarial example is a sample of input data which has been modified very slightly in a way that is intended to cause a machine learning classifier to misclassify it. In many cases, these modifications can be so subtle that a human observer does not even notice the modification at all, yet the classifier still makes a mistake. Adversarial examples pose security concerns because they could be used to perform an attack on machine learning systems, even if the adversary has no access to the underlying model. Up to now, all previous work have assumed a threat model in which the adversary can feed data directly into the machine learning classifier. This is not always the case for systems operating in the physical world, for example those which are using signals from cameras and other sensors as an input. This paper shows that even in such physical world scenarios, machine learning systems are vulnerable to adversarial examples. We demonstrate this by feeding adversarial images obtained from cell-phone camera to an ImageNet Inception classifier and measuring the classification accuracy of the system. We find that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera. |
Tasks | |
Published | 2016-07-08 |
URL | http://arxiv.org/abs/1607.02533v4 |
http://arxiv.org/pdf/1607.02533v4.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-examples-in-the-physical-world |
Repo | https://github.com/1Konny/FGSM |
Framework | pytorch |
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
Title | InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets |
Authors | Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel |
Abstract | This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset. Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods. |
Tasks | Image Generation, Representation Learning, Unsupervised Image Classification, Unsupervised MNIST |
Published | 2016-06-12 |
URL | http://arxiv.org/abs/1606.03657v1 |
http://arxiv.org/pdf/1606.03657v1.pdf | |
PWC | https://paperswithcode.com/paper/infogan-interpretable-representation-learning |
Repo | https://github.com/sidneyp/bidirectional |
Framework | tf |
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
Title | Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning |
Authors | Jiasen Lu, Caiming Xiong, Devi Parikh, Richard Socher |
Abstract | Attention-based neural encoder-decoder frameworks have been widely adopted for image captioning. Most methods force visual attention to be active for every generated word. However, the decoder likely requires little to no visual information from the image to predict non-visual words such as “the” and “of”. Other words that may seem visual can often be predicted reliably just from the language model e.g., “sign” after “behind a red stop” or “phone” following “talking on a cell”. In this paper, we propose a novel adaptive attention model with a visual sentinel. At each time step, our model decides whether to attend to the image (and if so, to which regions) or to the visual sentinel. The model decides whether to attend to the image and where, in order to extract meaningful information for sequential word generation. We test our method on the COCO image captioning 2015 challenge dataset and Flickr30K. Our approach sets the new state-of-the-art by a significant margin. |
Tasks | Image Captioning, Language Modelling |
Published | 2016-12-06 |
URL | http://arxiv.org/abs/1612.01887v2 |
http://arxiv.org/pdf/1612.01887v2.pdf | |
PWC | https://paperswithcode.com/paper/knowing-when-to-look-adaptive-attention-via-a |
Repo | https://github.com/miroblog/AdaptiveAttention |
Framework | pytorch |
COCO: A Platform for Comparing Continuous Optimizers in a Black-Box Setting
Title | COCO: A Platform for Comparing Continuous Optimizers in a Black-Box Setting |
Authors | Nikolaus Hansen, Anne Auger, Olaf Mersmann, Tea Tusar, Dimo Brockhoff |
Abstract | COCO is a platform for Comparing Continuous Optimizers in a black-box setting. It aims at automatizing the tedious and repetitive task of benchmarking numerical optimization algorithms to the greatest possible extent. We present the rationals behind the development of the platform as a general proposition for a guideline towards better benchmarking. We detail underlying fundamental concepts of COCO such as its definition of a problem, the idea of instances, the relevance of target values, and runtime as central performance measure. Finally, we give a quick overview of the basic code structure and the available test suites. |
Tasks | |
Published | 2016-03-29 |
URL | http://arxiv.org/abs/1603.08785v3 |
http://arxiv.org/pdf/1603.08785v3.pdf | |
PWC | https://paperswithcode.com/paper/coco-a-platform-for-comparing-continuous |
Repo | https://github.com/numbbo/coco |
Framework | none |
Conditional Image Generation with PixelCNN Decoders
Title | Conditional Image Generation with PixelCNN Decoders |
Authors | Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu |
Abstract | This work explores conditional image generation with a new image density model based on the PixelCNN architecture. The model can be conditioned on any vector, including descriptive labels or tags, or latent embeddings created by other networks. When conditioned on class labels from the ImageNet database, the model is able to generate diverse, realistic scenes representing distinct animals, objects, landscapes and structures. When conditioned on an embedding produced by a convolutional network given a single image of an unseen face, it generates a variety of new portraits of the same person with different facial expressions, poses and lighting conditions. We also show that conditional PixelCNN can serve as a powerful decoder in an image autoencoder. Additionally, the gated convolutional layers in the proposed model improve the log-likelihood of PixelCNN to match the state-of-the-art performance of PixelRNN on ImageNet, with greatly reduced computational cost. |
Tasks | Conditional Image Generation, Image Generation |
Published | 2016-06-16 |
URL | http://arxiv.org/abs/1606.05328v2 |
http://arxiv.org/pdf/1606.05328v2.pdf | |
PWC | https://paperswithcode.com/paper/conditional-image-generation-with-pixelcnn |
Repo | https://github.com/openai/pixel-cnn |
Framework | tf |
Deep Visual Foresight for Planning Robot Motion
Title | Deep Visual Foresight for Planning Robot Motion |
Authors | Chelsea Finn, Sergey Levine |
Abstract | A key challenge in scaling up robot learning to many skills and environments is removing the need for human supervision, so that robots can collect their own data and improve their own performance without being limited by the cost of requesting human feedback. Model-based reinforcement learning holds the promise of enabling an agent to learn to predict the effects of its actions, which could provide flexible predictive models for a wide range of tasks and environments, without detailed human supervision. We develop a method for combining deep action-conditioned video prediction models with model-predictive control that uses entirely unlabeled training data. Our approach does not require a calibrated camera, an instrumented training set-up, nor precise sensing and actuation. Our results show that our method enables a real robot to perform nonprehensile manipulation – pushing objects – and can handle novel objects not seen during training. |
Tasks | Video Prediction |
Published | 2016-10-03 |
URL | http://arxiv.org/abs/1610.00696v2 |
http://arxiv.org/pdf/1610.00696v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-visual-foresight-for-planning-robot |
Repo | https://github.com/m-serra/action-inference-for-video-prediction-benchmarking |
Framework | tf |
Invertible Conditional GANs for image editing
Title | Invertible Conditional GANs for image editing |
Authors | Guim Perarnau, Joost van de Weijer, Bogdan Raducanu, Jose M. Álvarez |
Abstract | Generative Adversarial Networks (GANs) have recently demonstrated to successfully approximate complex data distributions. A relevant extension of this model is conditional GANs (cGANs), where the introduction of external information allows to determine specific representations of the generated images. In this work, we evaluate encoders to inverse the mapping of a cGAN, i.e., mapping a real image into a latent space and a conditional representation. This allows, for example, to reconstruct and modify real images of faces conditioning on arbitrary attributes. Additionally, we evaluate the design of cGANs. The combination of an encoder with a cGAN, which we call Invertible cGAN (IcGAN), enables to re-generate real images with deterministic complex modifications. |
Tasks | Conditional Image Generation, Image-to-Image Translation |
Published | 2016-11-19 |
URL | http://arxiv.org/abs/1611.06355v1 |
http://arxiv.org/pdf/1611.06355v1.pdf | |
PWC | https://paperswithcode.com/paper/invertible-conditional-gans-for-image-editing |
Repo | https://github.com/AZHARTHEGEEK/GAN_s |
Framework | none |
Improving Sampling from Generative Autoencoders with Markov Chains
Title | Improving Sampling from Generative Autoencoders with Markov Chains |
Authors | Antonia Creswell, Kai Arulkumaran, Anil Anthony Bharath |
Abstract | We focus on generative autoencoders, such as variational or adversarial autoencoders, which jointly learn a generative model alongside an inference model. Generative autoencoders are those which are trained to softly enforce a prior on the latent distribution learned by the inference model. We call the distribution to which the inference model maps observed samples, the learned latent distribution, which may not be consistent with the prior. We formulate a Markov chain Monte Carlo (MCMC) sampling process, equivalent to iteratively decoding and encoding, which allows us to sample from the learned latent distribution. Since, the generative model learns to map from the learned latent distribution, rather than the prior, we may use MCMC to improve the quality of samples drawn from the generative model, especially when the learned latent distribution is far from the prior. Using MCMC sampling, we are able to reveal previously unseen differences between generative autoencoders trained either with or without a denoising criterion. |
Tasks | |
Published | 2016-10-28 |
URL | http://arxiv.org/abs/1610.09296v3 |
http://arxiv.org/pdf/1610.09296v3.pdf | |
PWC | https://paperswithcode.com/paper/improving-sampling-from-generative |
Repo | https://github.com/Kaixhin/Autoencoders |
Framework | torch |
A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images
Title | A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images |
Authors | David Vázquez, Jorge Bernal, F. Javier Sánchez, Gloria Fernández-Esparrach, Antonio M. López, Adriana Romero, Michal Drozdzal, Aaron Courville |
Abstract | Colorectal cancer (CRC) is the third cause of cancer death worldwide. Currently, the standard approach to reduce CRC-related mortality is to perform regular screening in search for polyps and colonoscopy is the screening tool of choice. The main limitations of this screening procedure are polyp miss-rate and inability to perform visual assessment of polyp malignancy. These drawbacks can be reduced by designing Decision Support Systems (DSS) aiming to help clinicians in the different stages of the procedure by providing endoluminal scene segmentation. Thus, in this paper, we introduce an extended benchmark of colonoscopy image, with the hope of establishing a new strong benchmark for colonoscopy image analysis research. We provide new baselines on this dataset by training standard fully convolutional networks (FCN) for semantic segmentation and significantly outperforming, without any further post-processing, prior results in endoluminal scene segmentation. |
Tasks | Scene Segmentation, Semantic Segmentation |
Published | 2016-12-02 |
URL | http://arxiv.org/abs/1612.00799v1 |
http://arxiv.org/pdf/1612.00799v1.pdf | |
PWC | https://paperswithcode.com/paper/a-benchmark-for-endoluminal-scene |
Repo | https://github.com/guilhermesantos/Semantic-Image-Segmentation |
Framework | pytorch |
Chained Gaussian Processes
Title | Chained Gaussian Processes |
Authors | Alan D. Saul, James Hensman, Aki Vehtari, Neil D. Lawrence |
Abstract | Gaussian process models are flexible, Bayesian non-parametric approaches to regression. Properties of multivariate Gaussians mean that they can be combined linearly in the manner of additive models and via a link function (like in generalized linear models) to handle non-Gaussian data. However, the link function formalism is restrictive, link functions are always invertible and must convert a parameter of interest to a linear combination of the underlying processes. There are many likelihoods and models where a non-linear combination is more appropriate. We term these more general models Chained Gaussian Processes: the transformation of the GPs to the likelihood parameters will not generally be invertible, and that implies that linearisation would only be possible with multiple (localized) links, i.e. a chain. We develop an approximate inference procedure for Chained GPs that is scalable and applicable to any factorized likelihood. We demonstrate the approximation on a range of likelihood functions. |
Tasks | Gaussian Processes |
Published | 2016-04-18 |
URL | http://arxiv.org/abs/1604.05263v1 |
http://arxiv.org/pdf/1604.05263v1.pdf | |
PWC | https://paperswithcode.com/paper/chained-gaussian-processes |
Repo | https://github.com/SheffieldML/ChainedGP |
Framework | none |
Known Unknowns: Uncertainty Quality in Bayesian Neural Networks
Title | Known Unknowns: Uncertainty Quality in Bayesian Neural Networks |
Authors | Ramon Oliveira, Pedro Tabacof, Eduardo Valle |
Abstract | We evaluate the uncertainty quality in neural networks using anomaly detection. We extract uncertainty measures (e.g. entropy) from the predictions of candidate models, use those measures as features for an anomaly detector, and gauge how well the detector differentiates known from unknown classes. We assign higher uncertainty quality to candidate models that lead to better detectors. We also propose a novel method for sampling a variational approximation of a Bayesian neural network, called One-Sample Bayesian Approximation (OSBA). We experiment on two datasets, MNIST and CIFAR10. We compare the following candidate neural network models: Maximum Likelihood, Bayesian Dropout, OSBA, and — for MNIST — the standard variational approximation. We show that Bayesian Dropout and OSBA provide better uncertainty information than Maximum Likelihood, and are essentially equivalent to the standard variational approximation, but much faster. |
Tasks | Anomaly Detection |
Published | 2016-12-05 |
URL | http://arxiv.org/abs/1612.01251v2 |
http://arxiv.org/pdf/1612.01251v2.pdf | |
PWC | https://paperswithcode.com/paper/known-unknowns-uncertainty-quality-in |
Repo | https://github.com/ramon-oliveira/deepstats |
Framework | none |
Lost in Space: Geolocation in Event Data
Title | Lost in Space: Geolocation in Event Data |
Authors | Sophie J. Lee, Howard Liu, Michael D. Ward |
Abstract | Extracting the “correct” location information from text data, i.e., determining the place of event, has long been a goal for automated text processing. To approximate human-like coding schema, we introduce a supervised machine learning algorithm that classifies each location word to be either correct or incorrect. We use news articles collected from around the world (Integrated Crisis Early Warning System [ICEWS] data and Open Event Data Alliance [OEDA] data) to test our algorithm that consists of two stages. In the feature selection stage, we extract contextual information from texts, namely, the N-gram patterns for location words, the frequency of mention, and the context of the sentences containing location words. In the classification stage, we use three classifiers to estimate the model parameters in the training set and then to predict whether a location word in the test set news articles is the place of the event. The validation results show that our algorithm improves the accuracy rate of the current geolocation methods of dictionary approach by as much as 25%. |
Tasks | Feature Selection |
Published | 2016-11-14 |
URL | http://arxiv.org/abs/1611.04837v1 |
http://arxiv.org/pdf/1611.04837v1.pdf | |
PWC | https://paperswithcode.com/paper/lost-in-space-geolocation-in-event-data |
Repo | https://github.com/haoliuhoward/LostinSpace-PSRM |
Framework | none |
Tensorial Mixture Models
Title | Tensorial Mixture Models |
Authors | Or Sharir, Ronen Tamari, Nadav Cohen, Amnon Shashua |
Abstract | Casting neural networks in generative frameworks is a highly sought-after endeavor these days. Contemporary methods, such as Generative Adversarial Networks, capture some of the generative capabilities, but not all. In particular, they lack the ability of tractable marginalization, and thus are not suitable for many tasks. Other methods, based on arithmetic circuits and sum-product networks, do allow tractable marginalization, but their performance is challenged by the need to learn the structure of a circuit. Building on the tractability of arithmetic circuits, we leverage concepts from tensor analysis, and derive a family of generative models we call Tensorial Mixture Models (TMMs). TMMs assume a simple convolutional network structure, and in addition, lend themselves to theoretical analyses that allow comprehensive understanding of the relation between their structure and their expressive properties. We thus obtain a generative model that is tractable on one hand, and on the other hand, allows effective representation of rich distributions in an easily controlled manner. These two capabilities are brought together in the task of classification under missing data, where TMMs deliver state of the art accuracies with seamless implementation and design. |
Tasks | |
Published | 2016-10-13 |
URL | http://arxiv.org/abs/1610.04167v5 |
http://arxiv.org/pdf/1610.04167v5.pdf | |
PWC | https://paperswithcode.com/paper/tensorial-mixture-models |
Repo | https://github.com/HUJI-Deep/caffe-simnets |
Framework | none |
End-to-End Instance Segmentation with Recurrent Attention
Title | End-to-End Instance Segmentation with Recurrent Attention |
Authors | Mengye Ren, Richard S. Zemel |
Abstract | While convolutional neural networks have gained impressive success recently in solving structured prediction problems such as semantic segmentation, it remains a challenge to differentiate individual object instances in the scene. Instance segmentation is very important in a variety of applications, such as autonomous driving, image captioning, and visual question answering. Techniques that combine large graphical models with low-level vision have been proposed to address this problem; however, we propose an end-to-end recurrent neural network (RNN) architecture with an attention mechanism to model a human-like counting process, and produce detailed instance segmentations. The network is jointly trained to sequentially produce regions of interest as well as a dominant object segmentation within each region. The proposed model achieves competitive results on the CVPPP, KITTI, and Cityscapes datasets. |
Tasks | Autonomous Driving, Image Captioning, Instance Segmentation, Question Answering, Semantic Segmentation, Structured Prediction, Visual Question Answering |
Published | 2016-05-30 |
URL | http://arxiv.org/abs/1605.09410v5 |
http://arxiv.org/pdf/1605.09410v5.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-instance-segmentation-with |
Repo | https://github.com/renmengye/rec-attend-public |
Framework | tf |