Paper Group AWR 5
Early Visual Concept Learning with Unsupervised Deep Learning. Image-to-Image Translation with Conditional Adversarial Networks. Towards Evaluating the Robustness of Neural Networks. Learning to reinforcement learn. Random Walk Models of Network Formation and Sequential Monte Carlo Methods for Graphs. SoundNet: Learning Sound Representations from U …
Early Visual Concept Learning with Unsupervised Deep Learning
Title | Early Visual Concept Learning with Unsupervised Deep Learning |
Authors | Irina Higgins, Loic Matthey, Xavier Glorot, Arka Pal, Benigno Uria, Charles Blundell, Shakir Mohamed, Alexander Lerchner |
Abstract | Automated discovery of early visual concepts from raw image data is a major open challenge in AI research. Addressing this problem, we propose an unsupervised approach for learning disentangled representations of the underlying factors of variation. We draw inspiration from neuroscience, and show how this can be achieved in an unsupervised generative model by applying the same learning pressures as have been suggested to act in the ventral visual stream in the brain. By enforcing redundancy reduction, encouraging statistical independence, and exposure to data with transform continuities analogous to those to which human infants are exposed, we obtain a variational autoencoder (VAE) framework capable of learning disentangled factors. Our approach makes few assumptions and works well across a wide variety of datasets. Furthermore, our solution has useful emergent properties, such as zero-shot inference and an intuitive understanding of “objectness”. |
Tasks | |
Published | 2016-06-17 |
URL | http://arxiv.org/abs/1606.05579v3 |
http://arxiv.org/pdf/1606.05579v3.pdf | |
PWC | https://paperswithcode.com/paper/early-visual-concept-learning-with |
Repo | https://github.com/takuseno/beta-vae |
Framework | tf |
Image-to-Image Translation with Conditional Adversarial Networks
Title | Image-to-Image Translation with Conditional Adversarial Networks |
Authors | Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros |
Abstract | We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either. |
Tasks | Cross-View Image-to-Image Translation, Image-to-Image Translation, Nuclear Segmentation |
Published | 2016-11-21 |
URL | http://arxiv.org/abs/1611.07004v3 |
http://arxiv.org/pdf/1611.07004v3.pdf | |
PWC | https://paperswithcode.com/paper/image-to-image-translation-with-conditional |
Repo | https://github.com/leemathew1998/GradientWeight |
Framework | pytorch |
Towards Evaluating the Robustness of Neural Networks
Title | Towards Evaluating the Robustness of Neural Networks |
Authors | Nicholas Carlini, David Wagner |
Abstract | Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input $x$ and any target classification $t$, it is possible to find a new input $x'$ that is similar to $x$ but classified as $t$. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks’ ability to find adversarial examples from $95%$ to $0.5%$. In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with $100%$ probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples. |
Tasks | Adversarial Attack |
Published | 2016-08-16 |
URL | http://arxiv.org/abs/1608.04644v2 |
http://arxiv.org/pdf/1608.04644v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-evaluating-the-robustness-of-neural |
Repo | https://github.com/MadryLab/cifar10_challenge |
Framework | tf |
Learning to reinforcement learn
Title | Learning to reinforcement learn |
Authors | Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, Matt Botvinick |
Abstract | In recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number of challenging task domains. However, a major limitation of such applications is their demand for massive amounts of training data. A critical present objective is thus to develop deep RL methods that can adapt rapidly to new tasks. In the present work we introduce a novel approach to this challenge, which we refer to as deep meta-reinforcement learning. Previous work has shown that recurrent networks can support meta-learning in a fully supervised context. We extend this approach to the RL setting. What emerges is a system that is trained using one RL algorithm, but whose recurrent dynamics implement a second, quite separate RL procedure. This second, learned RL algorithm can differ from the original one in arbitrary ways. Importantly, because it is learned, it is configured to exploit structure in the training domain. We unpack these points in a series of seven proof-of-concept experiments, each of which examines a key aspect of deep meta-RL. We consider prospects for extending and scaling up the approach, and also point out some potentially important implications for neuroscience. |
Tasks | Meta-Learning |
Published | 2016-11-17 |
URL | http://arxiv.org/abs/1611.05763v3 |
http://arxiv.org/pdf/1611.05763v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-reinforcement-learn |
Repo | https://github.com/mtrazzi/two-step-task |
Framework | tf |
Random Walk Models of Network Formation and Sequential Monte Carlo Methods for Graphs
Title | Random Walk Models of Network Formation and Sequential Monte Carlo Methods for Graphs |
Authors | Benjamin Bloem-Reddy, Peter Orbanz |
Abstract | We introduce a class of generative network models that insert edges by connecting the starting and terminal vertices of a random walk on the network graph. Within the taxonomy of statistical network models, this class is distinguished by permitting the location of a new edge to explicitly depend on the structure of the graph, but being nonetheless statistically and computationally tractable. In the limit of infinite walk length, the model converges to an extension of the preferential attachment model—in this sense, it can be motivated alternatively by asking what preferential attachment is an approximation to. Theoretical properties, including the limiting degree sequence, are studied analytically. If the entire history of the graph is observed, parameters can be estimated by maximum likelihood. If only the final graph is available, its history can be imputed using MCMC. We develop a class of sequential Monte Carlo algorithms that are more generally applicable to sequential network models, and may be of interest in their own right. The model parameters can be recovered from a single graph generated by the model. Applications to data clarify the role of the random walk length as a length scale of interactions within the graph. |
Tasks | |
Published | 2016-12-19 |
URL | http://arxiv.org/abs/1612.06404v2 |
http://arxiv.org/pdf/1612.06404v2.pdf | |
PWC | https://paperswithcode.com/paper/random-walk-models-of-network-formation-and |
Repo | https://github.com/ben-br/random_walk_smc |
Framework | none |
SoundNet: Learning Sound Representations from Unlabeled Video
Title | SoundNet: Learning Sound Representations from Unlabeled Video |
Authors | Yusuf Aytar, Carl Vondrick, Antonio Torralba |
Abstract | We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild. We leverage the natural synchronization between vision and sound to learn an acoustic representation using two-million unlabeled videos. Unlabeled video has the advantage that it can be economically acquired at massive scales, yet contains useful signals about natural sound. We propose a student-teacher training procedure which transfers discriminative visual knowledge from well established visual recognition models into the sound modality using unlabeled video as a bridge. Our sound representation yields significant performance improvements over the state-of-the-art results on standard benchmarks for acoustic scene/object classification. Visualizations suggest some high-level semantics automatically emerge in the sound network, even though it is trained without ground truth labels. |
Tasks | Object Classification |
Published | 2016-10-27 |
URL | http://arxiv.org/abs/1610.09001v1 |
http://arxiv.org/pdf/1610.09001v1.pdf | |
PWC | https://paperswithcode.com/paper/soundnet-learning-sound-representations-from |
Repo | https://github.com/eborboihuc/SoundNet-tensorflow |
Framework | tf |
emoji2vec: Learning Emoji Representations from their Description
Title | emoji2vec: Learning Emoji Representations from their Description |
Authors | Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko Bošnjak, Sebastian Riedel |
Abstract | Many current natural language processing applications for social media rely on representation learning and utilize pre-trained word embeddings. There currently exist several publicly-available, pre-trained sets of word embeddings, but they contain few or no emoji representations even as emoji usage in social media has increased. In this paper we release emoji2vec, pre-trained embeddings for all Unicode emoji which are learned from their description in the Unicode emoji standard. The resulting emoji embeddings can be readily used in downstream social natural language processing applications alongside word2vec. We demonstrate, for the downstream task of sentiment analysis, that emoji embeddings learned from short descriptions outperforms a skip-gram model trained on a large collection of tweets, while avoiding the need for contexts in which emoji need to appear frequently in order to estimate a representation. |
Tasks | Representation Learning, Sentiment Analysis, Word Embeddings |
Published | 2016-09-27 |
URL | http://arxiv.org/abs/1609.08359v2 |
http://arxiv.org/pdf/1609.08359v2.pdf | |
PWC | https://paperswithcode.com/paper/emoji2vec-learning-emoji-representations-from |
Repo | https://github.com/hougrammer/emoji_project |
Framework | tf |
Towards Conceptual Compression
Title | Towards Conceptual Compression |
Authors | Karol Gregor, Frederic Besse, Danilo Jimenez Rezende, Ivo Danihelka, Daan Wierstra |
Abstract | We introduce a simple recurrent variational auto-encoder architecture that significantly improves image modeling. The system represents the state-of-the-art in latent variable models for both the ImageNet and Omniglot datasets. We show that it naturally separates global conceptual information from lower level details, thus addressing one of the fundamentally desired properties of unsupervised learning. Furthermore, the possibility of restricting ourselves to storing only global information about an image allows us to achieve high quality ‘conceptual compression’. |
Tasks | Image Generation, Latent Variable Models, Omniglot |
Published | 2016-04-29 |
URL | http://arxiv.org/abs/1604.08772v1 |
http://arxiv.org/pdf/1604.08772v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-conceptual-compression |
Repo | https://github.com/musyoku/convolutional-draw |
Framework | none |
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
Title | StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks |
Authors | Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas |
Abstract | Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications. Samples generated by existing text-to-image approaches can roughly reflect the meaning of the given descriptions, but they fail to contain necessary details and vivid object parts. In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) to generate 256x256 photo-realistic images conditioned on text descriptions. We decompose the hard problem into more manageable sub-problems through a sketch-refinement process. The Stage-I GAN sketches the primitive shape and colors of the object based on the given text description, yielding Stage-I low-resolution images. The Stage-II GAN takes Stage-I results and text descriptions as inputs, and generates high-resolution images with photo-realistic details. It is able to rectify defects in Stage-I results and add compelling details with the refinement process. To improve the diversity of the synthesized images and stabilize the training of the conditional-GAN, we introduce a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold. Extensive experiments and comparisons with state-of-the-arts on benchmark datasets demonstrate that the proposed method achieves significant improvements on generating photo-realistic images conditioned on text descriptions. |
Tasks | Image Generation, Text-to-Image Generation |
Published | 2016-12-10 |
URL | http://arxiv.org/abs/1612.03242v2 |
http://arxiv.org/pdf/1612.03242v2.pdf | |
PWC | https://paperswithcode.com/paper/stackgan-text-to-photo-realistic-image |
Repo | https://github.com/hanzhanggit/StackGAN-Pytorch |
Framework | pytorch |
Binarized Neural Networks on the ImageNet Classification Task
Title | Binarized Neural Networks on the ImageNet Classification Task |
Authors | Xundong Wu, Yong Wu, Yong Zhao |
Abstract | We trained Binarized Neural Networks (BNNs) on the high resolution ImageNet ILSVRC-2102 dataset classification task and achieved a good performance. With a moderate size network of 13 layers, we obtained top-5 classification accuracy rate of 84.1 % on validation set through network distillation, much better than previous published results of 73.2% on XNOR network and 69.1% on binarized GoogleNET. We expect networks of better performance can be obtained by following our current strategies. We provide a detailed discussion and preliminary analysis on strategies used in the network training. |
Tasks | |
Published | 2016-04-11 |
URL | http://arxiv.org/abs/1604.03058v5 |
http://arxiv.org/pdf/1604.03058v5.pdf | |
PWC | https://paperswithcode.com/paper/binarized-neural-networks-on-the-imagenet |
Repo | https://github.com/chencongchong/BNN |
Framework | tf |
Supervised learning based on temporal coding in spiking neural networks
Title | Supervised learning based on temporal coding in spiking neural networks |
Authors | Hesham Mostafa |
Abstract | Gradient descent training techniques are remarkably successful in training analog-valued artificial neural networks (ANNs). Such training techniques, however, do not transfer easily to spiking networks due to the spike generation hard non-linearity and the discrete nature of spike communication. We show that in a feedforward spiking network that uses a temporal coding scheme where information is encoded in spike times instead of spike rates, the network input-output relation is differentiable almost everywhere. Moreover, this relation is piece-wise linear after a transformation of variables. Methods for training ANNs thus carry directly to the training of such spiking networks as we show when training on the permutation invariant MNIST task. In contrast to rate-based spiking networks that are often used to approximate the behavior of ANNs, the networks we present spike much more sparsely and their behavior can not be directly approximated by conventional ANNs. Our results highlight a new approach for controlling the behavior of spiking networks with realistic temporal dynamics, opening up the potential for using these networks to process spike patterns with complex temporal information. |
Tasks | |
Published | 2016-06-27 |
URL | http://arxiv.org/abs/1606.08165v2 |
http://arxiv.org/pdf/1606.08165v2.pdf | |
PWC | https://paperswithcode.com/paper/supervised-learning-based-on-temporal-coding |
Repo | https://github.com/TianjianCai/SNN |
Framework | tf |
Asynchronous Multi-Task Learning
Title | Asynchronous Multi-Task Learning |
Authors | Inci M. Baytas, Ming Yan, Anil K. Jain, Jiayu Zhou |
Abstract | Many real-world machine learning applications involve several learning tasks which are inter-related. For example, in healthcare domain, we need to learn a predictive model of a certain disease for many hospitals. The models for each hospital may be different because of the inherent differences in the distributions of the patient populations. However, the models are also closely related because of the nature of the learning tasks modeling the same disease. By simultaneously learning all the tasks, multi-task learning (MTL) paradigm performs inductive knowledge transfer among tasks to improve the generalization performance. When datasets for the learning tasks are stored at different locations, it may not always be feasible to transfer the data to provide a data-centralized computing environment due to various practical issues such as high data volume and privacy. In this paper, we propose a principled MTL framework for distributed and asynchronous optimization to address the aforementioned challenges. In our framework, gradient update does not wait for collecting the gradient information from all the tasks. Therefore, the proposed method is very efficient when the communication delay is too high for some task nodes. We show that many regularized MTL formulations can benefit from this framework, including the low-rank MTL for shared subspace learning. Empirical studies on both synthetic and real-world datasets demonstrate the efficiency and effectiveness of the proposed framework. |
Tasks | Multi-Task Learning, Transfer Learning |
Published | 2016-09-30 |
URL | http://arxiv.org/abs/1609.09563v1 |
http://arxiv.org/pdf/1609.09563v1.pdf | |
PWC | https://paperswithcode.com/paper/asynchronous-multi-task-learning |
Repo | https://github.com/illidanlab/AMTL |
Framework | none |
Finding Tiny Faces
Title | Finding Tiny Faces |
Authors | Peiyun Hu, Deva Ramanan |
Abstract | Though tremendous strides have been made in object recognition, one of the remaining open challenges is detecting small objects. We explore three aspects of the problem in the context of finding small faces: the role of scale invariance, image resolution, and contextual reasoning. While most recognition approaches aim to be scale-invariant, the cues for recognizing a 3px tall face are fundamentally different than those for recognizing a 300px tall face. We take a different approach and train separate detectors for different scales. To maintain efficiency, detectors are trained in a multi-task fashion: they make use of features extracted from multiple layers of single (deep) feature hierarchy. While training detectors for large objects is straightforward, the crucial challenge remains training detectors for small objects. We show that context is crucial, and define templates that make use of massively-large receptive fields (where 99% of the template extends beyond the object of interest). Finally, we explore the role of scale in pre-trained deep networks, providing ways to extrapolate networks tuned for limited scales to rather extreme ranges. We demonstrate state-of-the-art results on massively-benchmarked face datasets (FDDB and WIDER FACE). In particular, when compared to prior art on WIDER FACE, our results reduce error by a factor of 2 (our models produce an AP of 82% while prior art ranges from 29-64%). |
Tasks | Face Detection, Object Recognition |
Published | 2016-12-13 |
URL | http://arxiv.org/abs/1612.04402v2 |
http://arxiv.org/pdf/1612.04402v2.pdf | |
PWC | https://paperswithcode.com/paper/finding-tiny-faces |
Repo | https://github.com/cydonia999/Tiny_Faces_in_Tensorflow |
Framework | tf |
Awesome Typography: Statistics-Based Text Effects Transfer
Title | Awesome Typography: Statistics-Based Text Effects Transfer |
Authors | Shuai Yang, Jiaying Liu, Zhouhui Lian, Zongming Guo |
Abstract | In this work, we explore the problem of generating fantastic special-effects for the typography. It is quite challenging due to the model diversities to illustrate varied text effects for different characters. To address this issue, our key idea is to exploit the analytics on the high regularity of the spatial distribution for text effects to guide the synthesis process. Specifically, we characterize the stylized patches by their normalized positions and the optimal scales to depict their style elements. Our method first estimates these two features and derives their correlation statistically. They are then converted into soft constraints for texture transfer to accomplish adaptive multi-scale texture synthesis and to make style element distribution uniform. It allows our algorithm to produce artistic typography that fits for both local texture patterns and the global spatial distribution in the example. Experimental results demonstrate the superiority of our method for various text effects over conventional style transfer methods. In addition, we validate the effectiveness of our algorithm with extensive artistic typography library generation. |
Tasks | Style Transfer, Text Effects Transfer, Texture Synthesis |
Published | 2016-11-28 |
URL | http://arxiv.org/abs/1611.09026v2 |
http://arxiv.org/pdf/1611.09026v2.pdf | |
PWC | https://paperswithcode.com/paper/awesome-typography-statistics-based-text |
Repo | https://github.com/ycjing/Character-Stylization |
Framework | none |
A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection
Title | A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection |
Authors | Zhaowei Cai, Quanfu Fan, Rogerio S. Feris, Nuno Vasconcelos |
Abstract | A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi-scale object detection. The MS-CNN consists of a proposal sub-network and a detection sub-network. In the proposal sub-network, detection is performed at multiple output layers, so that receptive fields match objects of different scales. These complementary scale-specific detectors are combined to produce a strong multi-scale object detector. The unified network is learned end-to-end, by optimizing a multi-task loss. Feature upsampling by deconvolution is also explored, as an alternative to input upsampling, to reduce the memory and computation costs. State-of-the-art object detection performance, at up to 15 fps, is reported on datasets, such as KITTI and Caltech, containing a substantial number of small objects. |
Tasks | Face Detection, Object Detection, Pedestrian Detection, Real-Time Object Detection |
Published | 2016-07-25 |
URL | http://arxiv.org/abs/1607.07155v1 |
http://arxiv.org/pdf/1607.07155v1.pdf | |
PWC | https://paperswithcode.com/paper/a-unified-multi-scale-deep-convolutional |
Repo | https://github.com/zhaoweicai/mscnn |
Framework | none |