May 7, 2019

3065 words 15 mins read

Paper Group AWR 5

Early Visual Concept Learning with Unsupervised Deep Learning. Image-to-Image Translation with Conditional Adversarial Networks. Towards Evaluating the Robustness of Neural Networks. Learning to reinforcement learn. Random Walk Models of Network Formation and Sequential Monte Carlo Methods for Graphs. SoundNet: Learning Sound Representations from U …

Early Visual Concept Learning with Unsupervised Deep Learning


Title	Early Visual Concept Learning with Unsupervised Deep Learning
Authors	Irina Higgins, Loic Matthey, Xavier Glorot, Arka Pal, Benigno Uria, Charles Blundell, Shakir Mohamed, Alexander Lerchner
Abstract	Automated discovery of early visual concepts from raw image data is a major open challenge in AI research. Addressing this problem, we propose an unsupervised approach for learning disentangled representations of the underlying factors of variation. We draw inspiration from neuroscience, and show how this can be achieved in an unsupervised generative model by applying the same learning pressures as have been suggested to act in the ventral visual stream in the brain. By enforcing redundancy reduction, encouraging statistical independence, and exposure to data with transform continuities analogous to those to which human infants are exposed, we obtain a variational autoencoder (VAE) framework capable of learning disentangled factors. Our approach makes few assumptions and works well across a wide variety of datasets. Furthermore, our solution has useful emergent properties, such as zero-shot inference and an intuitive understanding of “objectness”.
Tasks
Published	2016-06-17
URL	http://arxiv.org/abs/1606.05579v3
PDF	http://arxiv.org/pdf/1606.05579v3.pdf
PWC	https://paperswithcode.com/paper/early-visual-concept-learning-with
Repo	https://github.com/takuseno/beta-vae
Framework	tf

Image-to-Image Translation with Conditional Adversarial Networks


Title	Image-to-Image Translation with Conditional Adversarial Networks
Authors	Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros
Abstract	We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.
Tasks	Cross-View Image-to-Image Translation, Image-to-Image Translation, Nuclear Segmentation
Published	2016-11-21
URL	http://arxiv.org/abs/1611.07004v3
PDF	http://arxiv.org/pdf/1611.07004v3.pdf
PWC	https://paperswithcode.com/paper/image-to-image-translation-with-conditional
Repo	https://github.com/leemathew1998/GradientWeight
Framework	pytorch

Towards Evaluating the Robustness of Neural Networks


Title	Towards Evaluating the Robustness of Neural Networks
Authors	Nicholas Carlini, David Wagner
Abstract	Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input $x$ and any target classification $t$, it is possible to find a new input $x'$ that is similar to $x$ but classified as $t$. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks’ ability to find adversarial examples from $95%$ to $0.5%$. In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with $100%$ probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.
Tasks	Adversarial Attack
Published	2016-08-16
URL	http://arxiv.org/abs/1608.04644v2
PDF	http://arxiv.org/pdf/1608.04644v2.pdf
PWC	https://paperswithcode.com/paper/towards-evaluating-the-robustness-of-neural
Repo	https://github.com/MadryLab/cifar10_challenge
Framework	tf

Learning to reinforcement learn


Title	Learning to reinforcement learn
Authors	Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, Matt Botvinick
Abstract	In recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number of challenging task domains. However, a major limitation of such applications is their demand for massive amounts of training data. A critical present objective is thus to develop deep RL methods that can adapt rapidly to new tasks. In the present work we introduce a novel approach to this challenge, which we refer to as deep meta-reinforcement learning. Previous work has shown that recurrent networks can support meta-learning in a fully supervised context. We extend this approach to the RL setting. What emerges is a system that is trained using one RL algorithm, but whose recurrent dynamics implement a second, quite separate RL procedure. This second, learned RL algorithm can differ from the original one in arbitrary ways. Importantly, because it is learned, it is configured to exploit structure in the training domain. We unpack these points in a series of seven proof-of-concept experiments, each of which examines a key aspect of deep meta-RL. We consider prospects for extending and scaling up the approach, and also point out some potentially important implications for neuroscience.
Tasks	Meta-Learning
Published	2016-11-17
URL	http://arxiv.org/abs/1611.05763v3
PDF	http://arxiv.org/pdf/1611.05763v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-reinforcement-learn
Repo	https://github.com/mtrazzi/two-step-task
Framework	tf

Random Walk Models of Network Formation and Sequential Monte Carlo Methods for Graphs


Title	Random Walk Models of Network Formation and Sequential Monte Carlo Methods for Graphs
Authors	Benjamin Bloem-Reddy, Peter Orbanz
Abstract	We introduce a class of generative network models that insert edges by connecting the starting and terminal vertices of a random walk on the network graph. Within the taxonomy of statistical network models, this class is distinguished by permitting the location of a new edge to explicitly depend on the structure of the graph, but being nonetheless statistically and computationally tractable. In the limit of infinite walk length, the model converges to an extension of the preferential attachment model—in this sense, it can be motivated alternatively by asking what preferential attachment is an approximation to. Theoretical properties, including the limiting degree sequence, are studied analytically. If the entire history of the graph is observed, parameters can be estimated by maximum likelihood. If only the final graph is available, its history can be imputed using MCMC. We develop a class of sequential Monte Carlo algorithms that are more generally applicable to sequential network models, and may be of interest in their own right. The model parameters can be recovered from a single graph generated by the model. Applications to data clarify the role of the random walk length as a length scale of interactions within the graph.
Tasks
Published	2016-12-19
URL	http://arxiv.org/abs/1612.06404v2
PDF	http://arxiv.org/pdf/1612.06404v2.pdf
PWC	https://paperswithcode.com/paper/random-walk-models-of-network-formation-and
Repo	https://github.com/ben-br/random_walk_smc
Framework	none

SoundNet: Learning Sound Representations from Unlabeled Video


Title	SoundNet: Learning Sound Representations from Unlabeled Video
Authors	Yusuf Aytar, Carl Vondrick, Antonio Torralba
Abstract	We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild. We leverage the natural synchronization between vision and sound to learn an acoustic representation using two-million unlabeled videos. Unlabeled video has the advantage that it can be economically acquired at massive scales, yet contains useful signals about natural sound. We propose a student-teacher training procedure which transfers discriminative visual knowledge from well established visual recognition models into the sound modality using unlabeled video as a bridge. Our sound representation yields significant performance improvements over the state-of-the-art results on standard benchmarks for acoustic scene/object classification. Visualizations suggest some high-level semantics automatically emerge in the sound network, even though it is trained without ground truth labels.
Tasks	Object Classification
Published	2016-10-27
URL	http://arxiv.org/abs/1610.09001v1
PDF	http://arxiv.org/pdf/1610.09001v1.pdf
PWC	https://paperswithcode.com/paper/soundnet-learning-sound-representations-from
Repo	https://github.com/eborboihuc/SoundNet-tensorflow
Framework	tf

emoji2vec: Learning Emoji Representations from their Description


Title	emoji2vec: Learning Emoji Representations from their Description
Authors	Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko Bošnjak, Sebastian Riedel
Abstract	Many current natural language processing applications for social media rely on representation learning and utilize pre-trained word embeddings. There currently exist several publicly-available, pre-trained sets of word embeddings, but they contain few or no emoji representations even as emoji usage in social media has increased. In this paper we release emoji2vec, pre-trained embeddings for all Unicode emoji which are learned from their description in the Unicode emoji standard. The resulting emoji embeddings can be readily used in downstream social natural language processing applications alongside word2vec. We demonstrate, for the downstream task of sentiment analysis, that emoji embeddings learned from short descriptions outperforms a skip-gram model trained on a large collection of tweets, while avoiding the need for contexts in which emoji need to appear frequently in order to estimate a representation.
Tasks	Representation Learning, Sentiment Analysis, Word Embeddings
Published	2016-09-27
URL	http://arxiv.org/abs/1609.08359v2
PDF	http://arxiv.org/pdf/1609.08359v2.pdf
PWC	https://paperswithcode.com/paper/emoji2vec-learning-emoji-representations-from
Repo	https://github.com/hougrammer/emoji_project
Framework	tf

Towards Conceptual Compression


Title	Towards Conceptual Compression
Authors	Karol Gregor, Frederic Besse, Danilo Jimenez Rezende, Ivo Danihelka, Daan Wierstra
Abstract	We introduce a simple recurrent variational auto-encoder architecture that significantly improves image modeling. The system represents the state-of-the-art in latent variable models for both the ImageNet and Omniglot datasets. We show that it naturally separates global conceptual information from lower level details, thus addressing one of the fundamentally desired properties of unsupervised learning. Furthermore, the possibility of restricting ourselves to storing only global information about an image allows us to achieve high quality ‘conceptual compression’.
Tasks	Image Generation, Latent Variable Models, Omniglot
Published	2016-04-29
URL	http://arxiv.org/abs/1604.08772v1
PDF	http://arxiv.org/pdf/1604.08772v1.pdf
PWC	https://paperswithcode.com/paper/towards-conceptual-compression
Repo	https://github.com/musyoku/convolutional-draw
Framework	none

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks


Title	StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
Authors	Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas
Abstract	Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications. Samples generated by existing text-to-image approaches can roughly reflect the meaning of the given descriptions, but they fail to contain necessary details and vivid object parts. In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) to generate 256x256 photo-realistic images conditioned on text descriptions. We decompose the hard problem into more manageable sub-problems through a sketch-refinement process. The Stage-I GAN sketches the primitive shape and colors of the object based on the given text description, yielding Stage-I low-resolution images. The Stage-II GAN takes Stage-I results and text descriptions as inputs, and generates high-resolution images with photo-realistic details. It is able to rectify defects in Stage-I results and add compelling details with the refinement process. To improve the diversity of the synthesized images and stabilize the training of the conditional-GAN, we introduce a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold. Extensive experiments and comparisons with state-of-the-arts on benchmark datasets demonstrate that the proposed method achieves significant improvements on generating photo-realistic images conditioned on text descriptions.
Tasks	Image Generation, Text-to-Image Generation
Published	2016-12-10
URL	http://arxiv.org/abs/1612.03242v2
PDF	http://arxiv.org/pdf/1612.03242v2.pdf
PWC	https://paperswithcode.com/paper/stackgan-text-to-photo-realistic-image
Repo	https://github.com/hanzhanggit/StackGAN-Pytorch
Framework	pytorch

Binarized Neural Networks on the ImageNet Classification Task


Title	Binarized Neural Networks on the ImageNet Classification Task
Authors	Xundong Wu, Yong Wu, Yong Zhao
Abstract	We trained Binarized Neural Networks (BNNs) on the high resolution ImageNet ILSVRC-2102 dataset classification task and achieved a good performance. With a moderate size network of 13 layers, we obtained top-5 classification accuracy rate of 84.1 % on validation set through network distillation, much better than previous published results of 73.2% on XNOR network and 69.1% on binarized GoogleNET. We expect networks of better performance can be obtained by following our current strategies. We provide a detailed discussion and preliminary analysis on strategies used in the network training.
Tasks
Published	2016-04-11
URL	http://arxiv.org/abs/1604.03058v5
PDF	http://arxiv.org/pdf/1604.03058v5.pdf
PWC	https://paperswithcode.com/paper/binarized-neural-networks-on-the-imagenet
Repo	https://github.com/chencongchong/BNN
Framework	tf

Supervised learning based on temporal coding in spiking neural networks


Title	Supervised learning based on temporal coding in spiking neural networks
Authors	Hesham Mostafa
Abstract	Gradient descent training techniques are remarkably successful in training analog-valued artificial neural networks (ANNs). Such training techniques, however, do not transfer easily to spiking networks due to the spike generation hard non-linearity and the discrete nature of spike communication. We show that in a feedforward spiking network that uses a temporal coding scheme where information is encoded in spike times instead of spike rates, the network input-output relation is differentiable almost everywhere. Moreover, this relation is piece-wise linear after a transformation of variables. Methods for training ANNs thus carry directly to the training of such spiking networks as we show when training on the permutation invariant MNIST task. In contrast to rate-based spiking networks that are often used to approximate the behavior of ANNs, the networks we present spike much more sparsely and their behavior can not be directly approximated by conventional ANNs. Our results highlight a new approach for controlling the behavior of spiking networks with realistic temporal dynamics, opening up the potential for using these networks to process spike patterns with complex temporal information.
Tasks
Published	2016-06-27
URL	http://arxiv.org/abs/1606.08165v2
PDF	http://arxiv.org/pdf/1606.08165v2.pdf
PWC	https://paperswithcode.com/paper/supervised-learning-based-on-temporal-coding
Repo	https://github.com/TianjianCai/SNN
Framework	tf

Asynchronous Multi-Task Learning


Title	Asynchronous Multi-Task Learning
Authors	Inci M. Baytas, Ming Yan, Anil K. Jain, Jiayu Zhou
Abstract	Many real-world machine learning applications involve several learning tasks which are inter-related. For example, in healthcare domain, we need to learn a predictive model of a certain disease for many hospitals. The models for each hospital may be different because of the inherent differences in the distributions of the patient populations. However, the models are also closely related because of the nature of the learning tasks modeling the same disease. By simultaneously learning all the tasks, multi-task learning (MTL) paradigm performs inductive knowledge transfer among tasks to improve the generalization performance. When datasets for the learning tasks are stored at different locations, it may not always be feasible to transfer the data to provide a data-centralized computing environment due to various practical issues such as high data volume and privacy. In this paper, we propose a principled MTL framework for distributed and asynchronous optimization to address the aforementioned challenges. In our framework, gradient update does not wait for collecting the gradient information from all the tasks. Therefore, the proposed method is very efficient when the communication delay is too high for some task nodes. We show that many regularized MTL formulations can benefit from this framework, including the low-rank MTL for shared subspace learning. Empirical studies on both synthetic and real-world datasets demonstrate the efficiency and effectiveness of the proposed framework.
Tasks	Multi-Task Learning, Transfer Learning
Published	2016-09-30
URL	http://arxiv.org/abs/1609.09563v1
PDF	http://arxiv.org/pdf/1609.09563v1.pdf
PWC	https://paperswithcode.com/paper/asynchronous-multi-task-learning
Repo	https://github.com/illidanlab/AMTL
Framework	none

Finding Tiny Faces


Title	Finding Tiny Faces
Authors	Peiyun Hu, Deva Ramanan
Abstract	Though tremendous strides have been made in object recognition, one of the remaining open challenges is detecting small objects. We explore three aspects of the problem in the context of finding small faces: the role of scale invariance, image resolution, and contextual reasoning. While most recognition approaches aim to be scale-invariant, the cues for recognizing a 3px tall face are fundamentally different than those for recognizing a 300px tall face. We take a different approach and train separate detectors for different scales. To maintain efficiency, detectors are trained in a multi-task fashion: they make use of features extracted from multiple layers of single (deep) feature hierarchy. While training detectors for large objects is straightforward, the crucial challenge remains training detectors for small objects. We show that context is crucial, and define templates that make use of massively-large receptive fields (where 99% of the template extends beyond the object of interest). Finally, we explore the role of scale in pre-trained deep networks, providing ways to extrapolate networks tuned for limited scales to rather extreme ranges. We demonstrate state-of-the-art results on massively-benchmarked face datasets (FDDB and WIDER FACE). In particular, when compared to prior art on WIDER FACE, our results reduce error by a factor of 2 (our models produce an AP of 82% while prior art ranges from 29-64%).
Tasks	Face Detection, Object Recognition
Published	2016-12-13
URL	http://arxiv.org/abs/1612.04402v2
PDF	http://arxiv.org/pdf/1612.04402v2.pdf
PWC	https://paperswithcode.com/paper/finding-tiny-faces
Repo	https://github.com/cydonia999/Tiny_Faces_in_Tensorflow
Framework	tf

Awesome Typography: Statistics-Based Text Effects Transfer


Title	Awesome Typography: Statistics-Based Text Effects Transfer
Authors	Shuai Yang, Jiaying Liu, Zhouhui Lian, Zongming Guo
Abstract	In this work, we explore the problem of generating fantastic special-effects for the typography. It is quite challenging due to the model diversities to illustrate varied text effects for different characters. To address this issue, our key idea is to exploit the analytics on the high regularity of the spatial distribution for text effects to guide the synthesis process. Specifically, we characterize the stylized patches by their normalized positions and the optimal scales to depict their style elements. Our method first estimates these two features and derives their correlation statistically. They are then converted into soft constraints for texture transfer to accomplish adaptive multi-scale texture synthesis and to make style element distribution uniform. It allows our algorithm to produce artistic typography that fits for both local texture patterns and the global spatial distribution in the example. Experimental results demonstrate the superiority of our method for various text effects over conventional style transfer methods. In addition, we validate the effectiveness of our algorithm with extensive artistic typography library generation.
Tasks	Style Transfer, Text Effects Transfer, Texture Synthesis
Published	2016-11-28
URL	http://arxiv.org/abs/1611.09026v2
PDF	http://arxiv.org/pdf/1611.09026v2.pdf
PWC	https://paperswithcode.com/paper/awesome-typography-statistics-based-text
Repo	https://github.com/ycjing/Character-Stylization
Framework	none

A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection


Title	A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection
Authors	Zhaowei Cai, Quanfu Fan, Rogerio S. Feris, Nuno Vasconcelos
Abstract	A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi-scale object detection. The MS-CNN consists of a proposal sub-network and a detection sub-network. In the proposal sub-network, detection is performed at multiple output layers, so that receptive fields match objects of different scales. These complementary scale-specific detectors are combined to produce a strong multi-scale object detector. The unified network is learned end-to-end, by optimizing a multi-task loss. Feature upsampling by deconvolution is also explored, as an alternative to input upsampling, to reduce the memory and computation costs. State-of-the-art object detection performance, at up to 15 fps, is reported on datasets, such as KITTI and Caltech, containing a substantial number of small objects.
Tasks	Face Detection, Object Detection, Pedestrian Detection, Real-Time Object Detection
Published	2016-07-25
URL	http://arxiv.org/abs/1607.07155v1
PDF	http://arxiv.org/pdf/1607.07155v1.pdf
PWC	https://paperswithcode.com/paper/a-unified-multi-scale-deep-convolutional
Repo	https://github.com/zhaoweicai/mscnn
Framework	none