October 20, 2019

3127 words 15 mins read

Paper Group AWR 194

On GANs and GMMs. BourGAN: Generative Networks with Metric Embeddings. Semantic Parsing for Task Oriented Dialog using Hierarchical Representations. A Bi-population Particle Swarm Optimizer for Learning Automata based Slow Intelligent System. Connecting Image Denoising and High-Level Vision Tasks via Deep Learning. Singing Style Transfer Using Cycl …

On GANs and GMMs


Title	On GANs and GMMs
Authors	Eitan Richardson, Yair Weiss
Abstract	A longstanding problem in machine learning is to find unsupervised methods that can learn the statistical structure of high dimensional signals. In recent years, GANs have gained much attention as a possible solution to the problem, and in particular have shown the ability to generate remarkably realistic high resolution sampled images. At the same time, many authors have pointed out that GANs may fail to model the full distribution (“mode collapse”) and that using the learned models for anything other than generating samples may be very difficult. In this paper, we examine the utility of GANs in learning statistical models of images by comparing them to perhaps the simplest statistical model, the Gaussian Mixture Model. First, we present a simple method to evaluate generative models based on relative proportions of samples that fall into predetermined bins. Unlike previous automatic methods for evaluating models, our method does not rely on an additional neural network nor does it require approximating intractable computations. Second, we compare the performance of GANs to GMMs trained on the same datasets. While GMMs have previously been shown to be successful in modeling small patches of images, we show how to train them on full sized images despite the high dimensionality. Our results show that GMMs can generate realistic samples (although less sharp than those of GANs) but also capture the full distribution, which GANs fail to do. Furthermore, GMMs allow efficient inference and explicit representation of the underlying statistical structure. Finally, we discuss how GMMs can be used to generate sharp images.
Tasks	Image Generation
Published	2018-05-31
URL	http://arxiv.org/abs/1805.12462v2
PDF	http://arxiv.org/pdf/1805.12462v2.pdf
PWC	https://paperswithcode.com/paper/on-gans-and-gmms
Repo	https://github.com/eitanrich/torch-mfa
Framework	pytorch

BourGAN: Generative Networks with Metric Embeddings


Title	BourGAN: Generative Networks with Metric Embeddings
Authors	Chang Xiao, Peilin Zhong, Changxi Zheng
Abstract	This paper addresses the mode collapse for generative adversarial networks (GANs). We view modes as a geometric structure of data distribution in a metric space. Under this geometric lens, we embed subsamples of the dataset from an arbitrary metric space into the l2 space, while preserving their pairwise distance distribution. Not only does this metric embedding determine the dimensionality of the latent space automatically, it also enables us to construct a mixture of Gaussians to draw latent space random vectors. We use the Gaussian mixture model in tandem with a simple augmentation of the objective function to train GANs. Every major step of our method is supported by theoretical analysis, and our experiments on real and synthetic data confirm that the generator is able to produce samples spreading over most of the modes while avoiding unwanted samples, outperforming several recent GAN variants on a number of metrics and offering new features.
Tasks
Published	2018-05-19
URL	http://arxiv.org/abs/1805.07674v3
PDF	http://arxiv.org/pdf/1805.07674v3.pdf
PWC	https://paperswithcode.com/paper/bourgan-generative-networks-with-metric
Repo	https://github.com/a554b554/BourGAN
Framework	pytorch

Semantic Parsing for Task Oriented Dialog using Hierarchical Representations


Title	Semantic Parsing for Task Oriented Dialog using Hierarchical Representations
Authors	Sonal Gupta, Rushin Shah, Mrinal Mohit, Anuj Kumar, Mike Lewis
Abstract	Task oriented dialog systems typically first parse user utterances to semantic frames comprised of intents and slots. Previous work on task oriented intent and slot-filling work has been restricted to one intent per query and one slot label per token, and thus cannot model complex compositional requests. Alternative semantic parsing systems have represented queries as logical forms, but these are challenging to annotate and parse. We propose a hierarchical annotation scheme for semantic parsing that allows the representation of compositional queries, and can be efficiently and accurately parsed by standard constituency parsing models. We release a dataset of 44k annotated queries (fb.me/semanticparsingdialog), and show that parsing models outperform sequence-to-sequence approaches on this dataset.
Tasks	Constituency Parsing, Semantic Parsing, Slot Filling
Published	2018-10-18
URL	http://arxiv.org/abs/1810.07942v1
PDF	http://arxiv.org/pdf/1810.07942v1.pdf
PWC	https://paperswithcode.com/paper/semantic-parsing-for-task-oriented-dialog
Repo	https://github.com/sz128/NLU_datasets_for_task_oriented_dialogue
Framework	pytorch

A Bi-population Particle Swarm Optimizer for Learning Automata based Slow Intelligent System


Title	A Bi-population Particle Swarm Optimizer for Learning Automata based Slow Intelligent System
Authors	Mohammad Hasanzadeh Mofrad, S. K. Chang
Abstract	Particle Swarm Optimization (PSO) is an Evolutionary Algorithm (EA) that utilizes a swarm of particles to solve an optimization problem. Slow Intelligence System (SIS) is a learning framework which slowly learns the solution to a problem performing a series of operations. Moreover, Learning Automata (LA) are minuscule but effective decision making entities which are best suited to act as a controller component. In this paper, we combine two isolate populations of PSO to forge the Adaptive Intelligence Optimizer (AIO) which harnesses the advantages of a bi-population PSO to escape from the local minimum and avoid premature convergence. Furthermore, using the rich framework of SIS and the nifty control theory that LA derived from, we find the perfect matching between SIS and LA where acting slowly is the pillar of both of them. Both SIS and LA need time to converge to the optimal decision where this enables AIO to outperform standard PSO having an incomparable performance on evolutionary optimization benchmark functions.
Tasks	Decision Making
Published	2018-04-03
URL	http://arxiv.org/abs/1804.00768v1
PDF	http://arxiv.org/pdf/1804.00768v1.pdf
PWC	https://paperswithcode.com/paper/a-bi-population-particle-swarm-optimizer-for
Repo	https://github.com/hmofrad/pso
Framework	none

Connecting Image Denoising and High-Level Vision Tasks via Deep Learning


Title	Connecting Image Denoising and High-Level Vision Tasks via Deep Learning
Authors	Ding Liu, Bihan Wen, Jianbo Jiao, Xianming Liu, Zhangyang Wang, Thomas S. Huang
Abstract	Image denoising and high-level vision tasks are usually handled independently in the conventional practice of computer vision, and their connection is fragile. In this paper, we cope with the two jointly and explore the mutual influence between them with the focus on two questions, namely (1) how image denoising can help improving high-level vision tasks, and (2) how the semantic information from high-level vision tasks can be used to guide image denoising. First for image denoising we propose a convolutional neural network in which convolutions are conducted in various spatial resolutions via downsampling and upsampling operations in order to fuse and exploit contextual information on different scales. Second we propose a deep neural network solution that cascades two modules for image denoising and various high-level tasks, respectively, and use the joint loss for updating only the denoising network via back-propagation. We experimentally show that on one hand, the proposed denoiser has the generality to overcome the performance degradation of different high-level vision tasks. On the other hand, with the guidance of high-level vision information, the denoising network produces more visually appealing results. Extensive experiments demonstrate the benefit of exploiting image semantics simultaneously for image denoising and high-level vision tasks via deep learning. The code is available online: https://github.com/Ding-Liu/DeepDenoising
Tasks	Denoising, Image Denoising
Published	2018-09-06
URL	http://arxiv.org/abs/1809.01826v1
PDF	http://arxiv.org/pdf/1809.01826v1.pdf
PWC	https://paperswithcode.com/paper/connecting-image-denoising-and-high-level
Repo	https://github.com/Ding-Liu/DeepDenoising
Framework	none

Singing Style Transfer Using Cycle-Consistent Boundary Equilibrium Generative Adversarial Networks


Title	Singing Style Transfer Using Cycle-Consistent Boundary Equilibrium Generative Adversarial Networks
Authors	Cheng-Wei Wu, Jen-Yu Liu, Yi-Hsuan Yang, Jyh-Shing R. Jang
Abstract	Can we make a famous rap singer like Eminem sing whatever our favorite song? Singing style transfer attempts to make this possible, by replacing the vocal of a song from the source singer to the target singer. This paper presents a method that learns from unpaired data for singing style transfer using generative adversarial networks.
Tasks	Style Transfer
Published	2018-07-06
URL	http://arxiv.org/abs/1807.02254v1
PDF	http://arxiv.org/pdf/1807.02254v1.pdf
PWC	https://paperswithcode.com/paper/singing-style-transfer-using-cycle-consistent
Repo	https://github.com/eliceio/vocal-style-transfer
Framework	tf

Three-Dimensionally Embedded Graph Convolutional Network (3DGCN) for Molecule Interpretation


Title	Three-Dimensionally Embedded Graph Convolutional Network (3DGCN) for Molecule Interpretation
Authors	Hyeoncheol Cho, Insung S. Choi
Abstract	We present a three-dimensional graph convolutional network (3DGCN), which predicts molecular properties and biochemical activities, based on 3D molecular graph. In the 3DGCN, graph convolution is unified with learning operations on the vector to handle the spatial information from molecular topology. The 3DGCN model exhibits significantly higher performance on various tasks compared with other deep-learning models, and has the ability of generalizing a given conformer to targeted features regardless of its rotations in the 3D space. More significantly, our model also can distinguish the 3D rotations of a molecule and predict the target value, depending upon the rotation degree, in the protein-ligand docking problem, when trained with orientation-dependent datasets. The rotation distinguishability of 3DGCN, along with rotation equivariance, provides a key milestone in the implementation of three-dimensionality to the field of deep-learning chemistry that solves challenging biochemical problems.
Tasks	Molecule Interpretation
Published	2018-11-24
URL	http://arxiv.org/abs/1811.09794v4
PDF	http://arxiv.org/pdf/1811.09794v4.pdf
PWC	https://paperswithcode.com/paper/three-dimensionally-embedded-graph
Repo	https://github.com/blackmints/3DGCN
Framework	tf

Large-Scale Study of Curiosity-Driven Learning


Title	Large-Scale Study of Curiosity-Driven Learning
Authors	Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros
Abstract	Reinforcement learning algorithms rely on carefully engineering environment rewards that are extrinsic to the agent. However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for developing reward functions that are intrinsic to the agent. Curiosity is a type of intrinsic reward function which uses prediction error as reward signal. In this paper: (a) We perform the first large-scale study of purely curiosity-driven learning, i.e. without any extrinsic rewards, across 54 standard benchmark environments, including the Atari game suite. Our results show surprisingly good performance, and a high degree of alignment between the intrinsic curiosity objective and the hand-designed extrinsic rewards of many game environments. (b) We investigate the effect of using different feature spaces for computing prediction error and show that random features are sufficient for many popular RL game benchmarks, but learned features appear to generalize better (e.g. to novel game levels in Super Mario Bros.). (c) We demonstrate limitations of the prediction-based rewards in stochastic setups. Game-play videos and code are at https://pathak22.github.io/large-scale-curiosity/
Tasks	Atari Games, SNES Games
Published	2018-08-13
URL	http://arxiv.org/abs/1808.04355v1
PDF	http://arxiv.org/pdf/1808.04355v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-study-of-curiosity-driven
Repo	https://github.com/SPark9625/Large-Scale-Study-of-Curiosity-Driven-Learning
Framework	pytorch

An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents


Title	An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents
Authors	Felipe Petroski Such, Vashisht Madhavan, Rosanne Liu, Rui Wang, Pablo Samuel Castro, Yulun Li, Jiale Zhi, Ludwig Schubert, Marc G. Bellemare, Jeff Clune, Joel Lehman
Abstract	Much human and computational effort has aimed to improve how deep reinforcement learning algorithms perform on benchmarks such as the Atari Learning Environment. Comparatively less effort has focused on understanding what has been learned by such methods, and investigating and comparing the representations learned by different families of reinforcement learning (RL) algorithms. Sources of friction include the onerous computational requirements, and general logistical and architectural complications for running Deep RL algorithms at scale. We lessen this friction, by (1) training several algorithms at scale and releasing trained models, (2) integrating with a previous Deep RL model release, and (3) releasing code that makes it easy for anyone to load, visualize, and analyze such models. This paper introduces the Atari Zoo framework, which contains models trained across benchmark Atari games, in an easy-to-use format, as well as code that implements common modes of analysis and connects such models to a popular neural network visualization library. Further, to demonstrate the potential of this dataset and software package, we show initial quantitative and qualitative comparisons between the performance and representations of several deep RL algorithms, highlighting interesting and previously unknown distinctions between them.
Tasks	Atari Games
Published	2018-12-17
URL	https://arxiv.org/abs/1812.07069v2
PDF	https://arxiv.org/pdf/1812.07069v2.pdf
PWC	https://paperswithcode.com/paper/an-atari-model-zoo-for-analyzing-visualizing
Repo	https://github.com/uber-research/atari-model-zoo
Framework	tf

Measuring abstract reasoning in neural networks


Title	Measuring abstract reasoning in neural networks
Authors	David G. T. Barrett, Felix Hill, Adam Santoro, Ari S. Morcos, Timothy Lillicrap
Abstract	Whether neural networks can learn abstract reasoning or whether they merely rely on superficial statistics is a topic of recent debate. Here, we propose a dataset and challenge designed to probe abstract reasoning, inspired by a well-known human IQ test. To succeed at this challenge, models must cope with various generalisation `regimes’ in which the training and test data differ in clearly-defined ways. We show that popular models such as ResNets perform poorly, even when the training and test sets differ only minimally, and we present a novel architecture, with a structure designed to encourage reasoning, that does significantly better. When we vary the way in which the test questions and training data differ, we find that our model is notably proficient at certain forms of generalisation, but notably weak at others. We further show that the model’s ability to generalise improves markedly if it is trained to predict symbolic explanations for its answers. Altogether, we introduce and explore ways to both measure and induce stronger abstract reasoning in neural networks. Our freely-available dataset should motivate further progress in this direction. \|
Tasks
Published	2018-07-11
URL	http://arxiv.org/abs/1807.04225v1
PDF	http://arxiv.org/pdf/1807.04225v1.pdf
PWC	https://paperswithcode.com/paper/measuring-abstract-reasoning-in-neural
Repo	https://github.com/shinelink/abstract-reasoning
Framework	tf

Data-driven Design: A Case for Maximalist Game Design


Title	Data-driven Design: A Case for Maximalist Game Design
Authors	Gabriella A. B. Barros, Michael Cerny Green, Antonios Liapis, Julian Togelius
Abstract	Maximalism in art refers to drawing on and combining multiple different sources for art creation, embracing the resulting collisions and heterogeneity. This paper discusses the use of maximalism in game design and particularly in data games, which are games that are generated partly based on open data. Using Data Adventures, a series of generators that create adventure games from data sources such as Wikipedia and OpenStreetMap, as a lens we explore several tradeoffs and issues in maximalist game design. This includes the tension between transformation and fidelity, between decorative and functional content, and legal and ethical issues resulting from this type of generativity. This paper sketches out the design space of maximalist data-driven games, a design space that is mostly unexplored.
Tasks
Published	2018-05-30
URL	http://arxiv.org/abs/1805.12475v1
PDF	http://arxiv.org/pdf/1805.12475v1.pdf
PWC	https://paperswithcode.com/paper/data-driven-design-a-case-for-maximalist-game
Repo	https://github.com/michaelbrave/Procedural-Generation-And-Generative-Systems-Resources
Framework	none

Inference for $L_2$-Boosting


Title	Inference for $L_2$-Boosting
Authors	David Rügamer, Sonja Greven
Abstract	We propose a statistical inference framework for the component-wise functional gradient descent algorithm (CFGD) under normality assumption for model errors, also known as $L_2$-Boosting. The CFGD is one of the most versatile tools to analyze data, because it scales well to high-dimensional data sets, allows for a very flexible definition of additive regression models and incorporates inbuilt variable selection. Due to the variable selection, we build on recent proposals for post-selection inference. However, the iterative nature of component-wise boosting, which can repeatedly select the same component to update, necessitates adaptations and extensions to existing approaches. We propose tests and confidence intervals for linear, grouped and penalized additive model components selected by $L_2$-Boosting. Our concepts also transfer to slow-learning algorithms more generally, and to other selection techniques which restrict the response space to more complex sets than polyhedra. We apply our framework to an additive model for sales prices of residential apartments and investigate the properties of our concepts in simulation studies.
Tasks
Published	2018-05-04
URL	https://arxiv.org/abs/1805.01852v4
PDF	https://arxiv.org/pdf/1805.01852v4.pdf
PWC	https://paperswithcode.com/paper/selective-inference-for-l_2-boosting
Repo	https://github.com/davidruegamer/inference_boosting
Framework	none


Title	Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings
Authors	Micael Carvalho, Rémi Cadène, David Picard, Laure Soulier, Nicolas Thome, Matthieu Cord
Abstract	Designing powerful tools that support cooking activities has rapidly gained popularity due to the massive amounts of available data, as well as recent advances in machine learning that are capable of analyzing them. In this paper, we propose a cross-modal retrieval model aligning visual and textual data (like pictures of dishes and their recipes) in a shared representation space. We describe an effective learning scheme, capable of tackling large-scale problems, and validate it on the Recipe1M dataset containing nearly 1 million picture-recipe pairs. We show the effectiveness of our approach regarding previous state-of-the-art models and present qualitative results over computational cooking use cases.
Tasks	Cross-Modal Retrieval
Published	2018-04-30
URL	http://arxiv.org/abs/1804.11146v1
PDF	http://arxiv.org/pdf/1804.11146v1.pdf
PWC	https://paperswithcode.com/paper/cross-modal-retrieval-in-the-cooking-context
Repo	https://github.com/Cadene/recipe1m.bootstrap.pytorch
Framework	pytorch

Fast Matrix Factorization with Non-Uniform Weights on Missing Data


Title	Fast Matrix Factorization with Non-Uniform Weights on Missing Data
Authors	Xiangnan He, Jinhui Tang, Xiaoyu Du, Richang Hong, Tongwei Ren, Tat-Seng Chua
Abstract	Matrix factorization (MF) has been widely used to discover the low-rank structure and to predict the missing entries of data matrix. In many real-world learning systems, the data matrix can be very high-dimensional but sparse. This poses an imbalanced learning problem, since the scale of missing entries is usually much larger than that of observed entries, but they cannot be ignored due to the valuable negative signal. For efficiency concern, existing work typically applies a uniform weight on missing entries to allow a fast learning algorithm. However, this simplification will decrease modeling fidelity, resulting in suboptimal performance for downstream applications. In this work, we weight the missing data non-uniformly, and more generically, we allow any weighting strategy on the missing data. To address the efficiency challenge, we propose a fast learning method, for which the time complexity is determined by the number of observed entries in the data matrix, rather than the matrix size. The key idea is two-fold: 1) we apply truncated SVD on the weight matrix to get a more compact representation of the weights, and 2) we learn MF parameters with element-wise alternating least squares (eALS) and memorize the key intermediate variables to avoid repeating computations that are unnecessary. We conduct extensive experiments on two recommendation benchmarks, demonstrating the correctness, efficiency, and effectiveness of our fast eALS method.
Tasks
Published	2018-11-11
URL	http://arxiv.org/abs/1811.04411v2
PDF	http://arxiv.org/pdf/1811.04411v2.pdf
PWC	https://paperswithcode.com/paper/fast-matrix-factorization-with-non-uniform
Repo	https://github.com/duxy-me/ext-als
Framework	none

Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs


Title	Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs
Authors	Sachin Kumar, Yulia Tsvetkov
Abstract	The Softmax function is used in the final layer of nearly all existing sequence-to-sequence models for language generation. However, it is usually the slowest layer to compute which limits the vocabulary size to a subset of most frequent types; and it has a large memory footprint. We propose a general technique for replacing the softmax layer with a continuous embedding layer. Our primary innovations are a novel probabilistic loss, and a training and inference procedure in which we generate a probability distribution over pre-trained word embeddings, instead of a multinomial distribution over the vocabulary obtained via softmax. We evaluate this new class of sequence-to-sequence models with continuous outputs on the task of neural machine translation. We show that our models obtain upto 2.5x speed-up in training time while performing on par with the state-of-the-art models in terms of translation quality. These models are capable of handling very large vocabularies without compromising on translation quality. They also produce more meaningful errors than in the softmax-based models, as these errors typically lie in a subspace of the vector space of the reference translations.
Tasks	Machine Translation, Text Generation, Word Embeddings
Published	2018-12-10
URL	http://arxiv.org/abs/1812.04616v3
PDF	http://arxiv.org/pdf/1812.04616v3.pdf
PWC	https://paperswithcode.com/paper/von-mises-fisher-loss-for-training-sequence
Repo	https://github.com/Sachin19/seq2seq-con
Framework	pytorch