Paper Group AWR 194
On GANs and GMMs. BourGAN: Generative Networks with Metric Embeddings. Semantic Parsing for Task Oriented Dialog using Hierarchical Representations. A Bi-population Particle Swarm Optimizer for Learning Automata based Slow Intelligent System. Connecting Image Denoising and High-Level Vision Tasks via Deep Learning. Singing Style Transfer Using Cycl …
On GANs and GMMs
Title | On GANs and GMMs |
Authors | Eitan Richardson, Yair Weiss |
Abstract | A longstanding problem in machine learning is to find unsupervised methods that can learn the statistical structure of high dimensional signals. In recent years, GANs have gained much attention as a possible solution to the problem, and in particular have shown the ability to generate remarkably realistic high resolution sampled images. At the same time, many authors have pointed out that GANs may fail to model the full distribution (“mode collapse”) and that using the learned models for anything other than generating samples may be very difficult. In this paper, we examine the utility of GANs in learning statistical models of images by comparing them to perhaps the simplest statistical model, the Gaussian Mixture Model. First, we present a simple method to evaluate generative models based on relative proportions of samples that fall into predetermined bins. Unlike previous automatic methods for evaluating models, our method does not rely on an additional neural network nor does it require approximating intractable computations. Second, we compare the performance of GANs to GMMs trained on the same datasets. While GMMs have previously been shown to be successful in modeling small patches of images, we show how to train them on full sized images despite the high dimensionality. Our results show that GMMs can generate realistic samples (although less sharp than those of GANs) but also capture the full distribution, which GANs fail to do. Furthermore, GMMs allow efficient inference and explicit representation of the underlying statistical structure. Finally, we discuss how GMMs can be used to generate sharp images. |
Tasks | Image Generation |
Published | 2018-05-31 |
URL | http://arxiv.org/abs/1805.12462v2 |
http://arxiv.org/pdf/1805.12462v2.pdf | |
PWC | https://paperswithcode.com/paper/on-gans-and-gmms |
Repo | https://github.com/eitanrich/torch-mfa |
Framework | pytorch |
BourGAN: Generative Networks with Metric Embeddings
Title | BourGAN: Generative Networks with Metric Embeddings |
Authors | Chang Xiao, Peilin Zhong, Changxi Zheng |
Abstract | This paper addresses the mode collapse for generative adversarial networks (GANs). We view modes as a geometric structure of data distribution in a metric space. Under this geometric lens, we embed subsamples of the dataset from an arbitrary metric space into the l2 space, while preserving their pairwise distance distribution. Not only does this metric embedding determine the dimensionality of the latent space automatically, it also enables us to construct a mixture of Gaussians to draw latent space random vectors. We use the Gaussian mixture model in tandem with a simple augmentation of the objective function to train GANs. Every major step of our method is supported by theoretical analysis, and our experiments on real and synthetic data confirm that the generator is able to produce samples spreading over most of the modes while avoiding unwanted samples, outperforming several recent GAN variants on a number of metrics and offering new features. |
Tasks | |
Published | 2018-05-19 |
URL | http://arxiv.org/abs/1805.07674v3 |
http://arxiv.org/pdf/1805.07674v3.pdf | |
PWC | https://paperswithcode.com/paper/bourgan-generative-networks-with-metric |
Repo | https://github.com/a554b554/BourGAN |
Framework | pytorch |
Semantic Parsing for Task Oriented Dialog using Hierarchical Representations
Title | Semantic Parsing for Task Oriented Dialog using Hierarchical Representations |
Authors | Sonal Gupta, Rushin Shah, Mrinal Mohit, Anuj Kumar, Mike Lewis |
Abstract | Task oriented dialog systems typically first parse user utterances to semantic frames comprised of intents and slots. Previous work on task oriented intent and slot-filling work has been restricted to one intent per query and one slot label per token, and thus cannot model complex compositional requests. Alternative semantic parsing systems have represented queries as logical forms, but these are challenging to annotate and parse. We propose a hierarchical annotation scheme for semantic parsing that allows the representation of compositional queries, and can be efficiently and accurately parsed by standard constituency parsing models. We release a dataset of 44k annotated queries (fb.me/semanticparsingdialog), and show that parsing models outperform sequence-to-sequence approaches on this dataset. |
Tasks | Constituency Parsing, Semantic Parsing, Slot Filling |
Published | 2018-10-18 |
URL | http://arxiv.org/abs/1810.07942v1 |
http://arxiv.org/pdf/1810.07942v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-parsing-for-task-oriented-dialog |
Repo | https://github.com/sz128/NLU_datasets_for_task_oriented_dialogue |
Framework | pytorch |
A Bi-population Particle Swarm Optimizer for Learning Automata based Slow Intelligent System
Title | A Bi-population Particle Swarm Optimizer for Learning Automata based Slow Intelligent System |
Authors | Mohammad Hasanzadeh Mofrad, S. K. Chang |
Abstract | Particle Swarm Optimization (PSO) is an Evolutionary Algorithm (EA) that utilizes a swarm of particles to solve an optimization problem. Slow Intelligence System (SIS) is a learning framework which slowly learns the solution to a problem performing a series of operations. Moreover, Learning Automata (LA) are minuscule but effective decision making entities which are best suited to act as a controller component. In this paper, we combine two isolate populations of PSO to forge the Adaptive Intelligence Optimizer (AIO) which harnesses the advantages of a bi-population PSO to escape from the local minimum and avoid premature convergence. Furthermore, using the rich framework of SIS and the nifty control theory that LA derived from, we find the perfect matching between SIS and LA where acting slowly is the pillar of both of them. Both SIS and LA need time to converge to the optimal decision where this enables AIO to outperform standard PSO having an incomparable performance on evolutionary optimization benchmark functions. |
Tasks | Decision Making |
Published | 2018-04-03 |
URL | http://arxiv.org/abs/1804.00768v1 |
http://arxiv.org/pdf/1804.00768v1.pdf | |
PWC | https://paperswithcode.com/paper/a-bi-population-particle-swarm-optimizer-for |
Repo | https://github.com/hmofrad/pso |
Framework | none |
Connecting Image Denoising and High-Level Vision Tasks via Deep Learning
Title | Connecting Image Denoising and High-Level Vision Tasks via Deep Learning |
Authors | Ding Liu, Bihan Wen, Jianbo Jiao, Xianming Liu, Zhangyang Wang, Thomas S. Huang |
Abstract | Image denoising and high-level vision tasks are usually handled independently in the conventional practice of computer vision, and their connection is fragile. In this paper, we cope with the two jointly and explore the mutual influence between them with the focus on two questions, namely (1) how image denoising can help improving high-level vision tasks, and (2) how the semantic information from high-level vision tasks can be used to guide image denoising. First for image denoising we propose a convolutional neural network in which convolutions are conducted in various spatial resolutions via downsampling and upsampling operations in order to fuse and exploit contextual information on different scales. Second we propose a deep neural network solution that cascades two modules for image denoising and various high-level tasks, respectively, and use the joint loss for updating only the denoising network via back-propagation. We experimentally show that on one hand, the proposed denoiser has the generality to overcome the performance degradation of different high-level vision tasks. On the other hand, with the guidance of high-level vision information, the denoising network produces more visually appealing results. Extensive experiments demonstrate the benefit of exploiting image semantics simultaneously for image denoising and high-level vision tasks via deep learning. The code is available online: https://github.com/Ding-Liu/DeepDenoising |
Tasks | Denoising, Image Denoising |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.01826v1 |
http://arxiv.org/pdf/1809.01826v1.pdf | |
PWC | https://paperswithcode.com/paper/connecting-image-denoising-and-high-level |
Repo | https://github.com/Ding-Liu/DeepDenoising |
Framework | none |
Singing Style Transfer Using Cycle-Consistent Boundary Equilibrium Generative Adversarial Networks
Title | Singing Style Transfer Using Cycle-Consistent Boundary Equilibrium Generative Adversarial Networks |
Authors | Cheng-Wei Wu, Jen-Yu Liu, Yi-Hsuan Yang, Jyh-Shing R. Jang |
Abstract | Can we make a famous rap singer like Eminem sing whatever our favorite song? Singing style transfer attempts to make this possible, by replacing the vocal of a song from the source singer to the target singer. This paper presents a method that learns from unpaired data for singing style transfer using generative adversarial networks. |
Tasks | Style Transfer |
Published | 2018-07-06 |
URL | http://arxiv.org/abs/1807.02254v1 |
http://arxiv.org/pdf/1807.02254v1.pdf | |
PWC | https://paperswithcode.com/paper/singing-style-transfer-using-cycle-consistent |
Repo | https://github.com/eliceio/vocal-style-transfer |
Framework | tf |
Three-Dimensionally Embedded Graph Convolutional Network (3DGCN) for Molecule Interpretation
Title | Three-Dimensionally Embedded Graph Convolutional Network (3DGCN) for Molecule Interpretation |
Authors | Hyeoncheol Cho, Insung S. Choi |
Abstract | We present a three-dimensional graph convolutional network (3DGCN), which predicts molecular properties and biochemical activities, based on 3D molecular graph. In the 3DGCN, graph convolution is unified with learning operations on the vector to handle the spatial information from molecular topology. The 3DGCN model exhibits significantly higher performance on various tasks compared with other deep-learning models, and has the ability of generalizing a given conformer to targeted features regardless of its rotations in the 3D space. More significantly, our model also can distinguish the 3D rotations of a molecule and predict the target value, depending upon the rotation degree, in the protein-ligand docking problem, when trained with orientation-dependent datasets. The rotation distinguishability of 3DGCN, along with rotation equivariance, provides a key milestone in the implementation of three-dimensionality to the field of deep-learning chemistry that solves challenging biochemical problems. |
Tasks | Molecule Interpretation |
Published | 2018-11-24 |
URL | http://arxiv.org/abs/1811.09794v4 |
http://arxiv.org/pdf/1811.09794v4.pdf | |
PWC | https://paperswithcode.com/paper/three-dimensionally-embedded-graph |
Repo | https://github.com/blackmints/3DGCN |
Framework | tf |
Large-Scale Study of Curiosity-Driven Learning
Title | Large-Scale Study of Curiosity-Driven Learning |
Authors | Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros |
Abstract | Reinforcement learning algorithms rely on carefully engineering environment rewards that are extrinsic to the agent. However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for developing reward functions that are intrinsic to the agent. Curiosity is a type of intrinsic reward function which uses prediction error as reward signal. In this paper: (a) We perform the first large-scale study of purely curiosity-driven learning, i.e. without any extrinsic rewards, across 54 standard benchmark environments, including the Atari game suite. Our results show surprisingly good performance, and a high degree of alignment between the intrinsic curiosity objective and the hand-designed extrinsic rewards of many game environments. (b) We investigate the effect of using different feature spaces for computing prediction error and show that random features are sufficient for many popular RL game benchmarks, but learned features appear to generalize better (e.g. to novel game levels in Super Mario Bros.). (c) We demonstrate limitations of the prediction-based rewards in stochastic setups. Game-play videos and code are at https://pathak22.github.io/large-scale-curiosity/ |
Tasks | Atari Games, SNES Games |
Published | 2018-08-13 |
URL | http://arxiv.org/abs/1808.04355v1 |
http://arxiv.org/pdf/1808.04355v1.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-study-of-curiosity-driven |
Repo | https://github.com/SPark9625/Large-Scale-Study-of-Curiosity-Driven-Learning |
Framework | pytorch |
An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents
Title | An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents |
Authors | Felipe Petroski Such, Vashisht Madhavan, Rosanne Liu, Rui Wang, Pablo Samuel Castro, Yulun Li, Jiale Zhi, Ludwig Schubert, Marc G. Bellemare, Jeff Clune, Joel Lehman |
Abstract | Much human and computational effort has aimed to improve how deep reinforcement learning algorithms perform on benchmarks such as the Atari Learning Environment. Comparatively less effort has focused on understanding what has been learned by such methods, and investigating and comparing the representations learned by different families of reinforcement learning (RL) algorithms. Sources of friction include the onerous computational requirements, and general logistical and architectural complications for running Deep RL algorithms at scale. We lessen this friction, by (1) training several algorithms at scale and releasing trained models, (2) integrating with a previous Deep RL model release, and (3) releasing code that makes it easy for anyone to load, visualize, and analyze such models. This paper introduces the Atari Zoo framework, which contains models trained across benchmark Atari games, in an easy-to-use format, as well as code that implements common modes of analysis and connects such models to a popular neural network visualization library. Further, to demonstrate the potential of this dataset and software package, we show initial quantitative and qualitative comparisons between the performance and representations of several deep RL algorithms, highlighting interesting and previously unknown distinctions between them. |
Tasks | Atari Games |
Published | 2018-12-17 |
URL | https://arxiv.org/abs/1812.07069v2 |
https://arxiv.org/pdf/1812.07069v2.pdf | |
PWC | https://paperswithcode.com/paper/an-atari-model-zoo-for-analyzing-visualizing |
Repo | https://github.com/uber-research/atari-model-zoo |
Framework | tf |
Measuring abstract reasoning in neural networks
Title | Measuring abstract reasoning in neural networks |
Authors | David G. T. Barrett, Felix Hill, Adam Santoro, Ari S. Morcos, Timothy Lillicrap |
Abstract | Whether neural networks can learn abstract reasoning or whether they merely rely on superficial statistics is a topic of recent debate. Here, we propose a dataset and challenge designed to probe abstract reasoning, inspired by a well-known human IQ test. To succeed at this challenge, models must cope with various generalisation `regimes’ in which the training and test data differ in clearly-defined ways. We show that popular models such as ResNets perform poorly, even when the training and test sets differ only minimally, and we present a novel architecture, with a structure designed to encourage reasoning, that does significantly better. When we vary the way in which the test questions and training data differ, we find that our model is notably proficient at certain forms of generalisation, but notably weak at others. We further show that the model’s ability to generalise improves markedly if it is trained to predict symbolic explanations for its answers. Altogether, we introduce and explore ways to both measure and induce stronger abstract reasoning in neural networks. Our freely-available dataset should motivate further progress in this direction. | |
Tasks | |
Published | 2018-07-11 |
URL | http://arxiv.org/abs/1807.04225v1 |
http://arxiv.org/pdf/1807.04225v1.pdf | |
PWC | https://paperswithcode.com/paper/measuring-abstract-reasoning-in-neural |
Repo | https://github.com/shinelink/abstract-reasoning |
Framework | tf |
Data-driven Design: A Case for Maximalist Game Design
Title | Data-driven Design: A Case for Maximalist Game Design |
Authors | Gabriella A. B. Barros, Michael Cerny Green, Antonios Liapis, Julian Togelius |
Abstract | Maximalism in art refers to drawing on and combining multiple different sources for art creation, embracing the resulting collisions and heterogeneity. This paper discusses the use of maximalism in game design and particularly in data games, which are games that are generated partly based on open data. Using Data Adventures, a series of generators that create adventure games from data sources such as Wikipedia and OpenStreetMap, as a lens we explore several tradeoffs and issues in maximalist game design. This includes the tension between transformation and fidelity, between decorative and functional content, and legal and ethical issues resulting from this type of generativity. This paper sketches out the design space of maximalist data-driven games, a design space that is mostly unexplored. |
Tasks | |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.12475v1 |
http://arxiv.org/pdf/1805.12475v1.pdf | |
PWC | https://paperswithcode.com/paper/data-driven-design-a-case-for-maximalist-game |
Repo | https://github.com/michaelbrave/Procedural-Generation-And-Generative-Systems-Resources |
Framework | none |
Inference for $L_2$-Boosting
Title | Inference for $L_2$-Boosting |
Authors | David Rügamer, Sonja Greven |
Abstract | We propose a statistical inference framework for the component-wise functional gradient descent algorithm (CFGD) under normality assumption for model errors, also known as $L_2$-Boosting. The CFGD is one of the most versatile tools to analyze data, because it scales well to high-dimensional data sets, allows for a very flexible definition of additive regression models and incorporates inbuilt variable selection. Due to the variable selection, we build on recent proposals for post-selection inference. However, the iterative nature of component-wise boosting, which can repeatedly select the same component to update, necessitates adaptations and extensions to existing approaches. We propose tests and confidence intervals for linear, grouped and penalized additive model components selected by $L_2$-Boosting. Our concepts also transfer to slow-learning algorithms more generally, and to other selection techniques which restrict the response space to more complex sets than polyhedra. We apply our framework to an additive model for sales prices of residential apartments and investigate the properties of our concepts in simulation studies. |
Tasks | |
Published | 2018-05-04 |
URL | https://arxiv.org/abs/1805.01852v4 |
https://arxiv.org/pdf/1805.01852v4.pdf | |
PWC | https://paperswithcode.com/paper/selective-inference-for-l_2-boosting |
Repo | https://github.com/davidruegamer/inference_boosting |
Framework | none |
Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings
Title | Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings |
Authors | Micael Carvalho, Rémi Cadène, David Picard, Laure Soulier, Nicolas Thome, Matthieu Cord |
Abstract | Designing powerful tools that support cooking activities has rapidly gained popularity due to the massive amounts of available data, as well as recent advances in machine learning that are capable of analyzing them. In this paper, we propose a cross-modal retrieval model aligning visual and textual data (like pictures of dishes and their recipes) in a shared representation space. We describe an effective learning scheme, capable of tackling large-scale problems, and validate it on the Recipe1M dataset containing nearly 1 million picture-recipe pairs. We show the effectiveness of our approach regarding previous state-of-the-art models and present qualitative results over computational cooking use cases. |
Tasks | Cross-Modal Retrieval |
Published | 2018-04-30 |
URL | http://arxiv.org/abs/1804.11146v1 |
http://arxiv.org/pdf/1804.11146v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-modal-retrieval-in-the-cooking-context |
Repo | https://github.com/Cadene/recipe1m.bootstrap.pytorch |
Framework | pytorch |
Fast Matrix Factorization with Non-Uniform Weights on Missing Data
Title | Fast Matrix Factorization with Non-Uniform Weights on Missing Data |
Authors | Xiangnan He, Jinhui Tang, Xiaoyu Du, Richang Hong, Tongwei Ren, Tat-Seng Chua |
Abstract | Matrix factorization (MF) has been widely used to discover the low-rank structure and to predict the missing entries of data matrix. In many real-world learning systems, the data matrix can be very high-dimensional but sparse. This poses an imbalanced learning problem, since the scale of missing entries is usually much larger than that of observed entries, but they cannot be ignored due to the valuable negative signal. For efficiency concern, existing work typically applies a uniform weight on missing entries to allow a fast learning algorithm. However, this simplification will decrease modeling fidelity, resulting in suboptimal performance for downstream applications. In this work, we weight the missing data non-uniformly, and more generically, we allow any weighting strategy on the missing data. To address the efficiency challenge, we propose a fast learning method, for which the time complexity is determined by the number of observed entries in the data matrix, rather than the matrix size. The key idea is two-fold: 1) we apply truncated SVD on the weight matrix to get a more compact representation of the weights, and 2) we learn MF parameters with element-wise alternating least squares (eALS) and memorize the key intermediate variables to avoid repeating computations that are unnecessary. We conduct extensive experiments on two recommendation benchmarks, demonstrating the correctness, efficiency, and effectiveness of our fast eALS method. |
Tasks | |
Published | 2018-11-11 |
URL | http://arxiv.org/abs/1811.04411v2 |
http://arxiv.org/pdf/1811.04411v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-matrix-factorization-with-non-uniform |
Repo | https://github.com/duxy-me/ext-als |
Framework | none |
Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs
Title | Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs |
Authors | Sachin Kumar, Yulia Tsvetkov |
Abstract | The Softmax function is used in the final layer of nearly all existing sequence-to-sequence models for language generation. However, it is usually the slowest layer to compute which limits the vocabulary size to a subset of most frequent types; and it has a large memory footprint. We propose a general technique for replacing the softmax layer with a continuous embedding layer. Our primary innovations are a novel probabilistic loss, and a training and inference procedure in which we generate a probability distribution over pre-trained word embeddings, instead of a multinomial distribution over the vocabulary obtained via softmax. We evaluate this new class of sequence-to-sequence models with continuous outputs on the task of neural machine translation. We show that our models obtain upto 2.5x speed-up in training time while performing on par with the state-of-the-art models in terms of translation quality. These models are capable of handling very large vocabularies without compromising on translation quality. They also produce more meaningful errors than in the softmax-based models, as these errors typically lie in a subspace of the vector space of the reference translations. |
Tasks | Machine Translation, Text Generation, Word Embeddings |
Published | 2018-12-10 |
URL | http://arxiv.org/abs/1812.04616v3 |
http://arxiv.org/pdf/1812.04616v3.pdf | |
PWC | https://paperswithcode.com/paper/von-mises-fisher-loss-for-training-sequence |
Repo | https://github.com/Sachin19/seq2seq-con |
Framework | pytorch |