February 1, 2020

3304 words 16 mins read

Paper Group AWR 79

Paper Group AWR 79

Deep Neuroevolution of Recurrent and Discrete World Models. Data Programming using Continuous and Quality-Guided Labeling Functions. Random Search and Reproducibility for Neural Architecture Search. Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers. The spiked matrix model with generative priors. Two-Step Sound …

Deep Neuroevolution of Recurrent and Discrete World Models

Title Deep Neuroevolution of Recurrent and Discrete World Models
Authors Sebastian Risi, Kenneth O. Stanley
Abstract Neural architectures inspired by our own human cognitive system, such as the recently introduced world models, have been shown to outperform traditional deep reinforcement learning (RL) methods in a variety of different domains. Instead of the relatively simple architectures employed in most RL experiments, world models rely on multiple different neural components that are responsible for visual information processing, memory, and decision-making. However, so far the components of these models have to be trained separately and through a variety of specialized training methods. This paper demonstrates the surprising finding that models with the same precise parts can be instead efficiently trained end-to-end through a genetic algorithm (GA), reaching a comparable performance to the original world model by solving a challenging car racing task. An analysis of the evolved visual and memory system indicates that they include a similar effective representation to the system trained through gradient descent. Additionally, in contrast to gradient descent methods that struggle with discrete variables, GAs also work directly with such representations, opening up opportunities for classical planning in latent space. This paper adds additional evidence on the effectiveness of deep neuroevolution for tasks that require the intricate orchestration of multiple components in complex heterogeneous architectures.
Tasks Car Racing, Decision Making
Published 2019-04-28
URL https://arxiv.org/abs/1906.08857v1
PDF https://arxiv.org/pdf/1906.08857v1.pdf
PWC https://paperswithcode.com/paper/deep-neuroevolution-of-recurrent-and-discrete
Repo https://github.com/sebastianrisi/ga-world-models
Framework pytorch

Data Programming using Continuous and Quality-Guided Labeling Functions

Title Data Programming using Continuous and Quality-Guided Labeling Functions
Authors Oishik Chatterjee, Ganesh Ramakrishnan, Sunita Sarawagi
Abstract Scarcity of labeled data is a bottleneck for supervised learning models. A paradigm that has evolved for dealing with this problem is data programming. An existing data programming paradigm allows human supervision to be provided as a set of discrete labeling functions (LF) that output possibly noisy labels to input instances and a generative modelfor consolidating the weak labels. We enhance and generalize this paradigm by supporting functions that output a continuous score (instead of a hard label) that noisily correlates with labels. We show across five applications that continuous LFs are more natural to program and lead to improved recall. We also show that accuracy of existing generative models is unstable with respect to initialization, training epochs, and learning rates. We give control to the data programmer to guide the training process by providing intuitive quality guides with each LF. We propose an elegant method of incorporating these guides into the generative model. Our overall method, called CAGE, makes the data programming paradigm more reliable than other tricks based on initialization, sign-penalties, or soft-accuracy constraints.
Tasks
Published 2019-11-22
URL https://arxiv.org/abs/1911.09860v1
PDF https://arxiv.org/pdf/1911.09860v1.pdf
PWC https://paperswithcode.com/paper/data-programming-using-continuous-and-quality
Repo https://github.com/oishik75/CAGE
Framework pytorch
Title Random Search and Reproducibility for Neural Architecture Search
Authors Liam Li, Ameet Talwalkar
Abstract Neural architecture search (NAS) is a promising research direction that has the potential to replace expert-designed networks with learned, task-specific architectures. In this work, in order to help ground the empirical results in this field, we propose new NAS baselines that build off the following observations: (i) NAS is a specialized hyperparameter optimization problem; and (ii) random search is a competitive baseline for hyperparameter optimization. Leveraging these observations, we evaluate both random search with early-stopping and a novel random search with weight-sharing algorithm on two standard NAS benchmarks—PTB and CIFAR-10. Our results show that random search with early-stopping is a competitive NAS baseline, e.g., it performs at least as well as ENAS, a leading NAS method, on both benchmarks. Additionally, random search with weight-sharing outperforms random search with early-stopping, achieving a state-of-the-art NAS result on PTB and a highly competitive result on CIFAR-10. Finally, we explore the existing reproducibility issues of published NAS results. We note the lack of source material needed to exactly reproduce these results, and further discuss the robustness of published results given the various sources of variability in NAS experimental setups. Relatedly, we provide all information (code, random seeds, documentation) needed to exactly reproduce our results, and report our random search with weight-sharing results for each benchmark on multiple runs.
Tasks Hyperparameter Optimization, Neural Architecture Search
Published 2019-02-20
URL https://arxiv.org/abs/1902.07638v3
PDF https://arxiv.org/pdf/1902.07638v3.pdf
PWC https://paperswithcode.com/paper/random-search-and-reproducibility-for-neural
Repo https://github.com/liamcli/randomNAS_release
Framework pytorch

Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers

Title Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers
Authors Yukuan Yang, Shuang Wu, Lei Deng, Tianyi Yan, Yuan Xie, Guoqi Li
Abstract Deep neural network (DNN) quantization converting floating-point (FP) data in the network to integers (INT) is an effective way to shrink the model size for memory saving and simplify the operations for compute acceleration. Recently, researches on DNN quantization develop from inference to training, laying a foundation for the online training on accelerators. However, existing schemes leaving batch normalization (BN) untouched during training are mostly incomplete quantization that still adopts high precision FP in some parts of the data paths. Currently, there is no solution that can use only low bit-width INT data during the whole training process of large-scale DNNs with acceptable accuracy. In this work, through decomposing all the computation steps in DNNs and fusing three special quantization functions to satisfy the different precision requirements, we propose a unified complete quantization framework termed as ``WAGEUBN’’ to quantize DNNs involving all data paths including W (Weights), A (Activation), G (Gradient), E (Error), U (Update), and BN. Moreover, the Momentum optimizer is also quantized to realize a completely quantized framework. Experiments on ResNet18/34/50 models demonstrate that WAGEUBN can achieve competitive accuracy on the ImageNet dataset. For the first time, the study of quantization in large-scale DNNs is advanced to the full 8-bit INT level. In this way, all the operations in the training and inference can be bit-wise operations, pushing towards faster processing speed, decreased memory cost, and higher energy efficiency. Our throughout quantization framework has great potential for future efficient portable devices with online learning ability. |
Tasks Quantization
Published 2019-09-05
URL https://arxiv.org/abs/1909.02384v2
PDF https://arxiv.org/pdf/1909.02384v2.pdf
PWC https://paperswithcode.com/paper/training-high-performance-and-large-scale
Repo https://github.com/yang-yk/wageubn
Framework tf

The spiked matrix model with generative priors

Title The spiked matrix model with generative priors
Authors Benjamin Aubin, Bruno Loureiro, Antoine Maillard, Florent Krzakala, Lenka Zdeborová
Abstract Using a low-dimensional parametrization of signals is a generic and powerful way to enhance performance in signal processing and statistical inference. A very popular and widely explored type of dimensionality reduction is sparsity; another type is generative modelling of signal distributions. Generative models based on neural networks, such as GANs or variational auto-encoders, are particularly performant and are gaining on applicability. In this paper we study spiked matrix models, where a low-rank matrix is observed through a noisy channel. This problem with sparse structure of the spikes has attracted broad attention in the past literature. Here, we replace the sparsity assumption by generative modelling, and investigate the consequences on statistical and algorithmic properties. We analyze the Bayes-optimal performance under specific generative models for the spike. In contrast with the sparsity assumption, we do not observe regions of parameters where statistical performance is superior to the best known algorithmic performance. We show that in the analyzed cases the approximate message passing algorithm is able to reach optimal performance. We also design enhanced spectral algorithms and analyze their performance and thresholds using random matrix theory, showing their superiority to the classical principal component analysis. We complement our theoretical results by illustrating the performance of the spectral algorithms when the spikes come from real datasets.
Tasks Dimensionality Reduction
Published 2019-05-29
URL https://arxiv.org/abs/1905.12385v2
PDF https://arxiv.org/pdf/1905.12385v2.pdf
PWC https://paperswithcode.com/paper/the-spiked-matrix-model-with-generative
Repo https://github.com/benjaminaubin/StructuredPrior_demo
Framework none

Two-Step Sound Source Separation: Training on Learned Latent Targets

Title Two-Step Sound Source Separation: Training on Learned Latent Targets
Authors Efthymios Tzinis, Shrikant Venkataramani, Zhepei Wang, Cem Subakan, Paris Smaragdis
Abstract In this paper, we propose a two-step training procedure for source separation via a deep neural network. In the first step we learn a transform (and it’s inverse) to a latent space where masking-based separation performance using oracles is optimal. For the second step, we train a separation module that operates on the previously learned space. In order to do so, we also make use of a scale-invariant signal to distortion ratio (SI-SDR) loss function that works in the latent space, and we prove that it lower-bounds the SI-SDR in the time domain. We run various sound separation experiments that show how this approach can obtain better performance as compared to systems that learn the transform and the separation module jointly. The proposed methodology is general enough to be applicable to a large class of neural network end-to-end separation systems.
Tasks Speech Separation
Published 2019-10-22
URL https://arxiv.org/abs/1910.09804v2
PDF https://arxiv.org/pdf/1910.09804v2.pdf
PWC https://paperswithcode.com/paper/two-step-sound-source-separation-training-on
Repo https://github.com/etzinis/two_step_mask_learning
Framework pytorch

FreeAnchor: Learning to Match Anchors for Visual Object Detection

Title FreeAnchor: Learning to Match Anchors for Visual Object Detection
Authors Xiaosong Zhang, Fang Wan, Chang Liu, Rongrong Ji, Qixiang Ye
Abstract Modern CNN-based object detectors assign anchors for ground-truth objects under the restriction of object-anchor Intersection-over-Unit (IoU). In this study, we propose a learning-to-match approach to break IoU restriction, allowing objects to match anchors in a flexible manner. Our approach, referred to as FreeAnchor, updates hand-crafted anchor assignment to “free” anchor matching by formulating detector training as a maximum likelihood estimation (MLE) procedure. FreeAnchor targets at learning features which best explain a class of objects in terms of both classification and localization. FreeAnchor is implemented by optimizing detection customized likelihood and can be fused with CNN-based detectors in a plug-and-play manner. Experiments on COCO demonstrate that FreeAnchor consistently outperforms their counterparts with significant margins.
Tasks Object Detection
Published 2019-09-05
URL https://arxiv.org/abs/1909.02466v2
PDF https://arxiv.org/pdf/1909.02466v2.pdf
PWC https://paperswithcode.com/paper/freeanchor-learning-to-match-anchors-for
Repo https://github.com/zhangxiaosong18/FreeAnchor
Framework pytorch

Chasing Ghosts: Instruction Following as Bayesian State Tracking

Title Chasing Ghosts: Instruction Following as Bayesian State Tracking
Authors Peter Anderson, Ayush Shrivastava, Devi Parikh, Dhruv Batra, Stefan Lee
Abstract A visually-grounded navigation instruction can be interpreted as a sequence of expected observations and actions an agent following the correct trajectory would encounter and perform. Based on this intuition, we formulate the problem of finding the goal location in Vision-and-Language Navigation (VLN) within the framework of Bayesian state tracking - learning observation and motion models conditioned on these expectable events. Together with a mapper that constructs a semantic spatial map on-the-fly during navigation, we formulate an end-to-end differentiable Bayes filter and train it to identify the goal by predicting the most likely trajectory through the map according to the instructions. The resulting navigation policy constitutes a new approach to instruction following that explicitly models a probability distribution over states, encoding strong geometric and algorithmic priors while enabling greater explainability. Our experiments show that our approach outperforms a strong LingUNet baseline when predicting the goal location on the map. On the full VLN task, i.e. navigating to the goal location, our approach achieves promising results with less reliance on navigation constraints.
Tasks
Published 2019-07-03
URL https://arxiv.org/abs/1907.02022v2
PDF https://arxiv.org/pdf/1907.02022v2.pdf
PWC https://paperswithcode.com/paper/chasing-ghosts-instruction-following-as
Repo https://github.com/batra-mlp-lab/vln-chasing-ghosts
Framework none

Zero-Shot Knowledge Distillation in Deep Networks

Title Zero-Shot Knowledge Distillation in Deep Networks
Authors Gaurav Kumar Nayak, Konda Reddy Mopuri, Vaisakh Shaj, R. Venkatesh Babu, Anirban Chakraborty
Abstract Knowledge distillation deals with the problem of training a smaller model (Student) from a high capacity source model (Teacher) so as to retain most of its performance. Existing approaches use either the training data or meta-data extracted from it in order to train the Student. However, accessing the dataset on which the Teacher has been trained may not always be feasible if the dataset is very large or it poses privacy or safety concerns (e.g., bio-metric or medical data). Hence, in this paper, we propose a novel data-free method to train the Student from the Teacher. Without even using any meta-data, we synthesize the Data Impressions from the complex Teacher model and utilize these as surrogates for the original training data samples to transfer its learning to Student via knowledge distillation. We, therefore, dub our method “Zero-Shot Knowledge Distillation” and demonstrate that our framework results in competitive generalization performance as achieved by distillation using the actual training data samples on multiple benchmark datasets.
Tasks
Published 2019-05-20
URL https://arxiv.org/abs/1905.08114v1
PDF https://arxiv.org/pdf/1905.08114v1.pdf
PWC https://paperswithcode.com/paper/zero-shot-knowledge-distillation-in-deep
Repo https://github.com/vcl-iisc/ZSKD
Framework tf

Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network

Title Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network
Authors Kun Xu, Liwei Wang, Mo Yu, Yansong Feng, Yan Song, Zhiguo Wang, Dong Yu
Abstract Previous cross-lingual knowledge graph (KG) alignment studies rely on entity embeddings derived only from monolingual KG structural information, which may fail at matching entities that have different facts in two KGs. In this paper, we introduce the topic entity graph, a local sub-graph of an entity, to represent entities with their contextual information in KG. From this view, the KB-alignment task can be formulated as a graph matching problem; and we further propose a graph-attention based solution, which first matches all entities in two topic entity graphs, and then jointly model the local matching information to derive a graph-level matching vector. Experiments show that our model outperforms previous state-of-the-art methods by a large margin.
Tasks Entity Embeddings, Graph Matching
Published 2019-05-28
URL https://arxiv.org/abs/1905.11605v3
PDF https://arxiv.org/pdf/1905.11605v3.pdf
PWC https://paperswithcode.com/paper/cross-lingual-knowledge-graph-alignment-via-1
Repo https://github.com/nju-websoft/JAPE
Framework tf

Agglomerative Attention

Title Agglomerative Attention
Authors Matthew Spellings
Abstract Neural networks using transformer-based architectures have recently demonstrated great power and flexibility in modeling sequences of many types. One of the core components of transformer networks is the attention layer, which allows contextual information to be exchanged among sequence elements. While many of the prevalent network structures thus far have utilized full attention – which operates on all pairs of sequence elements – the quadratic scaling of this attention mechanism significantly constrains the size of models that can be trained. In this work, we present an attention model that has only linear requirements in memory and computation time. We show that, despite the simpler attention model, networks using this attention mechanism can attain comparable performance to full attention networks on language modeling tasks.
Tasks Language Modelling
Published 2019-07-15
URL https://arxiv.org/abs/1907.06607v1
PDF https://arxiv.org/pdf/1907.06607v1.pdf
PWC https://paperswithcode.com/paper/agglomerative-attention
Repo https://github.com/adriangrepo/agglomerative_attention_scripts
Framework tf

TraVeLGAN: Image-to-image Translation by Transformation Vector Learning

Title TraVeLGAN: Image-to-image Translation by Transformation Vector Learning
Authors Matthew Amodio, Smita Krishnaswamy
Abstract Interest in image-to-image translation has grown substantially in recent years with the success of unsupervised models based on the cycle-consistency assumption. The achievements of these models have been limited to a particular subset of domains where this assumption yields good results, namely homogeneous domains that are characterized by style or texture differences. We tackle the challenging problem of image-to-image translation where the domains are defined by high-level shapes and contexts, as well as including significant clutter and heterogeneity. For this purpose, we introduce a novel GAN based on preserving intra-domain vector transformations in a latent space learned by a siamese network. The traditional GAN system introduced a discriminator network to guide the generator into generating images in the target domain. To this two-network system we add a third: a siamese network that guides the generator so that each original image shares semantics with its generated version. With this new three-network system, we no longer need to constrain the generators with the ubiquitous cycle-consistency restraint. As a result, the generators can learn mappings between more complex domains that differ from each other by large differences - not just style or texture.
Tasks Image-to-Image Translation
Published 2019-02-25
URL http://arxiv.org/abs/1902.09631v1
PDF http://arxiv.org/pdf/1902.09631v1.pdf
PWC https://paperswithcode.com/paper/travelgan-image-to-image-translation-by
Repo https://github.com/KrishnaswamyLab/travelgan
Framework tf

Point Cloud Oversegmentation with Graph-Structured Deep Metric Learning

Title Point Cloud Oversegmentation with Graph-Structured Deep Metric Learning
Authors Loic Landrieu, Mohamed Boussaha
Abstract We propose a new supervized learning framework for oversegmenting 3D point clouds into superpoints. We cast this problem as learning deep embeddings of the local geometry and radiometry of 3D points, such that the border of objects presents high contrasts. The embeddings are computed using a lightweight neural network operating on the points’ local neighborhood. Finally, we formulate point cloud oversegmentation as a graph partition problem with respect to the learned embeddings. This new approach allows us to set a new state-of-the-art in point cloud oversegmentation by a significant margin, on a dense indoor dataset (S3DIS) and a sparse outdoor one (vKITTI). Our best solution requires over five times fewer superpoints to reach similar performance than previously published methods on S3DIS. Furthermore, we show that our framework can be used to improve superpoint-based semantic segmentation algorithms, setting a new state-of-the-art for this task as well.
Tasks Metric Learning, Semantic Segmentation
Published 2019-04-03
URL http://arxiv.org/abs/1904.02113v1
PDF http://arxiv.org/pdf/1904.02113v1.pdf
PWC https://paperswithcode.com/paper/point-cloud-oversegmentation-with-graph
Repo https://github.com/loicland/superpoint_graph
Framework pytorch

Predicting In-game Actions From the Language of NBA Players

Title Predicting In-game Actions From the Language of NBA Players
Authors Nadav Oved, Amir Feder, Roi Reichart
Abstract Sports competitions are widely researched in computer and social science, with the goal of understanding how players act under uncertainty. While there is an abundance of computational work on player metrics prediction based on past performance, very few attempts to incorporate out-of-game signals have been made. Specifically, it was previously unclear whether linguistic signals gathered from players’ interviews can add information which does not appear in performance metrics. To bridge that gap, we define text classification tasks of predicting deviations from mean in NBA players’ in-game actions, which are associated with strategic choices, player behavior and risk, using their choice of language prior to the game. We collected a dataset of transcripts from key NBA players’ pre-game interviews and their in-game performance metrics, totaling in 5,226 interview-metric pairs. We design neural models for players’ action prediction based on increasingly more complex aspects of the language signals in their open-ended interviews. Our models can make their predictions based on the textual signal alone, or on a combination with signals from past-performance metrics. Our text-based models outperform strong baselines trained on performance metrics only, demonstrating the importance of language usage for action prediction. Moreover, the models that employ both textual input and past-performance metrics produced the best results. Finally, as neural networks are notoriously difficult to interpret, we propose a method for gaining further insight into what our models have learned. Particularly, we present an LDA-based analysis, where we interpret model predictions in terms of correlated topics. We find that our best performing textual model is most associated with topics that are intuitively related to each prediction task and that better models yield higher correlation with more informative topics.
Tasks Text Classification
Published 2019-10-24
URL https://arxiv.org/abs/1910.11292v2
PDF https://arxiv.org/pdf/1910.11292v2.pdf
PWC https://paperswithcode.com/paper/predicting-in-game-actions-from-the-language
Repo https://github.com/nadavo/mood
Framework none

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

Title AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
Authors Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Mark Hasegawa-Johnson
Abstract Non-parallel many-to-many voice conversion, as well as zero-shot voice conversion, remain under-explored areas. Deep style transfer algorithms, such as generative adversarial networks (GAN) and conditional variational autoencoder (CVAE), are being applied as new solutions in this field. However, GAN training is sophisticated and difficult, and there is no strong evidence that its generated speech is of good perceptual quality. On the other hand, CVAE training is simple but does not come with the distribution-matching property of a GAN. In this paper, we propose a new style transfer scheme that involves only an autoencoder with a carefully designed bottleneck. We formally show that this scheme can achieve distribution-matching style transfer by training only on a self-reconstruction loss. Based on this scheme, we proposed AUTOVC, which achieves state-of-the-art results in many-to-many voice conversion with non-parallel data, and which is the first to perform zero-shot voice conversion.
Tasks Style Transfer, Voice Conversion
Published 2019-05-14
URL https://arxiv.org/abs/1905.05879v2
PDF https://arxiv.org/pdf/1905.05879v2.pdf
PWC https://paperswithcode.com/paper/zero-shot-voice-style-transfer-with-only
Repo https://github.com/liusongxiang/StarGAN-Voice-Conversion
Framework pytorch
comments powered by Disqus