Paper Group AWR 79
Deep Neuroevolution of Recurrent and Discrete World Models. Data Programming using Continuous and Quality-Guided Labeling Functions. Random Search and Reproducibility for Neural Architecture Search. Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers. The spiked matrix model with generative priors. Two-Step Sound …
Deep Neuroevolution of Recurrent and Discrete World Models
Title | Deep Neuroevolution of Recurrent and Discrete World Models |
Authors | Sebastian Risi, Kenneth O. Stanley |
Abstract | Neural architectures inspired by our own human cognitive system, such as the recently introduced world models, have been shown to outperform traditional deep reinforcement learning (RL) methods in a variety of different domains. Instead of the relatively simple architectures employed in most RL experiments, world models rely on multiple different neural components that are responsible for visual information processing, memory, and decision-making. However, so far the components of these models have to be trained separately and through a variety of specialized training methods. This paper demonstrates the surprising finding that models with the same precise parts can be instead efficiently trained end-to-end through a genetic algorithm (GA), reaching a comparable performance to the original world model by solving a challenging car racing task. An analysis of the evolved visual and memory system indicates that they include a similar effective representation to the system trained through gradient descent. Additionally, in contrast to gradient descent methods that struggle with discrete variables, GAs also work directly with such representations, opening up opportunities for classical planning in latent space. This paper adds additional evidence on the effectiveness of deep neuroevolution for tasks that require the intricate orchestration of multiple components in complex heterogeneous architectures. |
Tasks | Car Racing, Decision Making |
Published | 2019-04-28 |
URL | https://arxiv.org/abs/1906.08857v1 |
https://arxiv.org/pdf/1906.08857v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-neuroevolution-of-recurrent-and-discrete |
Repo | https://github.com/sebastianrisi/ga-world-models |
Framework | pytorch |
Data Programming using Continuous and Quality-Guided Labeling Functions
Title | Data Programming using Continuous and Quality-Guided Labeling Functions |
Authors | Oishik Chatterjee, Ganesh Ramakrishnan, Sunita Sarawagi |
Abstract | Scarcity of labeled data is a bottleneck for supervised learning models. A paradigm that has evolved for dealing with this problem is data programming. An existing data programming paradigm allows human supervision to be provided as a set of discrete labeling functions (LF) that output possibly noisy labels to input instances and a generative modelfor consolidating the weak labels. We enhance and generalize this paradigm by supporting functions that output a continuous score (instead of a hard label) that noisily correlates with labels. We show across five applications that continuous LFs are more natural to program and lead to improved recall. We also show that accuracy of existing generative models is unstable with respect to initialization, training epochs, and learning rates. We give control to the data programmer to guide the training process by providing intuitive quality guides with each LF. We propose an elegant method of incorporating these guides into the generative model. Our overall method, called CAGE, makes the data programming paradigm more reliable than other tricks based on initialization, sign-penalties, or soft-accuracy constraints. |
Tasks | |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.09860v1 |
https://arxiv.org/pdf/1911.09860v1.pdf | |
PWC | https://paperswithcode.com/paper/data-programming-using-continuous-and-quality |
Repo | https://github.com/oishik75/CAGE |
Framework | pytorch |
Random Search and Reproducibility for Neural Architecture Search
Title | Random Search and Reproducibility for Neural Architecture Search |
Authors | Liam Li, Ameet Talwalkar |
Abstract | Neural architecture search (NAS) is a promising research direction that has the potential to replace expert-designed networks with learned, task-specific architectures. In this work, in order to help ground the empirical results in this field, we propose new NAS baselines that build off the following observations: (i) NAS is a specialized hyperparameter optimization problem; and (ii) random search is a competitive baseline for hyperparameter optimization. Leveraging these observations, we evaluate both random search with early-stopping and a novel random search with weight-sharing algorithm on two standard NAS benchmarks—PTB and CIFAR-10. Our results show that random search with early-stopping is a competitive NAS baseline, e.g., it performs at least as well as ENAS, a leading NAS method, on both benchmarks. Additionally, random search with weight-sharing outperforms random search with early-stopping, achieving a state-of-the-art NAS result on PTB and a highly competitive result on CIFAR-10. Finally, we explore the existing reproducibility issues of published NAS results. We note the lack of source material needed to exactly reproduce these results, and further discuss the robustness of published results given the various sources of variability in NAS experimental setups. Relatedly, we provide all information (code, random seeds, documentation) needed to exactly reproduce our results, and report our random search with weight-sharing results for each benchmark on multiple runs. |
Tasks | Hyperparameter Optimization, Neural Architecture Search |
Published | 2019-02-20 |
URL | https://arxiv.org/abs/1902.07638v3 |
https://arxiv.org/pdf/1902.07638v3.pdf | |
PWC | https://paperswithcode.com/paper/random-search-and-reproducibility-for-neural |
Repo | https://github.com/liamcli/randomNAS_release |
Framework | pytorch |
Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers
Title | Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers |
Authors | Yukuan Yang, Shuang Wu, Lei Deng, Tianyi Yan, Yuan Xie, Guoqi Li |
Abstract | Deep neural network (DNN) quantization converting floating-point (FP) data in the network to integers (INT) is an effective way to shrink the model size for memory saving and simplify the operations for compute acceleration. Recently, researches on DNN quantization develop from inference to training, laying a foundation for the online training on accelerators. However, existing schemes leaving batch normalization (BN) untouched during training are mostly incomplete quantization that still adopts high precision FP in some parts of the data paths. Currently, there is no solution that can use only low bit-width INT data during the whole training process of large-scale DNNs with acceptable accuracy. In this work, through decomposing all the computation steps in DNNs and fusing three special quantization functions to satisfy the different precision requirements, we propose a unified complete quantization framework termed as ``WAGEUBN’’ to quantize DNNs involving all data paths including W (Weights), A (Activation), G (Gradient), E (Error), U (Update), and BN. Moreover, the Momentum optimizer is also quantized to realize a completely quantized framework. Experiments on ResNet18/34/50 models demonstrate that WAGEUBN can achieve competitive accuracy on the ImageNet dataset. For the first time, the study of quantization in large-scale DNNs is advanced to the full 8-bit INT level. In this way, all the operations in the training and inference can be bit-wise operations, pushing towards faster processing speed, decreased memory cost, and higher energy efficiency. Our throughout quantization framework has great potential for future efficient portable devices with online learning ability. | |
Tasks | Quantization |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02384v2 |
https://arxiv.org/pdf/1909.02384v2.pdf | |
PWC | https://paperswithcode.com/paper/training-high-performance-and-large-scale |
Repo | https://github.com/yang-yk/wageubn |
Framework | tf |
The spiked matrix model with generative priors
Title | The spiked matrix model with generative priors |
Authors | Benjamin Aubin, Bruno Loureiro, Antoine Maillard, Florent Krzakala, Lenka Zdeborová |
Abstract | Using a low-dimensional parametrization of signals is a generic and powerful way to enhance performance in signal processing and statistical inference. A very popular and widely explored type of dimensionality reduction is sparsity; another type is generative modelling of signal distributions. Generative models based on neural networks, such as GANs or variational auto-encoders, are particularly performant and are gaining on applicability. In this paper we study spiked matrix models, where a low-rank matrix is observed through a noisy channel. This problem with sparse structure of the spikes has attracted broad attention in the past literature. Here, we replace the sparsity assumption by generative modelling, and investigate the consequences on statistical and algorithmic properties. We analyze the Bayes-optimal performance under specific generative models for the spike. In contrast with the sparsity assumption, we do not observe regions of parameters where statistical performance is superior to the best known algorithmic performance. We show that in the analyzed cases the approximate message passing algorithm is able to reach optimal performance. We also design enhanced spectral algorithms and analyze their performance and thresholds using random matrix theory, showing their superiority to the classical principal component analysis. We complement our theoretical results by illustrating the performance of the spectral algorithms when the spikes come from real datasets. |
Tasks | Dimensionality Reduction |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12385v2 |
https://arxiv.org/pdf/1905.12385v2.pdf | |
PWC | https://paperswithcode.com/paper/the-spiked-matrix-model-with-generative |
Repo | https://github.com/benjaminaubin/StructuredPrior_demo |
Framework | none |
Two-Step Sound Source Separation: Training on Learned Latent Targets
Title | Two-Step Sound Source Separation: Training on Learned Latent Targets |
Authors | Efthymios Tzinis, Shrikant Venkataramani, Zhepei Wang, Cem Subakan, Paris Smaragdis |
Abstract | In this paper, we propose a two-step training procedure for source separation via a deep neural network. In the first step we learn a transform (and it’s inverse) to a latent space where masking-based separation performance using oracles is optimal. For the second step, we train a separation module that operates on the previously learned space. In order to do so, we also make use of a scale-invariant signal to distortion ratio (SI-SDR) loss function that works in the latent space, and we prove that it lower-bounds the SI-SDR in the time domain. We run various sound separation experiments that show how this approach can obtain better performance as compared to systems that learn the transform and the separation module jointly. The proposed methodology is general enough to be applicable to a large class of neural network end-to-end separation systems. |
Tasks | Speech Separation |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.09804v2 |
https://arxiv.org/pdf/1910.09804v2.pdf | |
PWC | https://paperswithcode.com/paper/two-step-sound-source-separation-training-on |
Repo | https://github.com/etzinis/two_step_mask_learning |
Framework | pytorch |
FreeAnchor: Learning to Match Anchors for Visual Object Detection
Title | FreeAnchor: Learning to Match Anchors for Visual Object Detection |
Authors | Xiaosong Zhang, Fang Wan, Chang Liu, Rongrong Ji, Qixiang Ye |
Abstract | Modern CNN-based object detectors assign anchors for ground-truth objects under the restriction of object-anchor Intersection-over-Unit (IoU). In this study, we propose a learning-to-match approach to break IoU restriction, allowing objects to match anchors in a flexible manner. Our approach, referred to as FreeAnchor, updates hand-crafted anchor assignment to “free” anchor matching by formulating detector training as a maximum likelihood estimation (MLE) procedure. FreeAnchor targets at learning features which best explain a class of objects in terms of both classification and localization. FreeAnchor is implemented by optimizing detection customized likelihood and can be fused with CNN-based detectors in a plug-and-play manner. Experiments on COCO demonstrate that FreeAnchor consistently outperforms their counterparts with significant margins. |
Tasks | Object Detection |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02466v2 |
https://arxiv.org/pdf/1909.02466v2.pdf | |
PWC | https://paperswithcode.com/paper/freeanchor-learning-to-match-anchors-for |
Repo | https://github.com/zhangxiaosong18/FreeAnchor |
Framework | pytorch |
Chasing Ghosts: Instruction Following as Bayesian State Tracking
Title | Chasing Ghosts: Instruction Following as Bayesian State Tracking |
Authors | Peter Anderson, Ayush Shrivastava, Devi Parikh, Dhruv Batra, Stefan Lee |
Abstract | A visually-grounded navigation instruction can be interpreted as a sequence of expected observations and actions an agent following the correct trajectory would encounter and perform. Based on this intuition, we formulate the problem of finding the goal location in Vision-and-Language Navigation (VLN) within the framework of Bayesian state tracking - learning observation and motion models conditioned on these expectable events. Together with a mapper that constructs a semantic spatial map on-the-fly during navigation, we formulate an end-to-end differentiable Bayes filter and train it to identify the goal by predicting the most likely trajectory through the map according to the instructions. The resulting navigation policy constitutes a new approach to instruction following that explicitly models a probability distribution over states, encoding strong geometric and algorithmic priors while enabling greater explainability. Our experiments show that our approach outperforms a strong LingUNet baseline when predicting the goal location on the map. On the full VLN task, i.e. navigating to the goal location, our approach achieves promising results with less reliance on navigation constraints. |
Tasks | |
Published | 2019-07-03 |
URL | https://arxiv.org/abs/1907.02022v2 |
https://arxiv.org/pdf/1907.02022v2.pdf | |
PWC | https://paperswithcode.com/paper/chasing-ghosts-instruction-following-as |
Repo | https://github.com/batra-mlp-lab/vln-chasing-ghosts |
Framework | none |
Zero-Shot Knowledge Distillation in Deep Networks
Title | Zero-Shot Knowledge Distillation in Deep Networks |
Authors | Gaurav Kumar Nayak, Konda Reddy Mopuri, Vaisakh Shaj, R. Venkatesh Babu, Anirban Chakraborty |
Abstract | Knowledge distillation deals with the problem of training a smaller model (Student) from a high capacity source model (Teacher) so as to retain most of its performance. Existing approaches use either the training data or meta-data extracted from it in order to train the Student. However, accessing the dataset on which the Teacher has been trained may not always be feasible if the dataset is very large or it poses privacy or safety concerns (e.g., bio-metric or medical data). Hence, in this paper, we propose a novel data-free method to train the Student from the Teacher. Without even using any meta-data, we synthesize the Data Impressions from the complex Teacher model and utilize these as surrogates for the original training data samples to transfer its learning to Student via knowledge distillation. We, therefore, dub our method “Zero-Shot Knowledge Distillation” and demonstrate that our framework results in competitive generalization performance as achieved by distillation using the actual training data samples on multiple benchmark datasets. |
Tasks | |
Published | 2019-05-20 |
URL | https://arxiv.org/abs/1905.08114v1 |
https://arxiv.org/pdf/1905.08114v1.pdf | |
PWC | https://paperswithcode.com/paper/zero-shot-knowledge-distillation-in-deep |
Repo | https://github.com/vcl-iisc/ZSKD |
Framework | tf |
Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network
Title | Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network |
Authors | Kun Xu, Liwei Wang, Mo Yu, Yansong Feng, Yan Song, Zhiguo Wang, Dong Yu |
Abstract | Previous cross-lingual knowledge graph (KG) alignment studies rely on entity embeddings derived only from monolingual KG structural information, which may fail at matching entities that have different facts in two KGs. In this paper, we introduce the topic entity graph, a local sub-graph of an entity, to represent entities with their contextual information in KG. From this view, the KB-alignment task can be formulated as a graph matching problem; and we further propose a graph-attention based solution, which first matches all entities in two topic entity graphs, and then jointly model the local matching information to derive a graph-level matching vector. Experiments show that our model outperforms previous state-of-the-art methods by a large margin. |
Tasks | Entity Embeddings, Graph Matching |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11605v3 |
https://arxiv.org/pdf/1905.11605v3.pdf | |
PWC | https://paperswithcode.com/paper/cross-lingual-knowledge-graph-alignment-via-1 |
Repo | https://github.com/nju-websoft/JAPE |
Framework | tf |
Agglomerative Attention
Title | Agglomerative Attention |
Authors | Matthew Spellings |
Abstract | Neural networks using transformer-based architectures have recently demonstrated great power and flexibility in modeling sequences of many types. One of the core components of transformer networks is the attention layer, which allows contextual information to be exchanged among sequence elements. While many of the prevalent network structures thus far have utilized full attention – which operates on all pairs of sequence elements – the quadratic scaling of this attention mechanism significantly constrains the size of models that can be trained. In this work, we present an attention model that has only linear requirements in memory and computation time. We show that, despite the simpler attention model, networks using this attention mechanism can attain comparable performance to full attention networks on language modeling tasks. |
Tasks | Language Modelling |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.06607v1 |
https://arxiv.org/pdf/1907.06607v1.pdf | |
PWC | https://paperswithcode.com/paper/agglomerative-attention |
Repo | https://github.com/adriangrepo/agglomerative_attention_scripts |
Framework | tf |
TraVeLGAN: Image-to-image Translation by Transformation Vector Learning
Title | TraVeLGAN: Image-to-image Translation by Transformation Vector Learning |
Authors | Matthew Amodio, Smita Krishnaswamy |
Abstract | Interest in image-to-image translation has grown substantially in recent years with the success of unsupervised models based on the cycle-consistency assumption. The achievements of these models have been limited to a particular subset of domains where this assumption yields good results, namely homogeneous domains that are characterized by style or texture differences. We tackle the challenging problem of image-to-image translation where the domains are defined by high-level shapes and contexts, as well as including significant clutter and heterogeneity. For this purpose, we introduce a novel GAN based on preserving intra-domain vector transformations in a latent space learned by a siamese network. The traditional GAN system introduced a discriminator network to guide the generator into generating images in the target domain. To this two-network system we add a third: a siamese network that guides the generator so that each original image shares semantics with its generated version. With this new three-network system, we no longer need to constrain the generators with the ubiquitous cycle-consistency restraint. As a result, the generators can learn mappings between more complex domains that differ from each other by large differences - not just style or texture. |
Tasks | Image-to-Image Translation |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1902.09631v1 |
http://arxiv.org/pdf/1902.09631v1.pdf | |
PWC | https://paperswithcode.com/paper/travelgan-image-to-image-translation-by |
Repo | https://github.com/KrishnaswamyLab/travelgan |
Framework | tf |
Point Cloud Oversegmentation with Graph-Structured Deep Metric Learning
Title | Point Cloud Oversegmentation with Graph-Structured Deep Metric Learning |
Authors | Loic Landrieu, Mohamed Boussaha |
Abstract | We propose a new supervized learning framework for oversegmenting 3D point clouds into superpoints. We cast this problem as learning deep embeddings of the local geometry and radiometry of 3D points, such that the border of objects presents high contrasts. The embeddings are computed using a lightweight neural network operating on the points’ local neighborhood. Finally, we formulate point cloud oversegmentation as a graph partition problem with respect to the learned embeddings. This new approach allows us to set a new state-of-the-art in point cloud oversegmentation by a significant margin, on a dense indoor dataset (S3DIS) and a sparse outdoor one (vKITTI). Our best solution requires over five times fewer superpoints to reach similar performance than previously published methods on S3DIS. Furthermore, we show that our framework can be used to improve superpoint-based semantic segmentation algorithms, setting a new state-of-the-art for this task as well. |
Tasks | Metric Learning, Semantic Segmentation |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.02113v1 |
http://arxiv.org/pdf/1904.02113v1.pdf | |
PWC | https://paperswithcode.com/paper/point-cloud-oversegmentation-with-graph |
Repo | https://github.com/loicland/superpoint_graph |
Framework | pytorch |
Predicting In-game Actions From the Language of NBA Players
Title | Predicting In-game Actions From the Language of NBA Players |
Authors | Nadav Oved, Amir Feder, Roi Reichart |
Abstract | Sports competitions are widely researched in computer and social science, with the goal of understanding how players act under uncertainty. While there is an abundance of computational work on player metrics prediction based on past performance, very few attempts to incorporate out-of-game signals have been made. Specifically, it was previously unclear whether linguistic signals gathered from players’ interviews can add information which does not appear in performance metrics. To bridge that gap, we define text classification tasks of predicting deviations from mean in NBA players’ in-game actions, which are associated with strategic choices, player behavior and risk, using their choice of language prior to the game. We collected a dataset of transcripts from key NBA players’ pre-game interviews and their in-game performance metrics, totaling in 5,226 interview-metric pairs. We design neural models for players’ action prediction based on increasingly more complex aspects of the language signals in their open-ended interviews. Our models can make their predictions based on the textual signal alone, or on a combination with signals from past-performance metrics. Our text-based models outperform strong baselines trained on performance metrics only, demonstrating the importance of language usage for action prediction. Moreover, the models that employ both textual input and past-performance metrics produced the best results. Finally, as neural networks are notoriously difficult to interpret, we propose a method for gaining further insight into what our models have learned. Particularly, we present an LDA-based analysis, where we interpret model predictions in terms of correlated topics. We find that our best performing textual model is most associated with topics that are intuitively related to each prediction task and that better models yield higher correlation with more informative topics. |
Tasks | Text Classification |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1910.11292v2 |
https://arxiv.org/pdf/1910.11292v2.pdf | |
PWC | https://paperswithcode.com/paper/predicting-in-game-actions-from-the-language |
Repo | https://github.com/nadavo/mood |
Framework | none |
AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
Title | AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss |
Authors | Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Mark Hasegawa-Johnson |
Abstract | Non-parallel many-to-many voice conversion, as well as zero-shot voice conversion, remain under-explored areas. Deep style transfer algorithms, such as generative adversarial networks (GAN) and conditional variational autoencoder (CVAE), are being applied as new solutions in this field. However, GAN training is sophisticated and difficult, and there is no strong evidence that its generated speech is of good perceptual quality. On the other hand, CVAE training is simple but does not come with the distribution-matching property of a GAN. In this paper, we propose a new style transfer scheme that involves only an autoencoder with a carefully designed bottleneck. We formally show that this scheme can achieve distribution-matching style transfer by training only on a self-reconstruction loss. Based on this scheme, we proposed AUTOVC, which achieves state-of-the-art results in many-to-many voice conversion with non-parallel data, and which is the first to perform zero-shot voice conversion. |
Tasks | Style Transfer, Voice Conversion |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05879v2 |
https://arxiv.org/pdf/1905.05879v2.pdf | |
PWC | https://paperswithcode.com/paper/zero-shot-voice-style-transfer-with-only |
Repo | https://github.com/liusongxiang/StarGAN-Voice-Conversion |
Framework | pytorch |