April 1, 2020

3087 words 15 mins read

Paper Group NANR 83

Paper Group NANR 83

Deep Relational Factorization Machines. Antifragile and Robust Heteroscedastic Bayesian Optimisation. Meta-Learning without Memorization. Truth or backpropaganda? An empirical investigation of deep learning theory. Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks. Inductive Matri …

Deep Relational Factorization Machines

Title Deep Relational Factorization Machines
Authors Anonymous
Abstract Factorization Machines (FMs) is an important supervised learning approach due to its unique ability to capture feature interactions when dealing with high-dimensional sparse data. However, FMs assume each sample is independently observed and hence incapable of exploiting the interactions among samples. On the contrary, Graph Neural Networks (GNNs) has become increasingly popular due to its strength at capturing the dependencies among samples. But unfortunately, it cannot efficiently handle high-dimensional sparse data, which is quite common in modern machine learning tasks. In this work, to leverage their complementary advantages and yet overcome their issues, we proposed a novel approach, namely Deep Relational Factorization Machines, which can capture both the feature interaction and the sample interaction. In particular, we disclosed the relationship between the feature interaction and the graph, which opens a brand new avenue to deal with high-dimensional features. Finally, we demonstrate the effectiveness of the proposed approach with experiments on several real-world datasets.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HJgySxSKvB
PDF https://openreview.net/pdf?id=HJgySxSKvB
PWC https://paperswithcode.com/paper/deep-relational-factorization-machines
Repo
Framework

Antifragile and Robust Heteroscedastic Bayesian Optimisation

Title Antifragile and Robust Heteroscedastic Bayesian Optimisation
Authors Anonymous
Abstract Bayesian Optimisation is an important decision-making tool for high-stakes applications in drug discovery and materials design. An oft-overlooked modelling consideration however is the representation of input-dependent or heteroscedastic aleatoric uncertainty. The cost of misrepresenting this uncertainty as being homoscedastic could be high in drug discovery applications where neglecting heteroscedasticity in high throughput virtual screening could lead to a failed drug discovery program. In this paper, we propose a heteroscedastic Bayesian Optimisation scheme which both represents and optimises aleatoric noise in the suggestions. We consider cases such as drug discovery where we would like to minimise or be robust to aleatoric uncertainty but also applications such as materials discovery where it may be beneficial to maximise or be antifragile to aleatoric uncertainty. Our scheme features a heteroscedastic Gaussian Process (GP) as the surrogate model in conjunction with two acquisition heuristics. First, we extend the augmented expected improvement (AEI) heuristic to the heteroscedastic setting and second, we introduce a new acquisition function, aleatoric-penalised expected improvement (ANPEI) based on a simple scalarisation of the performance and noise objective. Both methods are capable of penalising or promoting aleatoric noise in the suggestions and yield improved performance relative to a naive implementation of homoscedastic Bayesian Optimisation on toy problems as well as a real-world optimisation problem.
Tasks Bayesian Optimisation, Decision Making, Drug Discovery
Published 2020-01-01
URL https://openreview.net/forum?id=B1lTqgSFDH
PDF https://openreview.net/pdf?id=B1lTqgSFDH
PWC https://paperswithcode.com/paper/antifragile-and-robust-heteroscedastic
Repo
Framework

Meta-Learning without Memorization

Title Meta-Learning without Memorization
Authors Anonymous
Abstract The ability to learn new concepts with small amounts of data is a critical aspect of intelligence that has proven challenging for deep learning methods. Meta-learning has emerged as a promising technique for leveraging data from previous tasks to enable efficient learning of new tasks. However, most meta-learning algorithms implicitly require that the meta-training tasks be “mutually-exclusive”, such that no single model can solve all of the tasks at once. For example when creating tasks for a meta-learned N-way image classifier, we typically randomize the assignment of the image classes to N-way classification labels for each task. If this is not done, the meta-learner can ignore the task training data and learn a single model that performs all of the meta-training tasks zero-shot, but does not adapt effectively to new image classes. This requirement means that the user must take great care in designing the tasks, for example by shuffling labels or removing task identifying information from the inputs. In some domains, this makes meta-learning entirely inapplicable. In this paper, we address this challenge by designing an information-theoretic meta-regularization objective that places precedence on data-driven adaptation. This causes the meta-learner to decide what should be learned from data and what must be inferred from the input. By doing so, our algorithm can successfully use data from “non-mutually-exclusive” tasks to efficiently adapt to novel tasks. We demonstrate its applicability to both contextual and gradient-based meta-learning algorithms, and apply it in practical settings where standard meta-learning has been difficult to apply. Our approach substantially outperforms standard meta-learning algorithms in these settings.
Tasks Meta-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=BklEFpEYwS
PDF https://openreview.net/pdf?id=BklEFpEYwS
PWC https://paperswithcode.com/paper/meta-learning-without-memorization
Repo
Framework

Truth or backpropaganda? An empirical investigation of deep learning theory

Title Truth or backpropaganda? An empirical investigation of deep learning theory
Authors Anonymous
Abstract We empirically evaluate common assumptions about neural networks that are widely held by practitioners and theorists alike. We study the prevalence of local minima in loss landscapes, whether small-norm parameter vectors generalize better (and whether this explains the advantages of weight decay), whether wide-network theories (like the neural tangent kernel) describe the behaviors of classifiers, and whether the rank of weight matrices can be linked to generalization and robustness in real-world networks.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HyxyIgHFvr
PDF https://openreview.net/pdf?id=HyxyIgHFvr
PWC https://paperswithcode.com/paper/truth-or-backpropaganda-an-empirical
Repo
Framework

Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks

Title Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks
Authors Anonymous
Abstract Recent work has revealed that overparameterized networks trained by gradient descent achieve arbitrarily low training error, and sometimes even low test error. The required width, however, is always polynomial in at least one of the sample size n, the (inverse) training error 1/epsilon, and the (inverse) failure probability 1/delta. This work shows that O(1/epsilon) iterations of gradient descent on two-layer networks of any width exceeding polylog(n, 1/epsilon, 1/delta) and Omega(1/epsilon^2) training examples suffices to achieve a test error of epsilon. The analysis further relies upon a margin property of the limiting kernel, which is guaranteed positive, and can distinguish between true labels and random labels.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HygegyrYwH
PDF https://openreview.net/pdf?id=HygegyrYwH
PWC https://paperswithcode.com/paper/polylogarithmic-width-suffices-for-gradient-1
Repo
Framework

Inductive Matrix Completion Based on Graph Neural Networks

Title Inductive Matrix Completion Based on Graph Neural Networks
Authors Anonymous
Abstract We propose an inductive matrix completion model without using side information. By factorizing the (rating) matrix into the product of low-dimensional latent embeddings of rows (users) and columns (items), a majority of existing matrix completion methods are transductive, since the learned embeddings cannot generalize to unseen rows/columns or to new matrices. To make matrix completion inductive, content (side information), such as user’s age or movie’s genre, has to be used previously. However, high-quality content is not always available, and can be hard to extract. Under the extreme setting where not any side information is available other than the matrix to complete, can we still learn an inductive matrix completion model? In this paper, we investigate this seemingly impossible problem and propose an Inductive Graph-based Matrix Completion (IGMC) model without using any side information. It trains a graph neural network (GNN) based purely on local subgraphs around (user, item) pairs generated from the rating matrix and maps these subgraphs to their corresponding ratings. Our model achieves highly competitive performance with state-of-the-art transductive baselines. In addition, since our model is inductive, it can generalize to users/items unseen during the training (given that their ratings exist), and can even transfer to new tasks. Our transfer learning experiments show that a model trained out of the MovieLens dataset can be directly used to predict Douban movie ratings and works surprisingly well. Our work demonstrates that: 1) it is possible to train inductive matrix completion models without using any side information while achieving state-of-the-art performance; 2) local graph patterns around a (user, item) pair are effective predictors of the rating this user gives to the item; and 3) we can transfer models trained on existing recommendation tasks to new tasks without any retraining.
Tasks Matrix Completion, Transfer Learning
Published 2020-01-01
URL https://openreview.net/forum?id=ByxxgCEYDS
PDF https://openreview.net/pdf?id=ByxxgCEYDS
PWC https://paperswithcode.com/paper/inductive-matrix-completion-based-on-graph
Repo
Framework

RGBD-GAN: Unsupervised 3D Representation Learning From Natural Image Datasets via RGBD Image Synthesis

Title RGBD-GAN: Unsupervised 3D Representation Learning From Natural Image Datasets via RGBD Image Synthesis
Authors Anonymous
Abstract Understanding three-dimensional (3D) geometries from two-dimensional (2D) images without any labeled information is promising for understanding the real world without incurring annotation cost. We herein propose a novel generative model, RGBD-GAN, which achieves unsupervised 3D representation learning from 2D images. The proposed method enables camera parameter conditional image generation and depth image generation without any 3D annotations such as camera poses or depth. We used an explicit 3D consistency loss for two RGBD images generated from different camera parameters in addition to the ordinal GAN objective. The loss is simple yet effective for any type of image generator such as the DCGAN and StyleGAN to be conditioned on camera parameters. We conducted experiments and demonstrated that the proposed method could learn 3D representations from 2D images with various generator architectures.
Tasks Conditional Image Generation, Image Generation, Representation Learning
Published 2020-01-01
URL https://openreview.net/forum?id=HyxjNyrtPr
PDF https://openreview.net/pdf?id=HyxjNyrtPr
PWC https://paperswithcode.com/paper/rgbd-gan-unsupervised-3d-representation
Repo
Framework

Exploring Model-based Planning with Policy Networks

Title Exploring Model-based Planning with Policy Networks
Authors Anonymous
Abstract Model-based reinforcement learning (MBRL) with model-predictive control or online planning has shown great potential for locomotion control tasks in both sample efficiency and asymptotic performance. Despite the successes, the existing planning methods search from candidate sequences randomly generated in the action space, which is inefficient in complex high-dimensional environments. In this paper, we propose a novel MBRL algorithm, model-based policy planning (POPLIN), that combines policy networks with online planning. More specifically, we formulate action planning at each time-step as an optimization problem using neural networks. We experiment with both optimization w.r.t. the action sequences initialized from the policy network, and also online optimization directly w.r.t. the parameters of the policy network. We show that POPLIN obtains state-of-the-art performance in the MuJoCo benchmarking environments, being about 3x more sample efficient than the state-of-the-art algorithms, such as PETS, TD3 and SAC. To explain the effectiveness of our algorithm, we show that the optimization surface in parameter space is smoother than in action space. Further more, we found the distilled policy network can be effectively applied without the expansive model predictive control during test time for some environments such as Cheetah. Code is released.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=H1exf64KwH
PDF https://openreview.net/pdf?id=H1exf64KwH
PWC https://paperswithcode.com/paper/exploring-model-based-planning-with-policy-1
Repo
Framework

Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations

Title Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations
Authors Anonymous
Abstract We propose precision gating (PG), an end-to-end trainable dual-precision quantization technique for deep neural networks. PG computes most features in a low precision and only a small proportion of important features in a higher precision. Precision gating is very lightweight and widely applicable to many neural network architectures. Experimental results show that precision gating can greatly reduce the average bitwidth of computations in both CNNs and LSTMs with negligible accuracy loss. Compared to state-of-the-art counterparts, PG achieves the same or better accuracy with 2.4× less compute on ImageNet. Compared to 8-bit uniform quantization, PG obtains a 1.2% improvement in perplexity per word with 2.8× computational cost reduction on LSTM on the Penn Tree Bank dataset. Precision gating has the potential to greatly reduce the execution costs of DNNs on both commodity and dedicated hardware accelerators. We implement the sampled dense-dense matrix multiplication kernel in PG on CPU, which achieves up to 8.3× wall clock speedup over the dense baseline.
Tasks Quantization
Published 2020-01-01
URL https://openreview.net/forum?id=SJgVU0EKwS
PDF https://openreview.net/pdf?id=SJgVU0EKwS
PWC https://paperswithcode.com/paper/precision-gating-improving-neural-network
Repo
Framework

Variance Reduced Local SGD with Lower Communication Complexity

Title Variance Reduced Local SGD with Lower Communication Complexity
Authors Anonymous
Abstract To accelerate the training of machine learning models, distributed stochastic gradient descent (SGD) and its variants have been widely adopted, which apply multiple workers in parallel to speed up training. Among them, Local SGD has gained much attention due to its lower communication cost. Nevertheless, when the data distribution on workers is non-identical, Local SGD requires $O(T^{\frac{3}{4}} N^{\frac{3}{4}})$ communications to maintain its \emph{linear iteration speedup} property, where $T$ is the total number of iterations and $N$ is the number of workers. In this paper, we propose Variance Reduced Local SGD (VRL-SGD) to further reduce the communication complexity. Benefiting from eliminating the dependency on the gradient variance among workers, we theoretically prove that VRL-SGD achieves a \emph{linear iteration speedup} with a lower communication complexity $O(T^{\frac{1}{2}} N^{\frac{3}{2}})$ even if workers access non-identical datasets. We conduct experiments on three machine learning tasks, and the experimental results demonstrate that VRL-SGD performs impressively better than Local SGD when the data among workers are quite diverse.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=S1lXnhVKPr
PDF https://openreview.net/pdf?id=S1lXnhVKPr
PWC https://paperswithcode.com/paper/variance-reduced-local-sgd-with-lower
Repo
Framework

Selection via Proxy: Efficient Data Selection for Deep Learning

Title Selection via Proxy: Efficient Data Selection for Deep Learning
Authors Anonymous
Abstract Data selection methods, such as active learning and core-set selection, are useful tools for machine learning on large datasets, but they can be prohibitively expensive to apply in deep learning. Unlike in other areas of machine learning, the feature representations that these techniques depend on are learned in deep learning rather than given, requiring substantial training times. In this work, we show that we can greatly improve the computational efficiency of data selection in deep learning by using a small proxy model to perform data selection (e.g., selecting data points to label for active learning). By removing hidden layers from the target model or training for fewer epochs, we create proxies that are an order of magnitude faster to train. Although these small proxy models have higher error rates, we find that they empirically provide useful signal for data selection. We evaluate this “selection via proxy” (SVP) approach on several data selection tasks across five datasets: CIFAR10, CIFAR100, ImageNet, Amazon Review Polarity, and Amazon Review Full. For active learning, applying SVP can give an order of magnitude improvement in data selection runtime (i.e., the time it takes to repeatedly train and select points) without significantly increasing the final error. For core-set selection, proxies that are over 10x faster to train than their larger, more accurate target models can remove up to 50% of the data without harming the final accuracy of the target, making end-to-end training time savings possible.
Tasks Active Learning
Published 2020-01-01
URL https://openreview.net/forum?id=HJg2b0VYDr
PDF https://openreview.net/pdf?id=HJg2b0VYDr
PWC https://paperswithcode.com/paper/selection-via-proxy-efficient-data-selection-1
Repo
Framework

Variational Template Machine for Data-to-Text Generation

Title Variational Template Machine for Data-to-Text Generation
Authors Anonymous
Abstract How to generate descriptions from structured data organized in tables? Existing approaches using neural encoder-decoder models often suffer from lacking diversity. We claim that an open set of templates is crucial for enriching the phrase constructions and realizing varied generations.Learning such templates is prohibitive since it often requires a large paired <table,description>, which is seldom available. This paper explores the problem of automatically learning reusable “templates” from paired and non-paired data. We propose the variational template machine (VTM), a novel method to generate text descriptions from data tables. Our contributions include: a) we carefully devise a specific model architecture and losses to explicitly disentangle text template and semantic content information, in the latent spaces, and b) we utilize both small parallel data and large raw text without aligned tables to enrich the template learning. Experiments on datasets from a variety of different domains show that VTM is able generate more diversely while keeping a good fluency and quality.
Tasks Data-to-Text Generation, Text Generation
Published 2020-01-01
URL https://openreview.net/forum?id=HkejNgBtPB
PDF https://openreview.net/pdf?id=HkejNgBtPB
PWC https://paperswithcode.com/paper/variational-template-machine-for-data-to-text
Repo
Framework

Make Lead Bias in Your Favor: A Simple and Effective Method for News Summarization

Title Make Lead Bias in Your Favor: A Simple and Effective Method for News Summarization
Authors Anonymous
Abstract Lead bias is a common phenomenon in news summarization, where early parts of an article often contain the most salient information. While many algorithms exploit this fact in summary generation, it has a detrimental effect on teaching the model to discriminate and extract important information. We propose that the lead bias can be leveraged in a simple and effective way in our favor to pretrain abstractive news summarization models on large-scale unlabelled corpus: predicting the leading sentences using the rest of an article. Via careful data cleaning and filtering, our transformer-based pretrained model without any finetuning achieves remarkable results over various news summarization tasks. With further finetuning, our model outperforms many competitive baseline models. For example, the pretrained model without finetuning outperforms pointer-generator network on CNN/DailyMail dataset. The finetuned model obtains 2.8% higher ROUGE-1, 1.3% higher ROUGE-2 and 1.7% higher ROUGE-L scores than the state-of-the-art model on XSum dataset.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BJgWG0iUvr
PDF https://openreview.net/pdf?id=BJgWG0iUvr
PWC https://paperswithcode.com/paper/make-lead-bias-in-your-favor-a-simple-and
Repo
Framework

Transition Based Dependency Parser for Amharic Language Using Deep Learning

Title Transition Based Dependency Parser for Amharic Language Using Deep Learning
Authors Mizanu Zelalem, Million Meshesha (PhD)
Abstract Researches shows that attempts done to apply existing dependency parser on morphological rich languages including Amharic shows a poor performance. In this study, a dependency parser for Amharic language is implemented using arc-eager transition system and LSTM network. The study introduced another way of building labeled dependency structure by using a separate network model to predict dependency relation. This helps the number of classes to decrease from 2n+2 into n, where n is the number of relationship types in the language and increases the number of examples for each class in the data set. Evaluation of the parser model results 91.54 and 81.4 unlabeled and labeled attachment score respectively. The major challenge in this study was the decrease of the accuracy of labeled attachment score. This is mainly due to the size and quality of the tree-bank available for Amharic language. Improving the tree-bank by increasing the size and by adding morphological information can make the performance of parser better.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=B1lOraEFPB
PDF https://openreview.net/pdf?id=B1lOraEFPB
PWC https://paperswithcode.com/paper/transition-based-dependency-parser-for
Repo
Framework

FSPool: Learning Set Representations with Featurewise Sort Pooling

Title FSPool: Learning Set Representations with Featurewise Sort Pooling
Authors Anonymous
Abstract Traditional set prediction models can struggle with simple datasets due to an issue we call the responsibility problem. We introduce a pooling method for sets of feature vectors based on sorting features across elements of the set. This can be used to construct a permutation-equivariant auto-encoder that avoids this responsibility problem. On a toy dataset of polygons and a set version of MNIST, we show that such an auto-encoder produces considerably better reconstructions and representations. Replacing the pooling function in existing set encoders with FSPool improves accuracy and convergence speed on a variety of datasets.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HJgBA2VYwH
PDF https://openreview.net/pdf?id=HJgBA2VYwH
PWC https://paperswithcode.com/paper/fspool-learning-set-representations-with-1
Repo
Framework
comments powered by Disqus