April 1, 2020

2853 words 14 mins read

Paper Group NAWR 3

Paper Group NAWR 3

Why ADAM Beats SGD for Attention Models. Hindsight Trust Region Policy Optimization. Decentralized Distributed PPO: Mastering PointGoal Navigation. ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning. Bootstrapping the Expressivity with Model-based Planning. Automated Relational Meta-learning. Empirical Bayes Transductive Meta-Learn …

Why ADAM Beats SGD for Attention Models

Title Why ADAM Beats SGD for Attention Models
Authors Anonymous
Abstract While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning, adaptive methods like Adam have been observed to outperform SGD across important tasks, such as attention models. The settings under which SGD performs poorly in comparison to Adam are not well understood yet. In this paper, we provide empirical and theoretical evidence that a heavy-tailed distribution of the noise in stochastic gradients is a root cause of SGD’s poor performance. Based on this observation, we study clipped variants of SGD that circumvent this issue; we then analyze their convergence under heavy-tailed noise. Furthermore, we develop a new adaptive coordinate-wise clipping algorithm (ACClip) tailored to such settings. Subsequently, we show how adaptive methods like Adam can be viewed through the lens of clipping, which helps us explain Adam’s strong performance under heavy-tail noise settings. Finally, we show that the proposed ACClip outperforms Adam for both BERT pretraining and finetuning tasks.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SJx37TEtDH
PDF https://openreview.net/pdf?id=SJx37TEtDH
PWC https://paperswithcode.com/paper/why-adam-beats-sgd-for-attention-models
Repo https://github.com/rivercold/ACClip-Pytorch
Framework pytorch

Hindsight Trust Region Policy Optimization

Title Hindsight Trust Region Policy Optimization
Authors Anonymous
Abstract As reinforcement learning continues to drive machine intelligence beyond its conventional boundary, unsubstantial practices in sparse reward environment severely limit further applications in a broader range of advanced fields. Motivated by the demand for an effective deep reinforcement learning algorithm that accommodates sparse reward environment, this paper presents Hindsight Trust Region Policy Optimization (HTRPO), a method that efficiently utilizes interactions in sparse reward conditions to optimize policies within trust region and, in the meantime, maintains learning stability. Firstly, we theoretically adapt the TRPO objective function, in the form of the expected return of the policy, to the distribution of hindsight data generated from the alternative goals. Then, we apply Monte Carlo with importance sampling to estimate KL-divergence between two policies, taking the hindsight data as input. Under the condition that the distributions are sufficiently close, the KL-divergence is approximated by another f-divergence. Such approximation results in the decrease of variance and alleviates the instability during policy update. Experimental results on both discrete and continuous benchmark tasks demonstrate that HTRPO converges significantly faster than previous policy gradient methods. It achieves effective performances and high data-efficiency for training policies in sparse reward environments.
Tasks Policy Gradient Methods
Published 2020-01-01
URL https://openreview.net/forum?id=rylCP6NFDB
PDF https://openreview.net/pdf?id=rylCP6NFDB
PWC https://paperswithcode.com/paper/hindsight-trust-region-policy-optimization-1
Repo https://github.com/HTRPOCODES/HTRPO-v2
Framework pytorch

Decentralized Distributed PPO: Mastering PointGoal Navigation

Title Decentralized Distributed PPO: Mastering PointGoal Navigation
Authors Anonymous
Abstract We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever “stale”), making it conceptually simple and easy to implement. In our experiments on training virtual robots to navigate in Habitat-Sim, DD-PPO exhibits near-linear scaling – achieving a speedup of 107x on 128 GPUs over a serial implementation. We leverage this scaling to train an agent for 2.5 Billion steps of experience (the equivalent of 80 years of human experience) – over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs. This massive-scale training not only sets the state of art on Habitat Autonomous Navigation Challenge 2019, but essentially “solves” the task – near-perfect autonomous navigation in an unseen environment without access to a map, directly from an RGB-D camera and a GPS+Compass sensor. Fortuitously, error vs computation exhibits a power-law-like distribution; thus, 90% of peak performance is obtained relatively early (at 100 million steps) and relatively cheaply (under 1 day with 8 GPUs). Finally, we show that the scene understanding and navigation policies learned can be transferred to other navigation tasks – the analog of “ImageNet pre-training + task-specific fine-tuning” for embodied AI. Our model outperforms ImageNet pre-trained CNNs on these transfer tasks and can serve as a universal resource (all models + code will be publicly available).
Tasks Autonomous Navigation, PointGoal Navigation
Published 2020-01-01
URL https://openreview.net/forum?id=H1gX8C4YPr
PDF https://openreview.net/pdf?id=H1gX8C4YPr
PWC https://paperswithcode.com/paper/decentralized-distributed-ppo-mastering
Repo https://github.com/facebookresearch/habitat-api/tree/master/habitat_baselines/rl/ddppo
Framework pytorch

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning

Title ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
Authors Anonymous
Abstract Recent powerful pre-trained language models have achieved remarkable performance on most of the popular datasets for reading comprehension. It is time to introduce more challenging datasets to push the development of this field towards more comprehensive reasoning of text. In this paper, we introduce a new Reading Comprehension dataset requiring logical reasoning (ReClor) extracted from standardized graduate admission examinations. As earlier studies suggest, human-annotated datasets usually contain biases, which are often exploited by models to achieve high accuracy without truly understanding the text. In order to comprehensively evaluate the logical reasoning ability of models on ReClor, we propose to identify biased data points and separate them into EASY set while the rest as HARD set. Empirical results show that the state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set. However, they struggle on HARD set with poor performance near that of random guess, indicating more research is needed to essentially enhance the logical reasoning ability of current models.
Tasks Reading Comprehension
Published 2020-01-01
URL https://openreview.net/forum?id=HJgJtT4tvB
PDF https://openreview.net/pdf?id=HJgJtT4tvB
PWC https://paperswithcode.com/paper/reclor-a-reading-comprehension-dataset
Repo https://github.com/yuweihao/reclor
Framework pytorch

Bootstrapping the Expressivity with Model-based Planning

Title Bootstrapping the Expressivity with Model-based Planning
Authors Anonymous
Abstract We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, $Q$-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal $Q$-functions and policies are much more complex than the dynamics. We hypothesize many real-world MDPs also have a similar property. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak $Q$-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on MuJoCo benchmark tasks.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=Hye4WaVYwr
PDF https://openreview.net/pdf?id=Hye4WaVYwr
PWC https://paperswithcode.com/paper/bootstrapping-the-expressivity-with-model-1
Repo https://github.com/roosephu/boots
Framework tf

Automated Relational Meta-learning

Title Automated Relational Meta-learning
Authors Anonymous
Abstract In order to efficiently learn with small amount of data on new tasks, meta-learning transfers knowledge learned from previous tasks to the new ones. However, a critical challenge in meta-learning is the task heterogeneity which cannot be well handled by traditional globally shared meta-learning methods. In addition, current task-specific meta-learning methods may either suffer from hand-crafted structure design or lack the capability to capture complex relations between tasks. In this paper, motivated by the way of knowledge organization in knowledge bases, we propose an automated relational meta-learning (ARML) framework that automatically extracts the cross-task relations and constructs the meta-knowledge graph. When a new task arrives, it can quickly find the most relevant structure and tailor the learned structure knowledge to the meta-learner. As a result, the proposed framework not only addresses the challenge of task heterogeneity by a learned meta-knowledge graph, but also increases the model interpretability. We conduct extensive experiments on 2D toy regression and few-shot image classification and the results demonstrate the superiority of ARML over state-of-the-art baselines.
Tasks Few-Shot Image Classification, Image Classification, Meta-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=rklp93EtwH
PDF https://openreview.net/pdf?id=rklp93EtwH
PWC https://paperswithcode.com/paper/automated-relational-meta-learning
Repo https://github.com/huaxiuyao/ARML
Framework none

Empirical Bayes Transductive Meta-Learning with Synthetic Gradients

Title Empirical Bayes Transductive Meta-Learning with Synthetic Gradients
Authors Anonymous
Abstract We propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging unlabeled information in the query set to learn a more powerful meta-model. To develop our framework we revisit the empirical Bayes formulation for multi-task learning. The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a sum of local KL divergences between the variational posterior and the true posterior of each task. We derive a novel amortized variational inference that couples all the variational posteriors into a meta-model, which consists of a synthetic gradient network and an initialization network. The combination of local KL divergences and synthetic gradient network allows for backpropagating information from unlabeled data, thereby enabling transduction. Our results on the Mini-ImageNet and CIFAR-FS benchmarks for episodic few-shot classification significantly outperform previous state-of-the-art methods.
Tasks Few-Shot Image Classification, Meta-Learning, Multi-Task Learning
Published 2020-01-01
URL https://openreview.net/forum?id=Hkg-xgrYvH
PDF https://openreview.net/pdf?id=Hkg-xgrYvH
PWC https://paperswithcode.com/paper/empirical-bayes-transductive-meta-learning
Repo https://github.com/hushell/sib_meta_learn
Framework pytorch

Spectral Embedding of Regularized Block Models

Title Spectral Embedding of Regularized Block Models
Authors Anonymous
Abstract Spectral embedding is a popular technique for the representation of graph data. Several regularization techniques have been proposed to improve the quality of the embedding with respect to downstream tasks like clustering. In this paper, we explain on a simple block model the impact of the complete graph regularization, whereby a constant is added to all entries of the adjacency matrix. Specifically, we show that the regularization forces the spectral embedding to focus on the largest blocks, making the representation less sensitive to noise or outliers. We illustrate these results on both on both synthetic and real data, showing how regularization improves standard clustering scores.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=H1l_0JBYwS
PDF https://openreview.net/pdf?id=H1l_0JBYwS
PWC https://paperswithcode.com/paper/spectral-embedding-of-regularized-block
Repo https://github.com/research-submissions/iclr20
Framework none

Mixed-curvature Variational Autoencoders

Title Mixed-curvature Variational Autoencoders
Authors Anonymous
Abstract It has been shown that using geometric spaces with non-zero curvature instead of plain Euclidean spaces with zero curvature improves performance on a range of Machine Learning tasks for learning representations. Recent work has leveraged these geometries to learn latent variable models like Variational Autoencoders (VAEs) in spherical and hyperbolic spaces with constant curvature. While these approaches work well on particular kinds of data that they were designed for e.g.~tree-like data for a hyperbolic VAE, there exists no generic approach unifying all three models. We develop a Mixed-curvature Variational Autoencoder, an efficient way to train a VAE whose latent space is a product of constant curvature Riemannian manifolds, where the per-component curvature can be learned. This generalizes the Euclidean VAE to curved latent spaces, as the model essentially reduces to the Euclidean VAE if curvatures of all latent space components go to 0.
Tasks Latent Variable Models
Published 2020-01-01
URL https://openreview.net/forum?id=S1g6xeSKDS
PDF https://openreview.net/pdf?id=S1g6xeSKDS
PWC https://paperswithcode.com/paper/mixed-curvature-variational-autoencoders
Repo https://github.com/oskopek/mvae
Framework pytorch

FINBERT: FINANCIAL SENTIMENT ANALYSIS WITH PRE-TRAINED LANGUAGE MODELS

Title FINBERT: FINANCIAL SENTIMENT ANALYSIS WITH PRE-TRAINED LANGUAGE MODELS
Authors Anonymous
Abstract While many sentiment classification solutions report high accuracy scores in product or movie review datasets, the performance of the methods in niche domains such as finance still largely falls behind. The reason of this gap is the domain-specific language, which decreases the applicability of existing models, and lack of quality labeled data to learn the new context of positive and negative in the specific domain. Transfer learning has been shown to be successful in adapting to new domains without large training data sets. In this paper, we explore the effectiveness of NLP transfer learning in financial sentiment classification. We introduce FinBERT, a language model based on BERT, which improved the state-of-the-art performance by 14 percentage points for a financial sentiment classification task in FinancialPhrasebank dataset.
Tasks Language Modelling, Sentiment Analysis, Transfer Learning
Published 2020-01-01
URL https://openreview.net/forum?id=HylznxrYDr
PDF https://openreview.net/pdf?id=HylznxrYDr
PWC https://paperswithcode.com/paper/finbert-financial-sentiment-analysis-with-pre-1
Repo https://github.com/ProsusAI/finBERT
Framework none

Smartphone Modulated Colorimetric Reader with Color Subtration

Title Smartphone Modulated Colorimetric Reader with Color Subtration
Authors Y. Zhao, S.Y. Choi, J. Lou-Franco, J.L.D. Nelis, H. Zhou, C. Cao, K. Campbell, C. Elliott, K. Rafferty
Abstract Color analysis has been essential for the interpretation of optical readouts, e.g. colorimetry, fluorescence, spectroscopy, and scanometry. However, existing colorimetric readers can hardly eliminate the color interference of colored solutions, e.g., interpreting pH test strips to assess the pH value of red wine. This paper introduces a smartphone modulated colorimetric reader that is compatible with most smartphone models and a novel color subtraction algorithm that eliminates color interferences due to colored solutions. Experiments were conducted to validate the effectiveness of the developed reader and algorithm on evaluating pH test strips produced from transparent and colored solutions using multiple smartphone models. Applicability of the developed reader was demonstrated through its interpretation of pH test strips measuring pH values of colored and non-transparent food samples including red wine and milk.
Tasks
Published 2020-01-26
URL https://ieeexplore.ieee.org/document/8956565
PDF https://ieeexplore.ieee.org/document/8956565
PWC https://paperswithcode.com/paper/smartphone-modulated-colorimetric-reader-with
Repo https://github.com/zyfccc/Smartphone-Modulated-Colorimetric-Reader-with-Color-Subtraction-IEEE-Sensors-2019
Framework tf

The Shape of Data: Intrinsic Distance for Data Distributions

Title The Shape of Data: Intrinsic Distance for Data Distributions
Authors Anonymous
Abstract The ability to represent and compare machine learning models is crucial in order to quantify subtle model changes, evaluate generative models, and gather insights on neural network architectures. Existing techniques for comparing data distributions focus on global data properties such as mean and covariance; in that sense, they are extrinsic and uni-scale. We develop a first-of-its-kind intrinsic and multi-scale method for characterizing and comparing data manifolds, using a lower-bound of the spectral variant of the Gromov-Wasserstein inter-manifold distance, which compares all data moments. In a thorough experimental study, we demonstrate that our method effectively discerns the structure of data manifolds even on unaligned data of different dimensionalities; moreover, we showcase its efficacy in evaluating the quality of generative models.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HyebplHYwB
PDF https://openreview.net/pdf?id=HyebplHYwB
PWC https://paperswithcode.com/paper/the-shape-of-data-intrinsic-distance-for-data
Repo https://github.com/imd-iclr/imd
Framework none

Targeted sampling of enlarged neighborhood via Monte Carlo tree search for TSP

Title Targeted sampling of enlarged neighborhood via Monte Carlo tree search for TSP
Authors Zhang-Hua Fu, Kai-Bin Qiu, Meng Qiu, Hongyuan Zha
Abstract The travelling salesman problem (TSP) is a well-known combinatorial optimization problem with a variety of real-life applications. We tackle TSP by incorporating machine learning methodology and leveraging the variable neighborhood search strategy. More precisely, the search process is considered as a Markov decision process (MDP), where a 2-opt local search is used to search within a small neighborhood, while a Monte Carlo tree search (MCTS) method (which iterates through simulation, selection and back-propagation steps), is used to sample a number of targeted actions within an enlarged neighborhood. This new paradigm clearly distinguishes itself from the existing machine learning (ML) based paradigms for solving the TSP, which either uses an end-to-end ML model, or simply applies traditional techniques after ML for post optimization. Experiments based on two public data sets show that, our approach clearly dominates all the existing learning based TSP algorithms in terms of performance, demonstrating its high potential on the TSP. More importantly, as a general framework without complicated hand-crafted rules, it can be readily extended to many other combinatorial optimization problems.
Tasks Combinatorial Optimization
Published 2020-01-01
URL https://openreview.net/forum?id=ByxtHCVKwB
PDF https://openreview.net/pdf?id=ByxtHCVKwB
PWC https://paperswithcode.com/paper/targeted-sampling-of-enlarged-neighborhood
Repo https://github.com/Spider-scnu/Monte-Carlo-tree-search-for-TSP
Framework none

Generative Models for Effective ML on Private, Decentralized Datasets

Title Generative Models for Effective ML on Private, Decentralized Datasets
Authors Anonymous
Abstract To improve real-world applications of machine learning, experienced modelers develop intuition about their datasets, their models, and how the two interact. Manual inspection of raw data—of representative samples, of outliers, of misclassifications—is an essential tool in a) identifying and fixing problems in the data, b) generating new modeling hypotheses, and c) assigning or refining human-provided labels. However, manual data inspection is risky for privacy-sensitive datasets, such as those representing the behavior of real-world individuals. Furthermore, manual data inspection is impossible in the increasingly important setting of federated learning, where raw examples are stored at the edge and the modeler may only access aggregated outputs such as metrics or model parameters. This paper demonstrates that generative models—trained using federated methods and with formal differential privacy guarantees—can be used effectively to debug data issues even when the data cannot be directly inspected. We explore these methods in applications to text with differentially private federated RNNs and to images using a novel algorithm for differentially private federated GANs.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SJgaRA4FPH
PDF https://openreview.net/pdf?id=SJgaRA4FPH
PWC https://paperswithcode.com/paper/generative-models-for-effective-ml-on-private
Repo https://github.com/tensorflow/gan
Framework tf

Scale-Equivariant Steerable Networks

Title Scale-Equivariant Steerable Networks
Authors Anonymous
Abstract The effectiveness of Convolutional Neural Networks (CNNs) has been substantially attributed to their built-in property of translation equivariance. However, CNNs do not have embedded mechanisms to handle other types of transformations. In this work, we pay attention to scale changes, which regularly appear in various tasks due to the changing distances between the objects and the camera. First, we introduce the general theory for building scale-equivariant convolutional networks with steerable filters. We develop scale-convolution and generalize other common blocks to be scale-equivariant. We demonstrate the computational efficiency and numerical stability of the proposed method. We compare the proposed models to the previously developed methods for scale equivariance and local scale invariance. We demonstrate state-of-the-art results on MNIST-scale dataset. Finally, we demonstrate that the proposed scale-equivariant convolutions show remarkable gains on STL-10 when used as drop-in replacements for non-equivariant convolutional layers.
Tasks Image Classification
Published 2020-01-01
URL https://openreview.net/forum?id=HJgpugrKPS
PDF https://openreview.net/pdf?id=HJgpugrKPS
PWC https://paperswithcode.com/paper/scale-equivariant-steerable-networks
Repo https://github.com/ISosnovik/sesn
Framework pytorch
comments powered by Disqus