April 1, 2020

2853 words 14 mins read

Paper Group NAWR 3

Why ADAM Beats SGD for Attention Models. Hindsight Trust Region Policy Optimization. Decentralized Distributed PPO: Mastering PointGoal Navigation. ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning. Bootstrapping the Expressivity with Model-based Planning. Automated Relational Meta-learning. Empirical Bayes Transductive Meta-Learn …

Why ADAM Beats SGD for Attention Models


Title	Why ADAM Beats SGD for Attention Models
Authors	Anonymous
Abstract	While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning, adaptive methods like Adam have been observed to outperform SGD across important tasks, such as attention models. The settings under which SGD performs poorly in comparison to Adam are not well understood yet. In this paper, we provide empirical and theoretical evidence that a heavy-tailed distribution of the noise in stochastic gradients is a root cause of SGD’s poor performance. Based on this observation, we study clipped variants of SGD that circumvent this issue; we then analyze their convergence under heavy-tailed noise. Furthermore, we develop a new adaptive coordinate-wise clipping algorithm (ACClip) tailored to such settings. Subsequently, we show how adaptive methods like Adam can be viewed through the lens of clipping, which helps us explain Adam’s strong performance under heavy-tail noise settings. Finally, we show that the proposed ACClip outperforms Adam for both BERT pretraining and finetuning tasks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJx37TEtDH
PDF	https://openreview.net/pdf?id=SJx37TEtDH
PWC	https://paperswithcode.com/paper/why-adam-beats-sgd-for-attention-models
Repo	https://github.com/rivercold/ACClip-Pytorch
Framework	pytorch

Hindsight Trust Region Policy Optimization


Title	Hindsight Trust Region Policy Optimization
Authors	Anonymous
Abstract	As reinforcement learning continues to drive machine intelligence beyond its conventional boundary, unsubstantial practices in sparse reward environment severely limit further applications in a broader range of advanced fields. Motivated by the demand for an effective deep reinforcement learning algorithm that accommodates sparse reward environment, this paper presents Hindsight Trust Region Policy Optimization (HTRPO), a method that efficiently utilizes interactions in sparse reward conditions to optimize policies within trust region and, in the meantime, maintains learning stability. Firstly, we theoretically adapt the TRPO objective function, in the form of the expected return of the policy, to the distribution of hindsight data generated from the alternative goals. Then, we apply Monte Carlo with importance sampling to estimate KL-divergence between two policies, taking the hindsight data as input. Under the condition that the distributions are sufficiently close, the KL-divergence is approximated by another f-divergence. Such approximation results in the decrease of variance and alleviates the instability during policy update. Experimental results on both discrete and continuous benchmark tasks demonstrate that HTRPO converges significantly faster than previous policy gradient methods. It achieves effective performances and high data-efficiency for training policies in sparse reward environments.
Tasks	Policy Gradient Methods
Published	2020-01-01
URL	https://openreview.net/forum?id=rylCP6NFDB
PDF	https://openreview.net/pdf?id=rylCP6NFDB
PWC	https://paperswithcode.com/paper/hindsight-trust-region-policy-optimization-1
Repo	https://github.com/HTRPOCODES/HTRPO-v2
Framework	pytorch


Title	Decentralized Distributed PPO: Mastering PointGoal Navigation
Authors	Anonymous
Abstract	We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever “stale”), making it conceptually simple and easy to implement. In our experiments on training virtual robots to navigate in Habitat-Sim, DD-PPO exhibits near-linear scaling – achieving a speedup of 107x on 128 GPUs over a serial implementation. We leverage this scaling to train an agent for 2.5 Billion steps of experience (the equivalent of 80 years of human experience) – over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs. This massive-scale training not only sets the state of art on Habitat Autonomous Navigation Challenge 2019, but essentially “solves” the task – near-perfect autonomous navigation in an unseen environment without access to a map, directly from an RGB-D camera and a GPS+Compass sensor. Fortuitously, error vs computation exhibits a power-law-like distribution; thus, 90% of peak performance is obtained relatively early (at 100 million steps) and relatively cheaply (under 1 day with 8 GPUs). Finally, we show that the scene understanding and navigation policies learned can be transferred to other navigation tasks – the analog of “ImageNet pre-training + task-specific fine-tuning” for embodied AI. Our model outperforms ImageNet pre-trained CNNs on these transfer tasks and can serve as a universal resource (all models + code will be publicly available).
Tasks	Autonomous Navigation, PointGoal Navigation
Published	2020-01-01
URL	https://openreview.net/forum?id=H1gX8C4YPr
PDF	https://openreview.net/pdf?id=H1gX8C4YPr
PWC	https://paperswithcode.com/paper/decentralized-distributed-ppo-mastering
Repo	https://github.com/facebookresearch/habitat-api/tree/master/habitat_baselines/rl/ddppo
Framework	pytorch

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning


Title	ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
Authors	Anonymous
Abstract	Recent powerful pre-trained language models have achieved remarkable performance on most of the popular datasets for reading comprehension. It is time to introduce more challenging datasets to push the development of this field towards more comprehensive reasoning of text. In this paper, we introduce a new Reading Comprehension dataset requiring logical reasoning (ReClor) extracted from standardized graduate admission examinations. As earlier studies suggest, human-annotated datasets usually contain biases, which are often exploited by models to achieve high accuracy without truly understanding the text. In order to comprehensively evaluate the logical reasoning ability of models on ReClor, we propose to identify biased data points and separate them into EASY set while the rest as HARD set. Empirical results show that the state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set. However, they struggle on HARD set with poor performance near that of random guess, indicating more research is needed to essentially enhance the logical reasoning ability of current models.
Tasks	Reading Comprehension
Published	2020-01-01
URL	https://openreview.net/forum?id=HJgJtT4tvB
PDF	https://openreview.net/pdf?id=HJgJtT4tvB
PWC	https://paperswithcode.com/paper/reclor-a-reading-comprehension-dataset
Repo	https://github.com/yuweihao/reclor
Framework	pytorch

Bootstrapping the Expressivity with Model-based Planning


Title	Bootstrapping the Expressivity with Model-based Planning
Authors	Anonymous
Abstract	We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, $Q$-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal $Q$-functions and policies are much more complex than the dynamics. We hypothesize many real-world MDPs also have a similar property. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak $Q$-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on MuJoCo benchmark tasks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Hye4WaVYwr
PDF	https://openreview.net/pdf?id=Hye4WaVYwr
PWC	https://paperswithcode.com/paper/bootstrapping-the-expressivity-with-model-1
Repo	https://github.com/roosephu/boots
Framework	tf

Automated Relational Meta-learning


Title	Automated Relational Meta-learning
Authors	Anonymous
Abstract	In order to efficiently learn with small amount of data on new tasks, meta-learning transfers knowledge learned from previous tasks to the new ones. However, a critical challenge in meta-learning is the task heterogeneity which cannot be well handled by traditional globally shared meta-learning methods. In addition, current task-specific meta-learning methods may either suffer from hand-crafted structure design or lack the capability to capture complex relations between tasks. In this paper, motivated by the way of knowledge organization in knowledge bases, we propose an automated relational meta-learning (ARML) framework that automatically extracts the cross-task relations and constructs the meta-knowledge graph. When a new task arrives, it can quickly find the most relevant structure and tailor the learned structure knowledge to the meta-learner. As a result, the proposed framework not only addresses the challenge of task heterogeneity by a learned meta-knowledge graph, but also increases the model interpretability. We conduct extensive experiments on 2D toy regression and few-shot image classification and the results demonstrate the superiority of ARML over state-of-the-art baselines.
Tasks	Few-Shot Image Classification, Image Classification, Meta-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rklp93EtwH
PDF	https://openreview.net/pdf?id=rklp93EtwH
PWC	https://paperswithcode.com/paper/automated-relational-meta-learning
Repo	https://github.com/huaxiuyao/ARML
Framework	none

Empirical Bayes Transductive Meta-Learning with Synthetic Gradients


Title	Empirical Bayes Transductive Meta-Learning with Synthetic Gradients
Authors	Anonymous
Abstract	We propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging unlabeled information in the query set to learn a more powerful meta-model. To develop our framework we revisit the empirical Bayes formulation for multi-task learning. The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a sum of local KL divergences between the variational posterior and the true posterior of each task. We derive a novel amortized variational inference that couples all the variational posteriors into a meta-model, which consists of a synthetic gradient network and an initialization network. The combination of local KL divergences and synthetic gradient network allows for backpropagating information from unlabeled data, thereby enabling transduction. Our results on the Mini-ImageNet and CIFAR-FS benchmarks for episodic few-shot classification significantly outperform previous state-of-the-art methods.
Tasks	Few-Shot Image Classification, Meta-Learning, Multi-Task Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=Hkg-xgrYvH
PDF	https://openreview.net/pdf?id=Hkg-xgrYvH
PWC	https://paperswithcode.com/paper/empirical-bayes-transductive-meta-learning
Repo	https://github.com/hushell/sib_meta_learn
Framework	pytorch

Spectral Embedding of Regularized Block Models


Title	Spectral Embedding of Regularized Block Models
Authors	Anonymous
Abstract	Spectral embedding is a popular technique for the representation of graph data. Several regularization techniques have been proposed to improve the quality of the embedding with respect to downstream tasks like clustering. In this paper, we explain on a simple block model the impact of the complete graph regularization, whereby a constant is added to all entries of the adjacency matrix. Specifically, we show that the regularization forces the spectral embedding to focus on the largest blocks, making the representation less sensitive to noise or outliers. We illustrate these results on both on both synthetic and real data, showing how regularization improves standard clustering scores.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1l_0JBYwS
PDF	https://openreview.net/pdf?id=H1l_0JBYwS
PWC	https://paperswithcode.com/paper/spectral-embedding-of-regularized-block
Repo	https://github.com/research-submissions/iclr20
Framework	none

Mixed-curvature Variational Autoencoders


Title	Mixed-curvature Variational Autoencoders
Authors	Anonymous
Abstract	It has been shown that using geometric spaces with non-zero curvature instead of plain Euclidean spaces with zero curvature improves performance on a range of Machine Learning tasks for learning representations. Recent work has leveraged these geometries to learn latent variable models like Variational Autoencoders (VAEs) in spherical and hyperbolic spaces with constant curvature. While these approaches work well on particular kinds of data that they were designed for e.g.~tree-like data for a hyperbolic VAE, there exists no generic approach unifying all three models. We develop a Mixed-curvature Variational Autoencoder, an efficient way to train a VAE whose latent space is a product of constant curvature Riemannian manifolds, where the per-component curvature can be learned. This generalizes the Euclidean VAE to curved latent spaces, as the model essentially reduces to the Euclidean VAE if curvatures of all latent space components go to 0.
Tasks	Latent Variable Models
Published	2020-01-01
URL	https://openreview.net/forum?id=S1g6xeSKDS
PDF	https://openreview.net/pdf?id=S1g6xeSKDS
PWC	https://paperswithcode.com/paper/mixed-curvature-variational-autoencoders
Repo	https://github.com/oskopek/mvae
Framework	pytorch

FINBERT: FINANCIAL SENTIMENT ANALYSIS WITH PRE-TRAINED LANGUAGE MODELS


Title	FINBERT: FINANCIAL SENTIMENT ANALYSIS WITH PRE-TRAINED LANGUAGE MODELS
Authors	Anonymous
Abstract	While many sentiment classification solutions report high accuracy scores in product or movie review datasets, the performance of the methods in niche domains such as finance still largely falls behind. The reason of this gap is the domain-specific language, which decreases the applicability of existing models, and lack of quality labeled data to learn the new context of positive and negative in the specific domain. Transfer learning has been shown to be successful in adapting to new domains without large training data sets. In this paper, we explore the effectiveness of NLP transfer learning in financial sentiment classification. We introduce FinBERT, a language model based on BERT, which improved the state-of-the-art performance by 14 percentage points for a financial sentiment classification task in FinancialPhrasebank dataset.
Tasks	Language Modelling, Sentiment Analysis, Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HylznxrYDr
PDF	https://openreview.net/pdf?id=HylznxrYDr
PWC	https://paperswithcode.com/paper/finbert-financial-sentiment-analysis-with-pre-1
Repo	https://github.com/ProsusAI/finBERT
Framework	none

Smartphone Modulated Colorimetric Reader with Color Subtration


Title	Smartphone Modulated Colorimetric Reader with Color Subtration
Authors	Y. Zhao, S.Y. Choi, J. Lou-Franco, J.L.D. Nelis, H. Zhou, C. Cao, K. Campbell, C. Elliott, K. Rafferty
Abstract	Color analysis has been essential for the interpretation of optical readouts, e.g. colorimetry, fluorescence, spectroscopy, and scanometry. However, existing colorimetric readers can hardly eliminate the color interference of colored solutions, e.g., interpreting pH test strips to assess the pH value of red wine. This paper introduces a smartphone modulated colorimetric reader that is compatible with most smartphone models and a novel color subtraction algorithm that eliminates color interferences due to colored solutions. Experiments were conducted to validate the effectiveness of the developed reader and algorithm on evaluating pH test strips produced from transparent and colored solutions using multiple smartphone models. Applicability of the developed reader was demonstrated through its interpretation of pH test strips measuring pH values of colored and non-transparent food samples including red wine and milk.
Tasks
Published	2020-01-26
URL	https://ieeexplore.ieee.org/document/8956565
PDF	https://ieeexplore.ieee.org/document/8956565
PWC	https://paperswithcode.com/paper/smartphone-modulated-colorimetric-reader-with
Repo	https://github.com/zyfccc/Smartphone-Modulated-Colorimetric-Reader-with-Color-Subtraction-IEEE-Sensors-2019
Framework	tf

The Shape of Data: Intrinsic Distance for Data Distributions


Title	The Shape of Data: Intrinsic Distance for Data Distributions
Authors	Anonymous
Abstract	The ability to represent and compare machine learning models is crucial in order to quantify subtle model changes, evaluate generative models, and gather insights on neural network architectures. Existing techniques for comparing data distributions focus on global data properties such as mean and covariance; in that sense, they are extrinsic and uni-scale. We develop a first-of-its-kind intrinsic and multi-scale method for characterizing and comparing data manifolds, using a lower-bound of the spectral variant of the Gromov-Wasserstein inter-manifold distance, which compares all data moments. In a thorough experimental study, we demonstrate that our method effectively discerns the structure of data manifolds even on unaligned data of different dimensionalities; moreover, we showcase its efficacy in evaluating the quality of generative models.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HyebplHYwB
PDF	https://openreview.net/pdf?id=HyebplHYwB
PWC	https://paperswithcode.com/paper/the-shape-of-data-intrinsic-distance-for-data
Repo	https://github.com/imd-iclr/imd
Framework	none

Targeted sampling of enlarged neighborhood via Monte Carlo tree search for TSP


Title	Targeted sampling of enlarged neighborhood via Monte Carlo tree search for TSP
Authors	Zhang-Hua Fu, Kai-Bin Qiu, Meng Qiu, Hongyuan Zha
Abstract	The travelling salesman problem (TSP) is a well-known combinatorial optimization problem with a variety of real-life applications. We tackle TSP by incorporating machine learning methodology and leveraging the variable neighborhood search strategy. More precisely, the search process is considered as a Markov decision process (MDP), where a 2-opt local search is used to search within a small neighborhood, while a Monte Carlo tree search (MCTS) method (which iterates through simulation, selection and back-propagation steps), is used to sample a number of targeted actions within an enlarged neighborhood. This new paradigm clearly distinguishes itself from the existing machine learning (ML) based paradigms for solving the TSP, which either uses an end-to-end ML model, or simply applies traditional techniques after ML for post optimization. Experiments based on two public data sets show that, our approach clearly dominates all the existing learning based TSP algorithms in terms of performance, demonstrating its high potential on the TSP. More importantly, as a general framework without complicated hand-crafted rules, it can be readily extended to many other combinatorial optimization problems.
Tasks	Combinatorial Optimization
Published	2020-01-01
URL	https://openreview.net/forum?id=ByxtHCVKwB
PDF	https://openreview.net/pdf?id=ByxtHCVKwB
PWC	https://paperswithcode.com/paper/targeted-sampling-of-enlarged-neighborhood
Repo	https://github.com/Spider-scnu/Monte-Carlo-tree-search-for-TSP
Framework	none

Generative Models for Effective ML on Private, Decentralized Datasets


Title	Generative Models for Effective ML on Private, Decentralized Datasets
Authors	Anonymous
Abstract	To improve real-world applications of machine learning, experienced modelers develop intuition about their datasets, their models, and how the two interact. Manual inspection of raw data—of representative samples, of outliers, of misclassifications—is an essential tool in a) identifying and fixing problems in the data, b) generating new modeling hypotheses, and c) assigning or refining human-provided labels. However, manual data inspection is risky for privacy-sensitive datasets, such as those representing the behavior of real-world individuals. Furthermore, manual data inspection is impossible in the increasingly important setting of federated learning, where raw examples are stored at the edge and the modeler may only access aggregated outputs such as metrics or model parameters. This paper demonstrates that generative models—trained using federated methods and with formal differential privacy guarantees—can be used effectively to debug data issues even when the data cannot be directly inspected. We explore these methods in applications to text with differentially private federated RNNs and to images using a novel algorithm for differentially private federated GANs.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJgaRA4FPH
PDF	https://openreview.net/pdf?id=SJgaRA4FPH
PWC	https://paperswithcode.com/paper/generative-models-for-effective-ml-on-private
Repo	https://github.com/tensorflow/gan
Framework	tf

Scale-Equivariant Steerable Networks


Title	Scale-Equivariant Steerable Networks
Authors	Anonymous
Abstract	The effectiveness of Convolutional Neural Networks (CNNs) has been substantially attributed to their built-in property of translation equivariance. However, CNNs do not have embedded mechanisms to handle other types of transformations. In this work, we pay attention to scale changes, which regularly appear in various tasks due to the changing distances between the objects and the camera. First, we introduce the general theory for building scale-equivariant convolutional networks with steerable filters. We develop scale-convolution and generalize other common blocks to be scale-equivariant. We demonstrate the computational efficiency and numerical stability of the proposed method. We compare the proposed models to the previously developed methods for scale equivariance and local scale invariance. We demonstrate state-of-the-art results on MNIST-scale dataset. Finally, we demonstrate that the proposed scale-equivariant convolutions show remarkable gains on STL-10 when used as drop-in replacements for non-equivariant convolutional layers.
Tasks	Image Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=HJgpugrKPS
PDF	https://openreview.net/pdf?id=HJgpugrKPS
PWC	https://paperswithcode.com/paper/scale-equivariant-steerable-networks
Repo	https://github.com/ISosnovik/sesn
Framework	pytorch