Paper Group NAWR 3
Why ADAM Beats SGD for Attention Models. Hindsight Trust Region Policy Optimization. Decentralized Distributed PPO: Mastering PointGoal Navigation. ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning. Bootstrapping the Expressivity with Model-based Planning. Automated Relational Meta-learning. Empirical Bayes Transductive Meta-Learn …
Why ADAM Beats SGD for Attention Models
Title | Why ADAM Beats SGD for Attention Models |
Authors | Anonymous |
Abstract | While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning, adaptive methods like Adam have been observed to outperform SGD across important tasks, such as attention models. The settings under which SGD performs poorly in comparison to Adam are not well understood yet. In this paper, we provide empirical and theoretical evidence that a heavy-tailed distribution of the noise in stochastic gradients is a root cause of SGD’s poor performance. Based on this observation, we study clipped variants of SGD that circumvent this issue; we then analyze their convergence under heavy-tailed noise. Furthermore, we develop a new adaptive coordinate-wise clipping algorithm (ACClip) tailored to such settings. Subsequently, we show how adaptive methods like Adam can be viewed through the lens of clipping, which helps us explain Adam’s strong performance under heavy-tail noise settings. Finally, we show that the proposed ACClip outperforms Adam for both BERT pretraining and finetuning tasks. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJx37TEtDH |
https://openreview.net/pdf?id=SJx37TEtDH | |
PWC | https://paperswithcode.com/paper/why-adam-beats-sgd-for-attention-models |
Repo | https://github.com/rivercold/ACClip-Pytorch |
Framework | pytorch |
Hindsight Trust Region Policy Optimization
Title | Hindsight Trust Region Policy Optimization |
Authors | Anonymous |
Abstract | As reinforcement learning continues to drive machine intelligence beyond its conventional boundary, unsubstantial practices in sparse reward environment severely limit further applications in a broader range of advanced fields. Motivated by the demand for an effective deep reinforcement learning algorithm that accommodates sparse reward environment, this paper presents Hindsight Trust Region Policy Optimization (HTRPO), a method that efficiently utilizes interactions in sparse reward conditions to optimize policies within trust region and, in the meantime, maintains learning stability. Firstly, we theoretically adapt the TRPO objective function, in the form of the expected return of the policy, to the distribution of hindsight data generated from the alternative goals. Then, we apply Monte Carlo with importance sampling to estimate KL-divergence between two policies, taking the hindsight data as input. Under the condition that the distributions are sufficiently close, the KL-divergence is approximated by another f-divergence. Such approximation results in the decrease of variance and alleviates the instability during policy update. Experimental results on both discrete and continuous benchmark tasks demonstrate that HTRPO converges significantly faster than previous policy gradient methods. It achieves effective performances and high data-efficiency for training policies in sparse reward environments. |
Tasks | Policy Gradient Methods |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rylCP6NFDB |
https://openreview.net/pdf?id=rylCP6NFDB | |
PWC | https://paperswithcode.com/paper/hindsight-trust-region-policy-optimization-1 |
Repo | https://github.com/HTRPOCODES/HTRPO-v2 |
Framework | pytorch |
Decentralized Distributed PPO: Mastering PointGoal Navigation
Title | Decentralized Distributed PPO: Mastering PointGoal Navigation |
Authors | Anonymous |
Abstract | We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever “stale”), making it conceptually simple and easy to implement. In our experiments on training virtual robots to navigate in Habitat-Sim, DD-PPO exhibits near-linear scaling – achieving a speedup of 107x on 128 GPUs over a serial implementation. We leverage this scaling to train an agent for 2.5 Billion steps of experience (the equivalent of 80 years of human experience) – over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs. This massive-scale training not only sets the state of art on Habitat Autonomous Navigation Challenge 2019, but essentially “solves” the task – near-perfect autonomous navigation in an unseen environment without access to a map, directly from an RGB-D camera and a GPS+Compass sensor. Fortuitously, error vs computation exhibits a power-law-like distribution; thus, 90% of peak performance is obtained relatively early (at 100 million steps) and relatively cheaply (under 1 day with 8 GPUs). Finally, we show that the scene understanding and navigation policies learned can be transferred to other navigation tasks – the analog of “ImageNet pre-training + task-specific fine-tuning” for embodied AI. Our model outperforms ImageNet pre-trained CNNs on these transfer tasks and can serve as a universal resource (all models + code will be publicly available). |
Tasks | Autonomous Navigation, PointGoal Navigation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1gX8C4YPr |
https://openreview.net/pdf?id=H1gX8C4YPr | |
PWC | https://paperswithcode.com/paper/decentralized-distributed-ppo-mastering |
Repo | https://github.com/facebookresearch/habitat-api/tree/master/habitat_baselines/rl/ddppo |
Framework | pytorch |
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
Title | ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning |
Authors | Anonymous |
Abstract | Recent powerful pre-trained language models have achieved remarkable performance on most of the popular datasets for reading comprehension. It is time to introduce more challenging datasets to push the development of this field towards more comprehensive reasoning of text. In this paper, we introduce a new Reading Comprehension dataset requiring logical reasoning (ReClor) extracted from standardized graduate admission examinations. As earlier studies suggest, human-annotated datasets usually contain biases, which are often exploited by models to achieve high accuracy without truly understanding the text. In order to comprehensively evaluate the logical reasoning ability of models on ReClor, we propose to identify biased data points and separate them into EASY set while the rest as HARD set. Empirical results show that the state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set. However, they struggle on HARD set with poor performance near that of random guess, indicating more research is needed to essentially enhance the logical reasoning ability of current models. |
Tasks | Reading Comprehension |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJgJtT4tvB |
https://openreview.net/pdf?id=HJgJtT4tvB | |
PWC | https://paperswithcode.com/paper/reclor-a-reading-comprehension-dataset |
Repo | https://github.com/yuweihao/reclor |
Framework | pytorch |
Bootstrapping the Expressivity with Model-based Planning
Title | Bootstrapping the Expressivity with Model-based Planning |
Authors | Anonymous |
Abstract | We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, $Q$-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal $Q$-functions and policies are much more complex than the dynamics. We hypothesize many real-world MDPs also have a similar property. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak $Q$-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on MuJoCo benchmark tasks. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hye4WaVYwr |
https://openreview.net/pdf?id=Hye4WaVYwr | |
PWC | https://paperswithcode.com/paper/bootstrapping-the-expressivity-with-model-1 |
Repo | https://github.com/roosephu/boots |
Framework | tf |
Automated Relational Meta-learning
Title | Automated Relational Meta-learning |
Authors | Anonymous |
Abstract | In order to efficiently learn with small amount of data on new tasks, meta-learning transfers knowledge learned from previous tasks to the new ones. However, a critical challenge in meta-learning is the task heterogeneity which cannot be well handled by traditional globally shared meta-learning methods. In addition, current task-specific meta-learning methods may either suffer from hand-crafted structure design or lack the capability to capture complex relations between tasks. In this paper, motivated by the way of knowledge organization in knowledge bases, we propose an automated relational meta-learning (ARML) framework that automatically extracts the cross-task relations and constructs the meta-knowledge graph. When a new task arrives, it can quickly find the most relevant structure and tailor the learned structure knowledge to the meta-learner. As a result, the proposed framework not only addresses the challenge of task heterogeneity by a learned meta-knowledge graph, but also increases the model interpretability. We conduct extensive experiments on 2D toy regression and few-shot image classification and the results demonstrate the superiority of ARML over state-of-the-art baselines. |
Tasks | Few-Shot Image Classification, Image Classification, Meta-Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rklp93EtwH |
https://openreview.net/pdf?id=rklp93EtwH | |
PWC | https://paperswithcode.com/paper/automated-relational-meta-learning |
Repo | https://github.com/huaxiuyao/ARML |
Framework | none |
Empirical Bayes Transductive Meta-Learning with Synthetic Gradients
Title | Empirical Bayes Transductive Meta-Learning with Synthetic Gradients |
Authors | Anonymous |
Abstract | We propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging unlabeled information in the query set to learn a more powerful meta-model. To develop our framework we revisit the empirical Bayes formulation for multi-task learning. The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a sum of local KL divergences between the variational posterior and the true posterior of each task. We derive a novel amortized variational inference that couples all the variational posteriors into a meta-model, which consists of a synthetic gradient network and an initialization network. The combination of local KL divergences and synthetic gradient network allows for backpropagating information from unlabeled data, thereby enabling transduction. Our results on the Mini-ImageNet and CIFAR-FS benchmarks for episodic few-shot classification significantly outperform previous state-of-the-art methods. |
Tasks | Few-Shot Image Classification, Meta-Learning, Multi-Task Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hkg-xgrYvH |
https://openreview.net/pdf?id=Hkg-xgrYvH | |
PWC | https://paperswithcode.com/paper/empirical-bayes-transductive-meta-learning |
Repo | https://github.com/hushell/sib_meta_learn |
Framework | pytorch |
Spectral Embedding of Regularized Block Models
Title | Spectral Embedding of Regularized Block Models |
Authors | Anonymous |
Abstract | Spectral embedding is a popular technique for the representation of graph data. Several regularization techniques have been proposed to improve the quality of the embedding with respect to downstream tasks like clustering. In this paper, we explain on a simple block model the impact of the complete graph regularization, whereby a constant is added to all entries of the adjacency matrix. Specifically, we show that the regularization forces the spectral embedding to focus on the largest blocks, making the representation less sensitive to noise or outliers. We illustrate these results on both on both synthetic and real data, showing how regularization improves standard clustering scores. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1l_0JBYwS |
https://openreview.net/pdf?id=H1l_0JBYwS | |
PWC | https://paperswithcode.com/paper/spectral-embedding-of-regularized-block |
Repo | https://github.com/research-submissions/iclr20 |
Framework | none |
Mixed-curvature Variational Autoencoders
Title | Mixed-curvature Variational Autoencoders |
Authors | Anonymous |
Abstract | It has been shown that using geometric spaces with non-zero curvature instead of plain Euclidean spaces with zero curvature improves performance on a range of Machine Learning tasks for learning representations. Recent work has leveraged these geometries to learn latent variable models like Variational Autoencoders (VAEs) in spherical and hyperbolic spaces with constant curvature. While these approaches work well on particular kinds of data that they were designed for e.g.~tree-like data for a hyperbolic VAE, there exists no generic approach unifying all three models. We develop a Mixed-curvature Variational Autoencoder, an efficient way to train a VAE whose latent space is a product of constant curvature Riemannian manifolds, where the per-component curvature can be learned. This generalizes the Euclidean VAE to curved latent spaces, as the model essentially reduces to the Euclidean VAE if curvatures of all latent space components go to 0. |
Tasks | Latent Variable Models |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1g6xeSKDS |
https://openreview.net/pdf?id=S1g6xeSKDS | |
PWC | https://paperswithcode.com/paper/mixed-curvature-variational-autoencoders |
Repo | https://github.com/oskopek/mvae |
Framework | pytorch |
FINBERT: FINANCIAL SENTIMENT ANALYSIS WITH PRE-TRAINED LANGUAGE MODELS
Title | FINBERT: FINANCIAL SENTIMENT ANALYSIS WITH PRE-TRAINED LANGUAGE MODELS |
Authors | Anonymous |
Abstract | While many sentiment classification solutions report high accuracy scores in product or movie review datasets, the performance of the methods in niche domains such as finance still largely falls behind. The reason of this gap is the domain-specific language, which decreases the applicability of existing models, and lack of quality labeled data to learn the new context of positive and negative in the specific domain. Transfer learning has been shown to be successful in adapting to new domains without large training data sets. In this paper, we explore the effectiveness of NLP transfer learning in financial sentiment classification. We introduce FinBERT, a language model based on BERT, which improved the state-of-the-art performance by 14 percentage points for a financial sentiment classification task in FinancialPhrasebank dataset. |
Tasks | Language Modelling, Sentiment Analysis, Transfer Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HylznxrYDr |
https://openreview.net/pdf?id=HylznxrYDr | |
PWC | https://paperswithcode.com/paper/finbert-financial-sentiment-analysis-with-pre-1 |
Repo | https://github.com/ProsusAI/finBERT |
Framework | none |
Smartphone Modulated Colorimetric Reader with Color Subtration
Title | Smartphone Modulated Colorimetric Reader with Color Subtration |
Authors | Y. Zhao, S.Y. Choi, J. Lou-Franco, J.L.D. Nelis, H. Zhou, C. Cao, K. Campbell, C. Elliott, K. Rafferty |
Abstract | Color analysis has been essential for the interpretation of optical readouts, e.g. colorimetry, fluorescence, spectroscopy, and scanometry. However, existing colorimetric readers can hardly eliminate the color interference of colored solutions, e.g., interpreting pH test strips to assess the pH value of red wine. This paper introduces a smartphone modulated colorimetric reader that is compatible with most smartphone models and a novel color subtraction algorithm that eliminates color interferences due to colored solutions. Experiments were conducted to validate the effectiveness of the developed reader and algorithm on evaluating pH test strips produced from transparent and colored solutions using multiple smartphone models. Applicability of the developed reader was demonstrated through its interpretation of pH test strips measuring pH values of colored and non-transparent food samples including red wine and milk. |
Tasks | |
Published | 2020-01-26 |
URL | https://ieeexplore.ieee.org/document/8956565 |
https://ieeexplore.ieee.org/document/8956565 | |
PWC | https://paperswithcode.com/paper/smartphone-modulated-colorimetric-reader-with |
Repo | https://github.com/zyfccc/Smartphone-Modulated-Colorimetric-Reader-with-Color-Subtraction-IEEE-Sensors-2019 |
Framework | tf |
The Shape of Data: Intrinsic Distance for Data Distributions
Title | The Shape of Data: Intrinsic Distance for Data Distributions |
Authors | Anonymous |
Abstract | The ability to represent and compare machine learning models is crucial in order to quantify subtle model changes, evaluate generative models, and gather insights on neural network architectures. Existing techniques for comparing data distributions focus on global data properties such as mean and covariance; in that sense, they are extrinsic and uni-scale. We develop a first-of-its-kind intrinsic and multi-scale method for characterizing and comparing data manifolds, using a lower-bound of the spectral variant of the Gromov-Wasserstein inter-manifold distance, which compares all data moments. In a thorough experimental study, we demonstrate that our method effectively discerns the structure of data manifolds even on unaligned data of different dimensionalities; moreover, we showcase its efficacy in evaluating the quality of generative models. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyebplHYwB |
https://openreview.net/pdf?id=HyebplHYwB | |
PWC | https://paperswithcode.com/paper/the-shape-of-data-intrinsic-distance-for-data |
Repo | https://github.com/imd-iclr/imd |
Framework | none |
Targeted sampling of enlarged neighborhood via Monte Carlo tree search for TSP
Title | Targeted sampling of enlarged neighborhood via Monte Carlo tree search for TSP |
Authors | Zhang-Hua Fu, Kai-Bin Qiu, Meng Qiu, Hongyuan Zha |
Abstract | The travelling salesman problem (TSP) is a well-known combinatorial optimization problem with a variety of real-life applications. We tackle TSP by incorporating machine learning methodology and leveraging the variable neighborhood search strategy. More precisely, the search process is considered as a Markov decision process (MDP), where a 2-opt local search is used to search within a small neighborhood, while a Monte Carlo tree search (MCTS) method (which iterates through simulation, selection and back-propagation steps), is used to sample a number of targeted actions within an enlarged neighborhood. This new paradigm clearly distinguishes itself from the existing machine learning (ML) based paradigms for solving the TSP, which either uses an end-to-end ML model, or simply applies traditional techniques after ML for post optimization. Experiments based on two public data sets show that, our approach clearly dominates all the existing learning based TSP algorithms in terms of performance, demonstrating its high potential on the TSP. More importantly, as a general framework without complicated hand-crafted rules, it can be readily extended to many other combinatorial optimization problems. |
Tasks | Combinatorial Optimization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ByxtHCVKwB |
https://openreview.net/pdf?id=ByxtHCVKwB | |
PWC | https://paperswithcode.com/paper/targeted-sampling-of-enlarged-neighborhood |
Repo | https://github.com/Spider-scnu/Monte-Carlo-tree-search-for-TSP |
Framework | none |
Generative Models for Effective ML on Private, Decentralized Datasets
Title | Generative Models for Effective ML on Private, Decentralized Datasets |
Authors | Anonymous |
Abstract | To improve real-world applications of machine learning, experienced modelers develop intuition about their datasets, their models, and how the two interact. Manual inspection of raw data—of representative samples, of outliers, of misclassifications—is an essential tool in a) identifying and fixing problems in the data, b) generating new modeling hypotheses, and c) assigning or refining human-provided labels. However, manual data inspection is risky for privacy-sensitive datasets, such as those representing the behavior of real-world individuals. Furthermore, manual data inspection is impossible in the increasingly important setting of federated learning, where raw examples are stored at the edge and the modeler may only access aggregated outputs such as metrics or model parameters. This paper demonstrates that generative models—trained using federated methods and with formal differential privacy guarantees—can be used effectively to debug data issues even when the data cannot be directly inspected. We explore these methods in applications to text with differentially private federated RNNs and to images using a novel algorithm for differentially private federated GANs. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJgaRA4FPH |
https://openreview.net/pdf?id=SJgaRA4FPH | |
PWC | https://paperswithcode.com/paper/generative-models-for-effective-ml-on-private |
Repo | https://github.com/tensorflow/gan |
Framework | tf |
Scale-Equivariant Steerable Networks
Title | Scale-Equivariant Steerable Networks |
Authors | Anonymous |
Abstract | The effectiveness of Convolutional Neural Networks (CNNs) has been substantially attributed to their built-in property of translation equivariance. However, CNNs do not have embedded mechanisms to handle other types of transformations. In this work, we pay attention to scale changes, which regularly appear in various tasks due to the changing distances between the objects and the camera. First, we introduce the general theory for building scale-equivariant convolutional networks with steerable filters. We develop scale-convolution and generalize other common blocks to be scale-equivariant. We demonstrate the computational efficiency and numerical stability of the proposed method. We compare the proposed models to the previously developed methods for scale equivariance and local scale invariance. We demonstrate state-of-the-art results on MNIST-scale dataset. Finally, we demonstrate that the proposed scale-equivariant convolutions show remarkable gains on STL-10 when used as drop-in replacements for non-equivariant convolutional layers. |
Tasks | Image Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJgpugrKPS |
https://openreview.net/pdf?id=HJgpugrKPS | |
PWC | https://paperswithcode.com/paper/scale-equivariant-steerable-networks |
Repo | https://github.com/ISosnovik/sesn |
Framework | pytorch |