February 1, 2020

3227 words 16 mins read

Paper Group AWR 302

Paper Group AWR 302

Catalyst.RL: A Distributed Framework for Reproducible RL Research. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. TF-Replicator: Distributed Machine Learning for Researchers. Class-Balanced Loss Based on Effective Number of Samples. PR Product: A Substitute for Inner Product in Neural Networks. Factored Latent-Dynamic Condition …

Catalyst.RL: A Distributed Framework for Reproducible RL Research

Title Catalyst.RL: A Distributed Framework for Reproducible RL Research
Authors Sergey Kolesnikov, Oleksii Hrinchuk
Abstract Despite the recent progress in deep reinforcement learning field (RL), and, arguably because of it, a large body of work remains to be done in reproducing and carefully comparing different RL algorithms. We present catalyst.RL, an open source framework for RL research with a focus on reproducibility and flexibility. Main features of our library include large-scale asynchronous distributed training, easy-to-use configuration files with the complete list of hyperparameters for the particular experiments, efficient implementations of various RL algorithms and auxiliary tricks, such as frame stacking, n-step returns, value distributions, etc. To vindicate the usefulness of our framework, we evaluate it on a range of benchmarks in a continuous control, as well as on the task of developing a controller to enable a physiologically-based human model with a prosthetic leg to walk and run. The latter task was introduced at NeurIPS 2018 AI for Prosthetics Challenge, where our team took the 3rd place, capitalizing on the ability of catalyst.RL to train high-quality and sample-efficient RL agents.
Tasks Continuous Control
Published 2019-02-28
URL http://arxiv.org/abs/1903.00027v1
PDF http://arxiv.org/pdf/1903.00027v1.pdf
PWC https://paperswithcode.com/paper/catalystrl-a-distributed-framework-for
Repo https://github.com/catalyst-team/catalyst-rl-framework
Framework none

MaskGAN: Towards Diverse and Interactive Facial Image Manipulation

Title MaskGAN: Towards Diverse and Interactive Facial Image Manipulation
Authors Cheng-Han Lee, Ziwei Liu, Lingyun Wu, Ping Luo
Abstract Facial image manipulation has achieved great progress in recent years. However, previous methods either operate on a predefined set of face attributes or leave users little freedom to interactively manipulate images. To overcome these drawbacks, we propose a novel framework termed MaskGAN, enabling diverse and interactive face manipulation. Our key insight is that semantic masks serve as a suitable intermediate representation for flexible face manipulation with fidelity preservation. MaskGAN has two main components: 1) Dense Mapping Network (DMN) and 2) Editing Behavior Simulated Training (EBST). Specifically, DMN learns style mapping between a free-form user modified mask and a target image, enabling diverse generation results. EBST models the user editing behavior on the source mask, making the overall framework more robust to various manipulated inputs. Specifically, it introduces dual-editing consistency as the auxiliary supervision signal. To facilitate extensive studies, we construct a large-scale high-resolution face dataset with fine-grained mask annotations named CelebAMask-HQ. MaskGAN is comprehensively evaluated on two challenging tasks: attribute transfer and style copy, demonstrating superior performance over other state-of-the-art methods. The code, models, and dataset are available at https://github.com/switchablenorms/CelebAMask-HQ.
Tasks
Published 2019-07-27
URL https://arxiv.org/abs/1907.11922v2
PDF https://arxiv.org/pdf/1907.11922v2.pdf
PWC https://paperswithcode.com/paper/maskgan-towards-diverse-and-interactive
Repo https://github.com/switchablenorms/CelebAMask-HQ
Framework pytorch

TF-Replicator: Distributed Machine Learning for Researchers

Title TF-Replicator: Distributed Machine Learning for Researchers
Authors Peter Buchlovsky, David Budden, Dominik Grewe, Chris Jones, John Aslanides, Frederic Besse, Andy Brock, Aidan Clark, Sergio Gómez Colmenarejo, Aedan Pope, Fabio Viola, Dan Belov
Abstract We describe TF-Replicator, a framework for distributed machine learning designed for DeepMind researchers and implemented as an abstraction over TensorFlow. TF-Replicator simplifies writing data-parallel and model-parallel research code. The same models can be effortlessly deployed to different cluster architectures (i.e. one or many machines containing CPUs, GPUs or TPU accelerators) using synchronous or asynchronous training regimes. To demonstrate the generality and scalability of TF-Replicator, we implement and benchmark three very different models: (1) A ResNet-50 for ImageNet classification, (2) a SN-GAN for class-conditional ImageNet image generation, and (3) a D4PG reinforcement learning agent for continuous control. Our results show strong scalability performance without demanding any distributed systems expertise of the user. The TF-Replicator programming model will be open-sourced as part of TensorFlow 2.0 (see https://github.com/tensorflow/community/pull/25).
Tasks Continuous Control, Image Generation
Published 2019-02-01
URL http://arxiv.org/abs/1902.00465v1
PDF http://arxiv.org/pdf/1902.00465v1.pdf
PWC https://paperswithcode.com/paper/tf-replicator-distributed-machine-learning
Repo https://github.com/hoondori/what-i-want-to-learn
Framework tf

Class-Balanced Loss Based on Effective Number of Samples

Title Class-Balanced Loss Based on Effective Number of Samples
Authors Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, Serge Belongie
Abstract With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.
Tasks Image Classification
Published 2019-01-16
URL http://arxiv.org/abs/1901.05555v1
PDF http://arxiv.org/pdf/1901.05555v1.pdf
PWC https://paperswithcode.com/paper/class-balanced-loss-based-on-effective-number
Repo https://github.com/feidfoe/AdjustBnd4Imbalance
Framework pytorch

PR Product: A Substitute for Inner Product in Neural Networks

Title PR Product: A Substitute for Inner Product in Neural Networks
Authors Zhennan Wang, Wenbin Zou, Chen Xu
Abstract In this paper, we analyze the inner product of weight vector w and data vector x in neural networks from the perspective of vector orthogonal decomposition and prove that the direction gradient of w decreases with the angle between them close to 0 or {\pi}. We propose the Projection and Rejection Product (PR Product) to make the direction gradient of w independent of the angle and consistently larger than the one in standard inner product while keeping the forward propagation identical. As a reliable substitute for standard inner product, the PR Product can be applied into many existing deep learning modules, so we develop the PR Product version of fully connected layer, convolutional layer and LSTM layer. In static image classification, the experiments on CIFAR10 and CIFAR100 datasets demonstrate that the PR Product can robustly enhance the ability of various state-of-the-art classification networks. On the task of image captioning, even without any bells and whistles, our PR Product version of captioning model can compete or outperform the state-of-the-art models on MS COCO dataset. Code has been made available at:https://github.com/wzn0828/PR_Product.
Tasks Image Captioning, Image Classification
Published 2019-04-30
URL https://arxiv.org/abs/1904.13148v2
PDF https://arxiv.org/pdf/1904.13148v2.pdf
PWC https://paperswithcode.com/paper/pr-product-a-substitute-for-inner-product-in
Repo https://github.com/wzn0828/PR_Product
Framework pytorch

Factored Latent-Dynamic Conditional Random Fields for Single and Multi-label Sequence Modeling

Title Factored Latent-Dynamic Conditional Random Fields for Single and Multi-label Sequence Modeling
Authors Satyajit Neogi, Justin Dauwels
Abstract Conditional Random Fields (CRF) are frequently applied for labeling and segmenting sequence data. Morency et al. (2007) introduced hidden state variables in a labeled CRF structure in order to model the latent dynamics within class labels, thus improving the labeling performance. Such a model is known as Latent-Dynamic CRF (LDCRF). We present Factored LDCRF (FLDCRF), a structure that allows multiple latent dynamics of the class labels to interact with each other. Including such latent-dynamic interactions leads to improved labeling performance on single-label and multi-label sequence modeling tasks. We apply our FLDCRF models on two single-label (one nested cross-validation) and one multi-label sequence tagging (nested cross-validation) experiments across two different datasets - UCI gesture phase data and UCI opportunity data. FLDCRF outperforms all state-of-the-art sequence models, i.e., CRF, LDCRF, LSTM, LSTM-CRF, Factorial CRF, Coupled CRF and a multi-label LSTM model in all our experiments. In addition, LSTM based models display inconsistent performance across validation and test data, and pose diffculty to select models on validation data during our experiments. FLDCRF offers easier model selection, consistency across validation and test performance and lucid model intuition. FLDCRF is also much faster to train compared to LSTM, even without a GPU. FLDCRF outshines the best LSTM model by ~4% on a single-label task on UCI gesture phase data and outperforms LSTM performance by ~2% on average across nested cross-validation test sets on the multi-label sequence tagging experiment on UCI opportunity data. The idea of FLDCRF can be extended to joint (multi-agent interactions) and heterogeneous (discrete and continuous) state space models.
Tasks Model Selection
Published 2019-11-09
URL https://arxiv.org/abs/1911.03667v2
PDF https://arxiv.org/pdf/1911.03667v2.pdf
PWC https://paperswithcode.com/paper/factored-latent-dynamic-conditional-random
Repo https://github.com/satyajitneogiju/FLDCRF-for-sequence-labeling
Framework none

Don’t ignore Dropout in Fully Convolutional Networks

Title Don’t ignore Dropout in Fully Convolutional Networks
Authors Thomas Spilsbury, Paavo Camps
Abstract Data for Image segmentation models can be costly to obtain due to the precision required by human annotators. We run a series of experiments showing the effect of different kinds of Dropout training on the DeepLabv3+ Image segmentation model when trained using a small dataset. We find that when appropriate forms of Dropout are applied in the right place in the model architecture that non-insignificant improvement in Mean Intersection over Union (mIoU) score can be observed. In our best case, we find that applying Dropout scheduling in conjunction with SpatialDropout improves baseline mIoU from 0.49 to 0.59. This result shows that even where a model architecture makes extensive use of Batch Normalization, Dropout can still be an effective way of improving performance in low data situations.
Tasks Semantic Segmentation
Published 2019-08-24
URL https://arxiv.org/abs/1908.09162v1
PDF https://arxiv.org/pdf/1908.09162v1.pdf
PWC https://paperswithcode.com/paper/dont-ignore-dropout-in-fully-convolutional
Repo https://github.com/smspillaz/seg-reg
Framework pytorch

CvxPnPL: A Unified Convex Solution to the Absolute Pose Estimation Problem from Point and Line Correspondences

Title CvxPnPL: A Unified Convex Solution to the Absolute Pose Estimation Problem from Point and Line Correspondences
Authors Sérgio Agostinho, João Gomes, Alessio Del Bue
Abstract We present a new convex method to estimate 3D pose from mixed combinations of 2D-3D point and line correspondences, the Perspective-n-Points-and-Lines problem (PnPL). We merge the contributions of each point and line into a unified Quadratic Constrained Quadratic Problem (QCQP) and then relax it into a Semi Definite Program (SDP) through Shor’s relaxation. This makes it possible to gracefully handle mixed configurations of points and lines. Furthermore, the proposed relaxation allows us to recover a finite number of solutions under ambiguous configurations. In such cases, the 3D pose candidates are found by further enforcing geometric constraints on the solution space and then retrieving such poses from the intersections of multiple quadrics. Experiments provide results in line with the best performing state of the art methods while providing the flexibility of solving for an arbitrary number of points and lines.
Tasks Pose Estimation
Published 2019-07-24
URL https://arxiv.org/abs/1907.10545v2
PDF https://arxiv.org/pdf/1907.10545v2.pdf
PWC https://paperswithcode.com/paper/cvxpnpl-a-unified-convex-solution-to-the
Repo https://github.com/SergioRAgostinho/cvxpnpl
Framework none

Interpreting Deep Neural Networks Through Variable Importance

Title Interpreting Deep Neural Networks Through Variable Importance
Authors Jonathan Ish-Horowicz, Dana Udwin, Seth Flaxman, Sarah Filippi, Lorin Crawford
Abstract While the success of deep neural networks (DNNs) is well-established across a variety of domains, our ability to explain and interpret these methods is limited. Unlike previously proposed local methods which try to explain particular classification decisions, we focus on global interpretability and ask a universally applicable question: given a trained model, which features are the most important? In the context of neural networks, a feature is rarely important on its own, so our strategy is specifically designed to leverage partial covariance structures and incorporate variable dependence into feature ranking. Our methodological contributions in this paper are two-fold. First, we propose an effect size analogue for DNNs that is appropriate for applications with highly collinear predictors (ubiquitous in computer vision). Second, we extend the recently proposed “RelATive cEntrality” (RATE) measure (Crawford et al., 2019) to the Bayesian deep learning setting. RATE applies an information theoretic criterion to the posterior distribution of effect sizes to assess feature significance. We apply our framework to three broad application areas: computer vision, natural language processing, and social science.
Tasks
Published 2019-01-28
URL https://arxiv.org/abs/1901.09839v2
PDF https://arxiv.org/pdf/1901.09839v2.pdf
PWC https://paperswithcode.com/paper/interpreting-deep-neural-networks-through
Repo https://github.com/lorinanthony/RATE
Framework tf

Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations

Title Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations
Authors Fred Hohman, Haekyu Park, Caleb Robinson, Duen Horng Chau
Abstract Deep learning is increasingly used in decision-making tasks. However, understanding how neural networks produce final predictions remains a fundamental challenge. Existing work on interpreting neural network predictions for images often focuses on explaining predictions for single images or neurons. As predictions are often computed from millions of weights that are optimized over millions of images, such explanations can easily miss a bigger picture. We present Summit, an interactive system that scalably and systematically summarizes and visualizes what features a deep learning model has learned and how those features interact to make predictions. Summit introduces two new scalable summarization techniques: (1) activation aggregation discovers important neurons, and (2) neuron-influence aggregation identifies relationships among such neurons. Summit combines these techniques to create the novel attribution graph that reveals and summarizes crucial neuron associations and substructures that contribute to a model’s outcomes. Summit scales to large data, such as the ImageNet dataset with 1.2M images, and leverages neural network feature visualization and dataset examples to help users distill large, complex neural network models into compact, interactive visualizations. We present neural network exploration scenarios where Summit helps us discover multiple surprising insights into a prevalent, large-scale image classifier’s learned representations and informs future neural network architecture design. The Summit visualization runs in modern web browsers and is open-sourced.
Tasks Decision Making
Published 2019-04-04
URL https://arxiv.org/abs/1904.02323v3
PDF https://arxiv.org/pdf/1904.02323v3.pdf
PWC https://paperswithcode.com/paper/summit-scaling-deep-learning-interpretability
Repo https://github.com/fredhohman/summit
Framework none

Multi-mapping Image-to-Image Translation via Learning Disentanglement

Title Multi-mapping Image-to-Image Translation via Learning Disentanglement
Authors Xiaoming Yu, Yuanqi Chen, Thomas Li, Shan Liu, Ge Li
Abstract Recent advances of image-to-image translation focus on learning the one-to-many mapping from two aspects: multi-modal translation and multi-domain translation. However, the existing methods only consider one of the two perspectives, which makes them unable to solve each other’s problem. To address this issue, we propose a novel unified model, which bridges these two objectives. First, we disentangle the input images into the latent representations by an encoder-decoder architecture with a conditional adversarial training in the feature space. Then, we encourage the generator to learn multi-mappings by a random cross-domain translation. As a result, we can manipulate different parts of the latent representations to perform multi-modal and multi-domain translations simultaneously. Experiments demonstrate that our method outperforms state-of-the-art methods.
Tasks Image-to-Image Translation
Published 2019-09-17
URL https://arxiv.org/abs/1909.07877v2
PDF https://arxiv.org/pdf/1909.07877v2.pdf
PWC https://paperswithcode.com/paper/multi-mapping-image-to-image-translation-via
Repo https://github.com/Xiaoming-Yu/DMIT
Framework pytorch

What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models

Title What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models
Authors Allyson Ettinger
Abstract Pre-training by language modeling has become a popular and successful approach to NLP tasks, but we have yet to understand exactly what linguistic capacities these pre-training processes confer upon models. In this paper we introduce a suite of diagnostics drawn from human language experiments, which allow us to ask targeted questions about the information used by language models for generating predictions in context. As a case study, we apply these diagnostics to the popular BERT model, finding that it can generally distinguish good from bad completions involving shared category or role reversal, albeit with less sensitivity than humans, and it robustly retrieves noun hypernyms, but it struggles with challenging inferences and role-based event prediction – and in particular, it shows clear insensitivity to the contextual impacts of negation.
Tasks Language Modelling
Published 2019-07-31
URL https://arxiv.org/abs/1907.13528v1
PDF https://arxiv.org/pdf/1907.13528v1.pdf
PWC https://paperswithcode.com/paper/what-bert-is-not-lessons-from-a-new-suite-of
Repo https://github.com/aetting/lm-diagnostics
Framework pytorch

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Title Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Authors Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, Cho-Jui Hsieh
Abstract Training large deep neural networks on massive datasets is computationally very challenging. There has been recent surge in interest in using large batch stochastic optimization methods to tackle this issue. The most prominent algorithm in this line of research is LARS, which by employing layerwise adaptive learning rates trains ResNet on ImageNet in a few minutes. However, LARS performs poorly for attention models like BERT, indicating that its performance gains are not consistent across tasks. In this paper, we first study a principled layerwise adaptation strategy to accelerate training of deep neural networks using large mini-batches. Using this strategy, we develop a new layerwise adaptive large batch optimization technique called LAMB; we then provide convergence analysis of LAMB as well as LARS, showing convergence to a stationary point in general nonconvex settings. Our empirical results demonstrate the superior performance of LAMB across various tasks such as BERT and ResNet-50 training with very little hyperparameter tuning. In particular, for BERT training, our optimizer enables use of very large batch sizes of 32868 without any degradation of performance. By increasing the batch size to the memory limit of a TPUv3 Pod, BERT training time can be reduced from 3 days to just 76 minutes (Table 1). The LAMB implementation is available at https://github.com/tensorflow/addons/blob/master/tensorflow_addons/optimizers/lamb.py
Tasks Question Answering, Stochastic Optimization
Published 2019-04-01
URL https://arxiv.org/abs/1904.00962v5
PDF https://arxiv.org/pdf/1904.00962v5.pdf
PWC https://paperswithcode.com/paper/reducing-bert-pre-training-time-from-3-days
Repo https://github.com/btahir/tensorflow-LAMB
Framework tf

Look Again at the Syntax: Relational Graph Convolutional Network for Gendered Ambiguous Pronoun Resolution

Title Look Again at the Syntax: Relational Graph Convolutional Network for Gendered Ambiguous Pronoun Resolution
Authors Yinchuan Xu, Junlin Yang
Abstract Gender bias has been found in existing coreference resolvers. In order to eliminate gender bias, a gender-balanced dataset Gendered Ambiguous Pronouns (GAP) has been released and the best baseline model achieves only 66.9% F1. Bidirectional Encoder Representations from Transformers (BERT) has broken several NLP task records and can be used on GAP dataset. However, fine-tune BERT on a specific task is computationally expensive. In this paper, we propose an end-to-end resolver by combining pre-trained BERT with Relational Graph Convolutional Network (R-GCN). R-GCN is used for digesting structural syntactic information and learning better task-specific embeddings. Empirical results demonstrate that, under explicit syntactic supervision and without the need to fine tune BERT, R-GCN’s embeddings outperform the original BERT embeddings on the coreference task. Our work significantly improves the snippet-context baseline F1 score on GAP dataset from 66.9% to 80.3%. We participated in the 2019 GAP Coreference Shared Task, and our codes are available online.
Tasks
Published 2019-05-21
URL https://arxiv.org/abs/1905.08868v3
PDF https://arxiv.org/pdf/1905.08868v3.pdf
PWC https://paperswithcode.com/paper/look-again-at-the-syntax-relational-graph
Repo https://github.com/ianycxu/RGCN-with-BERT
Framework pytorch

XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning

Title XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning
Authors Yue Zhao, Maciej K. Hryniewicki
Abstract A new semi-supervised ensemble algorithm called XGBOD (Extreme Gradient Boosting Outlier Detection) is proposed, described and demonstrated for the enhanced detection of outliers from normal observations in various practical datasets. The proposed framework combines the strengths of both supervised and unsupervised machine learning methods by creating a hybrid approach that exploits each of their individual performance capabilities in outlier detection. XGBOD uses multiple unsupervised outlier mining algorithms to extract useful representations from the underlying data that augment the predictive capabilities of an embedded supervised classifier on an improved feature space. The novel approach is shown to provide superior performance in comparison to competing individual detectors, the full ensemble and two existing representation learning based algorithms across seven outlier datasets.
Tasks Outlier Detection, Representation Learning, Unsupervised Representation Learning
Published 2019-12-01
URL https://arxiv.org/abs/1912.00290v1
PDF https://arxiv.org/pdf/1912.00290v1.pdf
PWC https://paperswithcode.com/paper/xgbod-improving-supervised-outlier-detection
Repo https://github.com/yzhao062/XGBOD
Framework none
comments powered by Disqus