February 1, 2020

3227 words 16 mins read

Paper Group AWR 302

Catalyst.RL: A Distributed Framework for Reproducible RL Research. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. TF-Replicator: Distributed Machine Learning for Researchers. Class-Balanced Loss Based on Effective Number of Samples. PR Product: A Substitute for Inner Product in Neural Networks. Factored Latent-Dynamic Condition …

Catalyst.RL: A Distributed Framework for Reproducible RL Research


Title	Catalyst.RL: A Distributed Framework for Reproducible RL Research
Authors	Sergey Kolesnikov, Oleksii Hrinchuk
Abstract	Despite the recent progress in deep reinforcement learning field (RL), and, arguably because of it, a large body of work remains to be done in reproducing and carefully comparing different RL algorithms. We present catalyst.RL, an open source framework for RL research with a focus on reproducibility and flexibility. Main features of our library include large-scale asynchronous distributed training, easy-to-use configuration files with the complete list of hyperparameters for the particular experiments, efficient implementations of various RL algorithms and auxiliary tricks, such as frame stacking, n-step returns, value distributions, etc. To vindicate the usefulness of our framework, we evaluate it on a range of benchmarks in a continuous control, as well as on the task of developing a controller to enable a physiologically-based human model with a prosthetic leg to walk and run. The latter task was introduced at NeurIPS 2018 AI for Prosthetics Challenge, where our team took the 3rd place, capitalizing on the ability of catalyst.RL to train high-quality and sample-efficient RL agents.
Tasks	Continuous Control
Published	2019-02-28
URL	http://arxiv.org/abs/1903.00027v1
PDF	http://arxiv.org/pdf/1903.00027v1.pdf
PWC	https://paperswithcode.com/paper/catalystrl-a-distributed-framework-for
Repo	https://github.com/catalyst-team/catalyst-rl-framework
Framework	none

MaskGAN: Towards Diverse and Interactive Facial Image Manipulation


Title	MaskGAN: Towards Diverse and Interactive Facial Image Manipulation
Authors	Cheng-Han Lee, Ziwei Liu, Lingyun Wu, Ping Luo
Abstract	Facial image manipulation has achieved great progress in recent years. However, previous methods either operate on a predefined set of face attributes or leave users little freedom to interactively manipulate images. To overcome these drawbacks, we propose a novel framework termed MaskGAN, enabling diverse and interactive face manipulation. Our key insight is that semantic masks serve as a suitable intermediate representation for flexible face manipulation with fidelity preservation. MaskGAN has two main components: 1) Dense Mapping Network (DMN) and 2) Editing Behavior Simulated Training (EBST). Specifically, DMN learns style mapping between a free-form user modified mask and a target image, enabling diverse generation results. EBST models the user editing behavior on the source mask, making the overall framework more robust to various manipulated inputs. Specifically, it introduces dual-editing consistency as the auxiliary supervision signal. To facilitate extensive studies, we construct a large-scale high-resolution face dataset with fine-grained mask annotations named CelebAMask-HQ. MaskGAN is comprehensively evaluated on two challenging tasks: attribute transfer and style copy, demonstrating superior performance over other state-of-the-art methods. The code, models, and dataset are available at https://github.com/switchablenorms/CelebAMask-HQ.
Tasks
Published	2019-07-27
URL	https://arxiv.org/abs/1907.11922v2
PDF	https://arxiv.org/pdf/1907.11922v2.pdf
PWC	https://paperswithcode.com/paper/maskgan-towards-diverse-and-interactive
Repo	https://github.com/switchablenorms/CelebAMask-HQ
Framework	pytorch

TF-Replicator: Distributed Machine Learning for Researchers


Title	TF-Replicator: Distributed Machine Learning for Researchers
Authors	Peter Buchlovsky, David Budden, Dominik Grewe, Chris Jones, John Aslanides, Frederic Besse, Andy Brock, Aidan Clark, Sergio Gómez Colmenarejo, Aedan Pope, Fabio Viola, Dan Belov
Abstract	We describe TF-Replicator, a framework for distributed machine learning designed for DeepMind researchers and implemented as an abstraction over TensorFlow. TF-Replicator simplifies writing data-parallel and model-parallel research code. The same models can be effortlessly deployed to different cluster architectures (i.e. one or many machines containing CPUs, GPUs or TPU accelerators) using synchronous or asynchronous training regimes. To demonstrate the generality and scalability of TF-Replicator, we implement and benchmark three very different models: (1) A ResNet-50 for ImageNet classification, (2) a SN-GAN for class-conditional ImageNet image generation, and (3) a D4PG reinforcement learning agent for continuous control. Our results show strong scalability performance without demanding any distributed systems expertise of the user. The TF-Replicator programming model will be open-sourced as part of TensorFlow 2.0 (see https://github.com/tensorflow/community/pull/25).
Tasks	Continuous Control, Image Generation
Published	2019-02-01
URL	http://arxiv.org/abs/1902.00465v1
PDF	http://arxiv.org/pdf/1902.00465v1.pdf
PWC	https://paperswithcode.com/paper/tf-replicator-distributed-machine-learning
Repo	https://github.com/hoondori/what-i-want-to-learn
Framework	tf

Class-Balanced Loss Based on Effective Number of Samples


Title	Class-Balanced Loss Based on Effective Number of Samples
Authors	Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, Serge Belongie
Abstract	With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.
Tasks	Image Classification
Published	2019-01-16
URL	http://arxiv.org/abs/1901.05555v1
PDF	http://arxiv.org/pdf/1901.05555v1.pdf
PWC	https://paperswithcode.com/paper/class-balanced-loss-based-on-effective-number
Repo	https://github.com/feidfoe/AdjustBnd4Imbalance
Framework	pytorch

PR Product: A Substitute for Inner Product in Neural Networks


Title	PR Product: A Substitute for Inner Product in Neural Networks
Authors	Zhennan Wang, Wenbin Zou, Chen Xu
Abstract	In this paper, we analyze the inner product of weight vector w and data vector x in neural networks from the perspective of vector orthogonal decomposition and prove that the direction gradient of w decreases with the angle between them close to 0 or {\pi}. We propose the Projection and Rejection Product (PR Product) to make the direction gradient of w independent of the angle and consistently larger than the one in standard inner product while keeping the forward propagation identical. As a reliable substitute for standard inner product, the PR Product can be applied into many existing deep learning modules, so we develop the PR Product version of fully connected layer, convolutional layer and LSTM layer. In static image classification, the experiments on CIFAR10 and CIFAR100 datasets demonstrate that the PR Product can robustly enhance the ability of various state-of-the-art classification networks. On the task of image captioning, even without any bells and whistles, our PR Product version of captioning model can compete or outperform the state-of-the-art models on MS COCO dataset. Code has been made available at:https://github.com/wzn0828/PR_Product.
Tasks	Image Captioning, Image Classification
Published	2019-04-30
URL	https://arxiv.org/abs/1904.13148v2
PDF	https://arxiv.org/pdf/1904.13148v2.pdf
PWC	https://paperswithcode.com/paper/pr-product-a-substitute-for-inner-product-in
Repo	https://github.com/wzn0828/PR_Product
Framework	pytorch

Factored Latent-Dynamic Conditional Random Fields for Single and Multi-label Sequence Modeling


Title	Factored Latent-Dynamic Conditional Random Fields for Single and Multi-label Sequence Modeling
Authors	Satyajit Neogi, Justin Dauwels
Abstract	Conditional Random Fields (CRF) are frequently applied for labeling and segmenting sequence data. Morency et al. (2007) introduced hidden state variables in a labeled CRF structure in order to model the latent dynamics within class labels, thus improving the labeling performance. Such a model is known as Latent-Dynamic CRF (LDCRF). We present Factored LDCRF (FLDCRF), a structure that allows multiple latent dynamics of the class labels to interact with each other. Including such latent-dynamic interactions leads to improved labeling performance on single-label and multi-label sequence modeling tasks. We apply our FLDCRF models on two single-label (one nested cross-validation) and one multi-label sequence tagging (nested cross-validation) experiments across two different datasets - UCI gesture phase data and UCI opportunity data. FLDCRF outperforms all state-of-the-art sequence models, i.e., CRF, LDCRF, LSTM, LSTM-CRF, Factorial CRF, Coupled CRF and a multi-label LSTM model in all our experiments. In addition, LSTM based models display inconsistent performance across validation and test data, and pose diffculty to select models on validation data during our experiments. FLDCRF offers easier model selection, consistency across validation and test performance and lucid model intuition. FLDCRF is also much faster to train compared to LSTM, even without a GPU. FLDCRF outshines the best LSTM model by ~4% on a single-label task on UCI gesture phase data and outperforms LSTM performance by ~2% on average across nested cross-validation test sets on the multi-label sequence tagging experiment on UCI opportunity data. The idea of FLDCRF can be extended to joint (multi-agent interactions) and heterogeneous (discrete and continuous) state space models.
Tasks	Model Selection
Published	2019-11-09
URL	https://arxiv.org/abs/1911.03667v2
PDF	https://arxiv.org/pdf/1911.03667v2.pdf
PWC	https://paperswithcode.com/paper/factored-latent-dynamic-conditional-random
Repo	https://github.com/satyajitneogiju/FLDCRF-for-sequence-labeling
Framework	none

Don’t ignore Dropout in Fully Convolutional Networks


Title	Don’t ignore Dropout in Fully Convolutional Networks
Authors	Thomas Spilsbury, Paavo Camps
Abstract	Data for Image segmentation models can be costly to obtain due to the precision required by human annotators. We run a series of experiments showing the effect of different kinds of Dropout training on the DeepLabv3+ Image segmentation model when trained using a small dataset. We find that when appropriate forms of Dropout are applied in the right place in the model architecture that non-insignificant improvement in Mean Intersection over Union (mIoU) score can be observed. In our best case, we find that applying Dropout scheduling in conjunction with SpatialDropout improves baseline mIoU from 0.49 to 0.59. This result shows that even where a model architecture makes extensive use of Batch Normalization, Dropout can still be an effective way of improving performance in low data situations.
Tasks	Semantic Segmentation
Published	2019-08-24
URL	https://arxiv.org/abs/1908.09162v1
PDF	https://arxiv.org/pdf/1908.09162v1.pdf
PWC	https://paperswithcode.com/paper/dont-ignore-dropout-in-fully-convolutional
Repo	https://github.com/smspillaz/seg-reg
Framework	pytorch

CvxPnPL: A Unified Convex Solution to the Absolute Pose Estimation Problem from Point and Line Correspondences


Title	CvxPnPL: A Unified Convex Solution to the Absolute Pose Estimation Problem from Point and Line Correspondences
Authors	Sérgio Agostinho, João Gomes, Alessio Del Bue
Abstract	We present a new convex method to estimate 3D pose from mixed combinations of 2D-3D point and line correspondences, the Perspective-n-Points-and-Lines problem (PnPL). We merge the contributions of each point and line into a unified Quadratic Constrained Quadratic Problem (QCQP) and then relax it into a Semi Definite Program (SDP) through Shor’s relaxation. This makes it possible to gracefully handle mixed configurations of points and lines. Furthermore, the proposed relaxation allows us to recover a finite number of solutions under ambiguous configurations. In such cases, the 3D pose candidates are found by further enforcing geometric constraints on the solution space and then retrieving such poses from the intersections of multiple quadrics. Experiments provide results in line with the best performing state of the art methods while providing the flexibility of solving for an arbitrary number of points and lines.
Tasks	Pose Estimation
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10545v2
PDF	https://arxiv.org/pdf/1907.10545v2.pdf
PWC	https://paperswithcode.com/paper/cvxpnpl-a-unified-convex-solution-to-the
Repo	https://github.com/SergioRAgostinho/cvxpnpl
Framework	none

Interpreting Deep Neural Networks Through Variable Importance


Title	Interpreting Deep Neural Networks Through Variable Importance
Authors	Jonathan Ish-Horowicz, Dana Udwin, Seth Flaxman, Sarah Filippi, Lorin Crawford
Abstract	While the success of deep neural networks (DNNs) is well-established across a variety of domains, our ability to explain and interpret these methods is limited. Unlike previously proposed local methods which try to explain particular classification decisions, we focus on global interpretability and ask a universally applicable question: given a trained model, which features are the most important? In the context of neural networks, a feature is rarely important on its own, so our strategy is specifically designed to leverage partial covariance structures and incorporate variable dependence into feature ranking. Our methodological contributions in this paper are two-fold. First, we propose an effect size analogue for DNNs that is appropriate for applications with highly collinear predictors (ubiquitous in computer vision). Second, we extend the recently proposed “RelATive cEntrality” (RATE) measure (Crawford et al., 2019) to the Bayesian deep learning setting. RATE applies an information theoretic criterion to the posterior distribution of effect sizes to assess feature significance. We apply our framework to three broad application areas: computer vision, natural language processing, and social science.
Tasks
Published	2019-01-28
URL	https://arxiv.org/abs/1901.09839v2
PDF	https://arxiv.org/pdf/1901.09839v2.pdf
PWC	https://paperswithcode.com/paper/interpreting-deep-neural-networks-through
Repo	https://github.com/lorinanthony/RATE
Framework	tf

Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations


Title	Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations
Authors	Fred Hohman, Haekyu Park, Caleb Robinson, Duen Horng Chau
Abstract	Deep learning is increasingly used in decision-making tasks. However, understanding how neural networks produce final predictions remains a fundamental challenge. Existing work on interpreting neural network predictions for images often focuses on explaining predictions for single images or neurons. As predictions are often computed from millions of weights that are optimized over millions of images, such explanations can easily miss a bigger picture. We present Summit, an interactive system that scalably and systematically summarizes and visualizes what features a deep learning model has learned and how those features interact to make predictions. Summit introduces two new scalable summarization techniques: (1) activation aggregation discovers important neurons, and (2) neuron-influence aggregation identifies relationships among such neurons. Summit combines these techniques to create the novel attribution graph that reveals and summarizes crucial neuron associations and substructures that contribute to a model’s outcomes. Summit scales to large data, such as the ImageNet dataset with 1.2M images, and leverages neural network feature visualization and dataset examples to help users distill large, complex neural network models into compact, interactive visualizations. We present neural network exploration scenarios where Summit helps us discover multiple surprising insights into a prevalent, large-scale image classifier’s learned representations and informs future neural network architecture design. The Summit visualization runs in modern web browsers and is open-sourced.
Tasks	Decision Making
Published	2019-04-04
URL	https://arxiv.org/abs/1904.02323v3
PDF	https://arxiv.org/pdf/1904.02323v3.pdf
PWC	https://paperswithcode.com/paper/summit-scaling-deep-learning-interpretability
Repo	https://github.com/fredhohman/summit
Framework	none

Multi-mapping Image-to-Image Translation via Learning Disentanglement


Title	Multi-mapping Image-to-Image Translation via Learning Disentanglement
Authors	Xiaoming Yu, Yuanqi Chen, Thomas Li, Shan Liu, Ge Li
Abstract	Recent advances of image-to-image translation focus on learning the one-to-many mapping from two aspects: multi-modal translation and multi-domain translation. However, the existing methods only consider one of the two perspectives, which makes them unable to solve each other’s problem. To address this issue, we propose a novel unified model, which bridges these two objectives. First, we disentangle the input images into the latent representations by an encoder-decoder architecture with a conditional adversarial training in the feature space. Then, we encourage the generator to learn multi-mappings by a random cross-domain translation. As a result, we can manipulate different parts of the latent representations to perform multi-modal and multi-domain translations simultaneously. Experiments demonstrate that our method outperforms state-of-the-art methods.
Tasks	Image-to-Image Translation
Published	2019-09-17
URL	https://arxiv.org/abs/1909.07877v2
PDF	https://arxiv.org/pdf/1909.07877v2.pdf
PWC	https://paperswithcode.com/paper/multi-mapping-image-to-image-translation-via
Repo	https://github.com/Xiaoming-Yu/DMIT
Framework	pytorch

What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models


Title	What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models
Authors	Allyson Ettinger
Abstract	Pre-training by language modeling has become a popular and successful approach to NLP tasks, but we have yet to understand exactly what linguistic capacities these pre-training processes confer upon models. In this paper we introduce a suite of diagnostics drawn from human language experiments, which allow us to ask targeted questions about the information used by language models for generating predictions in context. As a case study, we apply these diagnostics to the popular BERT model, finding that it can generally distinguish good from bad completions involving shared category or role reversal, albeit with less sensitivity than humans, and it robustly retrieves noun hypernyms, but it struggles with challenging inferences and role-based event prediction – and in particular, it shows clear insensitivity to the contextual impacts of negation.
Tasks	Language Modelling
Published	2019-07-31
URL	https://arxiv.org/abs/1907.13528v1
PDF	https://arxiv.org/pdf/1907.13528v1.pdf
PWC	https://paperswithcode.com/paper/what-bert-is-not-lessons-from-a-new-suite-of
Repo	https://github.com/aetting/lm-diagnostics
Framework	pytorch

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes


Title	Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Authors	Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, Cho-Jui Hsieh
Abstract	Training large deep neural networks on massive datasets is computationally very challenging. There has been recent surge in interest in using large batch stochastic optimization methods to tackle this issue. The most prominent algorithm in this line of research is LARS, which by employing layerwise adaptive learning rates trains ResNet on ImageNet in a few minutes. However, LARS performs poorly for attention models like BERT, indicating that its performance gains are not consistent across tasks. In this paper, we first study a principled layerwise adaptation strategy to accelerate training of deep neural networks using large mini-batches. Using this strategy, we develop a new layerwise adaptive large batch optimization technique called LAMB; we then provide convergence analysis of LAMB as well as LARS, showing convergence to a stationary point in general nonconvex settings. Our empirical results demonstrate the superior performance of LAMB across various tasks such as BERT and ResNet-50 training with very little hyperparameter tuning. In particular, for BERT training, our optimizer enables use of very large batch sizes of 32868 without any degradation of performance. By increasing the batch size to the memory limit of a TPUv3 Pod, BERT training time can be reduced from 3 days to just 76 minutes (Table 1). The LAMB implementation is available at https://github.com/tensorflow/addons/blob/master/tensorflow_addons/optimizers/lamb.py
Tasks	Question Answering, Stochastic Optimization
Published	2019-04-01
URL	https://arxiv.org/abs/1904.00962v5
PDF	https://arxiv.org/pdf/1904.00962v5.pdf
PWC	https://paperswithcode.com/paper/reducing-bert-pre-training-time-from-3-days
Repo	https://github.com/btahir/tensorflow-LAMB
Framework	tf

Look Again at the Syntax: Relational Graph Convolutional Network for Gendered Ambiguous Pronoun Resolution


Title	Look Again at the Syntax: Relational Graph Convolutional Network for Gendered Ambiguous Pronoun Resolution
Authors	Yinchuan Xu, Junlin Yang
Abstract	Gender bias has been found in existing coreference resolvers. In order to eliminate gender bias, a gender-balanced dataset Gendered Ambiguous Pronouns (GAP) has been released and the best baseline model achieves only 66.9% F1. Bidirectional Encoder Representations from Transformers (BERT) has broken several NLP task records and can be used on GAP dataset. However, fine-tune BERT on a specific task is computationally expensive. In this paper, we propose an end-to-end resolver by combining pre-trained BERT with Relational Graph Convolutional Network (R-GCN). R-GCN is used for digesting structural syntactic information and learning better task-specific embeddings. Empirical results demonstrate that, under explicit syntactic supervision and without the need to fine tune BERT, R-GCN’s embeddings outperform the original BERT embeddings on the coreference task. Our work significantly improves the snippet-context baseline F1 score on GAP dataset from 66.9% to 80.3%. We participated in the 2019 GAP Coreference Shared Task, and our codes are available online.
Tasks
Published	2019-05-21
URL	https://arxiv.org/abs/1905.08868v3
PDF	https://arxiv.org/pdf/1905.08868v3.pdf
PWC	https://paperswithcode.com/paper/look-again-at-the-syntax-relational-graph
Repo	https://github.com/ianycxu/RGCN-with-BERT
Framework	pytorch

XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning


Title	XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning
Authors	Yue Zhao, Maciej K. Hryniewicki
Abstract	A new semi-supervised ensemble algorithm called XGBOD (Extreme Gradient Boosting Outlier Detection) is proposed, described and demonstrated for the enhanced detection of outliers from normal observations in various practical datasets. The proposed framework combines the strengths of both supervised and unsupervised machine learning methods by creating a hybrid approach that exploits each of their individual performance capabilities in outlier detection. XGBOD uses multiple unsupervised outlier mining algorithms to extract useful representations from the underlying data that augment the predictive capabilities of an embedded supervised classifier on an improved feature space. The novel approach is shown to provide superior performance in comparison to competing individual detectors, the full ensemble and two existing representation learning based algorithms across seven outlier datasets.
Tasks	Outlier Detection, Representation Learning, Unsupervised Representation Learning
Published	2019-12-01
URL	https://arxiv.org/abs/1912.00290v1
PDF	https://arxiv.org/pdf/1912.00290v1.pdf
PWC	https://paperswithcode.com/paper/xgbod-improving-supervised-outlier-detection
Repo	https://github.com/yzhao062/XGBOD
Framework	none