April 1, 2020

2860 words 14 mins read

Paper Group NANR 70

Paper Group NANR 70

PROTOTYPE-ASSISTED ADVERSARIAL LEARNING FOR UNSUPERVISED DOMAIN ADAPTATION. Implicit Bias of Gradient Descent based Adversarial Training on Separable Data. Deep Neural Forests: An Architecture for Tabular Data. Revisiting the Generalization of Adaptive Gradient Methods. INFERENCE, PREDICTION, AND ENTROPY RATE OF CONTINUOUS-TIME, DISCRETE-EVENT PROC …

PROTOTYPE-ASSISTED ADVERSARIAL LEARNING FOR UNSUPERVISED DOMAIN ADAPTATION

Title PROTOTYPE-ASSISTED ADVERSARIAL LEARNING FOR UNSUPERVISED DOMAIN ADAPTATION
Authors Anonymous
Abstract This paper presents a generic framework to tackle the crucial class mismatch problem in unsupervised domain adaptation (UDA) for multi-class distributions. Previous adversarial learning methods condition domain alignment only on pseudo labels, but noisy and inaccurate pseudo labels may perturb the multi-class distribution embedded in probabilistic predictions, hence bringing insufficient alleviation to the latent mismatch problem. Compared with pseudo labels, class prototypes are more accurate and reliable since they summarize over all the instances and are able to represent the inherent semantic distribution shared across domains. Therefore, we propose a novel Prototype-Assisted Adversarial Learning (PAAL) scheme, which incorporates instance probabilistic predictions and class prototypes together to provide reliable indicators for adversarial domain alignment. With the PAAL scheme, we align both the instance feature representations and class prototype representations to alleviate the mismatch among semantically different classes. Also, we exploit the class prototypes as proxy to minimize the within-class variance in the target domain to mitigate the mismatch among semantically similar classes. With these novelties, we constitute a Prototype-Assisted Conditional Domain Adaptation (PACDA) framework which well tackles the class mismatch problem. We demonstrate the good performance and generalization ability of the PAAL scheme and also PACDA framework on two UDA tasks, i.e., object recognition (Office-Home,ImageCLEF-DA, andOffice) and synthetic-to-real semantic segmentation (GTA5→CityscapesandSynthia→Cityscapes).
Tasks Domain Adaptation, Object Recognition, Semantic Segmentation, Unsupervised Domain Adaptation
Published 2020-01-01
URL https://openreview.net/forum?id=Byg79h4tvB
PDF https://openreview.net/pdf?id=Byg79h4tvB
PWC https://paperswithcode.com/paper/prototype-assisted-adversarial-learning-for
Repo
Framework

Implicit Bias of Gradient Descent based Adversarial Training on Separable Data

Title Implicit Bias of Gradient Descent based Adversarial Training on Separable Data
Authors Anonymous
Abstract Adversarial training is a principled approach for training robust neural networks. Despite of tremendous successes in practice, its theoretical properties still remain largely unexplored. In this paper, we provide new theoretical insights of gradient descent based adversarial training by studying its computational properties, specifically on its implicit bias. We take the binary classification task on linearly separable data as an illustrative example, where the loss asymptotically attains its infimum as the parameter diverges to infinity along certain directions. Specifically, we show that for any fixed iteration $T$, when the adversarial perturbation during training has proper bounded L2 norm, the classifier learned by gradient descent based adversarial training converges in direction to the maximum L2 norm margin classifier at the rate of $O(1/\sqrt{T})$, significantly faster than the rate $O(1/\log T}$ of training with clean data. In addition, when the adversarial perturbation during training has bounded Lq norm, the resulting classifier converges in direction to a maximum mixed-norm margin classifier, which has a natural interpretation of robustness, as being the maximum L2 norm margin classifier under worst-case bounded Lq norm perturbation to the data. Our findings provide theoretical backups for adversarial training that it indeed promotes robustness against adversarial perturbation.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HkgTTh4FDH
PDF https://openreview.net/pdf?id=HkgTTh4FDH
PWC https://paperswithcode.com/paper/implicit-bias-of-gradient-descent-based
Repo
Framework

Deep Neural Forests: An Architecture for Tabular Data

Title Deep Neural Forests: An Architecture for Tabular Data
Authors Anonymous
Abstract Deep neural models, such as convolutional and recurrent networks, achieve phenomenal results over spatial data such as images and text. However, when considering tabular data, gradient boosting of decision trees (GBDT) remains the method of choice. Aiming to bridge this gap, we propose \emph{deep neural forests} (DNF) – a novel architecture that combines elements from decision trees as well as dense residual connections. We present the results of extensive empirical study in which we examine the performance of GBDTs, DNFs and (deep) fully-connected networks. These results indicate that DNFs achieve comparable results to GBDTs on tabular data, and open the door to end-to-end neural modeling of multi-modal data. To this end, we present a successful application of DNFs as part of a hybrid architecture for a multi-modal driving scene understanding classification task.
Tasks Scene Understanding
Published 2020-01-01
URL https://openreview.net/forum?id=Syg9YyBFvS
PDF https://openreview.net/pdf?id=Syg9YyBFvS
PWC https://paperswithcode.com/paper/deep-neural-forests-an-architecture-for
Repo
Framework

Revisiting the Generalization of Adaptive Gradient Methods

Title Revisiting the Generalization of Adaptive Gradient Methods
Authors Naman Agarwal, Rohan Anil, Elad Hazan, Tomer Koren, Cyril Zhang
Abstract A commonplace belief in the machine learning community is that using adaptive gradient methods hurts generalization. We re-examine this belief both theoretically and experimentally, in light of insights and trends from recent years. We revisit some previous oft-cited experiments and theoretical accounts in more depth, and provide a new set of experiments in larger-scale, state-of-the-art settings. We conclude that with proper tuning, the improved training performance of adaptive optimizers does not in general carry an overfitting penalty, especially in contemporary deep learning. Finally, we synthesize a ``user’s guide’’ to adaptive optimizers, including some proposed modifications to AdaGrad to mitigate some of its empirical shortcomings. |
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BJl6t64tvr
PDF https://openreview.net/pdf?id=BJl6t64tvr
PWC https://paperswithcode.com/paper/revisiting-the-generalization-of-adaptive
Repo
Framework

INFERENCE, PREDICTION, AND ENTROPY RATE OF CONTINUOUS-TIME, DISCRETE-EVENT PROCESSES

Title INFERENCE, PREDICTION, AND ENTROPY RATE OF CONTINUOUS-TIME, DISCRETE-EVENT PROCESSES
Authors Sarah Marzen, James P. Crutchfield
Abstract The inference of models, prediction of future symbols, and entropy rate estimation of discrete-time, discrete-event processes is well-worn ground. However, many time series are better conceptualized as continuous-time, discrete-event processes. Here, we provide new methods for inferring models, predicting future symbols, and estimating the entropy rate of continuous-time, discrete-event processes. The methods rely on an extension of Bayesian structural inference that takes advantage of neural network’s universal approximation power. Based on experiments with simple synthetic data, these new methods seem to be competitive with state-of- the-art methods for prediction and entropy rate estimation as long as the correct model is inferred.
Tasks Time Series
Published 2020-01-01
URL https://openreview.net/forum?id=B1gn-pEKwH
PDF https://openreview.net/pdf?id=B1gn-pEKwH
PWC https://paperswithcode.com/paper/inference-prediction-and-entropy-rate-of
Repo
Framework

Graph Residual Flow for Molecular Graph Generation

Title Graph Residual Flow for Molecular Graph Generation
Authors Anonymous
Abstract Statistical generative models for molecular graphs attract attention from many researchers from the fields of bio- and chemo-informatics. Among these models, invertible flow-based approaches are not fully explored yet. In this paper, we propose a powerful invertible flow for molecular graphs, called Graph Residual Flow (GRF). The GRF is based on residual flows, which are known for more flexible and complex non-linear mappings than traditional coupling flows. We theoretically derive non-trivial conditions such that GRF is invertible, and present a way of keeping the entire flows invertible throughout the training and sampling. Experimental results show that a generative model based on the proposed GRF achieve comparable generation performance, with much smaller number of trainable parameters compared to the existing flow-based model.
Tasks Graph Generation
Published 2020-01-01
URL https://openreview.net/forum?id=SyepHTNFDS
PDF https://openreview.net/pdf?id=SyepHTNFDS
PWC https://paperswithcode.com/paper/graph-residual-flow-for-molecular-graph-1
Repo
Framework

Carpe Diem, Seize the Samples Uncertain “at the Moment” for Adaptive Batch Selection

Title Carpe Diem, Seize the Samples Uncertain “at the Moment” for Adaptive Batch Selection
Authors Anonymous
Abstract The performance of deep neural networks is significantly affected by how well mini-batches are constructed. In this paper, we propose a novel adaptive batch selection algorithm called Recency Bias that exploits the uncertain samples predicted inconsistently in recent iterations. The historical label predictions of each sample are used to evaluate its predictive uncertainty within a sliding window. By taking advantage of this design, Recency Bias not only accelerates the training step but also achieves a more accurate network. We demonstrate the superiority of Recency Bias by extensive evaluation on two independent tasks. Compared with existing batch selection methods, the results showed that Recency Bias reduced the test error by up to 20.5% in a fixed wall-clock training time. At the same time, it improved the training time by up to 59.3% to reach the same test error.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BklSv34KvB
PDF https://openreview.net/pdf?id=BklSv34KvB
PWC https://paperswithcode.com/paper/carpe-diem-seize-the-samples-uncertain-at-the
Repo
Framework

Emergence of Collective Policies Inside Simulations with Biased Representations

Title Emergence of Collective Policies Inside Simulations with Biased Representations
Authors Anonymous
Abstract We consider a setting where biases are involved when agents internalise an environment. Agents have different biases, all of which resulting in imperfect evidence collected for taking optimal actions. Throughout the interactions, each agent asynchronously internalises their own predictive model of the environment and forms a virtual simulation within which the agent plays trials of the episodes in entirety. In this research, we focus on developing a collective policy trained solely inside agents’ simulations, which can then be transferred to the real-world environment. The key idea is to let agents imagine together; make them take turns to host virtual episodes within which all agents participate and interact with their own biased representations. Since agents’ biases vary, the collective policy developed while sequentially visiting the internal simulations complement one another’s shortcomings. In our experiment, the collective policies consistently achieve significantly higher returns than the best individually trained policies.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=Skg2pkHFwS
PDF https://openreview.net/pdf?id=Skg2pkHFwS
PWC https://paperswithcode.com/paper/emergence-of-collective-policies-inside
Repo
Framework

Scaling Autoregressive Video Models

Title Scaling Autoregressive Video Models
Authors Anonymous
Abstract Due to the statistical complexity of video, the high degree of inherent stochasticity, and the sheer amount of data, generating natural video remains a challenging task. State-of-the-art video generation models attempt to address these issues by combining sometimes complex, often video-specific neural network architectures, latent variable models, adversarial training and a range of other methods. Despite their often high complexity, these approaches still fall short of generating high quality video continuations outside of narrow domains and often struggle with fidelity. In contrast, we show that conceptually simple, autoregressive video generation models based on a three-dimensional self-attention mechanism achieve highly competitive results across multiple metrics on popular benchmark datasets for which they produce continuations of high fidelity and realism. Furthermore, we find that our models are capable of producing diverse and surprisingly realistic continuations on a subset of videos from Kinetics, a large scale action recognition dataset comprised of YouTube videos exhibiting phenomena such as camera movement, complex object interactions and diverse human movement. To our knowledge, this is the first promising application of video-generation models to videos of this complexity.
Tasks Latent Variable Models, Video Generation
Published 2020-01-01
URL https://openreview.net/forum?id=rJgsskrFwH
PDF https://openreview.net/pdf?id=rJgsskrFwH
PWC https://paperswithcode.com/paper/scaling-autoregressive-video-models-1
Repo
Framework

CloudLSTM: A Recurrent Neural Model for Spatiotemporal Point-cloud Stream Forecasting

Title CloudLSTM: A Recurrent Neural Model for Spatiotemporal Point-cloud Stream Forecasting
Authors Anonymous
Abstract This paper introduces CloudLSTM, a new branch of recurrent neural models tailored to forecasting over data streams generated by geospatial point-cloud sources. We design a Dynamic Point-cloud Convolution (D-Conv) operator as the core component of CloudLSTMs, which performs convolution directly over point-clouds and extracts local spatial features from sets of neighboring points that surround different elements of the input. This operator maintains the permutation invariance of sequence-to-sequence learning frameworks, while representing neighboring correlations at each time step – an important aspect in spatiotemporal predictive learning. The D-Conv operator resolves the grid-structural data requirements of existing spatiotemporal forecasting models and can be easily plugged into traditional LSTM architectures with sequence-to-sequence learning and attention mechanisms. We apply our proposed architecture to two representative, practical use cases that involve point-cloud streams, i.e. mobile service traffic forecasting and air quality indicator forecasting. Our results, obtained with real-world datasets collected in diverse scenarios for each use case, show that CloudLSTM delivers accurate long-term predictions, outperforming a variety of neural network models.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BJlowyHYPr
PDF https://openreview.net/pdf?id=BJlowyHYPr
PWC https://paperswithcode.com/paper/cloudlstm-a-recurrent-neural-model-for-1
Repo
Framework

A Simple Approach to the Noisy Label Problem Through the Gambler’s Loss

Title A Simple Approach to the Noisy Label Problem Through the Gambler’s Loss
Authors Anonymous
Abstract Learning in the presence of label noise is a challenging yet important task. It is crucial to design models that are robust to noisy labels. In this paper, we discover that a new class of loss functions called the gambler’s loss provides strong robustness to label noise across various levels of corruption. Training with this modified loss function reduces memorization of data points with noisy labels and is a simple yet effective method to improve robustness and generalization. Moreover, using this loss function allows us to derive an analytical early stopping criterion that accurately estimates when memorization of noisy labels begins to occur. Our overall approach achieves strong results and outperforming existing baselines.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rJxq3kHKPH
PDF https://openreview.net/pdf?id=rJxq3kHKPH
PWC https://paperswithcode.com/paper/a-simple-approach-to-the-noisy-label-problem
Repo
Framework

Address2vec: Generating vector embeddings for blockchain analytics

Title Address2vec: Generating vector embeddings for blockchain analytics
Authors Anonymous
Abstract Bitcoin is a virtual coinage system that enables users to trade virtually free of a central trusted authority. All transactions on the Bitcoin blockchain are publicly available for viewing, yet as Bitcoin is built mainly for security it’s original structure does not allow for direct analysis of address transactions. Existing analysis methods of the Bitcoin blockchain can be complicated, computationally expensive or inaccurate. We propose a computationally efficient model to analyze bitcoin blockchain addresses and allow for their use with existing machine learning algorithms. We compare our approach against Multi Level Sequence Learners (MLSLs), one of the best performing models on bitcoin address data.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SJlJegHFvH
PDF https://openreview.net/pdf?id=SJlJegHFvH
PWC https://paperswithcode.com/paper/address2vec-generating-vector-embeddings-for
Repo
Framework

Distance-Based Learning from Errors for Confidence Calibration

Title Distance-Based Learning from Errors for Confidence Calibration
Authors Anonymous
Abstract Deep neural networks (DNNs) are poorly-calibrated when trained in conventional ways. To improve confidence calibration of DNNs, we propose a novel training method, distance-based learning from errors (DBLE). DBLE bases its confidence estimation on distances in the representation space. We first adapt prototypical learning for training of a classification model for DBLE. It yields a representation space where a test sample’s distance to its ground-truth class center can calibrate the model’s performance. At inference, however, these distances are not available due to the lack of ground-truth label. To circumvent this by approximately inferring the distance for every test sample, we propose to train a confidence model jointly with the classification model, by merely learning from mis-classified training samples, which we show to be highly-beneficial for effective learning. On multiple data sets and DNN architectures, we demonstrate that DBLE outperforms alternative single-modal confidence calibration approaches. DBLE also achieves comparable performance with computationally-expensive ensemble approaches with lower computational cost and lower number of parameters.
Tasks Calibration
Published 2020-01-01
URL https://openreview.net/forum?id=BJeB5hVtvB
PDF https://openreview.net/pdf?id=BJeB5hVtvB
PWC https://paperswithcode.com/paper/distance-based-learning-from-errors-for
Repo
Framework

YaoGAN: Learning Worst-case Competitive Algorithms from Self-generated Inputs

Title YaoGAN: Learning Worst-case Competitive Algorithms from Self-generated Inputs
Authors Anonymous
Abstract We tackle the challenge of using machine learning to find algorithms with strong worst-case guarantees for online combinatorial optimization problems. Whereas the previous approach along this direction (Kong et al., 2018) relies on significant domain expertise to provide hard distributions over input instances at training, we ask whether this can be accomplished from first principles, i.e., without any human-provided data beyond specifying the objective of the optimization problem. To answer this question, we draw insights from classic results in game theory, analysis of algorithms, and online learning to introduce a novel framework. At the high level, similar to a generative adversarial network (GAN), our framework has two components whose respective goals are to learn the optimal algorithm as well as a set of input instances that captures the essential difficulty of the given optimization problem. The two components are trained against each other and evolved simultaneously. We test our ideas on the ski rental problem and the fractional AdWords problem. For these well-studied problems, our preliminary results demonstrate that the framework is capable of finding algorithms as well as difficult input instances that are consistent with known optimal results. We believe our new framework points to a promising direction which can facilitate the research of algorithm design by leveraging ML to improve the state of the art both in theory and in practice.
Tasks Combinatorial Optimization
Published 2020-01-01
URL https://openreview.net/forum?id=HygYmJBKwH
PDF https://openreview.net/pdf?id=HygYmJBKwH
PWC https://paperswithcode.com/paper/yaogan-learning-worst-case-competitive
Repo
Framework

Solving Packing Problems by Conditional Query Learning

Title Solving Packing Problems by Conditional Query Learning
Authors Anonymous
Abstract Neural Combinatorial Optimization (NCO) has shown the potential to solve traditional NP-hard problems recently. Previous studies have shown that NCO outperforms heuristic algorithms in many combinatorial optimization problems such as the routing problems. However, it is less efficient for more complicated problems such as packing, one type of optimization problem that faces mutual conditioned action space. In this paper, we propose a Conditional Query Learning (CQL) method to handle the packing problem for both 2D and 3D settings. By embedding previous actions as a conditional query to the attention model, we design a fully end-to-end model and train it for 2D and 3D packing via reinforcement learning respectively. Through extensive experiments, the results show that our method could achieve lower bin gap ratio and variance for both 2D and 3D packing. Our model improves 7.2% space utilization ratio compared with genetic algorithm for 3D packing (30 boxes case), and reduces more than 10% bin gap ratio in almost every case compared with extant learning approaches. In addition, our model shows great scalability to packing box number. Furthermore, we provide a general test environment of 2D and 3D packing for learning algorithms. All source code of the model and the test environment is released.
Tasks Combinatorial Optimization
Published 2020-01-01
URL https://openreview.net/forum?id=BkgTwRNtPB
PDF https://openreview.net/pdf?id=BkgTwRNtPB
PWC https://paperswithcode.com/paper/solving-packing-problems-by-conditional-query
Repo
Framework
comments powered by Disqus