April 1, 2020

2948 words 14 mins read

Paper Group ANR 511

Combating False Negatives in Adversarial Imitation Learning. Domain-Adversarial and -Conditional State Space Model for Imitation Learning. CO-Optimal Transport. A Probabilistic Framework for Imitating Human Race Driver Behavior. Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping. Experimental Studies in …

Combating False Negatives in Adversarial Imitation Learning


Title	Combating False Negatives in Adversarial Imitation Learning
Authors	Konrad Zolna, Chitwan Saharia, Leonard Boussioux, David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio
Abstract	In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones produced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agent’s trajectories, the discriminator is trained to output low values for them. We hypothesize that this inconsistent training signal for the discriminator can impede its learning, and consequently leads to worse overall performance of the agent. We show experimental evidence for this hypothesis and that the ‘False Negatives’ (i.e. successful agent episodes) significantly hinder adversarial imitation learning, which is the first contribution of this paper. Then, we propose a method to alleviate the impact of false negatives and test it on the BabyAI environment. This method consistently improves sample efficiency over the baselines by at least an order of magnitude.
Tasks	Imitation Learning
Published	2020-02-02
URL	https://arxiv.org/abs/2002.00412v1
PDF	https://arxiv.org/pdf/2002.00412v1.pdf
PWC	https://paperswithcode.com/paper/combating-false-negatives-in-adversarial
Repo
Framework

Domain-Adversarial and -Conditional State Space Model for Imitation Learning


Title	Domain-Adversarial and -Conditional State Space Model for Imitation Learning
Authors	Ryo Okumura, Masashi Okada, Tadahiro Taniguchi
Abstract	State representation learning (SRL) in partially observable Markov decision processes has been studied to learn abstract features of data useful for robot control tasks. For SRL, acquiring domain-agnostic states is essential for achieving efficient imitation learning (IL). Without these states, IL is hampered by domain-dependent information useless for control. However, existing methods fail to remove such disturbances from the states when the data from experts and agents show large domain shifts. To overcome this issue, we propose a domain-adversarial and -conditional state space model (DAC-SSM) that enables control systems to obtain domain-agnostic and task- and dynamics-aware states. DAC-SSM jointly optimizes the state inference, observation reconstruction, forward dynamics, and reward models. To remove domain-dependent information from the states, the model is trained with domain discriminators in an adversarial manner, and the reconstruction is conditioned on domain labels. We experimentally evaluated the model predictive control performance via IL for continuous control of sparse reward tasks in simulators and compared it with the performance of the existing SRL method. The agents from DAC-SSM achieved performance comparable to experts and more than twice the baselines. We conclude domain-agnostic states are essential for IL that has large domain shifts and can be obtained using DAC-SSM.
Tasks	Continuous Control, Imitation Learning, Representation Learning
Published	2020-01-31
URL	https://arxiv.org/abs/2001.11628v1
PDF	https://arxiv.org/pdf/2001.11628v1.pdf
PWC	https://paperswithcode.com/paper/domain-adversarial-and-conditional-state
Repo
Framework

CO-Optimal Transport


Title	CO-Optimal Transport
Authors	Ievgen Redko, Titouan Vayer, Rémi Flamary, Nicolas Courty
Abstract	Optimal transport (OT) is a powerful geometric and probabilistic tool for finding correspondences and measuring similarity between two distributions. Yet, its original formulation relies on the existence of a cost function between the samples of the two distributions, which makes it impractical for comparing data distributions supported on different topological spaces. To circumvent this limitation, we propose a novel OT problem, named COOT for CO-Optimal Transport, that aims to simultaneously optimize two transport maps between both samples and features. This is different from other approaches that either discard the individual features by focussing on pairwise distances (e.g. Gromov-Wasserstein) or need to model explicitly the relations between the features. COOT leads to interpretable correspondences between both samples and feature representations and holds metric properties. We provide a thorough theoretical analysis of our framework and establish rich connections with the Gromov-Wasserstein distance. We demonstrate its versatility with two machine learning applications in heterogeneous domain adaptation and co-clustering/data summarization, where COOT leads to performance improvements over the competing state-of-the-art methods.
Tasks	Data Summarization, Domain Adaptation
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03731v2
PDF	https://arxiv.org/pdf/2002.03731v2.pdf
PWC	https://paperswithcode.com/paper/co-optimal-transport
Repo
Framework

A Probabilistic Framework for Imitating Human Race Driver Behavior


Title	A Probabilistic Framework for Imitating Human Race Driver Behavior
Authors	Stefan Löckel, Jan Peters, Peter van Vliet
Abstract	Understanding and modeling human driver behavior is crucial for advanced vehicle development. However, unique driving styles, inconsistent behavior, and complex decision processes render it a challenging task, and existing approaches often lack variability or robustness. To approach this problem, we propose Probabilistic Modeling of Driver behavior (ProMoD), a modular framework which splits the task of driver behavior modeling into multiple modules. A global target trajectory distribution is learned with Probabilistic Movement Primitives, clothoids are utilized for local path generation, and the corresponding choice of actions is performed by a neural network. Experiments in a simulated car racing setting show considerable advantages in imitation accuracy and robustness compared to other imitation learning algorithms. The modular architecture of the proposed framework facilitates straightforward extensibility in driving line adaptation and sequencing of multiple movement primitives for future research.
Tasks	Car Racing, Imitation Learning
Published	2020-01-22
URL	https://arxiv.org/abs/2001.08255v2
PDF	https://arxiv.org/pdf/2001.08255v2.pdf
PWC	https://paperswithcode.com/paper/a-probabilistic-framework-for-imitating-human
Repo
Framework

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping


Title	Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Authors	Jesse Dodge, Gabriel Ilharco, Roy Schwartz, Ali Farhadi, Hannaneh Hajishirzi, Noah Smith
Abstract	Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing. This process, however, is often brittle: even with the same hyperparameter values, distinct random seeds can lead to substantially different results. To better understand this phenomenon, we experiment with four datasets from the GLUE benchmark, fine-tuning BERT hundreds of times on each while varying only the random seeds. We find substantial performance increases compared to previously reported results, and we quantify how the performance of the best-found model varies as a function of the number of fine-tuning trials. Further, we examine two factors influenced by the choice of random seed: weight initialization and training data order. We find that both contribute comparably to the variance of out-of-sample performance, and that some weight initializations perform well across all tasks explored. On small datasets, we observe that many fine-tuning trials diverge part of the way through training, and we offer best practices for practitioners to stop training less promising runs early. We publicly release all of our experimental data, including training and validation scores for 2,100 trials, to encourage further analysis of training dynamics during fine-tuning.
Tasks
Published	2020-02-15
URL	https://arxiv.org/abs/2002.06305v1
PDF	https://arxiv.org/pdf/2002.06305v1.pdf
PWC	https://paperswithcode.com/paper/fine-tuning-pretrained-language-models-weight
Repo
Framework

Experimental Studies in General Game Playing: An Experience Report


Title	Experimental Studies in General Game Playing: An Experience Report
Authors	Jakub Kowalski, Marek Szykuła
Abstract	We describe nearly fifteen years of General Game Playing experimental research history in the context of reproducibility and fairness of comparisons between various GGP agents and systems designed to play games described by different formalisms. We think our survey may provide an interesting perspective of how chaotic methods were allowed when nothing better was possible. Finally, from our experience-based view, we would like to propose a few recommendations of how such specific heterogeneous branch of research should be handled appropriately in the future. The goal of this note is to point out common difficulties and problems in the experimental research in the area. We hope that our recommendations will help in avoiding them in future works and allow more fair and reproducible comparisons.
Tasks
Published	2020-03-06
URL	https://arxiv.org/abs/2003.03410v1
PDF	https://arxiv.org/pdf/2003.03410v1.pdf
PWC	https://paperswithcode.com/paper/experimental-studies-in-general-game-playing
Repo
Framework

Channel Equilibrium Networks for Learning Deep Representation


Title	Channel Equilibrium Networks for Learning Deep Representation
Authors	Wenqi Shao, Shitao Tang, Xingang Pan, Ping Tan, Xiaogang Wang, Ping Luo
Abstract	Convolutional Neural Networks (CNNs) are typically constructed by stacking multiple building blocks, each of which contains a normalization layer such as batch normalization (BN) and a rectified linear function such as ReLU. However, this work shows that the combination of normalization and rectified linear function leads to inhibited channels, which have small magnitude and contribute little to the learned feature representation, impeding the generalization ability of CNNs. Unlike prior arts that simply removed the inhibited channels, we propose to “wake them up” during training by designing a novel neural building block, termed Channel Equilibrium (CE) block, which enables channels at the same layer to contribute equally to the learned representation. We show that CE is able to prevent inhibited channels both empirically and theoretically. CE has several appealing benefits. (1) It can be integrated into many advanced CNN architectures such as ResNet and MobileNet, outperforming their original networks. (2) CE has an interesting connection with the Nash Equilibrium, a well-known solution of a non-cooperative game. (3) Extensive experiments show that CE achieves state-of-the-art performance on various challenging benchmarks such as ImageNet and COCO.
Tasks
Published	2020-02-29
URL	https://arxiv.org/abs/2003.00214v1
PDF	https://arxiv.org/pdf/2003.00214v1.pdf
PWC	https://paperswithcode.com/paper/channel-equilibrium-networks-for-learning
Repo
Framework

Logarithmic Regret for Adversarial Online Control


Title	Logarithmic Regret for Adversarial Online Control
Authors	Dylan J. Foster, Max Simchowitz
Abstract	We introduce a new algorithm for online linear-quadratic control in a known system subject to adversarial disturbances. Existing regret bounds for this setting scale as $\sqrt{T}$ unless strong stochastic assumptions are imposed on the disturbance process. We give the first algorithm with logarithmic regret for arbitrary adversarial disturbance sequences, provided the state and control costs are given by known quadratic functions. Our algorithm and analysis use a characterization for the optimal offline control law to reduce the online control problem to (delayed) online learning with approximate advantage functions. Compared to previous techniques, our approach does not need to control movement costs for the iterates, leading to logarithmic regret.
Tasks
Published	2020-02-29
URL	https://arxiv.org/abs/2003.00189v2
PDF	https://arxiv.org/pdf/2003.00189v2.pdf
PWC	https://paperswithcode.com/paper/logarithmic-regret-for-adversarial-online
Repo
Framework

Nested Barycentric Coordinate System as an Explicit Feature Map


Title	Nested Barycentric Coordinate System as an Explicit Feature Map
Authors	Lee-Ad Gottlieb, Eran Kaufman, Aryeh Kontorovich, Gabriel Nivasch, Ofir Pele
Abstract	We propose a new embedding method which is particularly well-suited for settings where the sample size greatly exceeds the ambient dimension. Our technique consists of partitioning the space into simplices and then embedding the data points into features corresponding to the simplices’ barycentric coordinates. We then train a linear classifier in the rich feature space obtained from the simplices. The decision boundary may be highly non-linear, though it is linear within each simplex (and hence piecewise-linear overall). Further, our method can approximate any convex body. We give generalization bounds based on empirical margin and a novel hybrid sample compression technique. An extensive empirical evaluation shows that our method consistently outperforms a range of popular kernel embedding methods.
Tasks
Published	2020-02-05
URL	https://arxiv.org/abs/2002.01999v1
PDF	https://arxiv.org/pdf/2002.01999v1.pdf
PWC	https://paperswithcode.com/paper/nested-barycentric-coordinate-system-as-an
Repo
Framework

Energy-Based Processes for Exchangeable Data


Title	Energy-Based Processes for Exchangeable Data
Authors	Mengjiao Yang, Bo Dai, Hanjun Dai, Dale Schuurmans
Abstract	Recently there has been growing interest in modeling sets with exchangeability such as point clouds. A shortcoming of current approaches is that they restrict the cardinality of the sets considered or can only express limited forms of distribution over unobserved data. To overcome these limitations, we introduce Energy-Based Processes (EBPs), which extend energy based models to exchangeable data while allowing neural network parameterizations of the energy function. A key advantage of these models is the ability to express more flexible distributions over sets without restricting their cardinality. We develop an efficient training procedure for EBPs that demonstrates state-of-the-art performance on a variety of tasks such as point cloud generation, classification, denoising, and image completion.
Tasks	Denoising, Point Cloud Generation
Published	2020-03-17
URL	https://arxiv.org/abs/2003.07521v1
PDF	https://arxiv.org/pdf/2003.07521v1.pdf
PWC	https://paperswithcode.com/paper/energy-based-processes-for-exchangeable-data
Repo
Framework

A Quantitative History of A.I. Research in the United States and China


Title	A Quantitative History of A.I. Research in the United States and China
Authors	Daniel Ish, Andrew Lohn, Christian Curriden
Abstract	Motivated by recent interest in the status and consequences of competition between the U.S. and China in A.I. research, we analyze 60 years of abstract data scraped from Scopus to explore and quantify trends in publications on A.I. topics from institutions affiliated with each country. We find the total volume of publications produced in both countries grows with a remarkable regularity over tens of years. While China initially experienced faster growth in publication volume than the U.S., growth slowed in China when it reached parity with the U.S. and the growth rates of both countries are now similar. We also see both countries undergo a seismic shift in topic choice around 1990, and connect this to an explosion of interest in neural network methods. Finally, we see evidence that between 2000 and 2010, China’s topic choice tended to lag that of the U.S. but that in recent decades the topic portfolios have come into closer alignment.
Tasks
Published	2020-03-05
URL	https://arxiv.org/abs/2003.02763v1
PDF	https://arxiv.org/pdf/2003.02763v1.pdf
PWC	https://paperswithcode.com/paper/a-quantitative-history-of-ai-research-in-the
Repo
Framework

The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe


Title	The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe
Authors	Georg Rehm, Katrin Marheinecke, Stefanie Hegele, Stelios Piperidis, Kalina Bontcheva, Jan Hajič, Khalid Choukri, Andrejs Vasiļjevs, Gerhard Backfried, Christoph Prinz, José Manuel Gómez Pérez, Luc Meertens, Paul Lukowicz, Josef van Genabith, Andrea Lösch, Philipp Slusallek, Morten Irgens, Patrick Gatellier, Joachim Köhler, Laure Le Bars, Dimitra Anastasiou, Albina Auksoriūtė, Núria Bel, António Branco, Gerhard Budin, Walter Daelemans, Koenraad De Smedt, Radovan Garabík, Maria Gavriilidou, Dagmar Gromann, Svetla Koeva, Simon Krek, Cvetana Krstev, Krister Lindén, Bernardo Magnini, Jan Odijk, Maciej Ogrodniczuk, Eiríkur Rögnvaldsson, Mike Rosner, Bolette Sandford Pedersen, Inguna Skadiņa, Marko Tadić, Dan Tufiş, Tamás Váradi, Kadri Vider, Andy Way, François Yvon
Abstract	Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe’s specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI, including many opportunities, synergies but also misconceptions, has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions.
Tasks
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13833v1
PDF	https://arxiv.org/pdf/2003.13833v1.pdf
PWC	https://paperswithcode.com/paper/the-european-language-technology-landscape-in
Repo
Framework

ViCE: Visual Counterfactual Explanations for Machine Learning Models


Title	ViCE: Visual Counterfactual Explanations for Machine Learning Models
Authors	Oscar Gomez, Steffen Holter, Jun Yuan, Enrico Bertini
Abstract	The continued improvements in the predictive accuracy of machine learning models have allowed for their widespread practical application. Yet, many decisions made with seemingly accurate models still require verification by domain experts. In addition, end-users of a model also want to understand the reasons behind specific decisions. Thus, the need for interpretability is increasingly paramount. In this paper we present an interactive visual analytics tool, ViCE, that generates counterfactual explanations to contextualize and evaluate model decisions. Each sample is assessed to identify the minimal set of changes needed to flip the model’s output. These explanations aim to provide end-users with personalized actionable insights with which to understand, and possibly contest or improve, automated decisions. The results are effectively displayed in a visual interface where counterfactual explanations are highlighted and interactive methods are provided for users to explore the data and model. The functionality of the tool is demonstrated by its application to a home equity line of credit dataset.
Tasks
Published	2020-03-05
URL	https://arxiv.org/abs/2003.02428v1
PDF	https://arxiv.org/pdf/2003.02428v1.pdf
PWC	https://paperswithcode.com/paper/vice-visual-counterfactual-explanations-for
Repo
Framework

Parallelization of Monte Carlo Tree Search in Continuous Domains


Title	Parallelization of Monte Carlo Tree Search in Continuous Domains
Authors	Karl Kurzer, Christoph Hörtnagl, J. Marius Zöllner
Abstract	Monte Carlo Tree Search (MCTS) has proven to be capable of solving challenging tasks in domains such as Go, chess and Atari. Previous research has developed parallel versions of MCTS, exploiting today’s multiprocessing architectures. These studies focused on versions of MCTS for the discrete case. Our work builds upon existing parallelization strategies and extends them to continuous domains. In particular, leaf parallelization and root parallelization are studied and two final selection strategies that are required to handle continuous states in root parallelization are proposed. The evaluation of the resulting parallelized continuous MCTS is conducted using a challenging cooperative multi-agent system trajectory planning task in the domain of automated vehicles.
Tasks
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13741v1
PDF	https://arxiv.org/pdf/2003.13741v1.pdf
PWC	https://paperswithcode.com/paper/parallelization-of-monte-carlo-tree-search-in
Repo
Framework

Model-based Asynchronous Hyperparameter Optimization


Title	Model-based Asynchronous Hyperparameter Optimization
Authors	Louis C. Tiao, Aaron Klein, Cedric Archambeau, Matthias Seeger
Abstract	We introduce a model-based asynchronous multi-fidelity hyperparameter optimization (HPO) method, combining strengths of asynchronous Hyperband and Gaussian process-based Bayesian optimization. Our method obtains substantial speed-ups in wall-clock time over, both, synchronous and asynchronous Hyperband, as well as a prior model-based extension of the former. Candidate hyperparameters to evaluate are selected by a novel jointly dependent Gaussian process-based surrogate model over all resource levels, allowing evaluations at one level to be informed by evaluations gathered at all others. We benchmark several covariance functions and conduct extensive experiments on hyperparameter tuning for multi-layer perceptrons on tabular data, convolutional networks on image classification, and recurrent networks on language modelling, demonstrating the benefits of our approach.
Tasks	Hyperparameter Optimization, Image Classification, Language Modelling
Published	2020-03-24
URL	https://arxiv.org/abs/2003.10865v1
PDF	https://arxiv.org/pdf/2003.10865v1.pdf
PWC	https://paperswithcode.com/paper/model-based-asynchronous-hyperparameter
Repo
Framework