Paper Group ANR 1714
Finite-Sample Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation. Fair Distributions from Biased Samples: A Maximum Entropy Optimization Framework. A New Framework for Multi-Agent Reinforcement Learning – Centralized Training and Exploration with Decentralized Execution via Policy Distillation. Multi-Agent Re …
Finite-Sample Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation
Title | Finite-Sample Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation |
Authors | Jun Sun, Gang Wang, Georgios B. Giannakis, Qinmin Yang, Zaiyue Yang |
Abstract | Motivated by the emerging use of multi-agent reinforcement learning (MARL) in engineering applications such as networked robotics, swarming drones, and sensor networks, we investigate the policy evaluation problem in a fully decentralized setting, using temporal-difference (TD) learning with linear function approximation to handle large state spaces in practice. The goal of a group of agents is to collaboratively learn the value function of a given policy from locally private rewards observed in a shared environment, through exchanging local estimates with neighbors. Despite their simplicity and widespread use, our theoretical understanding of such decentralized TD learning algorithms remains limited. Existing results were obtained based on i.i.d. data samples, or by imposing an additional' projection step to control the gradient’ bias incurred by the Markovian observations. In this paper, we provide a finite-sample analysis of the fully decentralized TD(0) learning under both i.i.d. as well as Markovian samples, and prove that all local estimates converge linearly to a small neighborhood of the optimum. The resultant error bounds are the first of its type—in the sense that they hold under the most practical assumptions —which is made possible by means of a novel multi-step Lyapunov analysis. |
Tasks | Multi-agent Reinforcement Learning |
Published | 2019-11-03 |
URL | https://arxiv.org/abs/1911.00934v2 |
https://arxiv.org/pdf/1911.00934v2.pdf | |
PWC | https://paperswithcode.com/paper/finite-sample-analysis-of-decentralized |
Repo | |
Framework | |
Fair Distributions from Biased Samples: A Maximum Entropy Optimization Framework
Title | Fair Distributions from Biased Samples: A Maximum Entropy Optimization Framework |
Authors | L. Elisa Celis, Vijay Keswani, Ozan Yildiz, Nisheeth K. Vishnoi |
Abstract | One reason for the emergence of bias in AI systems is biased data – datasets that may not be true representations of the underlying distributions – and may over or under-represent groups with respect to protected attributes such as gender or race. We consider the problem of correcting such biases and learning distributions that are “fair”, with respect to measures such as proportional representation and statistical parity, from the given samples. Our approach is based on a novel formulation of the problem of learning a fair distribution as a maximum entropy optimization problem with a given expectation vector and a prior distribution. Technically, our main contributions are: (1) a new second-order method to compute the (dual of the) maximum entropy distribution over an exponentially-sized discrete domain that turns out to be faster than previous methods, and (2) methods to construct prior distributions and expectation vectors that provably guarantee that the learned distributions satisfy a wide class of fairness criteria. Our results also come with quantitative bounds on the total variation distance between the empirical distribution obtained from the samples and the learned fair distribution. Our experimental results include testing our approach on the COMPAS dataset and showing that the fair distributions not only improve disparate impact values but when used to train classifiers only incur a small loss of accuracy. |
Tasks | |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.02164v1 |
https://arxiv.org/pdf/1906.02164v1.pdf | |
PWC | https://paperswithcode.com/paper/fair-distributions-from-biased-samples-a |
Repo | |
Framework | |
A New Framework for Multi-Agent Reinforcement Learning – Centralized Training and Exploration with Decentralized Execution via Policy Distillation
Title | A New Framework for Multi-Agent Reinforcement Learning – Centralized Training and Exploration with Decentralized Execution via Policy Distillation |
Authors | Gang Chen |
Abstract | Deep reinforcement learning (DRL) is a booming area of artificial intelligence. Many practical applications of DRL naturally involve more than one collaborative learners, making it important to study DRL in a multi-agent context. Previous research showed that effective learning in complex multi-agent systems demands for highly coordinated environment exploration among all the participating agents. Many researchers attempted to cope with this challenge through learning centralized value functions. However, the common strategy for every agent to learn their local policies directly often fail to nurture strong inter-agent collaboration and can be sample inefficient whenever agents alter their communication channels. To address these issues, we propose a new framework known as centralized training and exploration with decentralized execution via policy distillation. Guided by this framework and the maximum-entropy learning technique, we will first train agents’ policies with shared global component to foster coordinated and effective learning. Locally executable policies will be derived subsequently from the trained global policies via policy distillation. Experiments show that our new framework and algorithm can achieve significantly better performance and higher sample efficiency than a cutting-edge baseline on several multi-agent DRL benchmarks. |
Tasks | Multi-agent Reinforcement Learning |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09152v1 |
https://arxiv.org/pdf/1910.09152v1.pdf | |
PWC | https://paperswithcode.com/paper/a-new-framework-for-multi-agent-reinforcement |
Repo | |
Framework | |
Multi-Agent Reinforcement Learning for Order-dispatching via Order-Vehicle Distribution Matching
Title | Multi-Agent Reinforcement Learning for Order-dispatching via Order-Vehicle Distribution Matching |
Authors | Ming Zhou, Jiarui Jin, Weinan Zhang, Zhiwei Qin, Yan Jiao, Chenxi Wang, Guobin Wu, Yong Yu, Jieping Ye |
Abstract | Improving the efficiency of dispatching orders to vehicles is a research hotspot in online ride-hailing systems. Most of the existing solutions for order-dispatching are centralized controlling, which require to consider all possible matches between available orders and vehicles. For large-scale ride-sharing platforms, there are thousands of vehicles and orders to be matched at every second which is of very high computational cost. In this paper, we propose a decentralized execution order-dispatching method based on multi-agent reinforcement learning to address the large-scale order-dispatching problem. Different from the previous cooperative multi-agent reinforcement learning algorithms, in our method, all agents work independently with the guidance from an evaluation of the joint policy since there is no need for communication or explicit cooperation between agents. Furthermore, we use KL-divergence optimization at each time step to speed up the learning process and to balance the vehicles (supply) and orders (demand). Experiments on both the explanatory environment and real-world simulator show that the proposed method outperforms the baselines in terms of accumulated driver income (ADI) and Order Response Rate (ORR) in various traffic environments. Besides, with the support of the online platform of Didi Chuxing, we designed a hybrid system to deploy our model. |
Tasks | Multi-agent Reinforcement Learning |
Published | 2019-10-07 |
URL | https://arxiv.org/abs/1910.02591v1 |
https://arxiv.org/pdf/1910.02591v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-agent-reinforcement-learning-for-order |
Repo | |
Framework | |
Improved Covariance Matrix Estimator using Shrinkage Transformation and Random Matrix Theory
Title | Improved Covariance Matrix Estimator using Shrinkage Transformation and Random Matrix Theory |
Authors | Samruddhi Deshmukh, Amartansh Dubey |
Abstract | One of the major challenges in multivariate analysis is the estimation of population covariance matrix from sample covariance matrix (SCM). Most recent covariance matrix estimators use either shrinkage transformations or asymptotic results from Random Matrix Theory (RMT). Shrinkage techniques help in pulling extreme correlation values towards certain target values whereas tools from RMT help in removing noisy eigenvalues of SCM. Both of these techniques use different approaches to achieve a similar goal which is to remove noisy correlations and add structure to SCM to overcome the bias-variance trade-off. In this paper, we first critically evaluate the pros and cons of these two techniques and then propose an improved estimator which exploits the advantages of both by taking an optimally weighted convex combination of covariance matrices estimated by an improved shrinkage transformation and a RMT based filter. It is a generalized estimator which can adapt to changing sampling noise conditions in various datasets by performing hyperparameter optimization. We show the effectiveness of this estimator on the problem of designing a financial portfolio with minimum risk. We have chosen this problem because the complex properties of stock market data provide extreme conditions to test the robustness of a covariance estimator. Using data from four of the world’s largest stock exchanges, we show that our proposed estimator outperforms existing estimators in minimizing the out-of-sample risk of the portfolio and hence predicts population statistics more precisely. Since covariance analysis is a crucial statistical tool, this estimator can be used in a wide range of machine learning, signal processing and high dimensional pattern recognition applications. |
Tasks | Hyperparameter Optimization |
Published | 2019-12-08 |
URL | https://arxiv.org/abs/1912.03718v1 |
https://arxiv.org/pdf/1912.03718v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-covariance-matrix-estimator-using |
Repo | |
Framework | |
Attention-based Fault-tolerant Approach for Multi-agent Reinforcement Learning Systems
Title | Attention-based Fault-tolerant Approach for Multi-agent Reinforcement Learning Systems |
Authors | Mingyang Geng, Kele Xu, Yiying Li, Shuqi Liu, Bo Ding, Huaimin Wang |
Abstract | The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents. In many real-world applications, the agents can only acquire a partial view of the world. However, in realistic settings, one or more agents that show arbitrarily faulty or malicious behavior may suffice to let the current coordination mechanisms fail. In this paper, we study a practical scenario considering the security issues in the presence of agents with arbitrarily faulty or malicious behavior. Under these circumstances, learning an optimal policy becomes particularly challenging, even in the unrealistic case that an agent’s policy can be made conditional upon all other agents’ observations. To overcome these difficulties, we present an Attention-based Fault-Tolerant (FT-Attn) algorithm which selects correct and relevant information for each agent at every time-step. The multi-head attention mechanism enables the agents to learn effective communication policies through experience concurrently to the action policies. Empirical results have shown that FT-Attn beats previous state-of-the-art methods in some complex environments and can adapt to various kinds of noisy environments without tuning the complexity of the algorithm. Furthermore, FT-Attn can effectively deal with the complex situation where an agent needs to reach multiple agents’ correct observation at the same time. |
Tasks | Multi-agent Reinforcement Learning |
Published | 2019-10-05 |
URL | https://arxiv.org/abs/1910.02240v1 |
https://arxiv.org/pdf/1910.02240v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-based-fault-tolerant-approach-for |
Repo | |
Framework | |
Fine-Grained Sentence Functions for Short-Text Conversation
Title | Fine-Grained Sentence Functions for Short-Text Conversation |
Authors | Wei Bi, Jun Gao, Xiaojiang Liu, Shuming Shi |
Abstract | Sentence function is an important linguistic feature referring to a user’s purpose in uttering a specific sentence. The use of sentence function has shown promising results to improve the performance of conversation models. However, there is no large conversation dataset annotated with sentence functions. In this work, we collect a new Short-Text Conversation dataset with manually annotated SEntence FUNctions (STC-Sefun). Classification models are trained on this dataset to (i) recognize the sentence function of new data in a large corpus of short-text conversations; (ii) estimate a proper sentence function of the response given a test query. We later train conversation models conditioned on the sentence functions, including information retrieval-based and neural generative models. Experimental results demonstrate that the use of sentence functions can help improve the quality of the returned responses. |
Tasks | Information Retrieval, Short-Text Conversation |
Published | 2019-07-24 |
URL | https://arxiv.org/abs/1907.10302v3 |
https://arxiv.org/pdf/1907.10302v3.pdf | |
PWC | https://paperswithcode.com/paper/fine-grained-sentence-functions-for-short |
Repo | |
Framework | |
An Efficient Pre-processing Method to Eliminate Adversarial Effects
Title | An Efficient Pre-processing Method to Eliminate Adversarial Effects |
Authors | Hua Wang, Jie Wang, Zhaoxia Yin |
Abstract | Deep Neural Networks (DNNs) are vulnerable to adversarial examples generated by imposing subtle perturbations to inputs that lead a model to predict incorrect outputs. Currently, a large number of researches on defending adversarial examples pay little attention to the real-world applications, either with high computational complexity or poor defensive effects. Motivated by this observation, we develop an efficient preprocessing method to defend adversarial images. Specifically, before an adversarial example is fed into the model, we perform two image transformations: WebP compression, which is utilized to remove the small adversarial noises. Flip operation, which flips the image once along one side of the image to destroy the specific structure of adversarial perturbations. Finally, a de-perturbed sample is obtained and can be correctly classified by DNNs. Experimental results on ImageNet show that our method outperforms the state-of-the-art defense methods. It can effectively defend adversarial attacks while ensure only very small accuracy drop on normal images. |
Tasks | Image Classification, Speech Recognition |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.08614v2 |
https://arxiv.org/pdf/1905.08614v2.pdf | |
PWC | https://paperswithcode.com/paper/190508614 |
Repo | |
Framework | |
Deep Optics for Single-shot High-dynamic-range Imaging
Title | Deep Optics for Single-shot High-dynamic-range Imaging |
Authors | Christopher A. Metzler, Hayato Ikoma, Yifan Peng, Gordon Wetzstein |
Abstract | High-dynamic-range (HDR) imaging is crucial for many computer graphics and vision applications. Yet, acquiring HDR images with a single shot remains a challenging problem. Whereas modern deep learning approaches are successful at hallucinating plausible HDR content from a single low-dynamic-range (LDR) image, saturated scene details often cannot be faithfully recovered. Inspired by recent deep optical imaging approaches, we interpret this problem as jointly training an optical encoder and electronic decoder where the encoder is parameterized by the point spread function (PSF) of the lens, the bottleneck is the sensor with a limited dynamic range, and the decoder is a convolutional neural network (CNN). The lens surface is then jointly optimized with the CNN in a training phase; we fabricate this optimized optical element and attach it as a hardware add-on to a conventional camera during inference. In extensive simulations and with a physical prototype, we demonstrate that this end-to-end deep optical imaging approach to single-shot HDR imaging outperforms both purely CNN-based approaches and other PSF engineering approaches. |
Tasks | |
Published | 2019-08-01 |
URL | https://arxiv.org/abs/1908.00620v1 |
https://arxiv.org/pdf/1908.00620v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-optics-for-single-shot-high-dynamic |
Repo | |
Framework | |
Learning-Based Low-Rank Approximations
Title | Learning-Based Low-Rank Approximations |
Authors | Piotr Indyk, Ali Vakilian, Yang Yuan |
Abstract | We introduce a “learning-based” algorithm for the low-rank decomposition problem: given an $n \times d$ matrix $A$, and a parameter $k$, compute a rank-$k$ matrix $A'$ that minimizes the approximation loss $\A-A’_F$. The algorithm uses a training set of input matrices in order to optimize its performance. Specifically, some of the most efficient approximate algorithms for computing low-rank approximations proceed by computing a projection $SA$, where $S$ is a sparse random $m \times n$ “sketching matrix”, and then performing the singular value decomposition of $SA$. We show how to replace the random matrix $S$ with a “learned” matrix of the same sparsity to reduce the error. Our experiments show that, for multiple types of data sets, a learned sketch matrix can substantially reduce the approximation loss compared to a random matrix $S$, sometimes by one order of magnitude. We also study mixed matrices where only some of the rows are trained and the remaining ones are random, and show that matrices still offer improved performance while retaining worst-case guarantees. Finally, to understand the theoretical aspects of our approach, we study the special case of $m=1$. In particular, we give an approximation algorithm for minimizing the empirical loss, with approximation factor depending on the stable rank of matrices in the training set. We also show generalization bounds for the sketch matrix learning problem. |
Tasks | |
Published | 2019-10-30 |
URL | https://arxiv.org/abs/1910.13984v1 |
https://arxiv.org/pdf/1910.13984v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-based-low-rank-approximations |
Repo | |
Framework | |
A Variant of Gaussian Process Dynamical Systems
Title | A Variant of Gaussian Process Dynamical Systems |
Authors | Jing Zhao, Jingjing Fei, Shiliang Sun |
Abstract | In order to better model high-dimensional sequential data, we propose a collaborative multi-output Gaussian process dynamical system (CGPDS), which is a novel variant of GPDSs. The proposed model assumes that the output on each dimension is controlled by a shared global latent process and a private local latent process. Thus, the dependence among different dimensions of the sequences can be captured, and the unique characteristics of each dimension of the sequences can be maintained. For training models and making prediction, we introduce inducing points and adopt stochastic variational inference methods. |
Tasks | |
Published | 2019-06-09 |
URL | https://arxiv.org/abs/1906.03647v1 |
https://arxiv.org/pdf/1906.03647v1.pdf | |
PWC | https://paperswithcode.com/paper/a-variant-of-gaussian-process-dynamical |
Repo | |
Framework | |
Multi-Frame Cross-Entropy Training for Convolutional Neural Networks in Speech Recognition
Title | Multi-Frame Cross-Entropy Training for Convolutional Neural Networks in Speech Recognition |
Authors | Tom Sercu, Neil Mallinar |
Abstract | We introduce Multi-Frame Cross-Entropy training (MFCE) for convolutional neural network acoustic models. Recognizing that similar to RNNs, CNNs are in nature sequence models that take variable length inputs, we propose to take as input to the CNN a part of an utterance long enough that multiple labels are predicted at once, therefore getting cross-entropy loss signal from multiple adjacent frames. This increases the amount of label information drastically for small marginal computational cost. We show large WER improvements on hub5 and rt02 after training on the 2000-hour Switchboard benchmark. |
Tasks | Speech Recognition |
Published | 2019-07-29 |
URL | https://arxiv.org/abs/1907.13121v1 |
https://arxiv.org/pdf/1907.13121v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-frame-cross-entropy-training-for |
Repo | |
Framework | |
Prototype Rectification for Few-Shot Learning
Title | Prototype Rectification for Few-Shot Learning |
Authors | Jinlu Liu, Liang Song, Yongqiang Qin |
Abstract | Few-shot learning requires to recognize novel classes with scarce labeled data. Prototypical network is useful in existing researches, however, training on narrow-size distribution of scarce data usually tends to get biased prototypes. In this paper, we figure out two key influencing factors of the process: the intra-class bias and the cross-class bias. We then propose a simple yet effective approach for prototype rectification in transductive setting. The approach utilizes label propagation to diminish the intra-class bias and feature shifting to diminish the cross-class bias. We also conduct theoretical analysis to derive its rationality as well as the lower bound of the performance. Effectiveness is shown on two few-shot benchmarks. Notably, our approach achieves state-of-the-art performance on both miniImageNet (70.31% on 1-shot and 81.89% on 5-shot) and tieredImageNet (78.74% on 1-shot and 86.92% on 5-shot). |
Tasks | Few-Shot Image Classification, Few-Shot Learning |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.10713v2 |
https://arxiv.org/pdf/1911.10713v2.pdf | |
PWC | https://paperswithcode.com/paper/prototype-rectification-for-few-shot-learning |
Repo | |
Framework | |
Everything old is new again: A multi-view learning approach to learning using privileged information and distillation
Title | Everything old is new again: A multi-view learning approach to learning using privileged information and distillation |
Authors | Weiran Wang |
Abstract | We adopt a multi-view approach for analyzing two knowledge transfer settings—learning using privileged information (LUPI) and distillation—in a common framework. Under reasonable assumptions about the complexities of hypothesis spaces, and being optimistic about the expected loss achievable by the student (in distillation) and a transformed teacher predictor (in LUPI), we show that encouraging agreement between the teacher and the student leads to reduced search space. As a result, improved convergence rate can be obtained with regularized empirical risk minimization. |
Tasks | MULTI-VIEW LEARNING, Transfer Learning |
Published | 2019-03-08 |
URL | http://arxiv.org/abs/1903.03694v1 |
http://arxiv.org/pdf/1903.03694v1.pdf | |
PWC | https://paperswithcode.com/paper/everything-old-is-new-again-a-multi-view |
Repo | |
Framework | |
Learning Modulated Loss for Rotated Object Detection
Title | Learning Modulated Loss for Rotated Object Detection |
Authors | Wen Qian, Xue Yang, Silong Peng, Yue Guo, Junchi Yan |
Abstract | Popular rotated detection methods usually use five parameters (coordinates of the central point, width, height, and rotation angle) to describe the rotated bounding box and l1-loss as the loss function. In this paper, we argue that the aforementioned integration can cause training instability and performance degeneration, due to the loss discontinuity resulted from the inherent periodicity of angles and the associated sudden exchange of width and height. This problem is further pronounced given the regression inconsistency among five parameters with different measurement units. We refer to the above issues as rotation sensitivity error (RSE) and propose a modulated rotation loss to dismiss the loss discontinuity. Our new loss is combined with the eight-parameter regression to further solve the problem of inconsistent parameter regression. Experiments show the state-of-art performances of our method on the public aerial image benchmark DOTA and UCAS-AOD. Its generalization abilities are also verified on ICDAR2015, HRSC2016, and FDDB. Qualitative improvements can be seen in Fig 1, and the source code will be released with the publication of the paper. |
Tasks | Object Detection |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08299v3 |
https://arxiv.org/pdf/1911.08299v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-modulated-loss-for-rotated-object |
Repo | |
Framework | |