April 2, 2020

3280 words 16 mins read

Paper Group ANR 229

Scalable Approximate Inference and Some Applications. A Human-Centered Review of the Algorithms used within the U.S. Child Welfare System. Causal Interaction Trees: Tree-Based Subgroup Identification for Observational Data. Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks. Channel Assignment in Uplink Wireless Communicatio …

Scalable Approximate Inference and Some Applications


Title	Scalable Approximate Inference and Some Applications
Authors	Jun Han
Abstract	Approximate inference in probability models is a fundamental task in machine learning. Approximate inference provides powerful tools to Bayesian reasoning, decision making, and Bayesian deep learning. The main goal is to estimate the expectation of interested functions w.r.t. a target distribution. When it comes to high dimensional probability models and large datasets, efficient approximate inference becomes critically important. In this thesis, we propose a new framework for approximate inference, which combines the advantages of these three frameworks and overcomes their limitations. Our proposed four algorithms are motivated by the recent computational progress of Stein’s method. Our proposed algorithms are applied to continuous and discrete distributions under the setting when the gradient information of the target distribution is available or unavailable. Theoretical analysis is provided to prove the convergence of our proposed algorithms. Our adaptive IS algorithm iteratively improves the importance proposal by functionally decreasing the KL divergence between the updated proposal and the target. When the gradient of the target is unavailable, our proposed sampling algorithm leverages the gradient of a surrogate model and corrects induced bias with importance weights, which significantly outperforms other gradient-free sampling algorithms. In addition, our theoretical results enable us to perform the goodness-of-fit test on discrete distributions. At the end of the thesis, we propose an importance-weighted method to efficiently aggregate local models in distributed learning with one-shot communication. Results on simulated and real datasets indicate the statistical efficiency and wide applicability of our algorithm.
Tasks	Decision Making
Published	2020-03-07
URL	https://arxiv.org/abs/2003.03515v1
PDF	https://arxiv.org/pdf/2003.03515v1.pdf
PWC	https://paperswithcode.com/paper/scalable-approximate-inference-and-some
Repo
Framework

A Human-Centered Review of the Algorithms used within the U.S. Child Welfare System


Title	A Human-Centered Review of the Algorithms used within the U.S. Child Welfare System
Authors	Devansh Saxena, Karla Badillo-Urquiola, Pamela J. Wisniewski, Shion Guha
Abstract	The U.S. Child Welfare System (CWS) is charged with improving outcomes for foster youth; yet, they are overburdened and underfunded. To overcome this limitation, several states have turned towards algorithmic decision-making systems to reduce costs and determine better processes for improving CWS outcomes. Using a human-centered algorithmic design approach, we synthesize 50 peer-reviewed publications on computational systems used in CWS to assess how they were being developed, common characteristics of predictors used, as well as the target outcomes. We found that most of the literature has focused on risk assessment models but does not consider theoretical approaches (e.g., child-foster parent matching) nor the perspectives of caseworkers (e.g., case notes). Therefore, future algorithms should strive to be context-aware and theoretically robust by incorporating salient factors identified by past research. We provide the HCI community with research avenues for developing human-centered algorithms that redirect attention towards more equitable outcomes for CWS.
Tasks	Decision Making
Published	2020-03-07
URL	https://arxiv.org/abs/2003.03541v1
PDF	https://arxiv.org/pdf/2003.03541v1.pdf
PWC	https://paperswithcode.com/paper/a-human-centered-review-of-the-algorithms
Repo
Framework

Causal Interaction Trees: Tree-Based Subgroup Identification for Observational Data


Title	Causal Interaction Trees: Tree-Based Subgroup Identification for Observational Data
Authors	Jiabei Yang, Issa J. Dahabreh, Jon A. Steingrimsson
Abstract	We propose Causal Interaction Trees for identifying subgroups of participants that have enhanced treatment effects using observational data. We extend the Classification and Regression Tree algorithm by using splitting criteria that focus on maximizing between-group treatment effect heterogeneity based on subgroup-specific treatment effect estimators to dictate decision-making in the algorithm. We derive properties of three subgroup-specific treatment effect estimators that account for the observational nature of the data – inverse probability weighting, g-formula and doubly robust estimators. We study the performance of the proposed algorithms using simulations and implement the algorithms in an observational study that evaluates the effectiveness of right heart catheterization on critically ill patients.
Tasks	Decision Making
Published	2020-03-06
URL	https://arxiv.org/abs/2003.03042v1
PDF	https://arxiv.org/pdf/2003.03042v1.pdf
PWC	https://paperswithcode.com/paper/causal-interaction-trees-tree-based-subgroup
Repo
Framework

Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks


Title	Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks
Authors	Liyuan Zheng, Yuanyuan Shi, Lillian J. Ratliff, Baosen Zhang
Abstract	This paper focuses on finding reinforcement learning policies for control systems with hard state and action constraints. Despite its success in many domains, reinforcement learning is challenging to apply to problems with hard constraints, especially if both the state variables and actions are constrained. Previous works seeking to ensure constraint satisfaction, or safety, have focused on adding a projection step to a learned policy. Yet, this approach requires solving an optimization problem at every policy execution step, which can lead to significant computational costs. To tackle this problem, this paper proposes a new approach, termed Vertex Networks (VNs), with guarantees on safety during exploration and on learned control policies by incorporating the safety constraints into the policy network architecture. Leveraging the geometric property that all points within a convex set can be represented as the convex combination of its vertices, the proposed algorithm first learns the convex combination weights and then uses these weights along with the pre-calculated vertices to output an action. The output action is guaranteed to be safe by construction. Numerical examples illustrate that the proposed VN algorithm outperforms vanilla reinforcement learning in a variety of benchmark control tasks.
Tasks
Published	2020-03-20
URL	https://arxiv.org/abs/2003.09488v1
PDF	https://arxiv.org/pdf/2003.09488v1.pdf
PWC	https://paperswithcode.com/paper/safe-reinforcement-learning-of-control-affine
Repo
Framework

Channel Assignment in Uplink Wireless Communication using Machine Learning Approach


Title	Channel Assignment in Uplink Wireless Communication using Machine Learning Approach
Authors	Guangyu Jia, Zhaohui Yang, Hak-Keung Lam, Jianfeng Shi, Mohammad Shikh-Bahaei
Abstract	This letter investigates a channel assignment problem in uplink wireless communication systems. Our goal is to maximize the sum rate of all users subject to integer channel assignment constraints. A convex optimization based algorithm is provided to obtain the optimal channel assignment, where the closed-form solution is obtained in each step. Due to high computational complexity in the convex optimization based algorithm, machine learning approaches are employed to obtain computational efficient solutions. More specifically, the data are generated by using convex optimization based algorithm and the original problem is converted to a regression problem which is addressed by the integration of convolutional neural networks (CNNs), feed-forward neural networks (FNNs), random forest and gated recurrent unit networks (GRUs). The results demonstrate that the machine learning method largely reduces the computation time with slightly compromising of prediction accuracy.
Tasks
Published	2020-01-12
URL	https://arxiv.org/abs/2001.03952v1
PDF	https://arxiv.org/pdf/2001.03952v1.pdf
PWC	https://paperswithcode.com/paper/channel-assignment-in-uplink-wireless
Repo
Framework

Robust Reinforcement Learning via Adversarial training with Langevin Dynamics


Title	Robust Reinforcement Learning via Adversarial training with Langevin Dynamics
Authors	Parameswaran Kamalaruban, Yu-Ting Huang, Ya-Ping Hsieh, Paul Rolland, Cheng Shi, Volkan Cevher
Abstract	We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents. Leveraging the powerful Stochastic Gradient Langevin Dynamics, we present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy gradient method. Our algorithm consistently outperforms existing baselines, in terms of generalization across different training and testing conditions, on several MuJoCo environments. Our experiments also show that, even for objective functions that entirely ignore potential environmental shifts, our sampling approach remains highly robust in comparison to standard RL algorithms.
Tasks
Published	2020-02-14
URL	https://arxiv.org/abs/2002.06063v1
PDF	https://arxiv.org/pdf/2002.06063v1.pdf
PWC	https://paperswithcode.com/paper/robust-reinforcement-learning-via-adversarial-1
Repo
Framework

Learning Interpretable Models in the Property Specification Language


Title	Learning Interpretable Models in the Property Specification Language
Authors	Rajarshi Roy, Dana Fisman, Daniel Neider
Abstract	We address the problem of learning human-interpretable descriptions of a complex system from a finite set of positive and negative examples of its behavior. In contrast to most of the recent work in this area, which focuses on descriptions expressed in Linear Temporal Logic (LTL), we develop a learning algorithm for formulas in the IEEE standard temporal logic PSL (Property Specification Language). Our work is motivated by the fact that many natural properties, such as an event happening at every n-th point in time, cannot be expressed in LTL, whereas it is easy to express such properties in PSL. Moreover, formulas in PSL can be more succinct and easier to interpret (due to the use of regular expressions in PSL formulas) than formulas in LTL. Our learning algorithm builds on top of an existing algorithm for learning LTL formulas. Roughly speaking, our algorithm reduces the learning task to a constraint satisfaction problem in propositional logic and then uses a SAT solver to search for a solution in an incremental fashion. We have implemented our algorithm and performed a comparative study between the proposed method and the existing LTL learning algorithm. Our results illustrate the effectiveness of the proposed approach to provide succinct human-interpretable descriptions from examples.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03668v1
PDF	https://arxiv.org/pdf/2002.03668v1.pdf
PWC	https://paperswithcode.com/paper/learning-interpretable-models-in-the-property
Repo
Framework

Federated Learning in the Sky: Joint Power Allocation and Scheduling with UAV Swarms


Title	Federated Learning in the Sky: Joint Power Allocation and Scheduling with UAV Swarms
Authors	Tengchan Zeng, Omid Semiari, Mohammad Mozaffari, Mingzhe Chen, Walid Saad, Mehdi Bennis
Abstract	Unmanned aerial vehicle (UAV) swarms must exploit machine learning (ML) in order to execute various tasks ranging from coordinated trajectory planning to cooperative target recognition. However, due to the lack of continuous connections between the UAV swarm and ground base stations (BSs), using centralized ML will be challenging, particularly when dealing with a large volume of data. In this paper, a novel framework is proposed to implement distributed federated learning (FL) algorithms within a UAV swarm that consists of a leading UAV and several following UAVs. Each following UAV trains a local FL model based on its collected data and then sends this trained local model to the leading UAV who will aggregate the received models, generate a global FL model, and transmit it to followers over the intra-swarm network. To identify how wireless factors, like fading, transmission delay, and UAV antenna angle deviations resulting from wind and mechanical vibrations, impact the performance of FL, a rigorous convergence analysis for FL is performed. Then, a joint power allocation and scheduling design is proposed to optimize the convergence rate of FL while taking into account the energy consumption during convergence and the delay requirement imposed by the swarm’s control system. Simulation results validate the effectiveness of the FL convergence analysis and show that the joint design strategy can reduce the number of communication rounds needed for convergence by as much as 35% compared with the baseline design.
Tasks
Published	2020-02-19
URL	https://arxiv.org/abs/2002.08196v1
PDF	https://arxiv.org/pdf/2002.08196v1.pdf
PWC	https://paperswithcode.com/paper/federated-learning-in-the-sky-joint-power
Repo
Framework

How to trap a gradient flow


Title	How to trap a gradient flow
Authors	Sébastien Bubeck, Dan Mikulincer
Abstract	We consider the problem of finding an $\varepsilon$-approximate stationary point of a smooth function on a compact domain of $\mathbb{R}^d$. In contrast with dimension-free approaches such as gradient descent, we focus here on the case where $d$ is finite, and potentially small. This viewpoint was explored in 1993 by Vavasis, who proposed an algorithm which, for any fixed finite dimension $d$, improves upon the $O(1/\varepsilon^2)$ oracle complexity of gradient descent. For example for $d=2$, Vavasis’ approach obtains the complexity $O(1/\varepsilon)$. Moreover for $d=2$ he also proved a lower bound of $\Omega(1/\sqrt{\varepsilon})$ for deterministic algorithms (we extend this result to randomized algorithms). Our main contribution is an algorithm, which we call gradient flow trapping (GFT), and the analysis of its oracle complexity. In dimension $d=2$, GFT closes the gap with Vavasis’ lower bound (up to a logarithmic factor), as we show that it has complexity $O\left(\sqrt{\frac{\log(1/\varepsilon)}{\varepsilon}}\right)$. In dimension $d=3$, we show a complexity of $O\left(\frac{\log(1/\varepsilon)}{\varepsilon}\right)$, improving upon Vavasis’ $O\left(1 / \varepsilon^{1.2} \right)$. In higher dimensions, GFT has the remarkable property of being a logarithmic parallel depth strategy, in stark contrast with the polynomial depth of gradient descent or Vavasis’ algorithm. In this higher dimensional regime, the total work of GFT improves quadratically upon the only other known polylogarithmic depth strategy for this problem, namely naive grid search.
Tasks
Published	2020-01-09
URL	https://arxiv.org/abs/2001.02968v2
PDF	https://arxiv.org/pdf/2001.02968v2.pdf
PWC	https://paperswithcode.com/paper/how-to-trap-a-gradient-flow
Repo
Framework

A random-batch Monte Carlo method for many-body systems with singular kernels


Title	A random-batch Monte Carlo method for many-body systems with singular kernels
Authors	Lei Li, Zhenli Xu, Yue Zhao
Abstract	We propose a fast potential splitting Markov Chain Monte Carlo method which costs $O(1)$ time each step for sampling from equilibrium distributions (Gibbs measures) corresponding to particle systems with singular interacting kernels. We decompose the interacting potential into two parts, one is of long range but is smooth, and the other one is of short range but may be singular. To displace a particle, we first evolve a selected particle using the stochastic differential equation (SDE) under the smooth part with the idea of random batches, as commonly used in stochastic gradient Langevin dynamics. Then, we use the short range part to do a Metropolis rejection. Different from the classical Langevin dynamics, we only run the SDE dynamics with random batch for a short duration of time so that the cost in the first step is $O(p)$, where $p$ is the batch size. The cost of the rejection step is $O(1)$ since the interaction used is of short range. We justify the proposed random-batch Monte Carlo method, which combines the random batch and splitting strategies, both in theory and with numerical experiments. While giving comparable results for typical examples of the Dyson Brownian motion and Lennard-Jones fluids, our method can save more time when compared to the classical Metropolis-Hastings algorithm.
Tasks
Published	2020-03-14
URL	https://arxiv.org/abs/2003.06554v1
PDF	https://arxiv.org/pdf/2003.06554v1.pdf
PWC	https://paperswithcode.com/paper/a-random-batch-monte-carlo-method-for-many
Repo
Framework

Deep Learning for Financial Applications : A Survey


Title	Deep Learning for Financial Applications : A Survey
Authors	Ahmet Murat Ozbayoglu, Mehmet Ugur Gudelek, Omer Berat Sezer
Abstract	Computational intelligence in finance has been a very popular topic for both academia and financial industry in the last few decades. Numerous studies have been published resulting in various models. Meanwhile, within the Machine Learning (ML) field, Deep Learning (DL) started getting a lot of attention recently, mostly due to its outperformance over the classical models. Lots of different implementations of DL exist today, and the broad interest is continuing. Finance is one particular area where DL models started getting traction, however, the playfield is wide open, a lot of research opportunities still exist. In this paper, we tried to provide a state-of-the-art snapshot of the developed DL models for financial applications, as of today. We not only categorized the works according to their intended subfield in finance but also analyzed them based on their DL models. In addition, we also aimed at identifying possible future implementations and highlighted the pathway for the ongoing research within the field.
Tasks
Published	2020-02-09
URL	https://arxiv.org/abs/2002.05786v1
PDF	https://arxiv.org/pdf/2002.05786v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-financial-applications-a
Repo
Framework

Mean-Field Analysis of Two-Layer Neural Networks: Non-Asymptotic Rates and Generalization Bounds


Title	Mean-Field Analysis of Two-Layer Neural Networks: Non-Asymptotic Rates and Generalization Bounds
Authors	Zixiang Chen, Yuan Cao, Quanquan Gu, Tong Zhang
Abstract	A recent line of work in deep learning theory has utilized the mean-field analysis to demonstrate the global convergence of noisy (stochastic) gradient descent for training over-parameterized two-layer neural networks. However, existing results in the mean-field setting do not provide the convergence rate of neural network training, and the generalization error bound is largely missing. In this paper, we provide a mean-field analysis in a generalized neural tangent kernel regime, and show that noisy gradient descent with weight decay can still exhibit a “kernel-like” behavior. This implies that the training loss converges linearly up to a certain accuracy in such regime. We also establish a generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay. Our results shed light on the connection between mean field analysis and the neural tangent kernel based analysis.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.04026v1
PDF	https://arxiv.org/pdf/2002.04026v1.pdf
PWC	https://paperswithcode.com/paper/mean-field-analysis-of-two-layer-neural
Repo
Framework

Delay-Adaptive Learning in Generalized Linear Contextual Bandits


Title	Delay-Adaptive Learning in Generalized Linear Contextual Bandits
Authors	Jose Blanchet, Renyuan Xu, Zhengyuan Zhou
Abstract	In this paper, we consider online learning in generalized linear contextual bandits where rewards are not immediately observed. Instead, rewards are available to the decision-maker only after some delay, which is unknown and stochastic. We study the performance of two well-known algorithms adapted to this delayed setting: one based on upper confidence bounds, and the other based on Thompson sampling. We describe modifications on how these two algorithms should be adapted to handle delays and give regret characterizations for both algorithms. Our results contribute to the broad landscape of contextual bandits literature by establishing that both algorithms can be made to be robust to delays, thereby helping clarify and reaffirm the empirical success of these two algorithms, which are widely deployed in modern recommendation engines.
Tasks	Multi-Armed Bandits
Published	2020-03-11
URL	https://arxiv.org/abs/2003.05174v1
PDF	https://arxiv.org/pdf/2003.05174v1.pdf
PWC	https://paperswithcode.com/paper/delay-adaptive-learning-in-generalized-linear
Repo
Framework

Online Residential Demand Response via Contextual Multi-Armed Bandits


Title	Online Residential Demand Response via Contextual Multi-Armed Bandits
Authors	Xin Chen, Yutong Nie, Na Li
Abstract	Residential load demands have huge potential to be exploited to enhance the efficiency and reliability of power system operation through demand response (DR) programs. This paper studies the strategies to select the right customers for residential DR from the perspective of load service entities (LSEs). One of the main challenges to implement residential DR is that customer responses to the incentives are uncertain and unknown, which are influenced by various personal and environmental factors. To address this challenge, this paper employs the contextual multi-armed bandit (CMAB) method to model the optimal customer selection problem with uncertainty. Based on Thompson sampling framework, an online learning and decision-making algorithm is proposed to learn customer behaviors and select appropriate customers for load reduction. This algorithm takes the contextual information into consideration and is applicable to complicated DR settings. Numerical simulations are performed to demonstrate the efficiency and learning effectiveness of the proposed algorithm.
Tasks	Decision Making, Multi-Armed Bandits
Published	2020-03-07
URL	https://arxiv.org/abs/2003.03627v1
PDF	https://arxiv.org/pdf/2003.03627v1.pdf
PWC	https://paperswithcode.com/paper/online-residential-demand-response-via
Repo
Framework


Title	MVP: Unified Motion and Visual Self-Supervised Learning for Large-Scale Robotic Navigation
Authors	Marvin Chancán, Michael Milford
Abstract	Autonomous navigation emerges from both motion and local visual perception in real-world environments. However, most successful robotic motion estimation methods (e.g. VO, SLAM, SfM) and vision systems (e.g. CNN, visual place recognition-VPR) are often separately used for mapping and localization tasks. Conversely, recent reinforcement learning (RL) based methods for visual navigation rely on the quality of GPS data reception, which may not be reliable when directly using it as ground truth across multiple, month-spaced traversals in large environments. In this paper, we propose a novel motion and visual perception approach, dubbed MVP, that unifies these two sensor modalities for large-scale, target-driven navigation tasks. Our MVP-based method can learn faster, and is more accurate and robust to both extreme environmental changes and poor GPS data than corresponding vision-only navigation methods. MVP temporally incorporates compact image representations, obtained using VPR, with optimized motion estimation data, including but not limited to those from VO or optimized radar odometry (RO), to efficiently learn self-supervised navigation policies via RL. We evaluate our method on two large real-world datasets, Oxford Robotcar and Nordland Railway, over a range of weather (e.g. overcast, night, snow, sun, rain, clouds) and seasonal (e.g. winter, spring, fall, summer) conditions using the new CityLearn framework; an interactive environment for efficiently training navigation agents. Our experimental results, on traversals of the Oxford RobotCar dataset with no GPS data, show that MVP can achieve 53% and 93% navigation success rate using VO and RO, respectively, compared to 7% for a vision-only method. We additionally report a trade-off between the RL success rate and the motion estimation precision.
Tasks	Autonomous Navigation, Motion Estimation, Visual Navigation, Visual Place Recognition
Published	2020-03-02
URL	https://arxiv.org/abs/2003.00667v1
PDF	https://arxiv.org/pdf/2003.00667v1.pdf
PWC	https://paperswithcode.com/paper/mvp-unified-motion-and-visual-self-supervised
Repo
Framework