April 2, 2020

3454 words 17 mins read

Paper Group ANR 194

Paper Group ANR 194

Bounded Regret for Finitely Parameterized Multi-Armed Bandits. Fair Bandit Learning with Delayed Impact of Actions. Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits. Developing a gender classification approach in human face images using modified local binary patterns and tani-moto based nearest neighbor algorithm. Online L …

Bounded Regret for Finitely Parameterized Multi-Armed Bandits

Title Bounded Regret for Finitely Parameterized Multi-Armed Bandits
Authors Kishan Panaganti, Dileep Kalathil
Abstract We consider the problem of finitely parameterized multi-armed bandits where the model of the underlying stochastic environment can be characterized based on a common unknown parameter. The true parameter is unknown to the learning agent. However, the set of possible parameters, which is finite, is known a priori. We propose an algorithm that is simple and easy to implement, which we call FP-UCB algorithm, which uses the information about the underlying parameter set for faster learning. In particular, we show that the FP-UCB algorithm achieves a bounded regret under some structural condition on the underlying parameter set. We also show that, if the underlying parameter set does not satisfy the necessary structural condition, the FP-UCB algorithm achieves a logarithmic regret, but with a smaller preceding constant compared to the standard UCB algorithm. We also validate the superior performance of the FP-UCB algorithm through extensive numerical simulations.
Tasks Multi-Armed Bandits
Published 2020-03-03
URL https://arxiv.org/abs/2003.01328v3
PDF https://arxiv.org/pdf/2003.01328v3.pdf
PWC https://paperswithcode.com/paper/bounded-regret-for-finitely-parameterized
Repo
Framework

Fair Bandit Learning with Delayed Impact of Actions

Title Fair Bandit Learning with Delayed Impact of Actions
Authors Wei Tang, Chien-Ju Ho, Yang Liu
Abstract Algorithmic fairness has been studied mostly in a static setting where the implicit assumptions are that the frequencies of historically made decisions do not impact the problem structure in subsequent future. However, for example, the capability to pay back a loan for people in a certain group might depend on historically how frequently that group has been approved loan applications. If banks keep rejecting loan applications to people in a disadvantaged group, it could create a feedback loop and further damage the chance of getting loans for people in that group. This challenge has been noted in several recent works but is under-explored in a more generic sequential learning setting. In this paper, we formulate this delayed and long-term impact of actions within the context of multi-armed bandits (MAB). We generalize the classical bandit setting to encode the dependency of this action “bias” due to the history of the learning. Our goal is to learn to maximize the collected utilities over time while satisfying fairness constraints imposed over arms’ utilities, which again depend on the decision they have received. We propose an algorithm that achieves a regret of $\tilde{\mathcal{O}}(KT^{2/3})$ and show a matching regret lower bound of $\Omega(KT^{2/3})$, where $K$ is the number of arms and $T$ denotes the learning horizon. Our results complement the bandit literature by adding techniques to deal with actions with long-term impacts and have implications in designing fair algorithms.
Tasks Multi-Armed Bandits
Published 2020-02-24
URL https://arxiv.org/abs/2002.10316v1
PDF https://arxiv.org/pdf/2002.10316v1.pdf
PWC https://paperswithcode.com/paper/fair-bandit-learning-with-delayed-impact-of
Repo
Framework

Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits

Title Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits
Authors Peter Landgren, Vaibhav Srivastava, Naomi Ehrich Leonard
Abstract We study a distributed decision-making problem in which multiple agents face the same multi-armed bandit (MAB), and each agent makes sequential choices among arms to maximize its own individual reward. The agents cooperate by sharing their estimates over a fixed communication graph. We consider an unconstrained reward model in which two or more agents can choose the same arm and collect independent rewards. And we consider a constrained reward model in which agents that choose the same arm at the same time receive no reward. We design a dynamic, consensus-based, distributed estimation algorithm for cooperative estimation of mean rewards at each arm. We leverage the estimates from this algorithm to develop two distributed algorithms: coop-UCB2 and coop-UCB2-selective-learning, for the unconstrained and constrained reward models, respectively. We show that both algorithms achieve group performance close to the performance of a centralized fusion center. Further, we investigate the influence of the communication graph structure on performance. We propose a novel graph explore-exploit index that predicts the relative performance of groups in terms of the communication graph, and we propose a novel nodal explore-exploit centrality index that predicts the relative performance of agents in terms of the agent locations in the communication graph.
Tasks Decision Making, Multi-Armed Bandits
Published 2020-03-03
URL https://arxiv.org/abs/2003.01312v1
PDF https://arxiv.org/pdf/2003.01312v1.pdf
PWC https://paperswithcode.com/paper/distributed-cooperative-decision-making-in-1
Repo
Framework

Developing a gender classification approach in human face images using modified local binary patterns and tani-moto based nearest neighbor algorithm

Title Developing a gender classification approach in human face images using modified local binary patterns and tani-moto based nearest neighbor algorithm
Authors Shervan Fekri-Ershad
Abstract Human identification is a much attention problem in computer vision. Gender classification plays an important role in human identification as preprocess step. So far, various methods have been proposed to solve this problem. Absolutely, classification accuracy is the main challenge for researchers in gender classification. But, some challenges such as rotation, gray scale variations, pose, illumination changes may be occurred in smart phone image capturing. In this respect, a multi step approach is proposed in this paper to classify genders in human face images based on improved local binary patters (MLBP). LBP is a texture descriptor, which extract local contrast and local spatial structure information. Some issues such as noise sensitivity, rotation sensitivity and low discriminative features can be considered as disadvantages of the basic LBP. MLBP handle disadvantages using a new theory to categorize extracted binary patterns of basic LBP. The proposed approach includes two stages. First of all, a feature vector is extracted for human face images based on MLBP. Next, non linear classifiers can be used to classify gender. In this paper nearest neighborhood classifier is evaluated based on Tani-Moto metric as distance measure. In the result part, two databases, self-collected and ICPR are used as human face database. Results are compared by some state-ofthe-art algorithms in this literature that shows the high quality of the proposed approach in terms of accuracy rate. Some of other main advantages of the proposed approach are rotation invariant, low noise sensitivity, size invariant and low computational complexity. The proposed approach decreases the computational complexity of smartphone applications because of reducing the number of database comparisons. It can also improve performance of the synchronous applications in the smarphones because of memory and CPU usage reduction.
Tasks
Published 2020-01-29
URL https://arxiv.org/abs/2001.10966v1
PDF https://arxiv.org/pdf/2001.10966v1.pdf
PWC https://paperswithcode.com/paper/developing-a-gender-classification-approach
Repo
Framework

Online Learning in Contextual Bandits using Gated Linear Networks

Title Online Learning in Contextual Bandits using Gated Linear Networks
Authors Eren Sezener, Marcus Hutter, David Budden, Jianan Wang, Joel Veness
Abstract We introduce a new and completely online contextual bandit algorithm called Gated Linear Contextual Bandits (GLCB). This algorithm is based on Gated Linear Networks (GLNs), a recently introduced deep learning architecture with properties well-suited to the online setting. Leveraging data-dependent gating properties of the GLN we are able to estimate prediction uncertainty with effectively zero algorithmic overhead. We empirically evaluate GLCB compared to 9 state-of-the-art algorithms that leverage deep neural networks, on a standard benchmark suite of discrete and continuous contextual bandit problems. GLCB obtains median first-place despite being the only online method, and we further support these results with a theoretical study of its convergence properties.
Tasks Multi-Armed Bandits
Published 2020-02-21
URL https://arxiv.org/abs/2002.11611v1
PDF https://arxiv.org/pdf/2002.11611v1.pdf
PWC https://paperswithcode.com/paper/online-learning-in-contextual-bandits-using
Repo
Framework

Object-Adaptive LSTM Network for Real-time Visual Tracking with Adversarial Data Augmentation

Title Object-Adaptive LSTM Network for Real-time Visual Tracking with Adversarial Data Augmentation
Authors Yihan Du, Yan Yan, Si Chen, Yang Hua
Abstract In recent years, deep learning based visual tracking methods have obtained great success owing to the powerful feature representation ability of Convolutional Neural Networks (CNNs). Among these methods, classification-based tracking methods exhibit excellent performance while their speeds are heavily limited by the expensive computation for massive proposal feature extraction. In contrast, matching-based tracking methods (such as Siamese networks) possess remarkable speed superiority. However, the absence of online updating renders these methods unadaptable to significant object appearance variations. In this paper, we propose a novel real-time visual tracking method, which adopts an object-adaptive LSTM network to effectively capture the video sequential dependencies and adaptively learn the object appearance variations. For high computational efficiency, we also present a fast proposal selection strategy, which utilizes the matching-based tracking method to pre-estimate dense proposals and selects high-quality ones to feed to the LSTM network for classification. This strategy efficiently filters out some irrelevant proposals and avoids the redundant computation for feature extraction, which enables our method to operate faster than conventional classification-based tracking methods. In addition, to handle the problems of sample inadequacy and class imbalance during online tracking, we adopt a data augmentation technique based on the Generative Adversarial Network (GAN) to facilitate the training of the LSTM network. Extensive experiments on four visual tracking benchmarks demonstrate the state-of-the-art performance of our method in terms of both tracking accuracy and speed, which exhibits great potentials of recurrent structures for visual tracking.
Tasks Data Augmentation, Real-Time Visual Tracking, Visual Tracking
Published 2020-02-07
URL https://arxiv.org/abs/2002.02598v1
PDF https://arxiv.org/pdf/2002.02598v1.pdf
PWC https://paperswithcode.com/paper/object-adaptive-lstm-network-for-real-time
Repo
Framework

Active Learning over DNN: Automated Engineering Design Optimization for Fluid Dynamics Based on Self-Simulated Dataset

Title Active Learning over DNN: Automated Engineering Design Optimization for Fluid Dynamics Based on Self-Simulated Dataset
Authors Yang Chen
Abstract Optimizing fluid-dynamic performance is an important engineering task. Traditionally, experts design shapes based on empirical estimations and verify them through expensive experiments. This costly process, both in terms of time and space, may only explore a limited number of shapes and lead to sub-optimal designs. In this research, a test-proven deep learning architecture is applied to predict the performance under various restrictions and search for better shapes by optimizing the learned prediction function. The major challenge is the vast amount of data points Deep Neural Network (DNN) demands, which is improvident to simulate. To remedy this drawback, a Frequentist active learning is used to explore regions of the output space that DNN predicts promising. This operation reduces the number of data samples demanded from ~8000 to 625. The final stage, a user interface, made the model capable of optimizing with given user input of minimum area and viscosity. Flood fill is used to define a boundary area function so that the optimal shape does not bypass the minimum area. Stochastic Gradient Langevin Dynamics (SGLD) is employed to make sure the ultimate shape is optimized while circumventing the required area. Jointly, shapes with extremely low drags are found explored by a practical user interface with no human domain knowledge and modest computation overhead.
Tasks Active Learning
Published 2020-01-18
URL https://arxiv.org/abs/2001.08075v2
PDF https://arxiv.org/pdf/2001.08075v2.pdf
PWC https://paperswithcode.com/paper/active-learning-over-dnn-automated
Repo
Framework

Resolution Adaptive Networks for Efficient Inference

Title Resolution Adaptive Networks for Efficient Inference
Authors Le Yang, Yizeng Han, Xi Chen, Shiji Song, Jifeng Dai, Gao Huang
Abstract Adaptive inference is an effective mechanism to achieve a dynamic tradeoff between accuracy and computational cost in deep networks. Existing works mainly exploit architecture redundancy in network depth or width. In this paper, we focus on spatial redundancy of input samples and propose a novel Resolution Adaptive Network (RANet), which is inspired by the intuition that low-resolution representations are sufficient for classifying “easy” inputs containing large objects with prototypical features, while only some “hard” samples need spatially detailed information. In RANet, the input images are first routed to a lightweight sub-network that efficiently extracts low-resolution representations, and those samples with high prediction confidence will exit early from the network without being further processed. Meanwhile, high-resolution paths in the network maintain the capability to recognize the “hard” samples. Therefore, RANet can effectively reduce the spatial redundancy involved in inferring high-resolution inputs. Empirically, we demonstrate the effectiveness of the proposed RANet on the CIFAR-10, CIFAR-100 and ImageNet datasets in both the anytime prediction setting and the budgeted batch classification setting.
Tasks
Published 2020-03-16
URL https://arxiv.org/abs/2003.07326v4
PDF https://arxiv.org/pdf/2003.07326v4.pdf
PWC https://paperswithcode.com/paper/resolution-adaptive-networks-for-efficient
Repo
Framework

On the Fairness of Randomized Trials for Recommendation with Heterogeneous Demographics and Beyond

Title On the Fairness of Randomized Trials for Recommendation with Heterogeneous Demographics and Beyond
Authors Zifeng Wang, Xi Chen, Rui Wen, Shao-Lun Huang
Abstract Observed events in recommendation are consequence of the decisions made by a policy, thus they are usually selectively labeled, namely the data are Missing Not At Random (MNAR), which often causes large bias to the estimate of true outcomes risk. A general approach to correct MNAR bias is performing small Randomized Controlled Trials (RCTs), where an additional uniform policy is employed to randomly assign items to each user. In this work, we concentrate on the fairness of RCTs under both homogeneous and heterogeneous demographics, especially analyzing the bias for the least favorable group on the latter setting. Considering RCTs’ limitations, we propose a novel Counterfactual Robust Risk Minimization (CRRM) framework, which is totally free of expensive RCTs, and derive its theoretical generalization error bound. At last, empirical experiments are performed on synthetic tasks and real-world data sets, substantiating our method’s superiority both in fairness and generalization.
Tasks
Published 2020-01-25
URL https://arxiv.org/abs/2001.09328v2
PDF https://arxiv.org/pdf/2001.09328v2.pdf
PWC https://paperswithcode.com/paper/on-the-fairness-of-randomized-trials-for
Repo
Framework

EllipBody: A Light-weight and Part-based Representation for Human Pose and Shape Recovery

Title EllipBody: A Light-weight and Part-based Representation for Human Pose and Shape Recovery
Authors Min Wang, Feng Qiu, Wentao Liu, Chen Qian, Xiaowei Zhou, Lizhuang Ma
Abstract Human pose and shape recovery is an important task in computer vision and real-world understanding. Current works are tackled due to the lack of 3D annotations for whole body shapes. We find that part segmentation is a very efficient 2D annotation in 3D human body recovery. It not only indicates the location of each part but also contains 3D information through occlusions from the shape of parts, as indicated in Figure 1. To better utilize 3D information contained in part segmentation, we propose a part-level differentiable renderer which model occlusion between parts explicitly. It enhances the performance in both learning-based and optimization-based methods. To further improve the efficiency of the task, we propose a light-weight body model called EllipBody, which uses ellipsoids to indicate each body part. Together with SMPL, the relationship between forward time, performance and number of faces in body models are analyzed. A small number of faces is chosen for achieving good performance and efficiency at the same time. Extensive experiments show that our methods achieve the state-of-the-art results on Human3.6M and LSP dataset for 3D pose estimation and part segmentation.
Tasks 3D Pose Estimation, Pose Estimation
Published 2020-03-24
URL https://arxiv.org/abs/2003.10873v1
PDF https://arxiv.org/pdf/2003.10873v1.pdf
PWC https://paperswithcode.com/paper/ellipbody-a-light-weight-and-part-based
Repo
Framework

Self-Supervised Contextual Bandits in Computer Vision

Title Self-Supervised Contextual Bandits in Computer Vision
Authors Aniket Anand Deshmukh, Abhimanu Kumar, Levi Boyles, Denis Charles, Eren Manavoglu, Urun Dogan
Abstract Contextual bandits are a common problem faced by machine learning practitioners in domains as diverse as hypothesis testing to product recommendations. There have been a lot of approaches in exploiting rich data representations for contextual bandit problems with varying degree of success. Self-supervised learning is a promising approach to find rich data representations without explicit labels. In a typical self-supervised learning scheme, the primary task is defined by the problem objective (e.g. clustering, classification, embedding generation etc.) and the secondary task is defined by the self-supervision objective (e.g. rotation prediction, words in neighborhood, colorization, etc.). In the usual self-supervision, we learn implicit labels from the training data for a secondary task. However, in the contextual bandit setting, we don’t have the advantage of getting implicit labels due to lack of data in the initial phase of learning. We provide a novel approach to tackle this issue by combining a contextual bandit objective with a self supervision objective. By augmenting contextual bandit learning with self-supervision we get a better cumulative reward. Our results on eight popular computer vision datasets show substantial gains in cumulative reward. We provide cases where the proposed scheme doesn’t perform optimally and give alternative methods for better learning in these cases.
Tasks Colorization, Multi-Armed Bandits
Published 2020-03-18
URL https://arxiv.org/abs/2003.08485v1
PDF https://arxiv.org/pdf/2003.08485v1.pdf
PWC https://paperswithcode.com/paper/self-supervised-contextual-bandits-in
Repo
Framework

Learning and Fairness in Energy Harvesting: A Maximin Multi-Armed Bandits Approach

Title Learning and Fairness in Energy Harvesting: A Maximin Multi-Armed Bandits Approach
Authors Debamita Ghosh, Arun Verma, Manjesh K. Hanawal
Abstract Recent advances in wireless radio frequency (RF) energy harvesting allows sensor nodes to increase their lifespan by remotely charging their batteries. The amount of energy harvested by the nodes varies depending on their ambient environment, and proximity to the source. The lifespan of the sensor network depends on the minimum amount of energy a node can harvest in the network. It is thus important to learn the least amount of energy harvested by nodes so that the source can transmit on a frequency band that maximizes this amount. We model this learning problem as a novel stochastic Maximin Multi-Armed Bandits (Maximin MAB) problem and propose an Upper Confidence Bound (UCB) based algorithm named Maximin UCB. Maximin MAB is a generalization of standard MAB and enjoys the same performance guarantee as that of the UCB1 algorithm. Experimental results validate the performance guarantees of our algorithm.
Tasks Multi-Armed Bandits
Published 2020-03-13
URL https://arxiv.org/abs/2003.06213v2
PDF https://arxiv.org/pdf/2003.06213v2.pdf
PWC https://paperswithcode.com/paper/learning-and-fairness-in-energy-harvesting-a
Repo
Framework

A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option

Title A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option
Authors P Sharoff, Nishant A. Mehta, Ravi Ganti
Abstract We consider a sequential decision-making problem where an agent can take one action at a time and each action has a stochastic temporal extent, i.e., a new action cannot be taken until the previous one is finished. Upon completion, the chosen action yields a stochastic reward. The agent seeks to maximize its cumulative reward over a finite time budget, with the option of “giving up” on a current action — hence forfeiting any reward – in order to choose another action. We cast this problem as a variant of the stochastic multi-armed bandits problem with stochastic consumption of resource. For this problem, we first establish that the optimal arm is the one that maximizes the ratio of the expected reward of the arm to the expected waiting time before the agent sees the reward due to pulling that arm. Using a novel upper confidence bound on this ratio, we then introduce an upper confidence based-algorithm, WAIT-UCB, for which we establish logarithmic, problem-dependent regret bound which has an improved dependence on problem parameters compared to previous works. Simulations on various problem configurations comparing WAIT-UCB against the state-of-the-art algorithms are also presented.
Tasks Decision Making, Multi-Armed Bandits
Published 2020-03-06
URL https://arxiv.org/abs/2003.03456v1
PDF https://arxiv.org/pdf/2003.03456v1.pdf
PWC https://paperswithcode.com/paper/a-farewell-to-arms-sequential-reward
Repo
Framework

Robustness Guarantees for Mode Estimation with an Application to Bandits

Title Robustness Guarantees for Mode Estimation with an Application to Bandits
Authors Aldo Pacchiano, Heinrich Jiang, Michael I. Jordan
Abstract Mode estimation is a classical problem in statistics with a wide range of applications in machine learning. Despite this, there is little understanding in its robustness properties under possibly adversarial data contamination. In this paper, we give precise robustness guarantees as well as privacy guarantees under simple randomization. We then introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We prove regret guarantees for the problems of top arm identification, top m-arms identification, contextual modal bandits, and infinite continuous arms top arm recovery. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences, thus rendering modal bandits an attractive choice in situations where the rewards may have outliers or adversarial corruptions.
Tasks Multi-Armed Bandits
Published 2020-03-05
URL https://arxiv.org/abs/2003.02932v1
PDF https://arxiv.org/pdf/2003.02932v1.pdf
PWC https://paperswithcode.com/paper/robustness-guarantees-for-mode-estimation
Repo
Framework

PolyGen: An Autoregressive Generative Model of 3D Meshes

Title PolyGen: An Autoregressive Generative Model of 3D Meshes
Authors Charlie Nash, Yaroslav Ganin, S. M. Ali Eslami, Peter W. Battaglia
Abstract Polygon meshes are an efficient representation of 3D geometry, and are of central importance in computer graphics, robotics and games development. Existing learning-based approaches have avoided the challenges of working with 3D meshes, instead using alternative object representations that are more compatible with neural architectures and training approaches. We present an approach which models the mesh directly, predicting mesh vertices and faces sequentially using a Transformer-based architecture. Our model can condition on a range of inputs, including object classes, voxels, and images, and because the model is probabilistic it can produce samples that capture uncertainty in ambiguous scenarios. We show that the model is capable of producing high-quality, usable meshes, and establish log-likelihood benchmarks for the mesh-modelling task. We also evaluate the conditional models on surface reconstruction metrics against alternative methods, and demonstrate competitive performance despite not training directly on this task.
Tasks
Published 2020-02-23
URL https://arxiv.org/abs/2002.10880v1
PDF https://arxiv.org/pdf/2002.10880v1.pdf
PWC https://paperswithcode.com/paper/polygen-an-autoregressive-generative-model-of
Repo
Framework
comments powered by Disqus