January 31, 2020

3073 words 15 mins read

Paper Group ANR 80

Agent Probing Interaction Policies. A Deep Actor-Critic Reinforcement Learning Framework for Dynamic Multichannel Access. Learning Pose Estimation for UAV Autonomous Navigation andLanding Using Visual-Inertial Sensor Data. Binary matrix completion with nonconvex regularizers. Stochastic Bandits with Context Distributions. Cluster Developing 1-Bit M …

Agent Probing Interaction Policies


Title	Agent Probing Interaction Policies
Authors	Siddharth Ghiya, Oluwafemi Azeez, Brendan Miller
Abstract	Reinforcement learning in a multi agent system is difficult because these systems are inherently non-stationary in nature. In such a case, identifying the type of the opposite agent is crucial and can help us address this non-stationary environment. We have investigated if we can employ some probing policies which help us better identify the type of the other agent in the environment. We’ve made a simplifying assumption that the other agent has a stationary policy that our probing policy is trying to approximate. Our work extends Environmental Probing Interaction Policy framework to handle multi agent environments.
Tasks
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09535v3
PDF	https://arxiv.org/pdf/1911.09535v3.pdf
PWC	https://paperswithcode.com/paper/agent-probing-interaction-policies
Repo
Framework

A Deep Actor-Critic Reinforcement Learning Framework for Dynamic Multichannel Access


Title	A Deep Actor-Critic Reinforcement Learning Framework for Dynamic Multichannel Access
Authors	Chen Zhong, Ziyang Lu, M. Cenk Gursoy, Senem Velipasalar
Abstract	To make efficient use of limited spectral resources, we in this work propose a deep actor-critic reinforcement learning based framework for dynamic multichannel access. We consider both a single-user case and a scenario in which multiple users attempt to access channels simultaneously. We employ the proposed framework as a single agent in the single-user case, and extend it to a decentralized multi-agent framework in the multi-user scenario. In both cases, we develop algorithms for the actor-critic deep reinforcement learning and evaluate the proposed learning policies via experiments and numerical results. In the single-user model, in order to evaluate the performance of the proposed channel access policy and the framework’s tolerance against uncertainty, we explore different channel switching patterns and different switching probabilities. In the case of multiple users, we analyze the probabilities of each user accessing channels with favorable channel conditions and the probability of collision. We also address a time-varying environment to identify the adaptive ability of the proposed framework. Additionally, we provide comparisons (in terms of both the average reward and time efficiency) between the proposed actor-critic deep reinforcement learning framework, Deep-Q network (DQN) based approach, random access, and the optimal policy when the channel dynamics are known.
Tasks
Published	2019-08-20
URL	https://arxiv.org/abs/1908.08401v1
PDF	https://arxiv.org/pdf/1908.08401v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-actor-critic-reinforcement-learning
Repo
Framework


Title	Learning Pose Estimation for UAV Autonomous Navigation andLanding Using Visual-Inertial Sensor Data
Authors	Francesca Baldini, Animashree Anandkumar, Richard M. Murray
Abstract	In this work, we propose a robust network-in-the-loop control system that allows an Unmanned-Aerial-Vehicles to navigate and land autonomously ona desired target. To estimate the global pose of theaerial vehicle, we develop a deep neural network ar-chitecture for visual-inertial odometry, which providesa robust alternative to traditional techniques for au-tonomous navigation of Unmanned-Aerial-Vehicles. Wefirst provide experimental results on the accuracy ofthe estimation by comparing the prediction of our modelto traditional visual-inertial approaches on the publiclyavailable EuRoC MAV dataset. The results indicate aclear improvement in the accuracy of the pose estima-tion up to 25% against the baseline. Second, we useAirsim, a simulator available as a plugin for UnrealEngine, to create new datasets of photorealistic imagesand inertial measurement to train and test our model.We finally integrate the proposed architecture for globallocalization with the Airsim closed-loop control system,and we provide simulation results for the autonomouslanding of the aerial vehicle.
Tasks	Autonomous Navigation, Pose Estimation
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04527v1
PDF	https://arxiv.org/pdf/1912.04527v1.pdf
PWC	https://paperswithcode.com/paper/learning-pose-estimation-for-uav-autonomous
Repo
Framework

Binary matrix completion with nonconvex regularizers


Title	Binary matrix completion with nonconvex regularizers
Authors	Chunsheng Liu
Abstract	Many practical problems involve the recovery of a binary matrix from partial information, which makes the binary matrix completion (BMC) technique received increasing attention in machine learning. In particular, we consider a special case of BMC problem, in which only a subset of positive elements can be observed. In recent years, convex regularization based methods are the mainstream approaches for this task. However, the applications of nonconvex surrogates in standard matrix completion have demonstrated better empirical performance. Accordingly, we propose a novel BMC model with nonconvex regularizers and provide the recovery guarantee for the model. Furthermore, for solving the resultant nonconvex optimization problem, we improve the popular proximal algorithm with acceleration strategies. It can be guaranteed that the convergence rate of the algorithm is in the order of ${1/T}$, where $T$ is the number of iterations. Extensive experiments conducted on both synthetic and real-world data sets demonstrate the superiority of the proposed approach over other competing methods.
Tasks	Matrix Completion
Published	2019-04-08
URL	http://arxiv.org/abs/1904.03807v1
PDF	http://arxiv.org/pdf/1904.03807v1.pdf
PWC	https://paperswithcode.com/paper/binary-matrix-completion-with-nonconvex
Repo
Framework

Stochastic Bandits with Context Distributions


Title	Stochastic Bandits with Context Distributions
Authors	Johannes Kirschner, Andreas Krause
Abstract	We introduce a stochastic contextual bandit model where at each time step the environment chooses a distribution over a context set and samples the context from this distribution. The learner observes only the context distribution while the exact context realization remains hidden. This allows for a broad range of applications where the context is stochastic or when the learner needs to predict the context. We leverage the UCB algorithm to this setting and show that it achieves an order-optimal high-probability bound on the cumulative regret for linear and kernelized reward functions. Our results strictly generalize previous work in the sense that both our model and the algorithm reduce to the standard setting when the environment chooses only Dirac delta distributions and therefore provides the exact context to the learner. We further analyze a variant where the learner observes the realized context after choosing the action. Finally, we demonstrate the proposed method on synthetic and real-world datasets.
Tasks
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02685v2
PDF	https://arxiv.org/pdf/1906.02685v2.pdf
PWC	https://paperswithcode.com/paper/stochastic-bandits-with-context-distributions
Repo
Framework

Cluster Developing 1-Bit Matrix Completion


Title	Cluster Developing 1-Bit Matrix Completion
Authors	Chengkun Zhang. Junbin Gao, Stephen Lu
Abstract	Matrix completion has a long-time history of usage as the core technique of recommender systems. In particular, 1-bit matrix completion, which considers the prediction as a `Recommended'' or` Not Recommended’’ question, has proved its significance and validity in the field. However, while customers and products aggregate into interacted clusters, state-of-the-art model-based 1-bit recommender systems do not take the consideration of grouping bias. To tackle the gap, this paper introduced Group-Specific 1-bit Matrix Completion (GS1MC) by first-time consolidating group-specific effects into 1-bit recommender systems under the low-rank latent variable framework. Additionally, to empower GS1MC even when grouping information is unobtainable, Cluster Developing Matrix Completion (CDMC) was proposed by integrating the sparse subspace clustering technique into GS1MC. Namely, CDMC allows clustering users/items and to leverage their group effects into matrix completion at the same time. Experiments on synthetic and real-world data show that GS1MC outperforms the current 1-bit matrix completion methods. Meanwhile, it is compelling that CDMC can successfully capture items’ genre features only based on sparse binary user-item interactive data. Notably, GS1MC provides a new insight to incorporate and evaluate the efficacy of clustering methods while CDMC can be served as a new tool to explore unrevealed social behavior or market phenomenon.
Tasks	Matrix Completion, Recommendation Systems
Published	2019-04-07
URL	http://arxiv.org/abs/1904.03779v1
PDF	http://arxiv.org/pdf/1904.03779v1.pdf
PWC	https://paperswithcode.com/paper/cluster-developing-1-bit-matrix-completion
Repo
Framework

Improving Word Representations: A Sub-sampled Unigram Distribution for Negative Sampling


Title	Improving Word Representations: A Sub-sampled Unigram Distribution for Negative Sampling
Authors	Wenxiang Jiao, Irwin King, Michael R. Lyu
Abstract	Word2Vec is the most popular model for word representation and has been widely investigated in literature. However, its noise distribution for negative sampling is decided by empirical trials and the optimality has always been ignored. We suggest that the distribution is a sub-optimal choice, and propose to use a sub-sampled unigram distribution for better negative sampling. Our contributions include: (1) proposing the concept of semantics quantification and deriving a suitable sub-sampling rate for the proposed distribution adaptive to different training corpora; (2) demonstrating the advantages of our approach in both negative sampling and noise contrastive estimation by extensive evaluation tasks; and (3) proposing a semantics weighted model for the MSR sentence completion task, resulting in considerable improvements. Our work not only improves the quality of word vectors but also benefits current understanding of Word2Vec.
Tasks
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09362v1
PDF	https://arxiv.org/pdf/1910.09362v1.pdf
PWC	https://paperswithcode.com/paper/improving-word-representations-a-sub-sampled
Repo
Framework

On the Performance of Thompson Sampling on Logistic Bandits


Title	On the Performance of Thompson Sampling on Logistic Bandits
Authors	Shi Dong, Tengyu Ma, Benjamin Van Roy
Abstract	We study the logistic bandit, in which rewards are binary with success probability $\exp(\beta a^\top \theta) / (1 + \exp(\beta a^\top \theta))$ and actions $a$ and coefficients $\theta$ are within the $d$-dimensional unit ball. While prior regret bounds for algorithms that address the logistic bandit exhibit exponential dependence on the slope parameter $\beta$, we establish a regret bound for Thompson sampling that is independent of $\beta$. Specifically, we establish that, when the set of feasible actions is identical to the set of possible coefficient vectors, the Bayesian regret of Thompson sampling is $\tilde{O}(d\sqrt{T})$. We also establish a $\tilde{O}(\sqrt{d\eta T}/\lambda)$ bound that applies more broadly, where $\lambda$ is the worst-case optimal log-odds and $\eta$ is the “fragility dimension,” a new statistic we define to capture the degree to which an optimal action for one model fails to satisfice for others. We demonstrate that the fragility dimension plays an essential role by showing that, for any $\epsilon > 0$, no algorithm can achieve $\mathrm{poly}(d, 1/\lambda)\cdot T^{1-\epsilon}$ regret.
Tasks
Published	2019-05-12
URL	https://arxiv.org/abs/1905.04654v1
PDF	https://arxiv.org/pdf/1905.04654v1.pdf
PWC	https://paperswithcode.com/paper/on-the-performance-of-thompson-sampling-on
Repo
Framework

Deep Variable-Block Chain with Adaptive Variable Selection


Title	Deep Variable-Block Chain with Adaptive Variable Selection
Authors	Lixiang Zhang, Lin Lin, Jia Li
Abstract	The architectures of deep neural networks (DNN) rely heavily on the underlying grid structure of variables, for instance, the lattice of pixels in an image. For general high dimensional data with variables not associated with a grid, the multi-layer perceptron and deep brief network are often used. However, it is frequently observed that those networks do not perform competitively and they are not helpful for identifying important variables. In this paper, we propose a framework that imposes on blocks of variables a chain structure obtained by step-wise greedy search so that the DNN architecture can leverage the constructed grid. We call this new neural network Deep Variable-Block Chain (DVC). Because the variable blocks are used for classification in a sequential manner, we further develop the capacity of selecting variables adaptively according to a number of regions trained by a decision tree. Our experiments show that DVC outperforms other generic DNNs and other strong classifiers. Moreover, DVC can achieve high accuracy at much reduced dimensionality and sometimes reveals drastically different sets of relevant variables for different regions.
Tasks
Published	2019-12-07
URL	https://arxiv.org/abs/1912.03573v1
PDF	https://arxiv.org/pdf/1912.03573v1.pdf
PWC	https://paperswithcode.com/paper/deep-variable-block-chain-with-adaptive
Repo
Framework

Car Pose in Context: Accurate Pose Estimation with Ground Plane Constraints


Title	Car Pose in Context: Accurate Pose Estimation with Ground Plane Constraints
Authors	Pengfei Li, Weichao Qiu, Michael Peven, Gregory D. Hager, Alan L. Yuille
Abstract	Scene context is a powerful constraint on the geometry of objects within the scene in cases, such as surveillance, where the camera geometry is unknown and image quality may be poor. In this paper, we describe a method for estimating the pose of cars in a scene jointly with the ground plane that supports them. We formulate this as a joint optimization that accounts for varying car shape using a statistical atlas, and which simultaneously computes geometry and internal camera parameters. We demonstrate that this method produces significant improvements for car pose estimation, and we show that the resulting 3D geometry, when computed over a video sequence, makes it possible to improve on state of the art classification of car behavior. We also show that introducing the planar constraint allows us to estimate camera focal length in a reliable manner.
Tasks	Pose Estimation
Published	2019-12-09
URL	https://arxiv.org/abs/1912.04363v1
PDF	https://arxiv.org/pdf/1912.04363v1.pdf
PWC	https://paperswithcode.com/paper/car-pose-in-context-accurate-pose-estimation
Repo
Framework

Explicit-Duration Markov Switching Models


Title	Explicit-Duration Markov Switching Models
Authors	Silvia Chiappa
Abstract	Markov switching models (MSMs) are probabilistic models that employ multiple sets of parameters to describe different dynamic regimes that a time series may exhibit at different periods of time. The switching mechanism between regimes is controlled by unobserved random variables that form a first-order Markov chain. Explicit-duration MSMs contain additional variables that explicitly model the distribution of time spent in each regime. This allows to define duration distributions of any form, but also to impose complex dependence between the observations and to reset the dynamics to initial conditions. Models that focus on the first two properties are most commonly known as hidden semi-Markov models or segment models, whilst models that focus on the third property are most commonly known as changepoint models or reset models. In this monograph, we provide a description of explicit-duration modelling by categorizing the different approaches into three groups, which differ in encoding in the explicit-duration variables different information about regime change/reset boundaries. The approaches are described using the formalism of graphical models, which allows to graphically represent and assess statistical dependence and therefore to easily describe the structure of complex models and derive inference routines. The presentation is intended to be pedagogical, focusing on providing a characterization of the three groups in terms of model structure constraints and inference properties. The monograph is supplemented with a software package that contains most of the models and examples described. The material presented should be useful to both researchers wishing to learn about these models and researchers wishing to develop them further.
Tasks	Time Series
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05800v1
PDF	https://arxiv.org/pdf/1909.05800v1.pdf
PWC	https://paperswithcode.com/paper/explicit-duration-markov-switching-models
Repo
Framework

Optimal Minimal Margin Maximization with Boosting


Title	Optimal Minimal Margin Maximization with Boosting
Authors	Allan Grønlund, Kasper Green Larsen, Alexander Mathiasen
Abstract	Boosting algorithms produce a classifier by iteratively combining base hypotheses. It has been observed experimentally that the generalization error keeps improving even after achieving zero training error. One popular explanation attributes this to improvements in margins. A common goal in a long line of research, is to maximize the smallest margin using as few base hypotheses as possible, culminating with the AdaBoostV algorithm by (R{"a}tsch and Warmuth [JMLR’04]). The AdaBoostV algorithm was later conjectured to yield an optimal trade-off between number of hypotheses trained and the minimal margin over all training points (Nie et al. [JMLR’13]). Our main contribution is a new algorithm refuting this conjecture. Furthermore, we prove a lower bound which implies that our new algorithm is optimal.
Tasks
Published	2019-01-30
URL	http://arxiv.org/abs/1901.10789v1
PDF	http://arxiv.org/pdf/1901.10789v1.pdf
PWC	https://paperswithcode.com/paper/optimal-minimal-margin-maximization-with
Repo
Framework

Heuristics, Answer Set Programming and Markov Decision Process for Solving a Set of Spatial Puzzles


Title	Heuristics, Answer Set Programming and Markov Decision Process for Solving a Set of Spatial Puzzles
Authors	Thiago Freitas dos Santos, Paulo E. Santos, Leonardo A. Ferreira, Reinaldo A. C. Bianchi, Pedro Cabalar
Abstract	Spatial puzzles composed of rigid objects, flexible strings and holes offer interesting domains for reasoning about spatial entities that are common in the human daily-life’s activities. The goal of this work is to investigate the automated solution of this kind of puzzles adapting an algorithm that combines Answer Set Programming (ASP) with Markov Decision Process (MDP), algorithm oASP(MDP), to use heuristics accelerating the learning process. ASP is applied to represent the domain as an MDP, while a Reinforcement Learning algorithm (Q-Learning) is used to find the optimal policies. In this work, the heuristics were obtained from the solution of relaxed versions of the puzzles. Experiments were performed on deterministic, non-deterministic and non-stationary versions of the puzzles. Results show that the proposed approach can accelerate the learning process, presenting an advantage when compared to the non-heuristic versions of oASP(MDP) and Q-Learning.
Tasks	Q-Learning
Published	2019-02-16
URL	http://arxiv.org/abs/1903.03411v1
PDF	http://arxiv.org/pdf/1903.03411v1.pdf
PWC	https://paperswithcode.com/paper/heuristics-answer-set-programming-and-markov
Repo
Framework

SRINet: Learning Strictly Rotation-Invariant Representations for Point Cloud Classification and Segmentation


Title	SRINet: Learning Strictly Rotation-Invariant Representations for Point Cloud Classification and Segmentation
Authors	Xiao Sun, Zhouhui Lian, Jianguo Xiao
Abstract	Point cloud analysis has drawn broader attentions due to its increasing demands in various fields. Despite the impressive performance has been achieved on several databases, researchers neglect the fact that the orientation of those point cloud data is aligned. Varying the orientation of point cloud may lead to the degradation of performance, restricting the capacity of generalizing to real applications where the prior of orientation is often unknown. In this paper, we propose the point projection feature, which is invariant to the rotation of the input point cloud. A novel architecture is designed to mine features of different levels. We adopt a PointNet-based backbone to extract global feature for point cloud, and the graph aggregation operation to perceive local shape structure. Besides, we introduce an efficient key point descriptor to assign each point with different response and help recognize the overall geometry. Mathematical analyses and experimental results demonstrate that the proposed method can extract strictly rotation-invariant representations for point cloud recognition and segmentation without data augmentation, and outperforms other state-of-the-art methods.
Tasks	Data Augmentation
Published	2019-11-06
URL	https://arxiv.org/abs/1911.02163v1
PDF	https://arxiv.org/pdf/1911.02163v1.pdf
PWC	https://paperswithcode.com/paper/srinet-learning-strictly-rotation-invariant
Repo
Framework

Forecasting in Big Data Environments: an Adaptable and Automated Shrinkage Estimation of Neural Networks (AAShNet)


Title	Forecasting in Big Data Environments: an Adaptable and Automated Shrinkage Estimation of Neural Networks (AAShNet)
Authors	Ali Habibnia, Esfandiar Maasoumi
Abstract	This paper considers improved forecasting in possibly nonlinear dynamic settings, with high-dimension predictors (“big data” environments). To overcome the curse of dimensionality and manage data and model complexity, we examine shrinkage estimation of a back-propagation algorithm of a deep neural net with skip-layer connections. We expressly include both linear and nonlinear components. This is a high-dimensional learning approach including both sparsity L1 and smoothness L2 penalties, allowing high-dimensionality and nonlinearity to be accommodated in one step. This approach selects significant predictors as well as the topology of the neural network. We estimate optimal values of shrinkage hyperparameters by incorporating a gradient-based optimization technique resulting in robust predictions with improved reproducibility. The latter has been an issue in some approaches. This is statistically interpretable and unravels some network structure, commonly left to a black box. An additional advantage is that the nonlinear part tends to get pruned if the underlying process is linear. In an application to forecasting equity returns, the proposed approach captures nonlinear dynamics between equities to enhance forecast performance. It offers an appreciable improvement over current univariate and multivariate models by RMSE and actual portfolio performance.
Tasks
Published	2019-04-25
URL	http://arxiv.org/abs/1904.11145v1
PDF	http://arxiv.org/pdf/1904.11145v1.pdf
PWC	https://paperswithcode.com/paper/forecasting-in-big-data-environments-an
Repo
Framework