Paper Group ANR 815
Nonparametric Stochastic Compositional Gradient Descent for Q-Learning in Continuous Markov Decision Problems. Switching Isotropic and Directional Exploration with Parameter Space Noise in Deep Reinforcement Learning. Identifying Land Patterns from Satellite Imagery in Amazon Rainforest using Deep Learning. The Entropy of Artificial Intelligence an …
Nonparametric Stochastic Compositional Gradient Descent for Q-Learning in Continuous Markov Decision Problems
Title | Nonparametric Stochastic Compositional Gradient Descent for Q-Learning in Continuous Markov Decision Problems |
Authors | Alec Koppel, Ekaterina Tolstaya, Ethan Stump, Alejandro Ribeiro |
Abstract | We consider Markov Decision Problems defined over continuous state and action spaces, where an autonomous agent seeks to learn a map from its states to actions so as to maximize its long-term discounted accumulation of rewards. We address this problem by considering Bellman’s optimality equation defined over action-value functions, which we reformulate into a nested non-convex stochastic optimization problem defined over a Reproducing Kernel Hilbert Space (RKHS). We develop a functional generalization of stochastic quasi-gradient method to solve it, which, owing to the structure of the RKHS, admits a parameterization in terms of scalar weights and past state-action pairs which grows proportionately with the algorithm iteration index. To ameliorate this complexity explosion, we apply Kernel Orthogonal Matching Pursuit to the sequence of kernel weights and dictionaries, which yields a controllable error in the descent direction of the underlying optimization method. We prove that the resulting algorithm, called KQ-Learning, converges with probability 1 to a stationary point of this problem, yielding a fixed point of the Bellman optimality operator under the hypothesis that it belongs to the RKHS. Under constant learning rates, we further obtain convergence to a small Bellman error that depends on the chosen learning rates. Numerical evaluation on the Continuous Mountain Car and Inverted Pendulum tasks yields convergent parsimonious learned action-value functions, policies that are competitive with the state of the art, and exhibit reliable, reproducible learning behavior. |
Tasks | Q-Learning, Stochastic Optimization |
Published | 2018-04-19 |
URL | http://arxiv.org/abs/1804.07323v1 |
http://arxiv.org/pdf/1804.07323v1.pdf | |
PWC | https://paperswithcode.com/paper/nonparametric-stochastic-compositional |
Repo | |
Framework | |
Switching Isotropic and Directional Exploration with Parameter Space Noise in Deep Reinforcement Learning
Title | Switching Isotropic and Directional Exploration with Parameter Space Noise in Deep Reinforcement Learning |
Authors | Izumi Karino, Kazutoshi Tanaka, Ryuma Niiyama, Yasuo Kuniyoshi |
Abstract | This paper proposes an exploration method for deep reinforcement learning based on parameter space noise. Recent studies have experimentally shown that parameter space noise results in better exploration than the commonly used action space noise. Previous methods devised a way to update the diagonal covariance matrix of a noise distribution and did not consider the direction of the noise vector and its correlation. In addition, fast updates of the noise distribution are required to facilitate policy learning. We propose a method that deforms the noise distribution according to the accumulated returns and the noises that have led to the returns. Moreover, this method switches isotropic exploration and directional exploration in parameter space with regard to obtained rewards. We validate our exploration strategy in the OpenAI Gym continuous environments and modified environments with sparse rewards. The proposed method achieves results that are competitive with a previous method at baseline tasks. Moreover, our approach exhibits better performance in sparse reward environments by exploration with the switching strategy. |
Tasks | |
Published | 2018-09-18 |
URL | http://arxiv.org/abs/1809.06570v2 |
http://arxiv.org/pdf/1809.06570v2.pdf | |
PWC | https://paperswithcode.com/paper/switching-isotropic-and-directional |
Repo | |
Framework | |
Identifying Land Patterns from Satellite Imagery in Amazon Rainforest using Deep Learning
Title | Identifying Land Patterns from Satellite Imagery in Amazon Rainforest using Deep Learning |
Authors | Somnath Rakshit, Soumyadeep Debnath, Dhiman Mondal |
Abstract | The Amazon rainforests have been suffering widespread damage, both via natural and artificial means. Every minute, it is estimated that the world loses forest cover the size of 48 football fields. Deforestation in the Amazon rainforest has led to drastically reduced biodiversity, loss of habitat, climate change, and other biological losses. In this respect, it has become essential to track how the nature of these forests change over time. Image classification using deep learning can help speed up this process by removing the manual task of classifying each image. Here, it is shown how convolutional neural networks can be used to track changes in land patterns in the Amazon rainforests. In this work, a testing accuracy of 96.71% was obtained. This can help governments and other agencies to track changes in land patterns more effectively and accurately. |
Tasks | Image Classification |
Published | 2018-09-02 |
URL | http://arxiv.org/abs/1809.00340v1 |
http://arxiv.org/pdf/1809.00340v1.pdf | |
PWC | https://paperswithcode.com/paper/identifying-land-patterns-from-satellite |
Repo | |
Framework | |
The Entropy of Artificial Intelligence and a Case Study of AlphaZero from Shannon’s Perspective
Title | The Entropy of Artificial Intelligence and a Case Study of AlphaZero from Shannon’s Perspective |
Authors | Bo Zhang, Bin Chen, Jin-lin Peng |
Abstract | The recently released AlphaZero algorithm achieves superhuman performance in the games of chess, shogi and Go, which raises two open questions. Firstly, as there is a finite number of possibilities in the game, is there a quantifiable intelligence measurement for evaluating intelligent systems, e.g. AlphaZero? Secondly, AlphaZero introduces sophisticated reinforcement learning and self-play to efficiently encode the possible states, is there a simple information-theoretic model to represent the learning process and offer more insights in fostering strong AI systems? This paper explores the above two questions by proposing a simple variance of Shannon’s communication model, the concept of intelligence entropy and the Unified Intelligence-Communication Model is proposed, which provide an information-theoretic metric for investigating the intelligence level and also provide an bound for intelligent agents in the form of Shannon’s capacity, namely, the intelligence capacity. This paper then applies the concept and model to AlphaZero as a case study and explains the learning process of intelligent agent as turbo-like iterative decoding, so that the learning performance of AlphaZero may be quantitatively evaluated. Finally, conclusions are provided along with theoretical and practical remarks. |
Tasks | |
Published | 2018-12-14 |
URL | http://arxiv.org/abs/1812.05794v2 |
http://arxiv.org/pdf/1812.05794v2.pdf | |
PWC | https://paperswithcode.com/paper/the-entropy-of-artificial-intelligence-and-a |
Repo | |
Framework | |
A Generic Approach to Lung Field Segmentation from Chest Radiographs using Deep Space and Shape Learning
Title | A Generic Approach to Lung Field Segmentation from Chest Radiographs using Deep Space and Shape Learning |
Authors | Awais Mansoor, Juan J. Cerrolaza, Geovanny Perez, Elijah Biggs, Kazunori Okada, Gustavo Nino, Marius George Linguraru |
Abstract | Computer-aided diagnosis (CAD) techniques for lung field segmentation from chest radiographs (CXR) have been proposed for adult cohorts, but rarely for pediatric subjects. Statistical shape models (SSMs), the workhorse of most state-of-the-art CXR-based lung field segmentation methods, do not efficiently accommodate shape variation of the lung field during the pediatric developmental stages. The main contributions of our work are: (1) a generic lung field segmentation framework from CXR accommodating large shape variation for adult and pediatric cohorts; (2) a deep representation learning detection mechanism, \emph{ensemble space learning}, for robust object localization; and (3) \emph{marginal shape deep learning} for the shape deformation parameter estimation. Unlike the iterative approach of conventional SSMs, the proposed shape learning mechanism transforms the parameter space into marginal subspaces that are solvable efficiently using the recursive representation learning mechanism. Furthermore, our method is the first to include the challenging retro-cardiac region in the CXR-based lung segmentation for accurate lung capacity estimation. The framework is evaluated on 668 CXRs of patients between 3 month to 89 year of age. We obtain a mean Dice similarity coefficient of $0.96\pm0.03$ (including the retro-cardiac region). For a given accuracy, the proposed approach is also found to be faster than conventional SSM-based iterative segmentation methods. The computational simplicity of the proposed generic framework could be similarly applied to the fast segmentation of other deformable objects. |
Tasks | Object Localization, Representation Learning |
Published | 2018-07-11 |
URL | http://arxiv.org/abs/1807.04339v1 |
http://arxiv.org/pdf/1807.04339v1.pdf | |
PWC | https://paperswithcode.com/paper/a-generic-approach-to-lung-field-segmentation |
Repo | |
Framework | |
Deep Reinforcement Fuzzing
Title | Deep Reinforcement Fuzzing |
Authors | Konstantin Böttinger, Patrice Godefroid, Rishabh Singh |
Abstract | Fuzzing is the process of finding security vulnerabilities in input-processing code by repeatedly testing the code with modified inputs. In this paper, we formalize fuzzing as a reinforcement learning problem using the concept of Markov decision processes. This in turn allows us to apply state-of-the-art deep Q-learning algorithms that optimize rewards, which we define from runtime properties of the program under test. By observing the rewards caused by mutating with a specific set of actions performed on an initial program input, the fuzzing agent learns a policy that can next generate new higher-reward inputs. We have implemented this new approach, and preliminary empirical evidence shows that reinforcement fuzzing can outperform baseline random fuzzing. |
Tasks | Q-Learning |
Published | 2018-01-14 |
URL | http://arxiv.org/abs/1801.04589v1 |
http://arxiv.org/pdf/1801.04589v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-fuzzing |
Repo | |
Framework | |
A Latent Variable Approach to Gaussian Process Modeling with Qualitative and Quantitative Factors
Title | A Latent Variable Approach to Gaussian Process Modeling with Qualitative and Quantitative Factors |
Authors | Yichi Zhang, Siyu Tao, Wei Chen, Daniel W. Apley |
Abstract | Computer simulations often involve both qualitative and numerical inputs. Existing Gaussian process (GP) methods for handling this mainly assume a different response surface for each combination of levels of the qualitative factors and relate them via a multiresponse cross-covariance matrix. We introduce a substantially different approach that maps each qualitative factor to an underlying numerical latent variable (LV), with the mapped value for each level estimated similarly to the correlation parameters. This provides a parsimonious GP parameterization that treats qualitative factors the same as numerical variables and views them as effecting the response via similar physical mechanisms. This has strong physical justification, as the effects of a qualitative factor in any physics-based simulation model must always be due to some underlying numerical variables. Even when the underlying variables are many, sufficient dimension reduction arguments imply that their effects can be represented by a low-dimensional LV. This conjecture is supported by the superior predictive performance observed across a variety of examples. Moreover, the mapped LVs provide substantial insight into the nature and effects of the qualitative factors. |
Tasks | Dimensionality Reduction |
Published | 2018-06-19 |
URL | http://arxiv.org/abs/1806.07504v2 |
http://arxiv.org/pdf/1806.07504v2.pdf | |
PWC | https://paperswithcode.com/paper/a-latent-variable-approach-to-gaussian |
Repo | |
Framework | |
Understanding training and generalization in deep learning by Fourier analysis
Title | Understanding training and generalization in deep learning by Fourier analysis |
Authors | Zhiqin John Xu |
Abstract | Background: It is still an open research area to theoretically understand why Deep Neural Networks (DNNs)—equipped with many more parameters than training data and trained by (stochastic) gradient-based methods—often achieve remarkably low generalization error. Contribution: We study DNN training by Fourier analysis. Our theoretical framework explains: i) DNN with (stochastic) gradient-based methods often endows low-frequency components of the target function with a higher priority during the training; ii) Small initialization leads to good generalization ability of DNN while preserving the DNN’s ability to fit any function. These results are further confirmed by experiments of DNNs fitting the following datasets, that is, natural images, one-dimensional functions and MNIST dataset. |
Tasks | |
Published | 2018-08-13 |
URL | http://arxiv.org/abs/1808.04295v4 |
http://arxiv.org/pdf/1808.04295v4.pdf | |
PWC | https://paperswithcode.com/paper/understanding-training-and-generalization-in |
Repo | |
Framework | |
Semi-Supervised Learning Enabled by Multiscale Deep Neural Network Inversion
Title | Semi-Supervised Learning Enabled by Multiscale Deep Neural Network Inversion |
Authors | Randall Balestriero, Herve Glotin, Richard Baraniuk |
Abstract | Deep Neural Networks (DNNs) provide state-of-the-art solutions in several difficult machine perceptual tasks. However, their performance relies on the availability of a large set of labeled training data, which limits the breadth of their applicability. Hence, there is a need for new {\em semi-supervised learning} methods for DNNs that can leverage both (a small amount of) labeled and unlabeled training data. In this paper, we develop a general loss function enabling DNNs of any topology to be trained in a semi-supervised manner without extra hyper-parameters. As opposed to current semi-supervised techniques based on topology-specific or unstable approaches, ours is both robust and general. We demonstrate that our approach reaches state-of-the-art performance on the SVHN ($9.82%$ test error, with $500$ labels and wide Resnet) and CIFAR10 (16.38% test error, with 8000 labels and sigmoid convolutional neural network) data sets. |
Tasks | |
Published | 2018-02-27 |
URL | http://arxiv.org/abs/1802.10172v1 |
http://arxiv.org/pdf/1802.10172v1.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-learning-enabled-by |
Repo | |
Framework | |
Using Taste Groups for Collaborative Filtering
Title | Using Taste Groups for Collaborative Filtering |
Authors | Farhan Khawar, Nevin L. Zhang |
Abstract | Implicit feedback is the simplest form of user feedback that can be used for item recommendation. It is easy to collect and domain independent. However, there is a lack of negative examples. Existing works circumvent this problem by making various assumptions regarding the unconsumed items, which fail to hold when the user did not consume an item because she was unaware of it. In this paper, we propose as a novel method for addressing the lack of negative examples in implicit feedback. The motivation is that if there is a large group of users who share the same taste and none of them consumed an item, then it is highly likely that the item is irrelevant to this taste. We use Hierarchical Latent Tree Analysis(HLTA) to identify taste-based user groups and make recommendations for a user based on her memberships in the groups. |
Tasks | |
Published | 2018-08-28 |
URL | http://arxiv.org/abs/1808.09785v1 |
http://arxiv.org/pdf/1808.09785v1.pdf | |
PWC | https://paperswithcode.com/paper/using-taste-groups-for-collaborative |
Repo | |
Framework | |
ClassSim: Similarity between Classes Defined by Misclassification Ratios of Trained Classifiers
Title | ClassSim: Similarity between Classes Defined by Misclassification Ratios of Trained Classifiers |
Authors | Kazuma Arino, Yohei Kikuta |
Abstract | Deep neural networks (DNNs) have achieved exceptional performances in many tasks, particularly, in supervised classification tasks. However, achievements with supervised classification tasks are based on large datasets with well-separated classes. Typically, real-world applications involve wild datasets that include similar classes; thus, evaluating similarities between classes and understanding relations among classes are important. To address this issue, a similarity metric, ClassSim, based on the misclassification ratios of trained DNNs is proposed herein. We conducted image recognition experiments to demonstrate that the proposed method provides better similarities compared with existing methods and is useful for classification problems. Source code including all experimental results is available at https://github.com/karino2/ClassSim/. |
Tasks | |
Published | 2018-02-05 |
URL | http://arxiv.org/abs/1802.01267v1 |
http://arxiv.org/pdf/1802.01267v1.pdf | |
PWC | https://paperswithcode.com/paper/classsim-similarity-between-classes-defined |
Repo | |
Framework | |
Low-Precision Floating-Point Schemes for Neural Network Training
Title | Low-Precision Floating-Point Schemes for Neural Network Training |
Authors | Marc Ortiz, Adrián Cristal, Eduard Ayguadé, Marc Casas |
Abstract | The use of low-precision fixed-point arithmetic along with stochastic rounding has been proposed as a promising alternative to the commonly used 32-bit floating point arithmetic to enhance training neural networks training in terms of performance and energy efficiency. In the first part of this paper, the behaviour of the 12-bit fixed-point arithmetic when training a convolutional neural network with the CIFAR-10 dataset is analysed, showing that such arithmetic is not the most appropriate for the training phase. After that, the paper presents and evaluates, under the same conditions, alternative low-precision arithmetics, starting with the 12-bit floating-point arithmetic. These two representations are then leveraged using local scaling in order to increase accuracy and get closer to the baseline 32-bit floating-point arithmetic. Finally, the paper introduces a simplified model in which both the outputs and the gradients of the neural networks are constrained to power-of-two values, just using 7 bits for their representation. The evaluation demonstrates a minimal loss in accuracy for the proposed Power-of-Two neural network, avoiding the use of multiplications and divisions and thereby, significantly reducing the training time as well as the energy consumption and memory requirements during the training and inference phases. |
Tasks | |
Published | 2018-04-14 |
URL | http://arxiv.org/abs/1804.05267v1 |
http://arxiv.org/pdf/1804.05267v1.pdf | |
PWC | https://paperswithcode.com/paper/low-precision-floating-point-schemes-for |
Repo | |
Framework | |
BriarPatches: Pixel-Space Interventions for Inducing Demographic Parity
Title | BriarPatches: Pixel-Space Interventions for Inducing Demographic Parity |
Authors | Alexey A. Gritsenko, Alex D’Amour, James Atwood, Yoni Halpern, D. Sculley |
Abstract | We introduce the BriarPatch, a pixel-space intervention that obscures sensitive attributes from representations encoded in pre-trained classifiers. The patches encourage internal model representations not to encode sensitive information, which has the effect of pushing downstream predictors towards exhibiting demographic parity with respect to the sensitive information. The net result is that these BriarPatches provide an intervention mechanism available at user level, and complements prior research on fair representations that were previously only applicable by model developers and ML experts. |
Tasks | |
Published | 2018-12-17 |
URL | http://arxiv.org/abs/1812.06869v1 |
http://arxiv.org/pdf/1812.06869v1.pdf | |
PWC | https://paperswithcode.com/paper/briarpatches-pixel-space-interventions-for |
Repo | |
Framework | |
Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving
Title | Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving |
Authors | Peiliang Li, Tong Qin, Shaojie Shen |
Abstract | We propose a stereo vision-based approach for tracking the camera ego-motion and 3D semantic objects in dynamic autonomous driving scenarios. Instead of directly regressing the 3D bounding box using end-to-end approaches, we propose to use the easy-to-labeled 2D detection and discrete viewpoint classification together with a light-weight semantic inference method to obtain rough 3D object measurements. Based on the object-aware-aided camera pose tracking which is robust in dynamic environments, in combination with our novel dynamic object bundle adjustment (BA) approach to fuse temporal sparse feature correspondences and the semantic 3D measurement model, we obtain 3D object pose, velocity and anchored dynamic point cloud estimation with instance accuracy and temporal consistency. The performance of our proposed method is demonstrated in diverse scenarios. Both the ego-motion estimation and object localization are compared with the state-of-of-the-art solutions. |
Tasks | Autonomous Driving, Motion Estimation, Object Localization, Pose Tracking |
Published | 2018-07-05 |
URL | http://arxiv.org/abs/1807.02062v3 |
http://arxiv.org/pdf/1807.02062v3.pdf | |
PWC | https://paperswithcode.com/paper/stereo-vision-based-semantic-3d-object-and |
Repo | |
Framework | |
Plenoptic Monte Carlo Object Localization for Robot Grasping under Layered Translucency
Title | Plenoptic Monte Carlo Object Localization for Robot Grasping under Layered Translucency |
Authors | Zheming Zhou, Zhiqiang Sui, Odest Chadwicke Jenkins |
Abstract | In order to fully function in human environments, robot perception will need to account for the uncertainty caused by translucent materials. Translucency poses several open challenges in the form of transparent objects (e.g., drinking glasses), refractive media (e.g., water), and diffuse partial occlusions (e.g., objects behind stained glass panels). This paper presents Plenoptic Monte Carlo Localization (PMCL) as a method for localizing object poses in the presence of translucency using plenoptic (light-field) observations. We propose a new depth descriptor, the Depth Likelihood Volume (DLV), and its use within a Monte Carlo object localization algorithm. We present results of localizing and manipulating objects with translucent materials and objects occluded by layers of translucency. Our PMCL implementation uses observations from a Lytro first generation light field camera to allow a Michigan Progress Fetch robot to perform grasping. |
Tasks | Object Localization |
Published | 2018-06-26 |
URL | http://arxiv.org/abs/1806.09769v4 |
http://arxiv.org/pdf/1806.09769v4.pdf | |
PWC | https://paperswithcode.com/paper/plenoptic-monte-carlo-object-localization-for |
Repo | |
Framework | |