January 30, 2020

3076 words 15 mins read

Paper Group ANR 274

Extending and Automating Basic Probability Theory with Propositional Computability Logic. Array Codes with Local Properties. Smoothing Structured Decomposable Circuits. Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning. Scaling data-driven robotics with reward sketching and batch reinforcement learning. Sparse-GAN …

Extending and Automating Basic Probability Theory with Propositional Computability Logic


Title	Extending and Automating Basic Probability Theory with Propositional Computability Logic
Authors	Keehang Kwon
Abstract	Classical probability theory is formulated using sets. In this paper, we extend classical probability theory with propositional computability logic. Unlike other formalisms, computability logic is built on the notion of events/games, which is central to probability theory. The probability theory based on CoL is therefore useful for {\it automating} uncertainty reasoning. We describe some basic properties of this new probability theory. We also discuss a novel isomorphism between the set operations and computability logic operations.
Tasks
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07375v2
PDF	https://arxiv.org/pdf/1909.07375v2.pdf
PWC	https://paperswithcode.com/paper/extending-and-automating-basic-probability
Repo
Framework

Array Codes with Local Properties


Title	Array Codes with Local Properties
Authors	Mario Blaum, Steven R. Hetzler
Abstract	In general, array codes consist of $m\times n$ arrays and in many cases, the arrays satisfy parity constraints along lines of different slopes (generally with a toroidal topology). Such codes are useful for RAID type of architectures, since they allow to replace finite field operations by XORs. We present expansions to traditional array codes of this type, like Blaum-Roth (BR) and extended EVENODD codes, by adding parity on columns. This vertical parity allows for recovery of one or more symbols in a column locally, i.e., by using the remaining symbols in the column without invoking the rest of the array. Properties and applications of the new codes are discussed, in particular to Locally Recoverable (LRC) codes.
Tasks
Published	2019-06-27
URL	https://arxiv.org/abs/1906.11731v2
PDF	https://arxiv.org/pdf/1906.11731v2.pdf
PWC	https://paperswithcode.com/paper/array-codes-with-local-properties
Repo
Framework

Smoothing Structured Decomposable Circuits


Title	Smoothing Structured Decomposable Circuits
Authors	Andy Shih, Guy Van den Broeck, Paul Beame, Antoine Amarilli
Abstract	We study the task of smoothing a circuit, i.e., ensuring that all children of a plus-gate mention the same variables. Circuits serve as the building blocks of state-of-the-art inference algorithms on discrete probabilistic graphical models and probabilistic programs. They are also important for discrete density estimation algorithms. Many of these tasks require the input circuit to be smooth. However, smoothing has not been studied in its own right yet, and only a trivial quadratic algorithm is known. This paper studies efficient smoothing for structured decomposable circuits. We propose a near-linear time algorithm for this task and explore lower bounds for smoothing decomposable circuits, using existing results on range-sum queries. Further, for the important case of All-Marginals, we show a more efficient linear-time algorithm. We validate experimentally the performance of our methods.
Tasks	Density Estimation
Published	2019-06-01
URL	https://arxiv.org/abs/1906.00311v2
PDF	https://arxiv.org/pdf/1906.00311v2.pdf
PWC	https://paperswithcode.com/paper/190600311
Repo
Framework

Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning


Title	Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning
Authors	Arjun Manoharan, Rahul Ramesh, Balaraman Ravindran
Abstract	Option discovery and skill acquisition frameworks are integral to the functioning of a Hierarchically organized Reinforcement learning agent. However, such techniques often yield a large number of options or skills, which can potentially be represented succinctly by filtering out any redundant information. Such a reduction can reduce the required computation while also improving the performance on a target task. In order to compress an array of option policies, we attempt to find a policy basis that accurately captures the set of all options. In this work, we propose Option Encoder, an auto-encoder based framework with intelligently constrained weights, that helps discover a collection of basis policies. The policy basis can be used as a proxy for the original set of skills in a suitable hierarchically organized framework. We demonstrate the efficacy of our method on a collection of grid-worlds and on the high-dimensional Fetch-Reach robotic manipulation task by evaluating the obtained policy basis on a set of downstream tasks.
Tasks
Published	2019-09-09
URL	https://arxiv.org/abs/1909.04134v1
PDF	https://arxiv.org/pdf/1909.04134v1.pdf
PWC	https://paperswithcode.com/paper/option-encoder-a-framework-for-discovering-a
Repo
Framework

Scaling data-driven robotics with reward sketching and batch reinforcement learning


Title	Scaling data-driven robotics with reward sketching and batch reinforcement learning
Authors	Serkan Cabi, Sergio Gómez Colmenarejo, Alexander Novikov, Ksenia Konyushkova, Scott Reed, Rae Jeong, Konrad Zolna, Yusuf Aytar, David Budden, Mel Vecerik, Oleg Sushkov, David Barker, Jonathan Scholz, Misha Denil, Nando de Freitas, Ziyu Wang
Abstract	We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions. We show how to apply this framework to accomplish three different object manipulation tasks on a real robot platform. Given demonstrations of a task together with task-agnostic recorded experience, we use a special form of human annotation as supervision to learn a reward function, which enables us to deal with real-world tasks where the reward signal cannot be acquired directly. Learned rewards are used in combination with a large dataset of experience from different tasks to learn a robot policy offline using batch RL. We show that using our approach it is possible to train agents to perform a variety of challenging manipulation tasks including stacking rigid objects and handling cloth.
Tasks
Published	2019-09-26
URL	https://arxiv.org/abs/1909.12200v2
PDF	https://arxiv.org/pdf/1909.12200v2.pdf
PWC	https://paperswithcode.com/paper/a-framework-for-data-driven-robotics
Repo
Framework

Sparse-GAN: Sparsity-constrained Generative Adversarial Network for Anomaly Detection in Retinal OCT Image


Title	Sparse-GAN: Sparsity-constrained Generative Adversarial Network for Anomaly Detection in Retinal OCT Image
Authors	Kang Zhou, Shenghua Gao, Jun Cheng, Zaiwang Gu, Huazhu Fu, Zhi Tu, Jianlong Yang, Yitian Zhao, Jiang Liu
Abstract	With the development of convolutional neural network, deep learning has shown its success for retinal disease detection from optical coherence tomography (OCT) images. However, deep learning often relies on large scale labelled data for training, which is oftentimes challenging especially for disease with low occurrence. Moreover, a deep learning system trained from data-set with one or a few diseases is unable to detect other unseen diseases, which limits the practical usage of the system in disease screening. To address the limitation, we propose a novel anomaly detection framework termed Sparsity-constrained Generative Adversarial Network (Sparse-GAN) for disease screening where only healthy data are available in the training set. The contributions of Sparse-GAN are two-folds: 1) The proposed Sparse-GAN predicts the anomalies in latent space rather than image-level; 2) Sparse-GAN is constrained by a novel Sparsity Regularization Net. Furthermore, in light of the role of lesions for disease screening, we present to leverage on an anomaly activation map to show the heatmap of lesions. We evaluate our proposed Sparse-GAN on a publicly available dataset, and the results show that the proposed method outperforms the state-of-the-art methods.
Tasks	Anomaly Detection
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12527v3
PDF	https://arxiv.org/pdf/1911.12527v3.pdf
PWC	https://paperswithcode.com/paper/sparse-gan-sparsity-constrained-generative
Repo
Framework

Online Inference for Advertising Auctions


Title	Online Inference for Advertising Auctions
Authors	Caio Waisman, Harikesh S. Nair, Carlos Carrion, Nan Xu
Abstract	Advertisers that engage in real-time bidding (RTB) to display their ads commonly have two goals: learning their optimal bidding policy and estimating the expected effect of exposing users to their ads. Typical strategies to accomplish one of these goals tend to ignore the other, creating an apparent tension between the two. This paper exploits the economic structure of the bid optimization problem faced by advertisers to show that these two objectives can actually be perfectly aligned. By framing the advertiser’s problem as a multi-armed bandit (MAB) problem, we propose a modified Thompson Sampling (TS) algorithm that concurrently learns the optimal bidding policy and estimates the expected effect of displaying the ad while minimizing economic losses from potential sub-optimal bidding. Simulations show that not only the proposed method successfully accomplishes the advertiser’s goals, but also does so at a much lower cost than more conventional experimentation policies aimed at performing causal inference.
Tasks	Causal Inference
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08600v1
PDF	https://arxiv.org/pdf/1908.08600v1.pdf
PWC	https://paperswithcode.com/paper/online-inference-for-advertising-auctions
Repo
Framework

Blocking Bandits


Title	Blocking Bandits
Authors	Soumya Basu, Rajat Sen, Sujay Sanghavi, Sanjay Shakkottai
Abstract	We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same product recommendation repeatedly) or infeasible (e.g. compute job scheduling on machines). We show that with prior knowledge of the rewards and delays of all the arms, the problem of optimizing cumulative reward does not admit any pseudo-polynomial time algorithm (in the number of arms) unless randomized exponential time hypothesis is false, by mapping to the PINWHEEL scheduling problem. Subsequently, we show that a simple greedy algorithm that plays the available arm with the highest reward is asymptotically $(1-1/e)$ optimal. When the rewards are unknown, we design a UCB based algorithm which is shown to have $c \log T + o(\log T)$ cumulative regret against the greedy algorithm, leveraging the free exploration of arms due to the unavailability. Finally, when all the delays are equal the problem reduces to Combinatorial Semi-bandits providing us with a lower bound of $c’ \log T+ \omega(\log T)$.
Tasks	Product Recommendation
Published	2019-07-27
URL	https://arxiv.org/abs/1907.11975v1
PDF	https://arxiv.org/pdf/1907.11975v1.pdf
PWC	https://paperswithcode.com/paper/blocking-bandits
Repo
Framework

Generalized Policy Iteration for Optimal Control in Continuous Time


Title	Generalized Policy Iteration for Optimal Control in Continuous Time
Authors	Jingliang Duan, Shengbo Eben Li, Zhengyu Liu, Monimoy Bujarbaruah, Bo Cheng
Abstract	This paper proposes the Deep Generalized Policy Iteration (DGPI) algorithm to find the infinite horizon optimal control policy for general nonlinear continuous-time systems with known dynamics. Unlike existing adaptive dynamic programming algorithms for continuous time systems, DGPI does not require the admissibility of initialized policy, and input-affine nature of controlled systems for convergence. Our algorithm employs the actor-critic architecture to approximate both policy and value functions with the purpose of iteratively solving the Hamilton-Jacobi-Bellman equation. Both the policy and value functions are approximated by deep neural networks. Given any arbitrary initial policy, the proposed DGPI algorithm can eventually converge to an admissible, and subsequently an optimal policy for an arbitrary nonlinear system. We also relax the update termination conditions of both the policy evaluation and improvement processes, which leads to a faster convergence speed than conventional Policy Iteration (PI) methods, for the same architecture of function approximators. We further prove the convergence and optimality of the algorithm with thorough Lyapunov analysis, and demonstrate its generality and efficacy using two detailed numerical examples.
Tasks
Published	2019-09-11
URL	https://arxiv.org/abs/1909.05402v1
PDF	https://arxiv.org/pdf/1909.05402v1.pdf
PWC	https://paperswithcode.com/paper/generalized-policy-iteration-for-optimal
Repo
Framework

A supervised-learning-based strategy for optimal demand response of an HVAC System


Title	A supervised-learning-based strategy for optimal demand response of an HVAC System
Authors	Youngjin Kim
Abstract	The large thermal capacity of buildings enables heating, ventilating, and air-conditioning (HVAC) systems to be exploited as demand response (DR) resources. Optimal DR of HVAC units is challenging, particularly for multi-zone buildings, because this requires detailed physics-based models of zonal temperature variations for HVAC system operation and building thermal conditions. This paper proposes a new strategy for optimal DR of an HVAC system in a multi-zone building, based on supervised learning (SL). Artificial neural networks (ANNs) are trained with data obtained under normal building operating conditions. The ANNs are replicated using piecewise linear equations, which are explicitly integrated into an optimal scheduling problem for price-based DR. The optimization problem is solved for various electricity prices and building thermal conditions. The solutions are further used to train a deep neural network (DNN) to directly determine the optimal DR schedule, referred to here as supervised-learning-aided meta-prediction (SLAMP). Case studies are performed using three different methods: explicit ANN replication (EAR), SLAMP, and physics-based modeling. The case study results verify the effectiveness of the proposed SL-based strategy, in terms of both practical applicability and computational time, while also ensuring the thermal comfort of occupants and cost-effective operation of the HVAC system.
Tasks
Published	2019-04-29
URL	http://arxiv.org/abs/1904.13304v1
PDF	http://arxiv.org/pdf/1904.13304v1.pdf
PWC	https://paperswithcode.com/paper/a-supervised-learning-based-strategy-for
Repo
Framework

Approximation spaces of deep neural networks


Title	Approximation spaces of deep neural networks
Authors	Rémi Gribonval, Gitta Kutyniok, Morten Nielsen, Felix Voigtlaender
Abstract	We study the expressivity of deep neural networks. Measuring a network’s complexity by its number of connections or by its number of neurons, we consider the class of functions for which the error of best approximation with networks of a given complexity decays at a certain rate when increasing the complexity budget. Using results from classical approximation theory, we show that this class can be endowed with a (quasi)-norm that makes it a linear function space, called approximation space. We establish that allowing the networks to have certain types of “skip connections” does not change the resulting approximation spaces. We also discuss the role of the network’s nonlinearity (also known as activation function) on the resulting spaces, as well as the role of depth. For the popular ReLU nonlinearity and its powers, we relate the newly constructed spaces to classical Besov spaces. The established embeddings highlight that some functions of very low Besov smoothness can nevertheless be well approximated by neural networks, if these networks are sufficiently deep.
Tasks
Published	2019-05-03
URL	https://arxiv.org/abs/1905.01208v2
PDF	https://arxiv.org/pdf/1905.01208v2.pdf
PWC	https://paperswithcode.com/paper/approximation-spaces-of-deep-neural-networks
Repo
Framework

CaDIS: Cataract Dataset for Image Segmentation


Title	CaDIS: Cataract Dataset for Image Segmentation
Authors	Evangello Flouty, Abdolrahim Kadkhodamohammadi, Imanol Luengo, Felix Fuentes-Hurtado, Hinde Taleb, Santiago Barbarisi, Gwenole Quellec, Danail Stoyanov
Abstract	Video signals provide a wealth of information about surgical procedures and are the main sensory cue for surgeons. Video processing and understanding can be used to empower computer assisted interventions (CAI) as well as the development of detailed post-operative analysis of the surgical intervention. A fundamental building block to such capabilities is the ability to understand and segment video into semantic labels that differentiate and localize tissue types and different instruments. Deep learning has advanced semantic segmentation techniques dramatically in recent years but is fundamentally reliant on the availability of labelled datasets used to train models. In this paper, we introduce a high quality dataset for semantic segmentation in Cataract surgery. We generated this dataset from the CATARACTS challenge dataset, which is publicly available. To the best of our knowledge, this dataset has the highest quality annotation in surgical data to date. We introduce the dataset and then show the automatic segmentation performance of state-of-the-art models on that dataset as a benchmark.
Tasks	Semantic Segmentation
Published	2019-06-27
URL	https://arxiv.org/abs/1906.11586v4
PDF	https://arxiv.org/pdf/1906.11586v4.pdf
PWC	https://paperswithcode.com/paper/cadss-cataract-dataset-for-semantic
Repo
Framework

Dual Adversarial Learning with Attention Mechanism for Fine-grained Medical Image Synthesis


Title	Dual Adversarial Learning with Attention Mechanism for Fine-grained Medical Image Synthesis
Authors	Dong Nie, Lei Xiang, Qian Wang, Dinggang Shen
Abstract	Medical imaging plays a critical role in various clinical applications. However, due to multiple considerations such as cost and risk, the acquisition of certain image modalities could be limited. To address this issue, many cross-modality medical image synthesis methods have been proposed. However, the current methods cannot well model the hard-to-synthesis regions (e.g., tumor or lesion regions). To address this issue, we propose a simple but effective strategy, that is, we propose a dual-discriminator (dual-D) adversarial learning system, in which, a global-D is used to make an overall evaluation for the synthetic image, and a local-D is proposed to densely evaluate the local regions of the synthetic image. More importantly, we build an adversarial attention mechanism which targets at better modeling hard-to-synthesize regions (e.g., tumor or lesion regions) based on the local-D. Experimental results show the robustness and accuracy of our method in synthesizing fine-grained target images from the corresponding source images. In particular, we evaluate our method on two datasets, i.e., to address the tasks of generating T2 MRI from T1 MRI for the brain tumor images and generating MRI from CT. Our method outperforms the state-of-the-art methods under comparison in all datasets and tasks. And the proposed difficult-region-aware attention mechanism is also proved to be able to help generate more realistic images, especially for the hard-to-synthesize regions.
Tasks	Image Generation
Published	2019-07-07
URL	https://arxiv.org/abs/1907.03297v1
PDF	https://arxiv.org/pdf/1907.03297v1.pdf
PWC	https://paperswithcode.com/paper/dual-adversarial-learning-with-attention
Repo
Framework

Deep Learning Enables Automatic Detection and Segmentation of Brain Metastases on Multi-Sequence MRI


Title	Deep Learning Enables Automatic Detection and Segmentation of Brain Metastases on Multi-Sequence MRI
Authors	Endre Grøvik, Darvin Yi, Michael Iv, Elisabeth Tong, Daniel L. Rubin, Greg Zaharchuk
Abstract	Detecting and segmenting brain metastases is a tedious and time-consuming task for many radiologists, particularly with the growing use of multi-sequence 3D imaging. This study demonstrates automated detection and segmentation of brain metastases on multi-sequence MRI using a deep learning approach based on a fully convolution neural network (CNN). In this retrospective study, a total of 156 patients with brain metastases from several primary cancers were included. Pre-therapy MR images (1.5T and 3T) included pre- and post-gadolinium T1-weighted 3D fast spin echo, post-gadolinium T1-weighted 3D axial IR-prepped FSPGR, and 3D fluid attenuated inversion recovery. The ground truth was established by manual delineation by two experienced neuroradiologists. CNN training/development was performed using 100 and 5 patients, respectively, with a 2.5D network based on a GoogLeNet architecture. The results were evaluated in 51 patients, equally separated into those with few (1-3), multiple (4-10), and many (>10) lesions. Network performance was evaluated using precision, recall, Dice/F1 score, and ROC-curve statistics. For an optimal probability threshold, detection and segmentation performance was assessed on a per metastasis basis. The area under the ROC-curve (AUC), averaged across all patients, was 0.98. The AUC in the subgroups was 0.99, 0.97, and 0.97 for patients having 1-3, 4-10, and >10 metastases, respectively. Using an average optimal probability threshold determined by the development set, precision, recall, and Dice-score were 0.79, 0.53, and 0.79, respectively. At the same probability threshold, the network showed an average false positive rate of 8.3/patient (no lesion-size limit) and 3.4/patient (10 mm3 lesion size limit). In conclusion, a deep learning approach using multi-sequence MRI can aid in the detection and segmentation of brain metastases.
Tasks
Published	2019-03-18
URL	http://arxiv.org/abs/1903.07988v1
PDF	http://arxiv.org/pdf/1903.07988v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-enables-automatic-detection-and
Repo
Framework

A Short Survey on Probabilistic Reinforcement Learning


Title	A Short Survey on Probabilistic Reinforcement Learning
Authors	Reazul Hasan Russel
Abstract	A reinforcement learning agent tries to maximize its cumulative payoff by interacting in an unknown environment. It is important for the agent to explore suboptimal actions as well as to pick actions with highest known rewards. Yet, in sensitive domains, collecting more data with exploration is not always possible, but it is important to find a policy with a certain performance guaranty. In this paper, we present a brief survey of methods available in the literature for balancing exploration-exploitation trade off and computing robust solutions from fixed samples in reinforcement learning.
Tasks
Published	2019-01-21
URL	http://arxiv.org/abs/1901.07010v1
PDF	http://arxiv.org/pdf/1901.07010v1.pdf
PWC	https://paperswithcode.com/paper/a-short-survey-on-probabilistic-reinforcement
Repo
Framework