Paper Group ANR 897
Stochastically Dominant Distributional Reinforcement Learning. Distributed Parameter Estimation in Randomized One-hidden-layer Neural Networks. The Knowledge Within: Methods for Data-Free Model Compression. Understanding LSTM – a tutorial into Long Short-Term Memory Recurrent Neural Networks. Communication-Efficient Distributed Online Learning wit …
Stochastically Dominant Distributional Reinforcement Learning
Title | Stochastically Dominant Distributional Reinforcement Learning |
Authors | John D. Martin, Michal Lyskawinski, Xiaohu Li, Brendan Englot |
Abstract | We describe a new approach for managing aleatoric uncertainty in the Reinforcement Learning (RL) paradigm. Instead of selecting actions according to a single statistic, we propose a distributional method based on the second-order stochastic dominance (SSD) relation. This compares the inherent dispersion of random returns induced by actions, producing a more comprehensive and robust evaluation of the environment’s uncertainty. The necessary conditions for SSD require estimators to predict accurate second moments. To accommodate this, we map the distributional RL problem to a Wasserstein gradient flow, treating the distributional Bellman residual as a potential energy functional. We propose a particle-based algorithm for which we prove optimality and convergence. Our experiments characterize the algorithm performance and demonstrate how uncertainty and performance are better balanced using an \textsc{ssd} policy than with other risk measures. |
Tasks | Distributional Reinforcement Learning |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.07318v3 |
https://arxiv.org/pdf/1905.07318v3.pdf | |
PWC | https://paperswithcode.com/paper/stochastically-dominant-distributional |
Repo | |
Framework | |
Distributed Parameter Estimation in Randomized One-hidden-layer Neural Networks
Title | Distributed Parameter Estimation in Randomized One-hidden-layer Neural Networks |
Authors | Yinsong Wang, Shahin Shahrampour |
Abstract | This paper addresses distributed parameter estimation in randomized one-hidden-layer neural networks. A group of agents sequentially receive measurements of an unknown parameter that is only partially observable to them. In this paper, we present a fully distributed estimation algorithm where agents exchange local estimates with their neighbors to collectively identify the true value of the parameter. We prove that this distributed update provides an asymptotically unbiased estimator of the unknown parameter, i.e., the first moment of the expected global error converges to zero asymptotically. We further analyze the efficiency of the proposed estimation scheme by establishing an asymptotic upper bound on the variance of the global error. Applying our method to a real-world dataset related to appliances energy prediction, we observe that our empirical findings verify the theoretical results. |
Tasks | |
Published | 2019-09-20 |
URL | https://arxiv.org/abs/1909.09736v2 |
https://arxiv.org/pdf/1909.09736v2.pdf | |
PWC | https://paperswithcode.com/paper/190909736 |
Repo | |
Framework | |
The Knowledge Within: Methods for Data-Free Model Compression
Title | The Knowledge Within: Methods for Data-Free Model Compression |
Authors | Matan Haroush, Itay Hubara, Elad Hoffer, Daniel Soudry |
Abstract | Background: Recently, an extensive amount of research has been focused on compressing and accelerating Deep Neural Networks (DNNs). So far, high compression rate algorithms required the entire training dataset, or its subset, for fine-tuning and low precision calibration process. However, this requirement is unacceptable when sensitive data is involved as in medical and biometric use-cases. Contributions: We present three methods for generating synthetic samples from trained models. Then, we demonstrate how these samples can be used to fine-tune or to calibrate quantized models with negligible accuracy degradation compared to the original training set — without using any real data in the process. Furthermore, we suggest that our best performing method, leveraging intrinsic batch normalization layers’ statistics of a trained model, can be used to evaluate data similarity. Our approach opens a path towards genuine data-free model compression, alleviating the need for training data during deployment. |
Tasks | Calibration, Model Compression |
Published | 2019-12-03 |
URL | https://arxiv.org/abs/1912.01274v1 |
https://arxiv.org/pdf/1912.01274v1.pdf | |
PWC | https://paperswithcode.com/paper/the-knowledge-within-methods-for-data-free |
Repo | |
Framework | |
Understanding LSTM – a tutorial into Long Short-Term Memory Recurrent Neural Networks
Title | Understanding LSTM – a tutorial into Long Short-Term Memory Recurrent Neural Networks |
Authors | Ralf C. Staudemeyer, Eric Rothstein Morris |
Abstract | Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) are one of the most powerful dynamic classifiers publicly known. The network itself and the related learning algorithms are reasonably well documented to get an idea how it works. This paper will shed more light into understanding how LSTM-RNNs evolved and why they work impressively well, focusing on the early, ground-breaking publications. We significantly improved documentation and fixed a number of errors and inconsistencies that accumulated in previous publications. To support understanding we as well revised and unified the notation used. |
Tasks | |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.09586v1 |
https://arxiv.org/pdf/1909.09586v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-lstm-a-tutorial-into-long-short |
Repo | |
Framework | |
Communication-Efficient Distributed Online Learning with Kernels
Title | Communication-Efficient Distributed Online Learning with Kernels |
Authors | Michael Kamp, Sebastian Bothe, Mario Boley, Michael Mock |
Abstract | We propose an efficient distributed online learning protocol for low-latency real-time services. It extends a previously presented protocol to kernelized online learners that represent their models by a support vector expansion. While such learners often achieve higher predictive performance than their linear counterparts, communicating the support vector expansions becomes inefficient for large numbers of support vectors. The proposed extension allows for a larger class of online learning algorithms—including those alleviating the problem above through model compression. In addition, we characterize the quality of the proposed protocol by introducing a novel criterion that requires the communication to be bounded by the loss suffered. |
Tasks | Model Compression |
Published | 2019-11-28 |
URL | https://arxiv.org/abs/1911.12899v1 |
https://arxiv.org/pdf/1911.12899v1.pdf | |
PWC | https://paperswithcode.com/paper/communication-efficient-distributed-online |
Repo | |
Framework | |
Atomistic structure learning
Title | Atomistic structure learning |
Authors | Mathias S. Jørgensen, Henrik L. Mortensen, Søren A. Meldgaard, Esben L. Kolsbjerg, Thomas L. Jacobsen, Knud H. Sørensen, Bjørk Hammer |
Abstract | One endeavour of modern physical chemistry is to use bottom-up approaches to design materials and drugs with desired properties. Here we introduce an atomistic structure learning algorithm (ASLA) that utilizes a convolutional neural network to build 2D compounds and layered structures atom by atom. The algorithm takes no prior data or knowledge on atomic interactions but inquires a first-principles quantum mechanical program for physical properties. Using reinforcement learning, the algorithm accumulates knowledge of chemical compound space for a given number and type of atoms and stores this in the neural network, ultimately learning the blueprint for the optimal structural arrangement of the atoms for a given target property. ASLA is demonstrated to work on diverse problems, including grain boundaries in graphene sheets, organic compound formation and a surface oxide structure. This approach to structure prediction is a first step toward direct manipulation of atoms with artificially intelligent first principles computer codes. |
Tasks | |
Published | 2019-02-27 |
URL | http://arxiv.org/abs/1902.10501v1 |
http://arxiv.org/pdf/1902.10501v1.pdf | |
PWC | https://paperswithcode.com/paper/atomistic-structure-learning |
Repo | |
Framework | |
Morphy: A Datamorphic Software Test Automation Tool
Title | Morphy: A Datamorphic Software Test Automation Tool |
Authors | Hong Zhu, Ian Bayley, Dongmei Liu, Xiaoyu Zheng |
Abstract | This paper presents an automated tool called Morphy for datamorphic testing. It classifies software test artefacts into test entities and test morphisms, which are mappings on testing entities. In addition to datamorphisms, metamorphisms and seed test case makers, Morphy also employs a set of other test morphisms including test case metrics and filters, test set metrics and filters, test result analysers and test executers to realise test automation. In particular, basic testing activities can be automated by invoking test morphisms. Test strategies can be realised as complex combinations of test morphisms. Test processes can be automated by recording, editing and playing test scripts that invoke test morphisms and strategies. Three types of test strategies have been implemented in Morphy: datamorphism combination strategies, cluster border exploration strategies and strategies for test set optimisation via genetic algorithms. This paper focuses on the datamorphism combination strategies by giving their definitions and implementation algorithms. The paper also illustrates their uses for testing both traditional software and AI applications with three case studies. |
Tasks | |
Published | 2019-12-20 |
URL | https://arxiv.org/abs/1912.09881v1 |
https://arxiv.org/pdf/1912.09881v1.pdf | |
PWC | https://paperswithcode.com/paper/morphy-a-datamorphic-software-test-automation |
Repo | |
Framework | |
Financial Time Series Forecasting with Deep Learning : A Systematic Literature Review: 2005-2019
Title | Financial Time Series Forecasting with Deep Learning : A Systematic Literature Review: 2005-2019 |
Authors | Omer Berat Sezer, Mehmet Ugur Gudelek, Ahmet Murat Ozbayoglu |
Abstract | Financial time series forecasting is, without a doubt, the top choice of computational intelligence for finance researchers from both academia and financial industry due to its broad implementation areas and substantial impact. Machine Learning (ML) researchers came up with various models and a vast number of studies have been published accordingly. As such, a significant amount of surveys exist covering ML for financial time series forecasting studies. Lately, Deep Learning (DL) models started appearing within the field, with results that significantly outperform traditional ML counterparts. Even though there is a growing interest in developing models for financial time series forecasting research, there is a lack of review papers that were solely focused on DL for finance. Hence, our motivation in this paper is to provide a comprehensive literature review on DL studies for financial time series forecasting implementations. We not only categorized the studies according to their intended forecasting implementation areas, such as index, forex, commodity forecasting, but also grouped them based on their DL model choices, such as Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), Long-Short Term Memory (LSTM). We also tried to envision the future for the field by highlighting the possible setbacks and opportunities, so the interested researchers can benefit. |
Tasks | Time Series, Time Series Forecasting |
Published | 2019-11-29 |
URL | https://arxiv.org/abs/1911.13288v1 |
https://arxiv.org/pdf/1911.13288v1.pdf | |
PWC | https://paperswithcode.com/paper/financial-time-series-forecasting-with-deep |
Repo | |
Framework | |
Horizontal Flows and Manifold Stochastics in Geometric Deep Learning
Title | Horizontal Flows and Manifold Stochastics in Geometric Deep Learning |
Authors | Stefan Sommer, Alex Bronstein |
Abstract | We introduce two constructions in geometric deep learning for 1) transporting orientation-dependent convolutional filters over a manifold in a continuous way and thereby defining a convolution operator that naturally incorporates the rotational effect of holonomy; and 2) allowing efficient evaluation of manifold convolution layers by sampling manifold valued random variables that center around a weighted Brownian motion maximum likelihood mean. Both methods are inspired by stochastics on manifolds and geometric statistics, and provide examples of how stochastic methods – here horizontal frame bundle flows and non-linear bridge sampling schemes, can be used in geometric deep learning. We outline the theoretical foundation of the two methods, discuss their relation to Euclidean deep networks and existing methodology in geometric deep learning, and establish important properties of the proposed constructions. |
Tasks | |
Published | 2019-09-13 |
URL | https://arxiv.org/abs/1909.06397v1 |
https://arxiv.org/pdf/1909.06397v1.pdf | |
PWC | https://paperswithcode.com/paper/horizontal-flows-and-manifold-stochastics-in |
Repo | |
Framework | |
Dragonfly Algorithm and its Applications in Applied Science – Survey
Title | Dragonfly Algorithm and its Applications in Applied Science – Survey |
Authors | Chnoor M. Rahman, Tarik A. Rashid |
Abstract | One of the most recently developed heuristic optimization algorithms is dragonfly by Mirjalili. Dragonfly algorithm has shown its ability to optimizing different real world problems. It has three variants. In this work, an overview of the algorithm and its variants is presented. Moreover, the hybridization versions of the algorithm are discussed. Furthermore, the results of the applications that utilized dragonfly algorithm in applied science are offered in the following area: Machine Learning, Image Processing, Wireless, and Networking. It is then compared with some other metaheuristic algorithms. In addition, the algorithm is tested on the CEC-C06 2019 benchmark functions. The results prove that the algorithm has great exploration ability and its convergence rate is better than other algorithms in the literature, such as PSO and GA. In general, in this survey the strong and weak points of the algorithm are discussed. Furthermore, some future works that will help in improving the algorithm’s weak points are recommended. This study is conducted with the hope of offering beneficial information about dragonfly algorithm to the researchers who want to study the algorithm. |
Tasks | |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/2001.02292v1 |
https://arxiv.org/pdf/2001.02292v1.pdf | |
PWC | https://paperswithcode.com/paper/dragonfly-algorithm-and-its-applications-in |
Repo | |
Framework | |
Foreground-aware Image Inpainting
Title | Foreground-aware Image Inpainting |
Authors | Wei Xiong, Jiahui Yu, Zhe Lin, Jimei Yang, Xin Lu, Connelly Barnes, Jiebo Luo |
Abstract | Existing image inpainting methods typically fill holes by borrowing information from surrounding pixels. They often produce unsatisfactory results when the holes overlap with or touch foreground objects due to lack of information about the actual extent of foreground and background regions within the holes. These scenarios, however, are very important in practice, especially for applications such as the removal of distracting objects. To address the problem, we propose a foreground-aware image inpainting system that explicitly disentangles structure inference and content completion. Specifically, our model learns to predict the foreground contour first, and then inpaints the missing region using the predicted contour as guidance. We show that by such disentanglement, the contour completion model predicts reasonable contours of objects, and further substantially improves the performance of image inpainting. Experiments show that our method significantly outperforms existing methods and achieves superior inpainting results on challenging cases with complex compositions. |
Tasks | Image Inpainting |
Published | 2019-01-17 |
URL | http://arxiv.org/abs/1901.05945v3 |
http://arxiv.org/pdf/1901.05945v3.pdf | |
PWC | https://paperswithcode.com/paper/foreground-aware-image-inpainting |
Repo | |
Framework | |
Building a Production Model for Retrieval-Based Chatbots
Title | Building a Production Model for Retrieval-Based Chatbots |
Authors | Kyle Swanson, Lili Yu, Christopher Fox, Jeremy Wohlwend, Tao Lei |
Abstract | Response suggestion is an important task for building human-computer conversation systems. Recent approaches to conversation modeling have introduced new model architectures with impressive results, but relatively little attention has been paid to whether these models would be practical in a production setting. In this paper, we describe the unique challenges of building a production retrieval-based conversation system, which selects outputs from a whitelist of candidate responses. To address these challenges, we propose a dual encoder architecture which performs rapid inference and scales well with the size of the whitelist. We also introduce and compare two methods for generating whitelists, and we carry out a comprehensive analysis of the model and whitelists. Experimental results on a large, proprietary help desk chat dataset, including both offline metrics and a human evaluation, indicate production-quality performance and illustrate key lessons about conversation modeling in practice. |
Tasks | |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.03209v2 |
https://arxiv.org/pdf/1906.03209v2.pdf | |
PWC | https://paperswithcode.com/paper/building-a-production-model-for-retrieval |
Repo | |
Framework | |
Spotting insects from satellites: modeling the presence of Culicoides imicola through Deep CNNs
Title | Spotting insects from satellites: modeling the presence of Culicoides imicola through Deep CNNs |
Authors | Stefano Vincenzi, Angelo Porrello, Pietro Buzzega, Annamaria Conte, Carla Ippoliti, Luca Candeloro, Alessio Di Lorenzo, Andrea Capobianco Dondona, Simone Calderara |
Abstract | Nowadays, Vector-Borne Diseases (VBDs) raise a severe threat for public health, accounting for a considerable amount of human illnesses. Recently, several surveillance plans have been put in place for limiting the spread of such diseases, typically involving on-field measurements. Such a systematic and effective plan still misses, due to the high costs and efforts required for implementing it. Ideally, any attempt in this field should consider the triangle vectors-host-pathogen, which is strictly linked to the environmental and climatic conditions. In this paper, we exploit satellite imagery from Sentinel-2 mission, as we believe they encode the environmental factors responsible for the vector’s spread. Our analysis - conducted in a data-driver fashion - couples spectral images with ground-truth information on the abundance of Culicoides imicola. In this respect, we frame our task as a binary classification problem, underpinning Convolutional Neural Networks (CNNs) as being able to learn useful representation from multi-band images. Additionally, we provide a multi-instance variant, aimed at extracting temporal patterns from a short sequence of spectral images. Experiments show promising results, providing the foundations for novel supportive tools, which could depict where surveillance and prevention measures could be prioritized. |
Tasks | |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.10024v1 |
https://arxiv.org/pdf/1911.10024v1.pdf | |
PWC | https://paperswithcode.com/paper/spotting-insects-from-satellites-modeling-the |
Repo | |
Framework | |
1D-Convolutional Capsule Network for Hyperspectral Image Classification
Title | 1D-Convolutional Capsule Network for Hyperspectral Image Classification |
Authors | Haitao Zhang, Lingguo Meng, Xian Wei, Xiaoliang Tang, Xuan Tang, Xingping Wang, Bo Jin, Wei Yao |
Abstract | Recently, convolutional neural networks (CNNs) have achieved excellent performances in many computer vision tasks. Specifically, for hyperspectral images (HSIs) classification, CNNs often require very complex structure due to the high dimension of HSIs. The complex structure of CNNs results in prohibitive training efforts. Moreover, the common situation in HSIs classification task is the lack of labeled samples, which results in accuracy deterioration of CNNs. In this work, we develop an easy-to-implement capsule network to alleviate the aforementioned problems, i.e., 1D-convolution capsule network (1D-ConvCapsNet). Firstly, 1D-ConvCapsNet separately extracts spatial and spectral information on spatial and spectral domains, which is more lightweight than 3D-convolution due to fewer parameters. Secondly, 1D-ConvCapsNet utilizes the capsule-wise constraint window method to reduce parameter amount and computational complexity of conventional capsule network. Finally, 1D-ConvCapsNet obtains accurate predictions with respect to input samples via dynamic routing. The effectiveness of the 1D-ConvCapsNet is verified by three representative HSI datasets. Experimental results demonstrate that 1D-ConvCapsNet is superior to state-of-the-art methods in both the accuracy and training effort. |
Tasks | Hyperspectral Image Classification, Image Classification |
Published | 2019-03-23 |
URL | http://arxiv.org/abs/1903.09834v1 |
http://arxiv.org/pdf/1903.09834v1.pdf | |
PWC | https://paperswithcode.com/paper/1d-convolutional-capsule-network-for |
Repo | |
Framework | |
Understanding Spatial Language in Radiology: Representation Framework, Annotation, and Spatial Relation Extraction from Chest X-ray Reports using Deep Learning
Title | Understanding Spatial Language in Radiology: Representation Framework, Annotation, and Spatial Relation Extraction from Chest X-ray Reports using Deep Learning |
Authors | Surabhi Datta, Yuqi Si, Laritza Rodriguez, Sonya E Shooshan, Dina Demner-Fushman, Kirk Roberts |
Abstract | We define a representation framework for extracting spatial information from radiology reports (Rad-SpRL). We annotated a total of 2000 chest X-ray reports with 4 spatial roles corresponding to the common radiology entities. Our focus is on extracting detailed information of a radiologist’s interpretation containing a radiographic finding, its anatomical location, corresponding probable diagnoses, as well as associated hedging terms. For this, we propose a deep learning-based natural language processing (NLP) method involving both word and character-level encodings. Specifically, we utilize a bidirectional long short-term memory (Bi-LSTM) conditional random field (CRF) model for extracting the spatial roles. The model achieved average F1 measures of 90.28 and 94.61 for extracting the Trajector and Landmark roles respectively whereas the performance was moderate for Diagnosis and Hedge roles with average F1 of 71.47 and 73.27 respectively. The corpus will soon be made available upon request. |
Tasks | Relation Extraction |
Published | 2019-08-13 |
URL | https://arxiv.org/abs/1908.04485v1 |
https://arxiv.org/pdf/1908.04485v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-spatial-language-in-radiology |
Repo | |
Framework | |