April 3, 2020

3170 words 15 mins read

Paper Group ANR 59

Paper Group ANR 59

Predicting drug properties with parameter-free machine learning: Pareto-Optimal Embedded Modeling (POEM). Towards Accurate Scene Text Recognition with Semantic Reasoning Networks. The Node Weight Dependent Traveling Salesperson Problem: Approximation Algorithms and Randomized Search Heuristics. PixelHop++: A Small Successive-Subspace-Learning-Based …

Predicting drug properties with parameter-free machine learning: Pareto-Optimal Embedded Modeling (POEM)

Title Predicting drug properties with parameter-free machine learning: Pareto-Optimal Embedded Modeling (POEM)
Authors Andrew E. Brereton, Stephen MacKinnon, Zhaleh Safikhani, Shawn Reeves, Sana Alwash, Vijay Shahani, Andreas Windemuth
Abstract The prediction of absorption, distribution, metabolism, excretion, and toxicity (ADMET) of small molecules from their molecular structure is a central problem in medicinal chemistry with great practical importance in drug discovery. Creating predictive models conventionally requires substantial trial-and-error for the selection of molecular representations, machine learning (ML) algorithms, and hyperparameter tuning. A generally applicable method that performs well on all datasets without tuning would be of great value but is currently lacking. Here, we describe Pareto-Optimal Embedded Modeling (POEM), a similarity-based method for predicting molecular properties. POEM is a non-parametric, supervised ML algorithm developed to generate reliable predictive models without need for optimization. POEMs predictive strength is obtained by combining multiple different representations of molecular structures in a context-specific manner, while maintaining low dimensionality. We benchmark POEM relative to industry-standard ML algorithms and published results across 17 classifications tasks. POEM performs well in all cases and reduces the risk of overfitting.
Tasks Drug Discovery
Published 2020-02-11
URL https://arxiv.org/abs/2002.04555v1
PDF https://arxiv.org/pdf/2002.04555v1.pdf
PWC https://paperswithcode.com/paper/predicting-drug-properties-with-parameter

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Title Towards Accurate Scene Text Recognition with Semantic Reasoning Networks
Authors Deli Yu, Xuan Li, Chengquan Zhang, Junyu Han, Jingtuo Liu, Errui Ding
Abstract Scene text image contains two levels of contents: visual texture and semantic information. Although the previous scene text recognition methods have made great progress over the past few years, the research on mining semantic information to assist text recognition attracts less attention, only RNN-like structures are explored to implicitly model semantic information. However, we observe that RNN based methods have some obvious shortcomings, such as time-dependent decoding manner and one-way serial transmission of semantic context, which greatly limit the help of semantic information and the computation efficiency. To mitigate these limitations, we propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition, where a global semantic reasoning module (GSRM) is introduced to capture global semantic context through multi-way parallel transmission. The state-of-the-art results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method. In addition, the speed of SRN has significant advantages over the RNN based methods, demonstrating its value in practical use.
Tasks Scene Text Recognition
Published 2020-03-27
URL https://arxiv.org/abs/2003.12294v1
PDF https://arxiv.org/pdf/2003.12294v1.pdf
PWC https://paperswithcode.com/paper/towards-accurate-scene-text-recognition-with

The Node Weight Dependent Traveling Salesperson Problem: Approximation Algorithms and Randomized Search Heuristics

Title The Node Weight Dependent Traveling Salesperson Problem: Approximation Algorithms and Randomized Search Heuristics
Authors Jakob Bossek, Katrin Casel, Pascal Kerschke, Frank Neumann
Abstract Several important optimization problems in the area of vehicle routing can be seen as a variant of the classical Traveling Salesperson Problem (TSP). In the area of evolutionary computation, the traveling thief problem (TTP) has gained increasing interest over the last 5 years. In this paper, we investigate the effect of weights on such problems, in the sense that the cost of traveling increases with respect to the weights of nodes already visited during a tour. This provides abstractions of important TSP variants such as the Traveling Thief Problem and time dependent TSP variants, and allows to study precisely the increase in difficulty caused by weight dependence. We provide a 3.59-approximation for this weight dependent version of TSP with metric distances and bounded positive weights. Furthermore, we conduct experimental investigations for simple randomized local search with classical mutation operators and two variants of the state-of-the-art evolutionary algorithm EAX adapted to the weighted TSP. Our results show the impact of the node weights on the position of the nodes in the resulting tour.
Published 2020-02-04
URL https://arxiv.org/abs/2002.01070v1
PDF https://arxiv.org/pdf/2002.01070v1.pdf
PWC https://paperswithcode.com/paper/the-node-weight-dependent-traveling

PixelHop++: A Small Successive-Subspace-Learning-Based (SSL-based) Model for Image Classification

Title PixelHop++: A Small Successive-Subspace-Learning-Based (SSL-based) Model for Image Classification
Authors Yueru Chen, Mozhdeh Rouhsedaghat, Suya You, Raghuveer Rao, C. -C. Jay Kuo
Abstract The successive subspace learning (SSL) principle was developed and used to design an interpretable learning model, known as the PixelHop method,for image classification in our prior work. Here, we propose an improved PixelHop method and call it PixelHop++. First, to make the PixelHop model size smaller, we decouple a joint spatial-spectral input tensor to multiple spatial tensors (one for each spectral component) under the spatial-spectral separability assumption and perform the Saab transform in a channel-wise manner, called the channel-wise (c/w) Saab transform.Second, by performing this operation from one hop to another successively, we construct a channel-decomposed feature tree whose leaf nodes contain features of one dimension (1D). Third, these 1D features are ranked according to their cross-entropy values, which allows us to select a subset of discriminant features for image classification. In PixelHop++, one can control the learning model size of fine-granularity,offering a flexible tradeoff between the model size and the classification performance. We demonstrate the flexibility of PixelHop++ on MNIST, Fashion MNIST, and CIFAR-10 three datasets.
Tasks Image Classification
Published 2020-02-08
URL https://arxiv.org/abs/2002.03141v1
PDF https://arxiv.org/pdf/2002.03141v1.pdf
PWC https://paperswithcode.com/paper/pixelhop-a-small-successive-subspace-learning

RAID: Randomized Adversarial-Input Detection for Neural Networks

Title RAID: Randomized Adversarial-Input Detection for Neural Networks
Authors Hasan Ferit Eniser, Maria Christakis, Valentin Wüstholz
Abstract In recent years, neural networks have become the default choice for image classification and many other learning tasks, even though they are vulnerable to so-called adversarial attacks. To increase their robustness against these attacks, there have emerged numerous detection mechanisms that aim to automatically determine if an input is adversarial. However, state-of-the-art detection mechanisms either rely on being tuned for each type of attack, or they do not generalize across different attack types. To alleviate these issues, we propose a novel technique for adversarial-image detection, RAID, that trains a secondary classifier to identify differences in neuron activation values between benign and adversarial inputs. Our technique is both more reliable and more effective than the state of the art when evaluated against six popular attacks. Moreover, a straightforward extension of RAID increases its robustness against detection-aware adversaries without affecting its effectiveness.
Tasks Image Classification
Published 2020-02-07
URL https://arxiv.org/abs/2002.02776v1
PDF https://arxiv.org/pdf/2002.02776v1.pdf
PWC https://paperswithcode.com/paper/raid-randomized-adversarial-input-detection

Knowledge as Priors: Cross-Modal Knowledge Generalization for Datasets without Superior Knowledge

Title Knowledge as Priors: Cross-Modal Knowledge Generalization for Datasets without Superior Knowledge
Authors Long Zhao, Xi Peng, Yuxiao Chen, Mubbasir Kapadia, Dimitris N. Metaxas
Abstract Cross-modal knowledge distillation deals with transferring knowledge from a model trained with superior modalities (Teacher) to another model trained with weak modalities (Student). Existing approaches require paired training examples exist in both modalities. However, accessing the data from superior modalities may not always be feasible. For example, in the case of 3D hand pose estimation, depth maps, point clouds, or stereo images usually capture better hand structures than RGB images, but most of them are expensive to be collected. In this paper, we propose a novel scheme to train the Student in a Target dataset where the Teacher is unavailable. Our key idea is to generalize the distilled cross-modal knowledge learned from a Source dataset, which contains paired examples from both modalities, to the Target dataset by modeling knowledge as priors on parameters of the Student. We name our method “Cross-Modal Knowledge Generalization” and demonstrate that our scheme results in competitive performance for 3D hand pose estimation on standard benchmark datasets.
Tasks Hand Pose Estimation, Pose Estimation
Published 2020-04-01
URL https://arxiv.org/abs/2004.00176v1
PDF https://arxiv.org/pdf/2004.00176v1.pdf
PWC https://paperswithcode.com/paper/knowledge-as-priors-cross-modal-knowledge
Title Bio-Inspired Hashing for Unsupervised Similarity Search
Authors Chaitanya K. Ryali, John J. Hopfield, Leopold Grinberg, Dmitry Krotov
Abstract The fruit fly Drosophila’s olfactory circuit has inspired a new locality sensitive hashing (LSH) algorithm, FlyHash. In contrast with classical LSH algorithms that produce low dimensional hash codes, FlyHash produces sparse high-dimensional hash codes and has also been shown to have superior empirical performance compared to classical LSH algorithms in similarity search. However, FlyHash uses random projections and cannot learn from data. Building on inspiration from FlyHash and the ubiquity of sparse expansive representations in neurobiology, our work proposes a novel hashing algorithm BioHash that produces sparse high dimensional hash codes in a data-driven manner. We show that BioHash outperforms previously published benchmarks for various hashing methods. Since our learning algorithm is based on a local and biologically plausible synaptic plasticity rule, our work provides evidence for the proposal that LSH might be a computational reason for the abundance of sparse expansive motifs in a variety of biological systems. We also propose a convolutional variant BioConvHash that further improves performance. From the perspective of computer science, BioHash and BioConvHash are fast, scalable and yield compressed binary representations that are useful for similarity search.
Published 2020-01-14
URL https://arxiv.org/abs/2001.04907v1
PDF https://arxiv.org/pdf/2001.04907v1.pdf
PWC https://paperswithcode.com/paper/bio-inspired-hashing-for-unsupervised-1

Convergence of Q-value in case of Gaussian rewards

Title Convergence of Q-value in case of Gaussian rewards
Authors Konatsu Miyamoto, Masaya Suzuki, Yuma Kigami, Kodai Satake
Abstract In this paper, as a study of reinforcement learning, we converge the Q function to unbounded rewards such as Gaussian distribution. From the central limit theorem, in some real-world applications it is natural to assume that rewards follow a Gaussian distribution , but existing proofs cannot guarantee convergence of the Q-function. Furthermore, in the distribution-type reinforcement learning and Bayesian reinforcement learning that have become popular in recent years, it is better to allow the reward to have a Gaussian distribution. Therefore, in this paper, we prove the convergence of the Q-function under the condition of $E[r(s,a)^2]<\infty$, which is much more relaxed than the existing research. Finally, as a bonus, a proof of the policy gradient theorem for distributed reinforcement learning is also posted.
Published 2020-03-07
URL https://arxiv.org/abs/2003.03526v1
PDF https://arxiv.org/pdf/2003.03526v1.pdf
PWC https://paperswithcode.com/paper/convergence-of-q-value-in-case-of-gaussian

AliCoCo: Alibaba E-commerce Cognitive Concept Net

Title AliCoCo: Alibaba E-commerce Cognitive Concept Net
Authors Xusheng Luo, Luxin Liu, Yonghua Yang, Le Bo, Yuanpeng Cao, Jinhang Wu, Qiang Li, Keping Yang, Kenny Q. Zhu
Abstract One of the ultimate goals of e-commerce platforms is to satisfy various shopping needs for their customers. Much efforts are devoted to creating taxonomies or ontologies in e-commerce towards this goal. However, user needs in e-commerce are still not well defined, and none of the existing ontologies has the enough depth and breadth for universal user needs understanding. The semantic gap in-between prevents shopping experience from being more intelligent. In this paper, we propose to construct a large-scale e-commerce cognitive concept net named “AliCoCo”, which is practiced in Alibaba, the largest Chinese e-commerce platform in the world. We formally define user needs in e-commerce, then conceptualize them as nodes in the net. We present details on how AliCoCo is constructed semi-automatically and its successful, ongoing and potential applications in e-commerce.
Published 2020-03-30
URL https://arxiv.org/abs/2003.13230v1
PDF https://arxiv.org/pdf/2003.13230v1.pdf
PWC https://paperswithcode.com/paper/alicoco-alibaba-e-commerce-cognitive-concept

Smart Data based Ensemble for Imbalanced Big Data Classification

Title Smart Data based Ensemble for Imbalanced Big Data Classification
Authors Diego García-Gil, Johan Holmberg, Salvador García, Ning Xiong, Francisco Herrera
Abstract Big Data scenarios pose a new challenge to traditional data mining algorithms, since they are not prepared to work with such amount of data. Smart Data refers to data of enough quality to improve the outcome from a data mining algorithm. Existing data mining algorithms unability to handle Big Datasets prevents the transition from Big to Smart Data. Automation in data acquisition that characterizes Big Data also brings some problems, such as differences in data size per class. This will lead classifiers to lean towards the most represented classes. This problem is known as imbalanced data distribution, where one class is underrepresented in the dataset. Ensembles of classifiers are machine learning methods that improve the performance of a single base classifier by the combination of several of them. Ensembles are not exempt from the imbalanced classification problem. To deal with this issue, the ensemble method have to be designed specifically. In this paper, a data preprocessing ensemble for imbalanced Big Data classification is presented, with focus on two-class problems. Experiments carried out in 21 Big Datasets have proved that our ensemble classifier outperforms classic machine learning models with an added data balancing method, such as Random Forests.
Published 2020-01-16
URL https://arxiv.org/abs/2001.05759v1
PDF https://arxiv.org/pdf/2001.05759v1.pdf
PWC https://paperswithcode.com/paper/smart-data-based-ensemble-for-imbalanced-big

Stochastic Modified Equations for Continuous Limit of Stochastic ADMM

Title Stochastic Modified Equations for Continuous Limit of Stochastic ADMM
Authors Xiang Zhou, Huizhuo Yuan, Chris Junchi Li, Qingyun Sun
Abstract Stochastic version of alternating direction method of multiplier (ADMM) and its variants (linearized ADMM, gradient-based ADMM) plays a key role for modern large scale machine learning problems. One example is the regularized empirical risk minimization problem. In this work, we put different variants of stochastic ADMM into a unified form, which includes standard, linearized and gradient-based ADMM with relaxation, and study their dynamics via a continuous-time model approach. We adapt the mathematical framework of stochastic modified equation (SME), and show that the dynamics of stochastic ADMM is approximated by a class of stochastic differential equations with small noise parameters in the sense of weak approximation. The continuous-time analysis would uncover important analytical insights into the behaviors of the discrete-time algorithm, which are non-trivial to gain otherwise. For example, we could characterize the fluctuation of the solution paths precisely, and decide optimal stopping time to minimize the variance of solution paths.
Published 2020-03-07
URL https://arxiv.org/abs/2003.03532v1
PDF https://arxiv.org/pdf/2003.03532v1.pdf
PWC https://paperswithcode.com/paper/stochastic-modified-equations-for-continuous

Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise

Title Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise
Authors Umut Şimşekli, Lingjiong Zhu, Yee Whye Teh, Mert Gürbüzbalaban
Abstract Stochastic gradient descent with momentum (SGDm) is one of the most popular optimization algorithms in deep learning. While there is a rich theory of SGDm for convex problems, the theory is considerably less developed in the context of deep learning where the problem is non-convex and the gradient noise might exhibit a heavy-tailed behavior, as empirically observed in recent studies. In this study, we consider a \emph{continuous-time} variant of SGDm, known as the underdamped Langevin dynamics (ULD), and investigate its asymptotic properties under heavy-tailed perturbations. Supported by recent studies from statistical physics, we argue both theoretically and empirically that the heavy-tails of such perturbations can result in a bias even when the step-size is small, in the sense that \emph{the optima of stationary distribution} of the dynamics might not match \emph{the optima of the cost function to be optimized}. As a remedy, we develop a novel framework, which we coin as \emph{fractional} ULD (FULD), and prove that FULD targets the so-called Gibbs distribution, whose optima exactly match the optima of the original cost. We observe that the Euler discretization of FULD has noteworthy algorithmic similarities with \emph{natural gradient} methods and \emph{gradient clipping}, bringing a new perspective on understanding their role in deep learning. We support our theory with experiments conducted on a synthetic model and neural networks.
Published 2020-02-13
URL https://arxiv.org/abs/2002.05685v1
PDF https://arxiv.org/pdf/2002.05685v1.pdf
PWC https://paperswithcode.com/paper/fractional-underdamped-langevin-dynamics

Optimizing Medical Treatment for Sepsis in Intensive Care: from Reinforcement Learning to Pre-Trial Evaluation

Title Optimizing Medical Treatment for Sepsis in Intensive Care: from Reinforcement Learning to Pre-Trial Evaluation
Authors Luchen Li, Ignacio Albert-Smet, Aldo A. Faisal
Abstract Our aim is to establish a framework where reinforcement learning (RL) of optimizing interventions retrospectively allows us a regulatory compliant pathway to prospective clinical testing of the learned policies in a clinical deployment. We focus on infections in intensive care units which are one of the major causes of death and difficult to treat because of the complex and opaque patient dynamics, and the clinically debated, highly-divergent set of intervention policies required by each individual patient, yet intensive care units are naturally data rich. In our work, we build on RL approaches in healthcare (“AI Clinicians”), and learn off-policy continuous dosing policy of pharmaceuticals for sepsis treatment using historical intensive care data under partially observable MDPs (POMDPs). POMPDs capture uncertainty in patient state better by taking in all historical information, yielding an efficient representation, which we investigate through ablations. We compensate for the lack of exploration in our retrospective data by evaluating each encountered state with a best-first tree search. We mitigate state distributional shift by optimizing our policy in the vicinity of the clinicians’ compound policy. Crucially, we evaluate our model recommendations using not only conventional policy evaluations but a novel framework that incorporates human experts: a model-agnostic pre-clinical evaluation method to estimate the accuracy and uncertainty of clinician’s decisions versus our system recommendations when confronted with the same individual patient history (“shadow mode”).
Published 2020-03-13
URL https://arxiv.org/abs/2003.06474v2
PDF https://arxiv.org/pdf/2003.06474v2.pdf
PWC https://paperswithcode.com/paper/optimizing-medical-treatment-for-sepsis-in

A Tensor Network Approach to Finite Markov Decision Processes

Title A Tensor Network Approach to Finite Markov Decision Processes
Authors Edward Gillman, Dominic C. Rose, Juan P. Garrahan
Abstract Tensor network (TN) techniques - often used in the context of quantum many-body physics - have shown promise as a tool for tackling machine learning (ML) problems. The application of TNs to ML, however, has mostly focused on supervised and unsupervised learning. Yet, with their direct connection to hidden Markov chains, TNs are also naturally suited to Markov decision processes (MDPs) which provide the foundation for reinforcement learning (RL). Here we introduce a general TN formulation of finite, episodic and discrete MDPs. We show how this formulation allows us to exploit algorithms developed for TNs for policy optimisation, the key aim of RL. As an application we consider the issue - formulated as an RL problem - of finding a stochastic evolution that satisfies specific dynamical conditions, using the simple example of random walk excursions as an illustration.
Published 2020-02-12
URL https://arxiv.org/abs/2002.05185v1
PDF https://arxiv.org/pdf/2002.05185v1.pdf
PWC https://paperswithcode.com/paper/a-tensor-network-approach-to-finite-markov

A proto-object based audiovisual saliency map

Title A proto-object based audiovisual saliency map
Authors Sudarshan Ramenahalli
Abstract Natural environment and our interaction with it is essentially multisensory, where we may deploy visual, tactile and/or auditory senses to perceive, learn and interact with our environment. Our objective in this study is to develop a scene analysis algorithm using multisensory information, specifically vision and audio. We develop a proto-object based audiovisual saliency map (AVSM) for the analysis of dynamic natural scenes. A specialized audiovisual camera with $360 \degree$ Field of View, capable of locating sound direction, is used to collect spatiotemporally aligned audiovisual data. We demonstrate that the performance of proto-object based audiovisual saliency map in detecting and localizing salient objects/events is in agreement with human judgment. In addition, the proto-object based AVSM that we compute as a linear combination of visual and auditory feature conspicuity maps captures a higher number of valid salient events compared to unisensory saliency maps. Such an algorithm can be useful in surveillance, robotic navigation, video compression and related applications.
Tasks Video Compression
Published 2020-03-15
URL https://arxiv.org/abs/2003.06779v1
PDF https://arxiv.org/pdf/2003.06779v1.pdf
PWC https://paperswithcode.com/paper/a-proto-object-based-audiovisual-saliency-map
comments powered by Disqus