January 30, 2020

3390 words 16 mins read

Paper Group ANR 311

Paper Group ANR 311

Human eye inspired log-polar pre-processing for neural networks. Solving Continual Combinatorial Selection via Deep Reinforcement Learning. A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance. Learning definable hypotheses on trees. A Generalized Framework for Population Based Training. Multi-Modal Citizen Sci …

Human eye inspired log-polar pre-processing for neural networks

Title Human eye inspired log-polar pre-processing for neural networks
Authors Leendert A Remmelzwaal, Amit Mishra, George F R Ellis
Abstract In this paper we draw inspiration from the human visual system, and present a bio-inspired pre-processing stage for neural networks. We implement this by applying a log-polar transformation as a pre-processing step, and to demonstrate, we have used a naive convolutional neural network (CNN). We demonstrate that a bio-inspired pre-processing stage can achieve rotation and scale robustness in CNNs. A key point in this paper is that the CNN does not need to be trained to identify rotation or scaling permutations; rather it is the log-polar pre-processing step that converts the image into a format that allows the CNN to handle rotation and scaling permutations. In addition we demonstrate how adding a log-polar transformation as a pre-processing step can reduce the image size to ~20% of the Euclidean image size, without significantly compromising classification accuracy of the CNN. The pre-processing stage presented in this paper is modelled after the retina and therefore is only tested against an image dataset. Note: This paper has been submitted for SAUPEC/RobMech/PRASA 2020.
Tasks
Published 2019-11-04
URL https://arxiv.org/abs/1911.01141v1
PDF https://arxiv.org/pdf/1911.01141v1.pdf
PWC https://paperswithcode.com/paper/human-eye-inspired-log-polar-pre-processing
Repo
Framework

Solving Continual Combinatorial Selection via Deep Reinforcement Learning

Title Solving Continual Combinatorial Selection via Deep Reinforcement Learning
Authors Hyungseok Song, Hyeryung Jang, Hai H. Tran, Se-eun Yoon, Kyunghwan Son, Donggyu Yun, Hyoju Chung, Yung Yi
Abstract We consider the Markov Decision Process (MDP) of selecting a subset of items at each step, termed the Select-MDP (S-MDP). The large state and action spaces of S-MDPs make them intractable to solve with typical reinforcement learning (RL) algorithms especially when the number of items is huge. In this paper, we present a deep RL algorithm to solve this issue by adopting the following key ideas. First, we convert the original S-MDP into an Iterative Select-MDP (IS-MDP), which is equivalent to the S-MDP in terms of optimal actions. IS-MDP decomposes a joint action of selecting K items simultaneously into K iterative selections resulting in the decrease of actions at the expense of an exponential increase of states. Second, we overcome this state space explo-sion by exploiting a special symmetry in IS-MDPs with novel weight shared Q-networks, which prov-ably maintain sufficient expressive power. Various experiments demonstrate that our approach works well even when the item space is large and that it scales to environments with item spaces different from those used in training.
Tasks
Published 2019-09-09
URL https://arxiv.org/abs/1909.03638v1
PDF https://arxiv.org/pdf/1909.03638v1.pdf
PWC https://paperswithcode.com/paper/solving-continual-combinatorial-selection-via
Repo
Framework

A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance

Title A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance
Authors Adi Shamir, Itay Safran, Eyal Ronen, Orr Dunkelman
Abstract The existence of adversarial examples in which an imperceptible change in the input can fool well trained neural networks was experimentally discovered by Szegedy et al in 2013, who called them “Intriguing properties of neural networks”. Since then, this topic had become one of the hottest research areas within machine learning, but the ease with which we can switch between any two decisions in targeted attacks is still far from being understood, and in particular it is not clear which parameters determine the number of input coordinates we have to change in order to mislead the network. In this paper we develop a simple mathematical framework which enables us to think about this baffling phenomenon from a fresh perspective, turning it into a natural consequence of the geometry of $\mathbb{R}^n$ with the $L_0$ (Hamming) metric, which can be quantitatively analyzed. In particular, we explain why we should expect to find targeted adversarial examples with Hamming distance of roughly $m$ in arbitrarily deep neural networks which are designed to distinguish between $m$ input classes.
Tasks
Published 2019-01-30
URL http://arxiv.org/abs/1901.10861v1
PDF http://arxiv.org/pdf/1901.10861v1.pdf
PWC https://paperswithcode.com/paper/a-simple-explanation-for-the-existence-of
Repo
Framework

Learning definable hypotheses on trees

Title Learning definable hypotheses on trees
Authors Emilie Grienenberger, Martin Ritzert
Abstract We study the problem of learning properties of nodes in tree structures. Those properties are specified by logical formulas, such as formulas from first-order or monadic second-order logic. We think of the tree as a database encoding a large dataset and therefore aim for learning algorithms which depend at most sublinearly on the size of the tree. We present a learning algorithm for quantifier-free formulas where the running time only depends polynomially on the number of training examples, but not on the size of the background structure. By a previous result on strings we know that for general first-order or monadic second-order (MSO) formulas a sublinear running time cannot be achieved. However, we show that by building an index on the tree in a linear time preprocessing phase, we can achieve a learning algorithm for MSO formulas with a logarithmic learning phase.
Tasks
Published 2019-09-24
URL https://arxiv.org/abs/1909.10994v1
PDF https://arxiv.org/pdf/1909.10994v1.pdf
PWC https://paperswithcode.com/paper/learning-definable-hypotheses-on-trees
Repo
Framework

A Generalized Framework for Population Based Training

Title A Generalized Framework for Population Based Training
Authors Ang Li, Ola Spyra, Sagi Perel, Valentin Dalibard, Max Jaderberg, Chenjie Gu, David Budden, Tim Harley, Pramod Gupta
Abstract Population Based Training (PBT) is a recent approach that jointly optimizes neural network weights and hyperparameters which periodically copies weights of the best performers and mutates hyperparameters during training. Previous PBT implementations have been synchronized glass-box systems. We propose a general, black-box PBT framework that distributes many asynchronous “trials” (a small number of training steps with warm-starting) across a cluster, coordinated by the PBT controller. The black-box design does not make assumptions on model architectures, loss functions or training procedures. Our system supports dynamic hyperparameter schedules to optimize both differentiable and non-differentiable metrics. We apply our system to train a state-of-the-art WaveNet generative model for human voice synthesis. We show that our PBT system achieves better accuracy, less sensitivity and faster convergence compared to existing methods, given the same computational resource.
Tasks
Published 2019-02-05
URL http://arxiv.org/abs/1902.01894v1
PDF http://arxiv.org/pdf/1902.01894v1.pdf
PWC https://paperswithcode.com/paper/a-generalized-framework-for-population-based
Repo
Framework

Multi-Modal Citizen Science: From Disambiguation to Transcription of Classical Literature

Title Multi-Modal Citizen Science: From Disambiguation to Transcription of Classical Literature
Authors Maryam Foradi, Jan Kaßel, Johannes Pein, Gregory R. Crane
Abstract The engagement of citizens in the research projects, including Digital Humanities projects, has risen in prominence in recent years. This type of engagement not only leads to incidental learning of participants but also indicates the added value of corpus enrichment via different types of annotations undertaken by users generating so-called smart texts. Our work focuses on the continuous task of adding new layers of annotation to Classical Literature. We aim to provide more extensive tools for readers of smart texts, enhancing their reading comprehension and at the same time empowering the language learning by introducing intellectual tasks, i.e., linking, tagging, and disambiguation. The current study adds a new mode of annotation-audio annotations-to the extensively annotated corpus of poetry by the Persian poet Hafiz. By proposing tasks with three different difficulty levels, we estimate the users’ ability of providing correct annotations in order to rate their answers in further stages of the project, where no ground truth data is available. While proficiency in Persian is beneficial, annotators with no knowledge of Persian are also able to add annotations to the corpus.
Tasks Reading Comprehension
Published 2019-09-27
URL https://arxiv.org/abs/1909.12622v1
PDF https://arxiv.org/pdf/1909.12622v1.pdf
PWC https://paperswithcode.com/paper/multi-modal-citizen-science-from
Repo
Framework

Multi-output Bus Travel Time Prediction with Convolutional LSTM Neural Network

Title Multi-output Bus Travel Time Prediction with Convolutional LSTM Neural Network
Authors Niklas Christoffer Petersen, Filipe Rodrigues, Francisco Camara Pereira
Abstract Accurate and reliable travel time predictions in public transport networks are essential for delivering an attractive service that is able to compete with other modes of transport in urban areas. The traditional application of this information, where arrival and departure predictions are displayed on digital boards, is highly visible in the city landscape of most modern metropolises. More recently, the same information has become critical as input for smart-phone trip planners in order to alert passengers about unreachable connections, alternative route choices and prolonged travel times. More sophisticated Intelligent Transport Systems (ITS) include the predictions of connection assurance, i.e. to hold back services in case a connecting service is delayed. In order to operate such systems, and to ensure the confidence of passengers in the systems, the information provided must be accurate and reliable. Traditional methods have trouble with this as congestion, and thus travel time variability, increases in cities, consequently making travel time predictions in urban areas a non-trivial task. This paper presents a system for bus travel time prediction that leverages the non-static spatio-temporal correlations present in urban bus networks, allowing the discovery of complex patterns not captured by traditional methods. The underlying model is a multi-output, multi-time-step, deep neural network that uses a combination of convolutional and long short-term memory (LSTM) layers. The method is empirically evaluated and compared to other popular approaches for link travel time prediction and currently available services, including the currently deployed model in Copenhagen, Denmark. We find that the proposed model significantly outperforms all the other methods we compare with, and is able to detect small irregular peaks in bus travel times very quickly.
Tasks
Published 2019-03-07
URL http://arxiv.org/abs/1903.02791v1
PDF http://arxiv.org/pdf/1903.02791v1.pdf
PWC https://paperswithcode.com/paper/multi-output-bus-travel-time-prediction-with
Repo
Framework

Softmax Is Not an Artificial Trick: An Information-Theoretic View of Softmax in Neural Networks

Title Softmax Is Not an Artificial Trick: An Information-Theoretic View of Softmax in Neural Networks
Authors Zhenyue Qin, Dongwoo Kim
Abstract Despite great popularity of applying softmax to map the non-normalised outputs of a neural network to a probability distribution over predicting classes, this normalised exponential transformation still seems to be artificial. A theoretic framework that incorporates softmax as an intrinsic component is still lacking. In this paper, we view neural networks embedding softmax from an information-theoretic perspective. Under this view, we can naturally and mathematically derive log-softmax as an inherent component in a neural network for evaluating the conditional mutual information between network output vectors and labels given an input datum. We show that training deterministic neural networks through maximising log-softmax is equivalent to enlarging the conditional mutual information, i.e., feeding label information into network outputs. We also generalise our informative-theoretic perspective to neural networks with stochasticity and derive information upper and lower bounds of log-softmax. In theory, such an information-theoretic view offers rationality support for embedding softmax in neural networks; in practice, we eventually demonstrate a computer vision application example of how to employ our information-theoretic view to filter out targeted objects on images.
Tasks
Published 2019-10-07
URL https://arxiv.org/abs/1910.02629v3
PDF https://arxiv.org/pdf/1910.02629v3.pdf
PWC https://paperswithcode.com/paper/softmax-is-not-an-artificial-trick-an
Repo
Framework

Measuring Conceptual Entanglement in Collections of Documents

Title Measuring Conceptual Entanglement in Collections of Documents
Authors Tomas Veloz, Xiazhao Zhao, Diederik Aerts
Abstract Conceptual entanglement is a crucial phenomenon in quantum cognition because it implies that classical probabilities cannot model non–compositional conceptual phenomena. While several psychological experiments have been developed to test conceptual entanglement, this has not been explored in the context of Natural Language Processing. In this paper, we apply the hypothesis that words of a document are traces of the concepts that a person has in mind when writing the document. Therefore, if these concepts are entangled, we should be able to observe traces of their entanglement in the documents. In particular, we test conceptual entanglement by contrasting language simulations with results obtained from a text corpus. Our analysis indicates that conceptual entanglement is strongly linked to the way in which language is structured. We discuss the implications of this finding in the context of conceptual modeling and of Natural Language Processing.
Tasks
Published 2019-09-20
URL https://arxiv.org/abs/1909.09708v1
PDF https://arxiv.org/pdf/1909.09708v1.pdf
PWC https://paperswithcode.com/paper/190909708
Repo
Framework

Large-scale traffic signal control using machine learning: some traffic flow considerations

Title Large-scale traffic signal control using machine learning: some traffic flow considerations
Authors Jorge A. Laval, Hao Zhou
Abstract This paper uses supervised learning, random search and deep reinforcement learning (DRL) methods to control large signalized intersection networks. The traffic model is Cellular Automaton rule 184, which has been shown to be a parameter-free representation of traffic flow, and is the most efficient implementation of the Kinematic Wave model with triangular fundamental diagram. We are interested in the steady-state performance of the system, both spatially and temporally: we consider a homogeneous grid network inscribed on a torus, which makes the network boundary-free, and drivers choose random routes. As a benchmark we use the longest-queue-first (LQF) greedy algorithm. We find that: (i) a policy trained with supervised learning with only two examples outperforms LQF, (ii) random search is able to generate near-optimal policies, (iii) the prevailing average network occupancy during training is the major determinant of the effectiveness of DRL policies. When trained under free-flow conditions one obtains DRL policies that are optimal for all traffic conditions, but this performance deteriorates as the occupancy during training increases. For occupancies > 75% during training, DRL policies perform very poorly for all traffic conditions, which means that DRL methods cannot learn under highly congested conditions. We conjecture that DRL’s inability to learn under congestion might be explained by a property of urban networks found here, whereby even a very bad policy produces an intersection throughput higher than downstream capacity. This means that the actual throughput tends to be independent of the policy. Our findings imply that it is advisable for current DRL methods in the literature to discard any congested data when training, and that doing this will improve their performance under all traffic conditions.
Tasks
Published 2019-08-07
URL https://arxiv.org/abs/1908.02673v1
PDF https://arxiv.org/pdf/1908.02673v1.pdf
PWC https://paperswithcode.com/paper/large-scale-traffic-signal-control-using
Repo
Framework

Bayesian Persuasion with Sequential Games

Title Bayesian Persuasion with Sequential Games
Authors Andrea Celli, Stefano Coniglio, Nicola Gatti
Abstract We study an information-structure design problem (a.k.a. persuasion) with a single sender and multiple receivers with actions of a priori unknown types, independently drawn from action-specific marginal distributions. As in the standard Bayesian persuasion model, the sender has access to additional information regarding the action types, which she can exploit when committing to a (noisy) signaling scheme through which she sends a private signal to each receiver. The novelty of our model is in considering the case where the receivers interact in a sequential game with imperfect information, with utilities depending on the game outcome and the realized action types. After formalizing the notions of ex ante and ex interim persuasiveness (which differ in the time at which the receivers commit to following the sender’s signaling scheme), we investigate the continuous optimization problem of computing a signaling scheme which maximizes the sender’s expected revenue. We show that computing an optimal ex ante persuasive signaling scheme is NP-hard when there are three or more receivers. In contrast with previous hardness results for ex interim persuasion, we show that, for games with two receivers, an optimal ex ante persuasive signaling scheme can be computed in polynomial time thanks to a novel algorithm based on the ellipsoid method which we propose.
Tasks
Published 2019-08-02
URL https://arxiv.org/abs/1908.00877v1
PDF https://arxiv.org/pdf/1908.00877v1.pdf
PWC https://paperswithcode.com/paper/bayesian-persuasion-with-sequential-games
Repo
Framework

Identifying Sub-Phenotypes of Acute Kidney Injury using Structured and Unstructured Electronic Health Record Data with Memory Networks

Title Identifying Sub-Phenotypes of Acute Kidney Injury using Structured and Unstructured Electronic Health Record Data with Memory Networks
Authors Zhenxing Xu, Jingyuan Chou, Xi Sheryl Zhang, Yuan Luo, Tamara Isakova, Prakash Adekkanattu, Jessica S. Ancker, Guoqian Jiang, Richard C. Kiefer, Jennifer A. Pacheco, Luke V. Rasmussen, Jyotishman Pathak, Fei Wang
Abstract Acute Kidney Injury (AKI) is a common clinical syndrome characterized by the rapid loss of kidney excretory function, which aggravates the clinical severity of other diseases in a large number of hospitalized patients. Accurate early prediction of AKI can enable in-time interventions and treatments. However, AKI is highly heterogeneous, thus identification of AKI sub-phenotypes can lead to an improved understanding of the disease pathophysiology and development of more targeted clinical interventions. This study used a memory network-based deep learning approach to discover AKI sub-phenotypes using structured and unstructured electronic health record (EHR) data of patients before AKI diagnosis. We leveraged a real world critical care EHR corpus including 37,486 ICU stays. Our approach identified three distinct sub-phenotypes: sub-phenotype I is with an average age of 63.03$ \pm 17.25 $ years, and is characterized by mild loss of kidney excretory function (Serum Creatinine (SCr) $1.55\pm 0.34$ mg/dL, estimated Glomerular Filtration Rate Test (eGFR) $107.65\pm 54.98$ mL/min/1.73$m^2$). These patients are more likely to develop stage I AKI. Sub-phenotype II is with average age 66.81$ \pm 10.43 $ years, and was characterized by severe loss of kidney excretory function (SCr $1.96\pm 0.49$ mg/dL, eGFR $82.19\pm 55.92$ mL/min/1.73$m^2$). These patients are more likely to develop stage III AKI. Sub-phenotype III is with average age 65.07$ \pm 11.32 $ years, and was characterized moderate loss of kidney excretory function and thus more likely to develop stage II AKI (SCr $1.69\pm 0.32$ mg/dL, eGFR $93.97\pm 56.53$ mL/min/1.73$m^2$). Both SCr and eGFR are significantly different across the three sub-phenotypes with statistical testing plus postdoc analysis, and the conclusion still holds after age adjustment.
Tasks
Published 2019-04-10
URL https://arxiv.org/abs/1904.04990v2
PDF https://arxiv.org/pdf/1904.04990v2.pdf
PWC https://paperswithcode.com/paper/identification-of-predictive-sub-phenotypes
Repo
Framework

A Quantum Annealing-Based Approach to Extreme Clustering

Title A Quantum Annealing-Based Approach to Extreme Clustering
Authors Tim Jaschek, Marko Bucyk, Jaspreet S. Oberoi
Abstract Clustering, or grouping, dataset elements based on similarity can be used not only to classify a dataset into a few categories, but also to approximate it by a relatively large number of representative elements. In the latter scenario, referred to as extreme clustering, datasets are enormous and the number of representative clusters is large. We have devised a distributed method that can efficiently solve extreme clustering problems using quantum annealing. We prove that this method yields optimal clustering assignments under a separability assumption, and show that the generated clustering assignments are of comparable quality to those of assignments generated by common clustering algorithms, yet can be obtained a full order of magnitude faster.
Tasks
Published 2019-03-19
URL https://arxiv.org/abs/1903.08256v3
PDF https://arxiv.org/pdf/1903.08256v3.pdf
PWC https://paperswithcode.com/paper/a-quantum-annealing-based-approach-to-extreme
Repo
Framework

Sequential Recommender Systems: Challenges, Progress and Prospects

Title Sequential Recommender Systems: Challenges, Progress and Prospects
Authors Shoujin Wang, Liang Hu, Yan Wang, Longbing Cao, Quan Z. Sheng, Mehmet Orgun
Abstract The emerging topic of sequential recommender systems has attracted increasing attention in recent years.Different from the conventional recommender systems including collaborative filtering and content-based filtering, SRSs try to understand and model the sequential user behaviors, the interactions between users and items, and the evolution of users preferences and item popularity over time. SRSs involve the above aspects for more precise characterization of user contexts, intent and goals, and item consumption trend, leading to more accurate, customized and dynamic recommendations.In this paper, we provide a systematic review on SRSs.We first present the characteristics of SRSs, and then summarize and categorize the key challenges in this research area, followed by the corresponding research progress consisting of the most recent and representative developments on this topic.Finally, we discuss the important research directions in this vibrant area.
Tasks Recommendation Systems
Published 2019-12-28
URL https://arxiv.org/abs/2001.04830v1
PDF https://arxiv.org/pdf/2001.04830v1.pdf
PWC https://paperswithcode.com/paper/sequential-recommender-systems-challenges
Repo
Framework

Explanation by Progressive Exaggeration

Title Explanation by Progressive Exaggeration
Authors Sumedha Singla, Brian Pollack, Junxiang Chen, Kayhan Batmanghelich
Abstract As machine learning methods see greater adoption and implementation in high stakes applications such as medical image diagnosis, the need for model interpretability and explanation has become more critical. Classical approaches that assess feature importance (e.g. saliency maps) do not explain how and why a particular region of an image is relevant to the prediction. We propose a method that explains the outcome of a classification black-box by gradually exaggerating the semantic effect of a given class. Given a query input to a classifier, our method produces a progressive set of plausible variations of that query, which gradually changes the posterior probability from its original class to its negation. These counter-factually generated samples preserve features unrelated to the classification decision, such that a user can employ our method as a “tuning knob” to traverse a data manifold while crossing the decision boundary. Our method is model agnostic and only requires the output value and gradient of the predictor with respect to its input.
Tasks Feature Importance
Published 2019-11-01
URL https://arxiv.org/abs/1911.00483v3
PDF https://arxiv.org/pdf/1911.00483v3.pdf
PWC https://paperswithcode.com/paper/explanation-by-progressive-exaggeration-1
Repo
Framework
comments powered by Disqus