October 18, 2019

3622 words 18 mins read

Paper Group ANR 657

CosmoFlow: Using Deep Learning to Learn the Universe at Scale. Exploration in Structured Reinforcement Learning. Spread Divergences. Generalized Similarity U: A Non-parametric Test of Association Based on Similarity. Large-Scale Visual Active Learning with Deep Probabilistic Ensembles. Universal Approximation by a Slim Network with Sparse Shortcut …

CosmoFlow: Using Deep Learning to Learn the Universe at Scale


Title	CosmoFlow: Using Deep Learning to Learn the Universe at Scale
Authors	Amrita Mathuriya, Deborah Bard, Peter Mendygral, Lawrence Meadows, James Arnemann, Lei Shao, Siyu He, Tuomas Karna, Daina Moise, Simon J. Pennycook, Kristyn Maschoff, Jason Sewall, Nalini Kumar, Shirley Ho, Mike Ringenburg, Prabhat, Victor Lee
Abstract	Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of the TensorFlow framework. CosmoFlow uses efficient implementations of 3D convolution and pooling primitives, together with improvements in threading for many element-wise operations, to improve training performance on Intel(C) Xeon Phi(TM) processors. We also utilize the Cray PE Machine Learning Plugin for efficient scaling to multiple nodes. We demonstrate fully synchronous data-parallel training on 8192 nodes of Cori with 77% parallel efficiency, achieving 3.5 Pflop/s sustained performance. To our knowledge, this is the first large-scale science application of the TensorFlow framework at supercomputer scale with fully-synchronous training. These enhancements enable us to process large 3D dark matter distribution and predict the cosmological parameters $\Omega_M$, $\sigma_8$ and n$_s$ with unprecedented accuracy.
Tasks
Published	2018-08-14
URL	http://arxiv.org/abs/1808.04728v2
PDF	http://arxiv.org/pdf/1808.04728v2.pdf
PWC	https://paperswithcode.com/paper/cosmoflow-using-deep-learning-to-learn-the
Repo
Framework

Exploration in Structured Reinforcement Learning


Title	Exploration in Structured Reinforcement Learning
Authors	Jungseul Ok, Alexandre Proutiere, Damianos Tranos
Abstract	We address reinforcement learning problems with finite state and action spaces where the underlying MDP has some known structure that could be potentially exploited to minimize the exploration rates of suboptimal (state, action) pairs. For any arbitrary structure, we derive problem-specific regret lower bounds satisfied by any learning algorithm. These lower bounds are made explicit for unstructured MDPs and for those whose transition probabilities and average reward functions are Lipschitz continuous w.r.t. the state and action. For Lipschitz MDPs, the bounds are shown not to scale with the sizes $S$ and $A$ of the state and action spaces, i.e., they are smaller than $c\log T$ where $T$ is the time horizon and the constant $c$ only depends on the Lipschitz structure, the span of the bias function, and the minimal action sub-optimality gap. This contrasts with unstructured MDPs where the regret lower bound typically scales as $SA\log T$. We devise DEL (Directed Exploration Learning), an algorithm that matches our regret lower bounds. We further simplify the algorithm for Lipschitz MDPs, and show that the simplified version is still able to efficiently exploit the structure.
Tasks
Published	2018-06-03
URL	http://arxiv.org/abs/1806.00775v2
PDF	http://arxiv.org/pdf/1806.00775v2.pdf
PWC	https://paperswithcode.com/paper/exploration-in-structured-reinforcement
Repo
Framework

Spread Divergences


Title	Spread Divergences
Authors	Mingtian Zhang, Peter Hayes, Tom Bird, Raza Habib, David Barber
Abstract	For distributions p and q with different supports, the divergence D(pq) may not exist. We define a spread divergence on modified p and q and describe sufficient conditions for the existence of such a divergence. We demonstrate how to maximize the discriminatory power of a given divergence by parameterizing and learning the spread. We also give examples of using a spread divergence to train and improve implicit generative models, including linear models (Independent Components Analysis) and non-linear models (Deep Generative Networks).
Tasks
Published	2018-11-21
URL	https://arxiv.org/abs/1811.08968v3
PDF	https://arxiv.org/pdf/1811.08968v3.pdf
PWC	https://paperswithcode.com/paper/spread-divergences
Repo
Framework

Generalized Similarity U: A Non-parametric Test of Association Based on Similarity


Title	Generalized Similarity U: A Non-parametric Test of Association Based on Similarity
Authors	Changshuai Wei, Qing Lu
Abstract	Second generation sequencing technologies are being increasingly used for genetic association studies, where the main research interest is to identify sets of genetic variants that contribute to various phenotype. The phenotype can be univariate disease status, multivariate responses and even high-dimensional outcomes. Considering the genotype and phenotype as two complex objects, this also poses a general statistical problem of testing association between complex objects. We here proposed a similarity-based test, generalized similarity U (GSU), that can test the association between complex objects. We first studied the theoretical properties of the test in a general setting and then focused on the application of the test to sequencing association studies. Based on theoretical analysis, we proposed to use Laplacian kernel based similarity for GSU to boost power and enhance robustness. Through simulation, we found that GSU did have advantages over existing methods in terms of power and robustness. We further performed a whole genome sequencing (WGS) scan for Alzherimer Disease Neuroimaging Initiative (ADNI) data, identifying three genes, APOE, APOC1 and TOMM40, associated with imaging phenotype. We developed a C++ package for analysis of whole genome sequencing data using GSU. The source codes can be downloaded at https://github.com/changshuaiwei/gsu.
Tasks
Published	2018-01-04
URL	http://arxiv.org/abs/1801.01220v1
PDF	http://arxiv.org/pdf/1801.01220v1.pdf
PWC	https://paperswithcode.com/paper/generalized-similarity-u-a-non-parametric
Repo
Framework

Large-Scale Visual Active Learning with Deep Probabilistic Ensembles


Title	Large-Scale Visual Active Learning with Deep Probabilistic Ensembles
Authors	Kashyap Chitta, Jose M. Alvarez, Adam Lesnikowski
Abstract	Annotating the right data for training deep neural networks is an important challenge. Active learning using uncertainty estimates from Bayesian Neural Networks (BNNs) could provide an effective solution to this. Despite being theoretically principled, BNNs require approximations to be applied to large-scale problems, where both performance and uncertainty estimation are crucial. In this paper, we introduce Deep Probabilistic Ensembles (DPEs), a scalable technique that uses a regularized ensemble to approximate a deep BNN. We conduct a series of large-scale visual active learning experiments to evaluate DPEs on classification with the CIFAR-10, CIFAR-100 and ImageNet datasets, and semantic segmentation with the BDD100k dataset. Our models require significantly less training data to achieve competitive performances, and steadily improve upon strong active learning baselines as the annotation budget is increased.
Tasks	Active Learning, Semantic Segmentation
Published	2018-11-08
URL	http://arxiv.org/abs/1811.03575v3
PDF	http://arxiv.org/pdf/1811.03575v3.pdf
PWC	https://paperswithcode.com/paper/large-scale-visual-active-learning-with-deep
Repo
Framework

Universal Approximation by a Slim Network with Sparse Shortcut Connections


Title	Universal Approximation by a Slim Network with Sparse Shortcut Connections
Authors	Fenglei Fan, Dayang Wang, Ge Wang
Abstract	Over recent years, deep learning has become a mainstream method in machine learning. More advanced networks are being actively developed to solve real-world problems in many important areas. Among successful features of network architectures, shortcut connections are well established to take the outputs of earlier layers as the inputs to later layers, and produce excellent results such as in ResNet and DenseNet. Despite the power of shortcuts, there remain important questions on the underlying mechanism and associated functionalities. For example, will adding shortcuts lead to a more compact structure? How to use shortcuts for an optimal efficiency and capacity of the network model? Along this direction, here we demonstrate that given only one neuron in each layer, the shortcuts can be sparsely placed to let the slim network become an universal approximator. Potentially, our theoretically-guaranteed sparse network model can achieve a learning performance comparable to densely-connected networks on well-known benchmarks.
Tasks
Published	2018-11-22
URL	http://arxiv.org/abs/1811.09003v1
PDF	http://arxiv.org/pdf/1811.09003v1.pdf
PWC	https://paperswithcode.com/paper/universal-approximation-by-a-slim-network
Repo
Framework

A New Result on the Complexity of Heuristic Estimates for the A* Algorithm


Title	A New Result on the Complexity of Heuristic Estimates for the A* Algorithm
Authors	Othar Hansson, Andrew Mayer, Marco Valtorta
Abstract	Relaxed models are abstract problem descriptions generated by ignoring constraints that are present in base-level problems. They play an important role in planning and search algorithms, as it has been shown that the length of an optimal solution to a relaxed model yields a monotone heuristic for an A? search of a base-level problem. Optimal solutions to a relaxed model may be computed algorithmically or by search in a further relaxed model, leading to a search that explores a hierarchy of relaxed models. In this paper, we review the traditional definition of problem relaxation and show that searching in the abstraction hierarchy created by problem relaxation will not reduce the computational effort required to find optimal solutions to the base- level problem, unless the relaxed problem found in the hierarchy can be transformed by some optimization (e.g., subproblem factoring). Specifically, we prove that any A* search of the base-level using a heuristic h2 will largely dominate an A* search of the base-level using a heuristic h1, if h1 must be computed by an A* search of the relaxed model using h2.
Tasks
Published	2018-03-16
URL	http://arxiv.org/abs/1803.06422v1
PDF	http://arxiv.org/pdf/1803.06422v1.pdf
PWC	https://paperswithcode.com/paper/a-new-result-on-the-complexity-of-heuristic
Repo
Framework

Alternating Loss Correction for Preterm-Birth Prediction from EHR Data with Noisy Labels


Title	Alternating Loss Correction for Preterm-Birth Prediction from EHR Data with Noisy Labels
Authors	Sabri Boughorbel, Fethi Jarray, Neethu Venugopal, Haithum Elhadi
Abstract	In this paper we are interested in the prediction of preterm birth based on diagnosis codes from longitudinal EHR. We formulate the prediction problem as a supervised classification with noisy labels. Our base classifier is a Recurrent Neural Network with an attention mechanism. We assume the availability of a data subset with both noisy and clean labels. For the cohort definition, most of the diagnosis codes on mothers’ records related to pregnancy are ambiguous for the definition of full-term and preterm classes. On the other hand, diagnosis codes on babies’ records provide fine-grained information on prematurity. Due to data de-identification, the links between mothers and babies are not available. We developed a heuristic based on admission and discharge times to match babies to their mothers and hence enrich mothers’ records with additional information on delivery status. The obtained additional dataset from the matching heuristic has noisy labels and was used to leverage the training of the deep learning model. We propose an Alternating Loss Correction (ALC) method to train deep models with both clean and noisy labels. First, the label corruption matrix is estimated using the data subset with both noisy and clean labels. Then it is used in the model as a dense output layer to correct for the label noise. The network is alternately trained on epochs with the clean dataset with a simple cross-entropy loss and on next epoch with the noisy dataset and a loss corrected with the estimated corruption matrix. The experiments for the prediction of preterm birth at 90 days before delivery showed an improvement in performance compared with baseline and state of-the-art methods.
Tasks
Published	2018-11-24
URL	http://arxiv.org/abs/1811.09782v1
PDF	http://arxiv.org/pdf/1811.09782v1.pdf
PWC	https://paperswithcode.com/paper/alternating-loss-correction-for-preterm-birth
Repo
Framework

Multiple topic identification in human/human conversations


Title	Multiple topic identification in human/human conversations
Authors	X. Bost, G. Senay, M. El-Bèze, R. De Mori
Abstract	The paper deals with the automatic analysis of real-life telephone conversations between an agent and a customer of a customer care service (ccs). The application domain is the public transportation system in Paris and the purpose is to collect statistics about customer problems in order to monitor the service and decide priorities on the intervention for improving user satisfaction. Of primary importance for the analysis is the detection of themes that are the object of customer problems. Themes are defined in the application requirements and are part of the application ontology that is implicit in the ccs documentation. Due to variety of customer population, the structure of conversations with an agent is unpredictable. A conversation may be about one or more themes. Theme mentions can be interleaved with mentions of facts that are irrelevant for the application purpose. Furthermore, in certain conversations theme mentions are localized in specific conversation segments while in other conversations mentions cannot be localized. As a consequence, approaches to feature extraction with and without mention localization are considered. Application domain relevant themes identified by an automatic procedure are expressed by specific sentences whose words are hypothesized by an automatic speech recognition (asr) system. The asr system is error prone. The word error rates can be very high for many reasons. Among them it is worth mentioning unpredictable background noise, speaker accent, and various types of speech disfluencies. As the application task requires the composition of proportions of theme mentions, a sequential decision strategy is introduced in this paper for performing a survey of the large amount of conversations made available in a given time period. The strategy has to sample the conversations to form a survey containing enough data analyzed with high accuracy so that proportions can be estimated with sufficient accuracy. Due to the unpredictable type of theme mentions, it is appropriate to consider methods for theme hypothesization based on global as well as local feature extraction. Two systems based on each type of feature extraction will be considered by the strategy. One of the four methods is novel. It is based on a new definition of density of theme mentions and on the localization of high density zones whose boundaries do not need to be precisely detected. The sequential decision strategy starts by grouping theme hypotheses into sets of different expected accuracy and coverage levels. For those sets for which accuracy can be improved with a consequent increase of coverage a new system with new features is introduced. Its execution is triggered only when specific preconditions are met on the hypotheses generated by the basic four systems. Experimental results are provided on a corpus collected in the call center of the Paris transportation system known as ratp. The results show that surveys with high accuracy and coverage can be composed with the proposed strategy and systems. This makes it possible to apply a previously published proportion estimation approach that takes into account hypothesization errors .
Tasks	Speech Recognition
Published	2018-12-18
URL	http://arxiv.org/abs/1812.07207v2
PDF	http://arxiv.org/pdf/1812.07207v2.pdf
PWC	https://paperswithcode.com/paper/multiple-topic-identification-in-humanhuman
Repo
Framework


Title	From Fair Decision Making to Social Equality
Authors	Hussein Mozannar, Mesrob I. Ohannessian, Nathan Srebro
Abstract	The study of fairness in intelligent decision systems has mostly ignored long-term influence on the underlying population. Yet fairness considerations (e.g. affirmative action) have often the implicit goal of achieving balance among groups within the population. The most basic notion of balance is eventual equality between the qualifications of the groups. How can we incorporate influence dynamics in decision making? How well do dynamics-oblivious fairness policies fare in terms of reaching equality? In this paper, we propose a simple yet revealing model that encompasses (1) a selection process where an institution chooses from multiple groups according to their qualifications so as to maximize an institutional utility and (2) dynamics that govern the evolution of the groups’ qualifications according to the imposed policies. We focus on demographic parity as the formalism of affirmative action. We then give conditions under which an unconstrained policy reaches equality on its own. In this case, surprisingly, imposing demographic parity may break equality. When it doesn’t, one would expect the additional constraint to reduce utility, however, we show that utility may in fact increase. In more realistic scenarios, unconstrained policies do not lead to equality. In such cases, we show that although imposing demographic parity may remedy it, there is a danger that groups settle at a worse set of qualifications. As a silver lining, we also identify when the constraint not only leads to equality, but also improves all groups. This gives quantifiable insight into both sides of the mismatch hypothesis. These cases and trade-offs are instrumental in determining when and how imposing demographic parity can be beneficial in selection processes, both for the institution and for society on the long run.
Tasks	Decision Making
Published	2018-12-07
URL	https://arxiv.org/abs/1812.02952v2
PDF	https://arxiv.org/pdf/1812.02952v2.pdf
PWC	https://paperswithcode.com/paper/from-fair-decision-making-to-social-equality
Repo
Framework

DeepWiTraffic: Low Cost WiFi-Based Traffic Monitoring System Using Deep Learning


Title	DeepWiTraffic: Low Cost WiFi-Based Traffic Monitoring System Using Deep Learning
Authors	Myounggyu Won, Sayan Sahu, Kyung-Joon Park
Abstract	A traffic monitoring system (TMS) is an integral part of Intelligent Transportation Systems (ITS) for traffic analysis and planning. This paper addresses the endemic cost issue of deploying a large number of TMSs to cover huge miles of two-lane rural highways (119,247 miles in U.S.). A low-cost and portable TMS called DeepWiTraffic based on COTs WiFi devices and deep learning is proposed. DeepWiTraffic enables accurate vehicle detection and classification by exploiting the unique WiFi Channel State Information (CSI) of passing vehicles. Spatial and temporal correlations of preprocessed CSI amplitude and phase data are identified and analyzed using deep learning to classify vehicles into five different types: motorcycle, passenger vehicle, SUV, pickup truck, and large truck. A large amount of CSI data of passing vehicles and the corresponding ground truth video data are collected for about 120 hours to validate the effectiveness of DeepWiTraffic. The results show that the average detection accuracy of 99.4%, and the average classification accuracy of 91.1% (Motorcycle: 97.2%, Passenger Car: 91.1%, SUV:83.8%, Pickup Truck: 83.3%, and Large Truck: 99.7%) are achieved at a very small cost of about $1,000.
Tasks
Published	2018-12-19
URL	http://arxiv.org/abs/1812.08208v1
PDF	http://arxiv.org/pdf/1812.08208v1.pdf
PWC	https://paperswithcode.com/paper/deepwitraffic-low-cost-wifi-based-traffic
Repo
Framework

Detecting Reliable Novel Word Senses: A Network-Centric Approach


Title	Detecting Reliable Novel Word Senses: A Network-Centric Approach
Authors	Abhik Jana, Animesh Mukherjee, Pawan Goyal
Abstract	In this era of Big Data, due to expeditious exchange of information on the web, words are being used to denote newer meanings, causing linguistic shift. With the recent availability of large amounts of digitized texts, an automated analysis of the evolution of language has become possible. Our study mainly focuses on improving the detection of new word senses. This paper presents a unique proposal based on network features to improve the precision of new word sense detection. For a candidate word where a new sense (birth) has been detected by comparing the sense clusters induced at two different time points, we further compare the network properties of the subgraphs induced from novel sense cluster across these two time points. Using the mean fractional change in edge density, structural similarity and average path length as features in an SVM classifier, manual evaluation gives precision values of 0.86 and 0.74 for the task of new sense detection, when tested on 2 distinct time-point pairs, in comparison to the precision values in the range of 0.23-0.32, when the proposed scheme is not used. The outlined method can therefore be used as a new post-hoc step to improve the precision of novel word sense detection in a robust and reliable way where the underlying framework uses a graph structure. Another important observation is that even though our proposal is a post-hoc step, it can be used in isolation and that itself results in a very decent performance achieving a precision of 0.54-0.62. Finally, we show that our method is able to detect the well-known historical shifts in 80% cases.
Tasks
Published	2018-12-14
URL	http://arxiv.org/abs/1812.05936v1
PDF	http://arxiv.org/pdf/1812.05936v1.pdf
PWC	https://paperswithcode.com/paper/detecting-reliable-novel-word-senses-a
Repo
Framework

On Theory for BART


Title	On Theory for BART
Authors	Veronika Rockova, Enakshi Saha
Abstract	Ensemble learning is a statistical paradigm built on the premise that many weak learners can perform exceptionally well when deployed collectively. The BART method of Chipman et al. (2010) is a prominent example of Bayesian ensemble learning, where each learner is a tree. Due to its impressive performance, BART has received a lot of attention from practitioners. Despite its wide popularity, however, theoretical studies of BART have begun emerging only very recently. Laying the foundations for the theoretical analysis of Bayesian forests, Rockova and van der Pas (2017) showed optimal posterior concentration under conditionally uniform tree priors. These priors deviate from the actual priors implemented in BART. Here, we study the exact BART prior and propose a simple modification so that it also enjoys optimality properties. To this end, we dive into branching process theory. We obtain tail bounds for the distribution of total progeny under heterogeneous Galton-Watson (GW) processes exploiting their connection to random walks. We conclude with a result stating the optimal rate of posterior convergence for BART.
Tasks
Published	2018-10-01
URL	http://arxiv.org/abs/1810.00787v2
PDF	http://arxiv.org/pdf/1810.00787v2.pdf
PWC	https://paperswithcode.com/paper/on-theory-for-bart
Repo
Framework

An Optimal Control Approach to Sequential Machine Teaching


Title	An Optimal Control Approach to Sequential Machine Teaching
Authors	Laurent Lessard, Xuezhou Zhang, Xiaojin Zhu
Abstract	Given a sequential learning algorithm and a target model, sequential machine teaching aims to find the shortest training sequence to drive the learning algorithm to the target model. We present the first principled way to find such shortest training sequences. Our key insight is to formulate sequential machine teaching as a time-optimal control problem. This allows us to solve sequential teaching by leveraging key theoretical and computational tools developed over the past 60 years in the optimal control community. Specifically, we study the Pontryagin Maximum Principle, which yields a necessary condition for optimality of a training sequence. We present analytic, structural, and numerical implications of this approach on a case study with a least-squares loss function and gradient descent learner. We compute optimal training sequences for this problem, and although the sequences seem circuitous, we find that they can vastly outperform the best available heuristics for generating training sequences.
Tasks
Published	2018-10-15
URL	http://arxiv.org/abs/1810.06175v2
PDF	http://arxiv.org/pdf/1810.06175v2.pdf
PWC	https://paperswithcode.com/paper/an-optimal-control-approach-to-sequential
Repo
Framework

Event Nugget Detection with Forward-Backward Recurrent Neural Networks


Title	Event Nugget Detection with Forward-Backward Recurrent Neural Networks
Authors	Reza Ghaeini, Xiaoli Z. Fern, Liang Huang, Prasad Tadepalli
Abstract	Traditional event detection methods heavily rely on manually engineered rich features. Recent deep learning approaches alleviate this problem by automatic feature engineering. But such efforts, like tradition methods, have so far only focused on single-token event mentions, whereas in practice events can also be a phrase. We instead use forward-backward recurrent neural networks (FBRNNs) to detect events that can be either words or phrases. To the best our knowledge, this is one of the first efforts to handle multi-word events and also the first attempt to use RNNs for event detection. Experimental results demonstrate that FBRNN is competitive with the state-of-the-art methods on the ACE 2005 and the Rich ERE 2015 event detection tasks.
Tasks	Feature Engineering
Published	2018-02-15
URL	http://arxiv.org/abs/1802.05672v1
PDF	http://arxiv.org/pdf/1802.05672v1.pdf
PWC	https://paperswithcode.com/paper/event-nugget-detection-with-forward-backward
Repo
Framework