October 16, 2019

3429 words 17 mins read

Paper Group ANR 1152

Directly and Efficiently Optimizing Prediction Error and AUC of Linear Classifiers. DeepMoTIon: Learning to Navigate Like Humans. End-to-end named entity extraction from speech. A Reinforcement Learning-driven Translation Model for Search-Oriented Conversational Systems. Receiver Operating Characteristic Curves and Confidence Bands for Support Vect …

Directly and Efficiently Optimizing Prediction Error and AUC of Linear Classifiers


Title	Directly and Efficiently Optimizing Prediction Error and AUC of Linear Classifiers
Authors	Hiva Ghanbari, Katya Scheinberg
Abstract	The predictive quality of machine learning models is typically measured in terms of their (approximate) expected prediction error or the so-called Area Under the Curve (AUC) for a particular data distribution. However, when the models are constructed by the means of empirical risk minimization, surrogate functions such as the logistic loss are optimized instead. This is done because the empirical approximations of the expected error and AUC functions are nonconvex and nonsmooth, and more importantly have zero derivative almost everywhere. In this work, we show that in the case of linear predictors, and under the assumption that the data has normal distribution, the expected error and the expected AUC are not only smooth, but have closed form expressions, which depend on the first and second moments of the normal distribution. Hence, we derive derivatives of these two functions and use these derivatives in an optimization algorithm to directly optimize the expected error and the AUC. In the case of real data sets, the derivatives can be approximated using empirical moments. We show that even when data is not normally distributed, computed derivatives are sufficiently useful to render an efficient optimization method and high quality solutions. Thus, we propose a gradient-based optimization method for direct optimization of the prediction error and AUC. Moreover, the per-iteration complexity of the proposed algorithm has no dependence on the size of the data set, unlike those for optimizing logistic regression and all other well known empirical risk minimization problems.
Tasks
Published	2018-02-07
URL	http://arxiv.org/abs/1802.02535v1
PDF	http://arxiv.org/pdf/1802.02535v1.pdf
PWC	https://paperswithcode.com/paper/directly-and-efficiently-optimizing
Repo
Framework

DeepMoTIon: Learning to Navigate Like Humans


Title	DeepMoTIon: Learning to Navigate Like Humans
Authors	Mahmoud Hamandi, Mike D’Arcy, Pooyan Fazli
Abstract	We present a novel human-aware navigation approach, where the robot learns to mimic humans to navigate safely in crowds. The presented model, referred to as DeepMoTIon, is trained with pedestrian surveillance data to predict human velocity in the environment. The robot processes LiDAR scans via the trained network to navigate to the target location. We conduct extensive experiments to assess the components of our network and prove their necessity to imitate humans. Our experiments show that DeepMoTIion outperforms all the benchmarks in terms of human imitation, achieving a 24% reduction in time series-based path deviation over the next best approach. In addition, while many other approaches often failed to reach the target, our method reached the target in 100% of the test cases while complying with social norms and ensuring human safety.
Tasks	Time Series
Published	2018-03-09
URL	https://arxiv.org/abs/1803.03719v3
PDF	https://arxiv.org/pdf/1803.03719v3.pdf
PWC	https://paperswithcode.com/paper/deepmotion-learning-to-navigate-like-humans
Repo
Framework

End-to-end named entity extraction from speech


Title	End-to-end named entity extraction from speech
Authors	Sahar Ghannay, Antoine Caubrière, Yannick Estève, Antoine Laurent, Emmanuel Morin
Abstract	Named entity recognition (NER) is among SLU tasks that usually extract semantic information from textual documents. Until now, NER from speech is made through a pipeline process that consists in processing first an automatic speech recognition (ASR) on the audio and then processing a NER on the ASR outputs. Such approach has some disadvantages (error propagation, metric to tune ASR systems sub-optimal in regards to the final task, reduced space search at the ASR output level…) and it is known that more integrated approaches outperform sequential ones, when they can be applied. In this paper, we present a first study of end-to-end approach that directly extracts named entities from speech, though a unique neural architecture. On a such way, a joint optimization is able for both ASR and NER. Experiments are carried on French data easily accessible, composed of data distributed in several evaluation campaign. Experimental results show that this end-to-end approach provides better results (F-measure=0.69 on test data) than a classical pipeline approach to detect named entity categories (F-measure=0.65).
Tasks	Entity Extraction, Named Entity Recognition, Speech Recognition
Published	2018-05-30
URL	http://arxiv.org/abs/1805.12045v1
PDF	http://arxiv.org/pdf/1805.12045v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-named-entity-extraction-from
Repo
Framework

A Reinforcement Learning-driven Translation Model for Search-Oriented Conversational Systems


Title	A Reinforcement Learning-driven Translation Model for Search-Oriented Conversational Systems
Authors	Wafa Aissa, Laure Soulier, Ludovic Denoyer
Abstract	Search-oriented conversational systems rely on information needs expressed in natural language (NL). We focus here on the understanding of NL expressions for building keyword-based queries. We propose a reinforcement-learning-driven translation model framework able to 1) learn the translation from NL expressions to queries in a supervised way, and, 2) to overcome the lack of large-scale dataset by framing the translation model as a word selection approach and injecting relevance feedback in the learning process. Experiments are carried out on two TREC datasets and outline the effectiveness of our approach.
Tasks
Published	2018-08-29
URL	http://arxiv.org/abs/1809.01495v1
PDF	http://arxiv.org/pdf/1809.01495v1.pdf
PWC	https://paperswithcode.com/paper/a-reinforcement-learning-driven-translation
Repo
Framework

Receiver Operating Characteristic Curves and Confidence Bands for Support Vector Machines


Title	Receiver Operating Characteristic Curves and Confidence Bands for Support Vector Machines
Authors	Daniel J. Luckett, Eric B. Laber, Samer S. El-Kamary, Cheng Fan, Ravi Jhaveri, Charles M. Perou, Fatma M. Shebl, Michael R. Kosorok
Abstract	Many problems that appear in biomedical decision making, such as diagnosing disease and predicting response to treatment, can be expressed as binary classification problems. The costs of false positives and false negatives vary across application domains and receiver operating characteristic (ROC) curves provide a visual representation of this trade-off. Nonparametric estimators for the ROC curve, such as a weighted support vector machine (SVM), are desirable because they are robust to model misspecification. While weighted SVMs have great potential for estimating ROC curves, their theoretical properties were heretofore underdeveloped. We propose a method for constructing confidence bands for the SVM ROC curve and provide the theoretical justification for the SVM ROC curve by showing that the risk function of the estimated decision rule is uniformly consistent across the weight parameter. We demonstrate the proposed confidence band method and the superior sensitivity and specificity of the weighted SVM compared to commonly used methods in diagnostic medicine using simulation studies. We present two illustrative examples: diagnosis of hepatitis C and a predictive model for treatment response in breast cancer.
Tasks	Decision Making
Published	2018-07-17
URL	http://arxiv.org/abs/1807.06711v1
PDF	http://arxiv.org/pdf/1807.06711v1.pdf
PWC	https://paperswithcode.com/paper/receiver-operating-characteristic-curves-and
Repo
Framework

Reinforcement Learning for Dynamic Bidding in Truckload Markets: an Application to Large-Scale Fleet Management with Advance Commitments


Title	Reinforcement Learning for Dynamic Bidding in Truckload Markets: an Application to Large-Scale Fleet Management with Advance Commitments
Authors	Yingfei Wang, Juliana Martins Do Nascimento, Warren Powell
Abstract	Truckload brokerages, a $100 billion/year industry in the U.S., plays the critical role of matching shippers with carriers, often to move loads several days into the future. Brokerages not only have to find companies that will agree to move a load, the brokerage often has to find a price that both the shipper and carrier will agree to. The price not only varies by shipper and carrier, but also by the traffic lanes and other variables such as commodity type. Brokerages have to learn about shipper and carrier response functions by offering a price and observing whether each accepts the quote. We propose a knowledge gradient policy with bootstrap aggregation for high-dimensional contextual settings to guide price experimentation by maximizing the value of information. The learning policy is tested using a carefully calibrated fleet simulator that includes a stochastic lookahead policy that simulates fleet movements, as well as the stochastic modeling of driver assignments and the carrier’s load commitment policies with advance booking.
Tasks
Published	2018-02-25
URL	https://arxiv.org/abs/1802.08976v2
PDF	https://arxiv.org/pdf/1802.08976v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-bidding-for-advance-commitments-in
Repo
Framework

Using Eigencentrality to Estimate Joint, Conditional and Marginal Probabilities from Mixed-Variable Data: Method and Applications


Title	Using Eigencentrality to Estimate Joint, Conditional and Marginal Probabilities from Mixed-Variable Data: Method and Applications
Authors	Andrew Skabar
Abstract	The ability to estimate joint, conditional and marginal probability distributions over some set of variables is of great utility for many common machine learning tasks. However, estimating these distributions can be challenging, particularly in the case of data containing a mix of discrete and continuous variables. This paper presents a non-parametric method for estimating these distributions directly from a dataset. The data are first represented as a graph consisting of object nodes and attribute value nodes. Depending on the distribution to be estimated, an appropriate eigenvector equation is then constructed. This equation is then solved to find the corresponding stationary distribution of the graph, from which the required distributions can then be estimated and sampled from. The paper demonstrates how the method can be applied to many common machine learning tasks including classification, regression, missing value imputation, outlier detection, random vector generation, and clustering.
Tasks	Imputation, Outlier Detection
Published	2018-09-19
URL	http://arxiv.org/abs/1809.07006v1
PDF	http://arxiv.org/pdf/1809.07006v1.pdf
PWC	https://paperswithcode.com/paper/using-eigencentrality-to-estimate-joint
Repo
Framework

Composing Modeling and Inference Operations with Probabilistic Program Combinators


Title	Composing Modeling and Inference Operations with Probabilistic Program Combinators
Authors	Eli Sennesh, Adam Ścibior, Hao Wu, Jan-Willem van de Meent
Abstract	Probabilistic programs with dynamic computation graphs can define measures over sample spaces with unbounded dimensionality, which constitute programmatic analogues to Bayesian nonparametrics. Owing to the generality of this model class, inference relies on `black-box' Monte Carlo methods that are often not able to take advantage of conditional independence and exchangeability, which have historically been the cornerstones of efficient inference. We here seek to develop a` middle ground’ between probabilistic models with fully dynamic and fully static computation graphs. To this end, we introduce a combinator library for the Probabilistic Torch framework. Combinators are functions that accept models and return transformed models. We assume that models are dynamic, but that model composition is static, in the sense that combinator application takes place prior to evaluating the model on data. Combinators provide primitives for both model and inference composition. Model combinators take the form of classic functional programming constructs such as map and reduce. These constructs define a computation graph at a coarsened level of representation, in which nodes correspond to models, rather than individual variables. Inference combinators implement operations such as importance resampling and application of a transition kernel, which alter the evaluation strategy for a model whilst preserving proper weighting. Owing to this property, models defined using combinators can be trained using stochastic methods that optimize either variational or wake-sleep style objectives. As a validation of this principle, we use combinators to implement black box inference for hidden Markov models.
Tasks
Published	2018-11-14
URL	http://arxiv.org/abs/1811.05965v3
PDF	http://arxiv.org/pdf/1811.05965v3.pdf
PWC	https://paperswithcode.com/paper/composing-modeling-and-inference-operations
Repo
Framework

GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization


Title	GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization
Authors	Hung-I Harry Chen, Yu-Chiao Chiu, Tinghe Zhang, Songyao Zhang, Yufei Huang, Yidong Chen
Abstract	Bioinformatics tools have been developed to interpret gene expression data at the gene set level, and these gene set based analyses improve the biologists’ capability to discover functional relevance of their experiment design. While elucidating gene set individually, inter gene sets association is rarely taken into consideration. Deep learning, an emerging machine learning technique in computational biology, can be used to generate an unbiased combination of gene set, and to determine the biological relevance and analysis consistency of these combining gene sets by leveraging large genomic data sets. In this study, we proposed a gene superset autoencoder (GSAE), a multi-layer autoencoder model with the incorporation of a priori defined gene sets that retain the crucial biological features in the latent layer. We introduced the concept of the gene superset, an unbiased combination of gene sets with weights trained by the autoencoder, where each node in the latent layer is a superset. Trained with genomic data from TCGA and evaluated with their accompanying clinical parameters, we showed gene supersets’ ability of discriminating tumor subtypes and their prognostic capability. We further demonstrated the biological relevance of the top component gene sets in the significant supersets. Using autoencoder model and gene superset at its latent layer, we demonstrated that gene supersets retain sufficient biological information with respect to tumor subtypes and clinical prognostic significance. Superset also provides high reproducibility on survival analysis and accurate prediction for cancer subtypes.
Tasks	Survival Analysis
Published	2018-05-21
URL	http://arxiv.org/abs/1805.07874v2
PDF	http://arxiv.org/pdf/1805.07874v2.pdf
PWC	https://paperswithcode.com/paper/gsae-an-autoencoder-with-embedded-gene-set
Repo
Framework

Beyond the Low-Degree Algorithm: Mixtures of Subcubes and Their Applications


Title	Beyond the Low-Degree Algorithm: Mixtures of Subcubes and Their Applications
Authors	Sitan Chen, Ankur Moitra
Abstract	We introduce the problem of learning mixtures of $k$ subcubes over ${0,1}^n$, which contains many classic learning theory problems as a special case (and is itself a special case of others). We give a surprising $n^{O(\log k)}$-time learning algorithm based on higher-order multilinear moments. It is not possible to learn the parameters because the same distribution can be represented by quite different models. Instead, we develop a framework for reasoning about how multilinear moments can pinpoint essential features of the mixture, like the number of components. We also give applications of our algorithm to learning decision trees with stochastic transitions (which also capture interesting scenarios where the transitions are deterministic but there are latent variables). Using our algorithm for learning mixtures of subcubes, we can approximate the Bayes optimal classifier within additive error $\epsilon$ on $k$-leaf decision trees with at most $s$ stochastic transitions on any root-to-leaf path in $n^{O(s + \log k)}\cdot\text{poly}(1/\epsilon)$ time. In this stochastic setting, the classic Occam algorithms for learning decision trees with zero stochastic transitions break down, while the low-degree algorithm of Linial et al. inherently has a quasipolynomial dependence on $1/\epsilon$. In contrast, as we will show, mixtures of $k$ subcubes are uniquely determined by their degree $2 \log k$ moments and hence provide a useful abstraction for simultaneously achieving the polynomial dependence on $1/\epsilon$ of the classic Occam algorithms for decision trees and the flexibility of the low-degree algorithm in being able to accommodate stochastic transitions. Using our multilinear moment techniques, we also give the first improved upper and lower bounds since the work of Feldman et al. for the related but harder problem of learning mixtures of binary product distributions.
Tasks
Published	2018-03-17
URL	http://arxiv.org/abs/1803.06521v2
PDF	http://arxiv.org/pdf/1803.06521v2.pdf
PWC	https://paperswithcode.com/paper/beyond-the-low-degree-algorithm-mixtures-of
Repo
Framework

Understanding Individual Neuron Importance Using Information Theory


Title	Understanding Individual Neuron Importance Using Information Theory
Authors	Rana Ali Amjad, Kairen Liu, Bernhard C. Geiger
Abstract	In this work, we investigate the use of three information-theoretic quantities – entropy, mutual information with the class variable, and a class selectivity measure based on Kullback-Leibler divergence – to understand and study the behavior of already trained fully-connected feed-forward neural networks. We analyze the connection between these information-theoretic quantities and classification performance on the test set by cumulatively ablating neurons in networks trained on MNIST, FashionMNIST, and CIFAR-10. Our results parallel those recently published by Morcos et al., indicating that class selectivity is not a good indicator for classification performance. However, looking at individual layers separately, both mutual information and class selectivity are positively correlated with classification performance, at least for networks with ReLU activation functions. We provide explanations for this phenomenon and conclude that it is ill-advised to compare the proposed information-theoretic quantities across layers. Finally, we briefly discuss future prospects of employing information-theoretic quantities for different purposes, including neuron pruning and studying the effect that different regularizers and architectures have on the trained neural network. We also draw connections to the information bottleneck theory of neural networks.
Tasks
Published	2018-04-18
URL	https://arxiv.org/abs/1804.06679v3
PDF	https://arxiv.org/pdf/1804.06679v3.pdf
PWC	https://paperswithcode.com/paper/understanding-individual-neuron-importance
Repo
Framework

Ensemble of Convolutional Neural Networks for Dermoscopic Images Classification


Title	Ensemble of Convolutional Neural Networks for Dermoscopic Images Classification
Authors	Tomáš Majtner, Buda Bajić, Sule Yildirim, Jon Yngve Hardeberg, Joakim Lindblad, Nataša Sladoje
Abstract	In this report, we are presenting our automated prediction system for disease classification within dermoscopic images. The proposed solution is based on deep learning, where we employed transfer learning strategy on VGG16 and GoogLeNet architectures. The key feature of our solution is preprocessing based primarily on image augmentation and colour normalization. The solution was evaluated on Task 3: Lesion Diagnosis of the ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection.
Tasks	Image Augmentation, Transfer Learning
Published	2018-08-15
URL	http://arxiv.org/abs/1808.05071v1
PDF	http://arxiv.org/pdf/1808.05071v1.pdf
PWC	https://paperswithcode.com/paper/ensemble-of-convolutional-neural-networks-for-1
Repo
Framework

Unraveling Go gaming nature by Ising Hamiltonian and common fate graphs: tactics and statistics


Title	Unraveling Go gaming nature by Ising Hamiltonian and common fate graphs: tactics and statistics
Authors	Didier Barradas-Bautista, Matías Alvarado
Abstract	Go gaming is a struggle between adversaries, black and white simple stones, and aim to control the most Go board territory for success. Rules are simple but Go game fighting is highly intricate. Stones placement and interaction on board is random-appearance, likewise interaction phenomena among basic elements in physics thermodynamics, chemistry, biology, or social issues. We model the Go game dynamic employing an Ising model energy function, whose interaction coefficients reflect the application of rules and tactics to build long-term strategies. At any step of the game, the energy function of the model assesses the control and strength of a player over the board. A close fit between predictions of the model with actual game’s scores is obtained. AlphaGo computer is the current top Go player, but its behavior does not wholly reveal the Go gaming nature. The Ising function allows for precisely model the stochastic evolutions of Go gaming patterns, so, to advance the understanding on Go own-dynamic -beyond the player`s abilities. The analysis of the frequency and combination of tactics shows the formation of patterns in the groups of stones during a game, regarding the turn of each player, or if human or computer adversaries are confronted. \|
Tasks
Published	2018-03-15
URL	http://arxiv.org/abs/1803.05983v1
PDF	http://arxiv.org/pdf/1803.05983v1.pdf
PWC	https://paperswithcode.com/paper/unraveling-go-gaming-nature-by-ising
Repo
Framework

Bootstrapping Conversational Agents With Weak Supervision


Title	Bootstrapping Conversational Agents With Weak Supervision
Authors	Neil Mallinar, Abhishek Shah, Rajendra Ugrani, Ayush Gupta, Manikandan Gurusankar, Tin Kam Ho, Q. Vera Liao, Yunfeng Zhang, Rachel K. E. Bellamy, Robert Yates, Chris Desmarais, Blake McGregor
Abstract	Many conversational agents in the market today follow a standard bot development framework which requires training intent classifiers to recognize user input. The need to create a proper set of training examples is often the bottleneck in the development process. In many occasions agent developers have access to historical chat logs that can provide a good quantity as well as coverage of training examples. However, the cost of labeling them with tens to hundreds of intents often prohibits taking full advantage of these chat logs. In this paper, we present a framework called \textit{search, label, and propagate} (SLP) for bootstrapping intents from existing chat logs using weak supervision. The framework reduces hours to days of labeling effort down to minutes of work by using a search engine to find examples, then relies on a data programming approach to automatically expand the labels. We report on a user study that shows positive user feedback for this new approach to build conversational agents, and demonstrates the effectiveness of using data programming for auto-labeling. While the system is developed for training conversational agents, the framework has broader application in significantly reducing labeling effort for training text classifiers.
Tasks
Published	2018-12-14
URL	http://arxiv.org/abs/1812.06176v1
PDF	http://arxiv.org/pdf/1812.06176v1.pdf
PWC	https://paperswithcode.com/paper/bootstrapping-conversational-agents-with-weak
Repo
Framework

Learning to Run challenge: Synthesizing physiologically accurate motion using deep reinforcement learning


Title	Learning to Run challenge: Synthesizing physiologically accurate motion using deep reinforcement learning
Authors	Łukasz Kidziński, Sharada P. Mohanty, Carmichael Ong, Jennifer L. Hicks, Sean F. Carroll, Sergey Levine, Marcel Salathé, Scott L. Delp
Abstract	Synthesizing physiologically-accurate human movement in a variety of conditions can help practitioners plan surgeries, design experiments, or prototype assistive devices in simulated environments, reducing time and costs and improving treatment outcomes. Because of the large and complex solution spaces of biomechanical models, current methods are constrained to specific movements and models, requiring careful design of a controller and hindering many possible applications. We sought to discover if modern optimization methods efficiently explore these complex spaces. To do this, we posed the problem as a competition in which participants were tasked with developing a controller to enable a physiologically-based human model to navigate a complex obstacle course as quickly as possible, without using any experimental data. They were provided with a human musculoskeletal model and a physics-based simulation environment. In this paper, we discuss the design of the competition, technical difficulties, results, and analysis of the top controllers. The challenge proved that deep reinforcement learning techniques, despite their high computational cost, can be successfully employed as an optimization method for synthesizing physiologically feasible motion in high-dimensional biomechanical systems.
Tasks
Published	2018-03-31
URL	http://arxiv.org/abs/1804.00198v1
PDF	http://arxiv.org/pdf/1804.00198v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-run-challenge-synthesizing
Repo
Framework