October 20, 2019

3481 words 17 mins read

Paper Group AWR 270

Paper Group AWR 270

Distributed Distributional Deterministic Policy Gradients. Unbiased Learning to Rank with Unbiased Propensity Estimation. Neural Response Ranking for Social Conversation: A Data-Efficient Approach. Neural State Classification for Hybrid Systems. Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks. Scalable Mutual …

Distributed Distributional Deterministic Policy Gradients

Title Distributed Distributional Deterministic Policy Gradients
Authors Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, Timothy Lillicrap
Abstract This work adopts the very successful distributional perspective on reinforcement learning and adapts it to the continuous control setting. We combine this within a distributed framework for off-policy learning in order to develop what we call the Distributed Distributional Deep Deterministic Policy Gradient algorithm, D4PG. We also combine this technique with a number of additional, simple improvements such as the use of $N$-step returns and prioritized experience replay. Experimentally we examine the contribution of each of these individual components, and show how they interact, as well as their combined contributions. Our results show that across a wide variety of simple control tasks, difficult manipulation tasks, and a set of hard obstacle-based locomotion tasks the D4PG algorithm achieves state of the art performance.
Tasks Continuous Control
Published 2018-04-23
URL http://arxiv.org/abs/1804.08617v1
PDF http://arxiv.org/pdf/1804.08617v1.pdf
PWC https://paperswithcode.com/paper/distributed-distributional-deterministic
Repo https://github.com/vgudapati/DRLND_Collaboration_Competetion
Framework pytorch

Unbiased Learning to Rank with Unbiased Propensity Estimation

Title Unbiased Learning to Rank with Unbiased Propensity Estimation
Authors Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, W. Bruce Croft
Abstract Learning to rank with biased click data is a well-known challenge. A variety of methods has been explored to debias click data for learning to rank such as click models, result interleaving and, more recently, the unbiased learning-to-rank framework based on inverse propensity weighting. Despite their differences, most existing studies separate the estimation of click bias (namely the \textit{propensity model}) from the learning of ranking algorithms. To estimate click propensities, they either conduct online result randomization, which can negatively affect the user experience, or offline parameter estimation, which has special requirements for click data and is optimized for objectives (e.g. click likelihood) that are not directly related to the ranking performance of the system. In this work, we address those problems by unifying the learning of propensity models and ranking models. We find that the problem of estimating a propensity model from click data is a dual problem of unbiased learning to rank. Based on this observation, we propose a Dual Learning Algorithm (DLA) that jointly learns an unbiased ranker and an \textit{unbiased propensity model}. DLA is an automatic unbiased learning-to-rank framework as it directly learns unbiased ranking models from biased click data without any preprocessing. It can adapt to the change of bias distributions and is applicable to online learning. Our empirical experiments with synthetic and real-world data show that the models trained with DLA significantly outperformed the unbiased learning-to-rank algorithms based on result randomization and the models trained with relevance signals extracted by click models.
Tasks Learning-To-Rank
Published 2018-04-16
URL http://arxiv.org/abs/1804.05938v2
PDF http://arxiv.org/pdf/1804.05938v2.pdf
PWC https://paperswithcode.com/paper/180405938
Repo https://github.com/QingyaoAi/Dual-Learning-Algorithm-for-Unbiased-Learning-to-Rank
Framework tf

Neural Response Ranking for Social Conversation: A Data-Efficient Approach

Title Neural Response Ranking for Social Conversation: A Data-Efficient Approach
Authors Igor Shalyminov, Ondřej Dušek, Oliver Lemon
Abstract The overall objective of ‘social’ dialogue systems is to support engaging, entertaining, and lengthy conversations on a wide variety of topics, including social chit-chat. Apart from raw dialogue data, user-provided ratings are the most common signal used to train such systems to produce engaging responses. In this paper we show that social dialogue systems can be trained effectively from raw unannotated data. Using a dataset of real conversations collected in the 2017 Alexa Prize challenge, we developed a neural ranker for selecting ‘good’ system responses to user utterances, i.e. responses which are likely to lead to long and engaging conversations. We show that (1) our neural ranker consistently outperforms several strong baselines when trained to optimise for user ratings; (2) when trained on larger amounts of data and only using conversation length as the objective, the ranker performs better than the one trained using ratings – ultimately reaching a Precision@1 of 0.87. This advance will make data collection for social conversational agents simpler and less expensive in the future.
Tasks
Published 2018-11-02
URL http://arxiv.org/abs/1811.00967v1
PDF http://arxiv.org/pdf/1811.00967v1.pdf
PWC https://paperswithcode.com/paper/neural-response-ranking-for-social
Repo https://github.com/WattSocialBot/alana_learning_to_rank
Framework none

Neural State Classification for Hybrid Systems

Title Neural State Classification for Hybrid Systems
Authors Dung Phan, Nicola Paoletti, Timothy Zhang, Radu Grosu, Scott A. Smolka, Scott D. Stoller
Abstract We introduce the State Classification Problem (SCP) for hybrid systems, and present Neural State Classification (NSC) as an efficient solution technique. SCP generalizes the model checking problem as it entails classifying each state $s$ of a hybrid automaton as either positive or negative, depending on whether or not $s$ satisfies a given time-bounded reachability specification. This is an interesting problem in its own right, which NSC solves using machine-learning techniques, Deep Neural Networks in particular. State classifiers produced by NSC tend to be very efficient (run in constant time and space), but may be subject to classification errors. To quantify and mitigate such errors, our approach comprises: i) techniques for certifying, with statistical guarantees, that an NSC classifier meets given accuracy levels; ii) tuning techniques, including a novel technique based on adversarial sampling, that can virtually eliminate false negatives (positive states classified as negative), thereby making the classifier more conservative. We have applied NSC to six nonlinear hybrid system benchmarks, achieving an accuracy of 99.25% to 99.98%, and a false-negative rate of 0.0033 to 0, which we further reduced to 0.0015 to 0 after tuning the classifier. We believe that this level of accuracy is acceptable in many practical applications, and that these results demonstrate the promise of the NSC approach.
Tasks
Published 2018-07-26
URL http://arxiv.org/abs/1807.09901v1
PDF http://arxiv.org/pdf/1807.09901v1.pdf
PWC https://paperswithcode.com/paper/neural-state-classification-for-hybrid
Repo https://github.com/moduIo/Neural-State-Classification
Framework none

Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks

Title Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks
Authors Jason Phang, Thibault Févry, Samuel R. Bowman
Abstract Pretraining sentence encoders with language modeling and related unsupervised tasks has recently been shown to be very effective for language understanding tasks. By supplementing language model-style pretraining with further training on data-rich supervised tasks, such as natural language inference, we obtain additional performance improvements on the GLUE benchmark. Applying supplementary training on BERT (Devlin et al., 2018), we attain a GLUE score of 81.8—the state of the art (as of 02/24/2019) and a 1.4 point improvement over BERT. We also observe reduced variance across random restarts in this setting. Our approach yields similar improvements when applied to ELMo (Peters et al., 2018a) and Radford et al. (2018)‘s model. In addition, the benefits of supplementary training are particularly pronounced in data-constrained regimes, as we show in experiments with artificially limited training data.
Tasks Language Modelling, Natural Language Inference, Transfer Learning
Published 2018-11-02
URL http://arxiv.org/abs/1811.01088v2
PDF http://arxiv.org/pdf/1811.01088v2.pdf
PWC https://paperswithcode.com/paper/sentence-encoders-on-stilts-supplementary
Repo https://github.com/zphang/bert_on_stilts
Framework pytorch

Scalable Mutual Information Estimation using Dependence Graphs

Title Scalable Mutual Information Estimation using Dependence Graphs
Authors Morteza Noshad, Yu Zeng, Alfred O. Hero III
Abstract The Mutual Information (MI) is an often used measure of dependency between two random variables utilized in information theory, statistics and machine learning. Recently several MI estimators have been proposed that can achieve parametric MSE convergence rate. However, most of the previously proposed estimators have the high computational complexity of at least $O(N^2)$. We propose a unified method for empirical non-parametric estimation of general MI function between random vectors in $\mathbb{R}^d$ based on $N$ i.i.d. samples. The reduced complexity MI estimator, called the ensemble dependency graph estimator (EDGE), combines randomized locality sensitive hashing (LSH), dependency graphs, and ensemble bias-reduction methods. We prove that EDGE achieves optimal computational complexity $O(N)$, and can achieve the optimal parametric MSE rate of $O(1/N)$ if the density is $d$ times differentiable. To the best of our knowledge EDGE is the first non-parametric MI estimator that can achieve parametric MSE rates with linear time complexity. We illustrate the utility of EDGE for the analysis of the information plane (IP) in deep learning. Using EDGE we shed light on a controversy on whether or not the compression property of information bottleneck (IB) in fact holds for ReLu and other rectification functions in deep neural networks (DNN).
Tasks
Published 2018-01-27
URL http://arxiv.org/abs/1801.09125v2
PDF http://arxiv.org/pdf/1801.09125v2.pdf
PWC https://paperswithcode.com/paper/scalable-mutual-information-estimation-using
Repo https://github.com/mrtnoshad/EDGE
Framework none

PHOENICS: A universal deep Bayesian optimizer

Title PHOENICS: A universal deep Bayesian optimizer
Authors Florian Häse, Loïc M. Roch, Christoph Kreisbeck, Alán Aspuru-Guzik
Abstract In this work we introduce PHOENICS, a probabilistic global optimization algorithm combining ideas from Bayesian optimization with concepts from Bayesian kernel density estimation. We propose an inexpensive acquisition function balancing the explorative and exploitative behavior of the algorithm. This acquisition function enables intuitive sampling strategies for an efficient parallel search of global minima. The performance of PHOENICS is assessed via an exhaustive benchmark study on a set of 15 discrete, quasi-discrete and continuous multidimensional functions. Unlike optimization methods based on Gaussian processes (GP) and random forests (RF), we show that PHOENICS is less sensitive to the nature of the co-domain, and outperforms GP and RF optimizations. We illustrate the performance of PHOENICS on the Oregonator, a difficult case-study describing a complex chemical reaction network. We demonstrate that only PHOENICS was able to reproduce qualitatively and quantitatively the target dynamic behavior of this nonlinear reaction dynamics. We recommend PHOENICS for rapid optimization of scalar, possibly non-convex, black-box unknown objective functions.
Tasks Density Estimation, Gaussian Processes
Published 2018-01-04
URL http://arxiv.org/abs/1801.01469v1
PDF http://arxiv.org/pdf/1801.01469v1.pdf
PWC https://paperswithcode.com/paper/phoenics-a-universal-deep-bayesian-optimizer
Repo https://github.com/aspuru-guzik-group/phoenics
Framework tf

Re-evaluating Evaluation

Title Re-evaluating Evaluation
Authors David Balduzzi, Karl Tuyls, Julien Perolat, Thore Graepel
Abstract Progress in machine learning is measured by careful evaluation on problems of outstanding common interest. However, the proliferation of benchmark suites and environments, adversarial attacks, and other complications has diluted the basic evaluation model by overwhelming researchers with choices. Deliberate or accidental cherry picking is increasingly likely, and designing well-balanced evaluation suites requires increasing effort. In this paper we take a step back and propose Nash averaging. The approach builds on a detailed analysis of the algebraic structure of evaluation in two basic scenarios: agent-vs-agent and agent-vs-task. The key strength of Nash averaging is that it automatically adapts to redundancies in evaluation data, so that results are not biased by the incorporation of easy tasks or weak agents. Nash averaging thus encourages maximally inclusive evaluation – since there is no harm (computational cost aside) from including all available tasks and agents.
Tasks
Published 2018-06-07
URL http://arxiv.org/abs/1806.02643v2
PDF http://arxiv.org/pdf/1806.02643v2.pdf
PWC https://paperswithcode.com/paper/re-evaluating-evaluation
Repo https://github.com/PhilipFelizarta/Maxent-Nash-Implementation
Framework none

Unsupervised Metric Learning in Presence of Missing Data

Title Unsupervised Metric Learning in Presence of Missing Data
Authors Anna C. Gilbert, Rishi Sonthalia
Abstract For many machine learning tasks, the input data lie on a low-dimensional manifold embedded in a high dimensional space and, because of this high-dimensional structure, most algorithms are inefficient. The typical solution is to reduce the dimension of the input data using standard dimension reduction algorithms such as ISOMAP, LAPLACIAN EIGENMAPS or LLES. This approach, however, does not always work in practice as these algorithms require that we have somewhat ideal data. Unfortunately, most data sets either have missing entries or unacceptably noisy values. That is, real data are far from ideal and we cannot use these algorithms directly. In this paper, we focus on the case when we have missing data. Some techniques, such as matrix completion, can be used to fill in missing data but these methods do not capture the non-linear structure of the manifold. Here, we present a new algorithm MR-MISSING that extends these previous algorithms and can be used to compute low dimensional representation on data sets with missing entries. We demonstrate the effectiveness of our algorithm by running three different experiments. We visually verify the effectiveness of our algorithm on synthetic manifolds, we numerically compare our projections against those computed by first filling in data using nlPCA and mDRUR on the MNIST data set, and we also show that we can do classification on MNIST with missing data. We also provide a theoretical guarantee for MR-MISSING under some simplifying assumptions.
Tasks Dimensionality Reduction, Matrix Completion, Metric Learning
Published 2018-07-19
URL http://arxiv.org/abs/1807.07610v3
PDF http://arxiv.org/pdf/1807.07610v3.pdf
PWC https://paperswithcode.com/paper/unsupervised-metric-learning-in-presence-of
Repo https://github.com/UnofficialJuliaMirror/MRMissing.jl-d89fa356-2490-5e87-a4cc-d004309f6659
Framework none

Cross-modal Deep Variational Hand Pose Estimation

Title Cross-modal Deep Variational Hand Pose Estimation
Authors Adrian Spurr, Jie Song, Seonwook Park, Otmar Hilliges
Abstract The human hand moves in complex and high-dimensional ways, making estimation of 3D hand pose configurations from images alone a challenging task. In this work we propose a method to learn a statistical hand model represented by a cross-modal trained latent space via a generative deep neural network. We derive an objective function from the variational lower bound of the VAE framework and jointly optimize the resulting cross-modal KL-divergence and the posterior reconstruction objective, naturally admitting a training regime that leads to a coherent latent space across multiple modalities such as RGB images, 2D keypoint detections or 3D hand configurations. Additionally, it grants a straightforward way of using semi-supervision. This latent space can be directly used to estimate 3D hand poses from RGB images, outperforming the state-of-the art in different settings. Furthermore, we show that our proposed method can be used without changes on depth images and performs comparably to specialized methods. Finally, the model is fully generative and can synthesize consistent pairs of hand configurations across modalities. We evaluate our method on both RGB and depth datasets and analyze the latent space qualitatively.
Tasks Hand Pose Estimation, Pose Estimation
Published 2018-03-30
URL http://arxiv.org/abs/1803.11404v1
PDF http://arxiv.org/pdf/1803.11404v1.pdf
PWC https://paperswithcode.com/paper/cross-modal-deep-variational-hand-pose
Repo https://github.com/spurra/vae-hands-3d
Framework tf

Neural networks for post-processing ensemble weather forecasts

Title Neural networks for post-processing ensemble weather forecasts
Authors Stephan Rasp, Sebastian Lerch
Abstract Ensemble weather predictions require statistical post-processing of systematic errors to obtain reliable and accurate probabilistic forecasts. Traditionally, this is accomplished with distributional regression models in which the parameters of a predictive distribution are estimated from a training period. We propose a flexible alternative based on neural networks that can incorporate nonlinear relationships between arbitrary predictor variables and forecast distribution parameters that are automatically learned in a data-driven way rather than requiring pre-specified link functions. In a case study of 2-meter temperature forecasts at surface stations in Germany, the neural network approach significantly outperforms benchmark post-processing methods while being computationally more affordable. Key components to this improvement are the use of auxiliary predictor variables and station-specific information with the help of embeddings. Furthermore, the trained neural network can be used to gain insight into the importance of meteorological variables thereby challenging the notion of neural networks as uninterpretable black boxes. Our approach can easily be extended to other statistical post-processing and forecasting problems. We anticipate that recent advances in deep learning combined with the ever-increasing amounts of model and observation data will transform the post-processing of numerical weather forecasts in the coming decade.
Tasks
Published 2018-05-23
URL http://arxiv.org/abs/1805.09091v1
PDF http://arxiv.org/pdf/1805.09091v1.pdf
PWC https://paperswithcode.com/paper/neural-networks-for-post-processing-ensemble
Repo https://github.com/slerch/ppnn
Framework none

URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection

Title URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection
Authors Hung Le, Quang Pham, Doyen Sahoo, Steven C. H. Hoi
Abstract Malicious URLs host unsolicited content and are used to perpetrate cybercrimes. It is imperative to detect them in a timely manner. Traditionally, this is done through the usage of blacklists, which cannot be exhaustive, and cannot detect newly generated malicious URLs. To address this, recent years have witnessed several efforts to perform Malicious URL Detection using Machine Learning. The most popular and scalable approaches use lexical properties of the URL string by extracting Bag-of-words like features, followed by applying machine learning models such as SVMs. There are also other features designed by experts to improve the prediction performance of the model. These approaches suffer from several limitations: (i) Inability to effectively capture semantic meaning and sequential patterns in URL strings; (ii) Requiring substantial manual feature engineering; and (iii) Inability to handle unseen features and generalize to test data. To address these challenges, we propose URLNet, an end-to-end deep learning framework to learn a nonlinear URL embedding for Malicious URL Detection directly from the URL. Specifically, we apply Convolutional Neural Networks to both characters and words of the URL String to learn the URL embedding in a jointly optimized framework. This approach allows the model to capture several types of semantic information, which was not possible by the existing models. We also propose advanced word-embeddings to solve the problem of too many rare words observed in this task. We conduct extensive experiments on a large-scale dataset and show a significant performance gain over existing methods. We also conduct ablation studies to evaluate the performance of various components of URLNet.
Tasks Feature Engineering, Word Embeddings
Published 2018-02-09
URL http://arxiv.org/abs/1802.03162v2
PDF http://arxiv.org/pdf/1802.03162v2.pdf
PWC https://paperswithcode.com/paper/urlnet-learning-a-url-representation-with
Repo https://github.com/Antimalweb/URLNet
Framework tf

An application of cascaded 3D fully convolutional networks for medical image segmentation

Title An application of cascaded 3D fully convolutional networks for medical image segmentation
Authors Holger R. Roth, Hirohisa Oda, Xiangrong Zhou, Natsuki Shimizu, Ying Yang, Yuichiro Hayashi, Masahiro Oda, Michitaka Fujiwara, Kazunari Misawa, Kensaku Mori
Abstract Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: https://github.com/holgerroth/3Dunet_abdomen_cascade.
Tasks 3D Medical Imaging Segmentation, Medical Image Segmentation, Semantic Segmentation
Published 2018-03-14
URL http://arxiv.org/abs/1803.05431v2
PDF http://arxiv.org/pdf/1803.05431v2.pdf
PWC https://paperswithcode.com/paper/an-application-of-cascaded-3d-fully
Repo https://github.com/holgerroth/3Dunet_abdomen_cascade
Framework none

Compositional Language Understanding with Text-based Relational Reasoning

Title Compositional Language Understanding with Text-based Relational Reasoning
Authors Koustuv Sinha, Shagun Sodhani, William L. Hamilton, Joelle Pineau
Abstract Neural networks for natural language reasoning have largely focused on extractive, fact-based question-answering (QA) and common-sense inference. However, it is also crucial to understand the extent to which neural networks can perform relational reasoning and combinatorial generalization from natural language—abilities that are often obscured by annotation artifacts and the dominance of language modeling in standard QA benchmarks. In this work, we present a novel benchmark dataset for language understanding that isolates performance on relational reasoning. We also present a neural message-passing baseline and show that this model, which incorporates a relational inductive bias, is superior at combinatorial generalization compared to a traditional recurrent neural network approach.
Tasks Common Sense Reasoning, Language Modelling, Question Answering, Relational Reasoning
Published 2018-11-07
URL http://arxiv.org/abs/1811.02959v2
PDF http://arxiv.org/pdf/1811.02959v2.pdf
PWC https://paperswithcode.com/paper/compositional-language-understanding-with
Repo https://github.com/koustuvsinha/clutrr
Framework none

Asymptotics for Sketching in Least Squares Regression

Title Asymptotics for Sketching in Least Squares Regression
Authors Edgar Dobriban, Sifan Liu
Abstract We consider a least squares regression problem where the data has been generated from a linear model, and we are interested to learn the unknown regression parameters. We consider “sketch-and-solve” methods that randomly project the data first, and do regression after. Previous works have analyzed the statistical and computational performance of such methods. However, the existing analysis is not fine-grained enough to show the fundamental differences between various methods, such as the Subsampled Randomized Hadamard Transform (SRHT) and Gaussian projections. In this paper, we make progress on this problem, working in an asymptotic framework where the number of datapoints and dimension of features goes to infinity. We find the limits of the accuracy loss (for estimation and test error) incurred by popular sketching methods. We show separation between different methods, so that SRHT is better than Gaussian projections. Our theoretical results are verified on both real and synthetic data. The analysis of SRHT relies on novel methods from random matrix theory that may be of independent interest.
Tasks Dimensionality Reduction
Published 2018-10-14
URL https://arxiv.org/abs/1810.06089v2
PDF https://arxiv.org/pdf/1810.06089v2.pdf
PWC https://paperswithcode.com/paper/a-new-theory-for-sketching-in-linear
Repo https://github.com/liusf15/Sketching-lr
Framework none
comments powered by Disqus