May 5, 2019

2902 words 14 mins read

Paper Group ANR 525

The Mythos of Model Interpretability. Understanding Probabilistic Sparse Gaussian Process Approximations. Structured Stochastic Linear Bandits. Social-sparsity brain decoders: faster spatial sparsity. Distributed Hessian-Free Optimization for Deep Neural Network. Reward Augmented Maximum Likelihood for Neural Structured Prediction. Finding the diff …

The Mythos of Model Interpretability


Title	The Mythos of Model Interpretability
Authors	Zachary C. Lipton
Abstract	Supervised machine learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world? We want models to be not only good, but interpretable. And yet the task of interpretation appears underspecified. Papers provide diverse and sometimes non-overlapping motivations for interpretability, and offer myriad notions of what attributes render models interpretable. Despite this ambiguity, many papers proclaim interpretability axiomatically, absent further explanation. In this paper, we seek to refine the discourse on interpretability. First, we examine the motivations underlying interest in interpretability, finding them to be diverse and occasionally discordant. Then, we address model properties and techniques thought to confer interpretability, identifying transparency to humans and post-hoc explanations as competing notions. Throughout, we discuss the feasibility and desirability of different notions, and question the oft-made assertions that linear models are interpretable and that deep neural networks are not.
Tasks
Published	2016-06-10
URL	http://arxiv.org/abs/1606.03490v3
PDF	http://arxiv.org/pdf/1606.03490v3.pdf
PWC	https://paperswithcode.com/paper/the-mythos-of-model-interpretability
Repo
Framework

Understanding Probabilistic Sparse Gaussian Process Approximations


Title	Understanding Probabilistic Sparse Gaussian Process Approximations
Authors	Matthias Bauer, Mark van der Wilk, Carl Edward Rasmussen
Abstract	Good sparse approximations are essential for practical inference in Gaussian Processes as the computational cost of exact methods is prohibitive for large datasets. The Fully Independent Training Conditional (FITC) and the Variational Free Energy (VFE) approximations are two recent popular methods. Despite superficial similarities, these approximations have surprisingly different theoretical properties and behave differently in practice. We thoroughly investigate the two methods for regression both analytically and through illustrative examples, and draw conclusions to guide practical application.
Tasks	Gaussian Processes
Published	2016-06-15
URL	http://arxiv.org/abs/1606.04820v2
PDF	http://arxiv.org/pdf/1606.04820v2.pdf
PWC	https://paperswithcode.com/paper/understanding-probabilistic-sparse-gaussian
Repo
Framework

Structured Stochastic Linear Bandits


Title	Structured Stochastic Linear Bandits
Authors	Nicholas Johnson, Vidyashankar Sivakumar, Arindam Banerjee
Abstract	The stochastic linear bandit problem proceeds in rounds where at each round the algorithm selects a vector from a decision set after which it receives a noisy linear loss parameterized by an unknown vector. The goal in such a problem is to minimize the (pseudo) regret which is the difference between the total expected loss of the algorithm and the total expected loss of the best fixed vector in hindsight. In this paper, we consider settings where the unknown parameter has structure, e.g., sparse, group sparse, low-rank, which can be captured by a norm, e.g., $L_1$, $L_{(1,2)}$, nuclear norm. We focus on constructing confidence ellipsoids which contain the unknown parameter across all rounds with high-probability. We show the radius of such ellipsoids depend on the Gaussian width of sets associated with the norm capturing the structure. Such characterization leads to tighter confidence ellipsoids and, therefore, sharper regret bounds compared to bounds in the existing literature which are based on the ambient dimensionality.
Tasks
Published	2016-06-17
URL	http://arxiv.org/abs/1606.05693v1
PDF	http://arxiv.org/pdf/1606.05693v1.pdf
PWC	https://paperswithcode.com/paper/structured-stochastic-linear-bandits
Repo
Framework


Title	Social-sparsity brain decoders: faster spatial sparsity
Authors	Gaël Varoquaux, Matthieu Kowalski, Bertrand Thirion
Abstract	Spatially-sparse predictors are good models for brain decoding: they give accurate predictions and their weight maps are interpretable as they focus on a small number of regions. However, the state of the art, based on total variation or graph-net, is computationally costly. Here we introduce sparsity in the local neighborhood of each voxel with social-sparsity, a structured shrinkage operator. We find that, on brain imaging classification problems, social-sparsity performs almost as well as total-variation models and better than graph-net, for a fraction of the computational cost. It also very clearly outlines predictive regions. We give details of the model and the algorithm.
Tasks	Brain Decoding
Published	2016-06-21
URL	http://arxiv.org/abs/1606.06439v1
PDF	http://arxiv.org/pdf/1606.06439v1.pdf
PWC	https://paperswithcode.com/paper/social-sparsity-brain-decoders-faster-spatial
Repo
Framework

Distributed Hessian-Free Optimization for Deep Neural Network


Title	Distributed Hessian-Free Optimization for Deep Neural Network
Authors	Xi He, Dheevatsa Mudigere, Mikhail Smelyanskiy, Martin Takáč
Abstract	Training deep neural network is a high dimensional and a highly non-convex optimization problem. Stochastic gradient descent (SGD) algorithm and it’s variations are the current state-of-the-art solvers for this task. However, due to non-covexity nature of the problem, it was observed that SGD slows down near saddle point. Recent empirical work claim that by detecting and escaping saddle point efficiently, it’s more likely to improve training performance. With this objective, we revisit Hessian-free optimization method for deep networks. We also develop its distributed variant and demonstrate superior scaling potential to SGD, which allows more efficiently utilizing larger computing resources thus enabling large models and faster time to obtain desired solution. Furthermore, unlike truncated Newton method (Marten’s HF) that ignores negative curvature information by using na"ive conjugate gradient method and Gauss-Newton Hessian approximation information - we propose a novel algorithm to explore negative curvature direction by solving the sub-problem with stabilized bi-conjugate method involving possible indefinite stochastic Hessian information. We show that these techniques accelerate the training process for both the standard MNIST dataset and also the TIMIT speech recognition problem, demonstrating robust performance with upto an order of magnitude larger batch sizes. This increased scaling potential is illustrated with near linear speed-up on upto 16 CPU nodes for a simple 4-layer network.
Tasks	Speech Recognition
Published	2016-06-02
URL	http://arxiv.org/abs/1606.00511v2
PDF	http://arxiv.org/pdf/1606.00511v2.pdf
PWC	https://paperswithcode.com/paper/distributed-hessian-free-optimization-for
Repo
Framework

Reward Augmented Maximum Likelihood for Neural Structured Prediction


Title	Reward Augmented Maximum Likelihood for Neural Structured Prediction
Authors	Mohammad Norouzi, Samy Bengio, Zhifeng Chen, Navdeep Jaitly, Mike Schuster, Yonghui Wu, Dale Schuurmans
Abstract	A key problem in structured output prediction is direct optimization of the task reward function that matters for test evaluation. This paper presents a simple and computationally efficient approach to incorporate task reward into a maximum likelihood framework. By establishing a link between the log-likelihood and expected reward objectives, we show that an optimal regularized expected reward is achieved when the conditional distribution of the outputs given the inputs is proportional to their exponentiated scaled rewards. Accordingly, we present a framework to smooth the predictive probability of the outputs using their corresponding rewards. We optimize the conditional log-probability of augmented outputs that are sampled proportionally to their exponentiated scaled rewards. Experiments on neural sequence to sequence models for speech recognition and machine translation show notable improvements over a maximum likelihood baseline by using reward augmented maximum likelihood (RAML), where the rewards are defined as the negative edit distance between the outputs and the ground truth labels.
Tasks	Machine Translation, Speech Recognition, Structured Prediction
Published	2016-09-01
URL	http://arxiv.org/abs/1609.00150v3
PDF	http://arxiv.org/pdf/1609.00150v3.pdf
PWC	https://paperswithcode.com/paper/reward-augmented-maximum-likelihood-for
Repo
Framework

Finding the different patterns in buildings data using bag of words representation with clustering


Title	Finding the different patterns in buildings data using bag of words representation with clustering
Authors	Usman Habib, Gerhard Zucker
Abstract	The understanding of the buildings operation has become a challenging task due to the large amount of data recorded in energy efficient buildings. Still, today the experts use visual tools for analyzing the data. In order to make the task realistic, a method has been proposed in this paper to automatically detect the different patterns in buildings. The K Means clustering is used to automatically identify the ON (operational) cycles of the chiller. In the next step the ON cycles are transformed to symbolic representation by using Symbolic Aggregate Approximation (SAX) method. Then the SAX symbols are converted to bag of words representation for hierarchical clustering. Moreover, the proposed technique is applied to real life data of adsorption chiller. Additionally, the results from the proposed method and dynamic time warping (DTW) approach are also discussed and compared.
Tasks
Published	2016-02-03
URL	http://arxiv.org/abs/1602.01398v1
PDF	http://arxiv.org/pdf/1602.01398v1.pdf
PWC	https://paperswithcode.com/paper/finding-the-different-patterns-in-buildings
Repo
Framework

Towards the Design of Prospect-Theory based Human Decision Rules for Hypothesis Testing


Title	Towards the Design of Prospect-Theory based Human Decision Rules for Hypothesis Testing
Authors	V. Sriram Siddhardh Nadendla, Swastik Brahma, Pramod K. Varshney
Abstract	Detection rules have traditionally been designed for rational agents that minimize the Bayes risk (average decision cost). With the advent of crowd-sensing systems, there is a need to redesign binary hypothesis testing rules for behavioral agents, whose cognitive behavior is not captured by traditional utility functions such as Bayes risk. In this paper, we adopt prospect theory based models for decision makers. We consider special agent models namely optimists and pessimists in this paper, and derive optimal detection rules under different scenarios. Using an illustrative example, we also show how the decision rule of a human agent deviates from the Bayesian decision rule under various behavioral models, considered in this paper.
Tasks
Published	2016-10-04
URL	http://arxiv.org/abs/1610.01085v1
PDF	http://arxiv.org/pdf/1610.01085v1.pdf
PWC	https://paperswithcode.com/paper/towards-the-design-of-prospect-theory-based
Repo
Framework

An Information Theoretic Feature Selection Framework for Big Data under Apache Spark


Title	An Information Theoretic Feature Selection Framework for Big Data under Apache Spark
Authors	Sergio Ramírez-Gallego, Héctor Mouriño-Talín, David Martínez-Rego, Verónica Bolón-Canedo, José Manuel Benítez, Amparo Alonso-Betanzos, Francisco Herrera
Abstract	With the advent of extremely high dimensional datasets, dimensionality reduction techniques are becoming mandatory. Among many techniques, feature selection has been growing in interest as an important tool to identify relevant features on huge datasets –both in number of instances and features–. The purpose of this work is to demonstrate that standard feature selection methods can be parallelized in Big Data platforms like Apache Spark, boosting both performance and accuracy. We thus propose a distributed implementation of a generic feature selection framework which includes a wide group of well-known Information Theoretic methods. Experimental results on a wide set of real-world datasets show that our distributed framework is capable of dealing with ultra-high dimensional datasets as well as those with a huge number of samples in a short period of time, outperforming the sequential version in all the cases studied.
Tasks	Dimensionality Reduction, Feature Selection
Published	2016-10-13
URL	http://arxiv.org/abs/1610.04154v2
PDF	http://arxiv.org/pdf/1610.04154v2.pdf
PWC	https://paperswithcode.com/paper/an-information-theoretic-feature-selection
Repo
Framework

Fine-To-Coarse Global Registration of RGB-D Scans


Title	Fine-To-Coarse Global Registration of RGB-D Scans
Authors	Maciej Halber, Thomas Funkhouser
Abstract	RGB-D scanning of indoor environments is important for many applications, including real estate, interior design, and virtual reality. However, it is still challenging to register RGB-D images from a hand-held camera over a long video sequence into a globally consistent 3D model. Current methods often can lose tracking or drift and thus fail to reconstruct salient structures in large environments (e.g., parallel walls in different rooms). To address this problem, we propose a “fine-to-coarse” global registration algorithm that leverages robust registrations at finer scales to seed detection and enforcement of new correspondence and structural constraints at coarser scales. To test global registration algorithms, we provide a benchmark with 10,401 manually-clicked point correspondences in 25 scenes from the SUN3D dataset. During experiments with this benchmark, we find that our fine-to-coarse algorithm registers long RGB-D sequences better than previous methods.
Tasks
Published	2016-07-28
URL	http://arxiv.org/abs/1607.08539v3
PDF	http://arxiv.org/pdf/1607.08539v3.pdf
PWC	https://paperswithcode.com/paper/fine-to-coarse-global-registration-of-rgb-d
Repo
Framework

Ensemble Validation: Selectivity has a Price, but Variety is Free


Title	Ensemble Validation: Selectivity has a Price, but Variety is Free
Authors	Eric Bax, Farshad Kooti
Abstract	Suppose some classifiers are selected from a set of hypothesis classifiers to form an equally-weighted ensemble that selects a member classifier at random for each input example. Then the ensemble has an error bound consisting of the average error bound for the member classifiers, a term for selectivity that varies from zero (if all hypothesis classifiers are selected) to a standard uniform error bound (if only a single classifier is selected), and small constants. There is no penalty for using a richer hypothesis set if the same fraction of the hypothesis classifiers are selected for the ensemble.
Tasks
Published	2016-10-04
URL	http://arxiv.org/abs/1610.01234v3
PDF	http://arxiv.org/pdf/1610.01234v3.pdf
PWC	https://paperswithcode.com/paper/ensemble-validation-selectivity-has-a-price
Repo
Framework

Twenty (simple) questions


Title	Twenty (simple) questions
Authors	Yuval Dagan, Yuval Filmus, Ariel Gabizon, Shay Moran
Abstract	A basic combinatorial interpretation of Shannon’s entropy function is via the “20 questions” game. This cooperative game is played by two players, Alice and Bob: Alice picks a distribution $\pi$ over the numbers ${1,\ldots,n}$, and announces it to Bob. She then chooses a number $x$ according to $\pi$, and Bob attempts to identify $x$ using as few Yes/No queries as possible, on average. An optimal strategy for the “20 questions” game is given by a Huffman code for $\pi$: Bob’s questions reveal the codeword for $x$ bit by bit. This strategy finds $x$ using fewer than $H(\pi)+1$ questions on average. However, the questions asked by Bob could be arbitrary. In this paper, we investigate the following question: Are there restricted sets of questions that match the performance of Huffman codes, either exactly or approximately? Our first main result shows that for every distribution $\pi$, Bob has a strategy that uses only questions of the form “$x < c$?” and “$x = c$?", and uncovers $x$ using at most $H(\pi)+1$ questions on average, matching the performance of Huffman codes in this sense. We also give a natural set of $O(rn^{1/r})$ questions that achieve a performance of at most $H(\pi)+r$, and show that $\Omega(rn^{1/r})$ questions are required to achieve such a guarantee. Our second main result gives a set $\mathcal{Q}$ of $1.25^{n+o(n)}$ questions such that for every distribution $\pi$, Bob can implement an optimal strategy for $\pi$ using only questions from $\mathcal{Q}$. We also show that $1.25^{n-o(n)}$ questions are needed, for infinitely many $n$. If we allow a small slack of $r$ over the optimal strategy, then roughly $(rn)^{\Theta(1/r)}$ questions are necessary and sufficient.
Tasks
Published	2016-11-05
URL	http://arxiv.org/abs/1611.01655v3
PDF	http://arxiv.org/pdf/1611.01655v3.pdf
PWC	https://paperswithcode.com/paper/twenty-simple-questions
Repo
Framework

Large-scale Collaborative Imaging Genetics Studies of Risk Genetic Factors for Alzheimer’s Disease Across Multiple Institutions


Title	Large-scale Collaborative Imaging Genetics Studies of Risk Genetic Factors for Alzheimer’s Disease Across Multiple Institutions
Authors	Qingyang Li, Tao Yang, Liang Zhan, Derrek Paul Hibar, Neda Jahanshad, Yalin Wang, Jieping Ye, Paul M. Thompson, Jie Wang
Abstract	Genome-wide association studies (GWAS) offer new opportunities to identify genetic risk factors for Alzheimer’s disease (AD). Recently, collaborative efforts across different institutions emerged that enhance the power of many existing techniques on individual institution data. However, a major barrier to collaborative studies of GWAS is that many institutions need to preserve individual data privacy. To address this challenge, we propose a novel distributed framework, termed Local Query Model (LQM) to detect risk SNPs for AD across multiple research institutions. To accelerate the learning process, we propose a Distributed Enhanced Dual Polytope Projection (D-EDPP) screening rule to identify irrelevant features and remove them from the optimization. To the best of our knowledge, this is the first successful run of the computationally intensive model selection procedure to learn a consistent model across different institutions without compromising their privacy while ranking the SNPs that may collectively affect AD. Empirical studies are conducted on 809 subjects with 5.9 million SNP features which are distributed across three individual institutions. D-EDPP achieved a 66-fold speed-up by effectively identifying irrelevant features.
Tasks	Model Selection
Published	2016-08-19
URL	http://arxiv.org/abs/1608.07251v1
PDF	http://arxiv.org/pdf/1608.07251v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-collaborative-imaging-genetics
Repo
Framework

Super Mario as a String: Platformer Level Generation Via LSTMs


Title	Super Mario as a String: Platformer Level Generation Via LSTMs
Authors	Adam Summerville, Michael Mateas
Abstract	The procedural generation of video game levels has existed for at least 30 years, but only recently have machine learning approaches been used to generate levels without specifying the rules for generation. A number of these have looked at platformer levels as a sequence of characters and performed generation using Markov chains. In this paper we examine the use of Long Short-Term Memory recurrent neural networks (LSTMs) for the purpose of generating levels trained from a corpus of Super Mario Brothers levels. We analyze a number of different data representations and how the generated levels fit into the space of human authored Super Mario Brothers levels.
Tasks
Published	2016-03-02
URL	http://arxiv.org/abs/1603.00930v2
PDF	http://arxiv.org/pdf/1603.00930v2.pdf
PWC	https://paperswithcode.com/paper/super-mario-as-a-string-platformer-level
Repo
Framework

A Survey of Credit Card Fraud Detection Techniques: Data and Technique Oriented Perspective


Title	A Survey of Credit Card Fraud Detection Techniques: Data and Technique Oriented Perspective
Authors	SamanehSorournejad, Zahra Zojaji, Reza Ebrahimi Atani, Amir Hassan Monadjemi
Abstract	Credit card plays a very important rule in today’s economy. It becomes an unavoidable part of household, business and global activities. Although using credit cards provides enormous benefits when used carefully and responsibly,significant credit and financial damages may be caused by fraudulent activities. Many techniques have been proposed to confront the growth in credit card fraud. However, all of these techniques have the same goal of avoiding the credit card fraud; each one has its own drawbacks, advantages and characteristics. In this paper, after investigating difficulties of credit card fraud detection, we seek to review the state of the art in credit card fraud detection techniques, data sets and evaluation criteria.The advantages and disadvantages of fraud detection methods are enumerated and compared.Furthermore, a classification of mentioned techniques into two main fraud detection approaches, namely, misuses (supervised) and anomaly detection (unsupervised) is presented. Again, a classification of techniques is proposed based on capability to process the numerical and categorical data sets. Different data sets used in literature are then described and grouped into real and synthesized data and the effective and common attributes are extracted for further usage.Moreover, evaluation employed criterions in literature are collected and discussed.Consequently, open issues for credit card fraud detection are explained as guidelines for new researchers.
Tasks	Anomaly Detection, Fraud Detection
Published	2016-11-19
URL	http://arxiv.org/abs/1611.06439v1
PDF	http://arxiv.org/pdf/1611.06439v1.pdf
PWC	https://paperswithcode.com/paper/a-survey-of-credit-card-fraud-detection
Repo
Framework