Paper Group ANR 1065
The Everlasting Database: Statistical Validity at a Fair Price. BELIEF: A distance-based redundancy-proof feature selection method for Big Data. Analysis of a Canonical Labeling Algorithm for the Alignment of Correlated Erdős-Rényi Graphs. Automatic Severity Classification of Coronary Artery Disease via Recurrent Capsule Network. Overparameterized …
The Everlasting Database: Statistical Validity at a Fair Price
Title | The Everlasting Database: Statistical Validity at a Fair Price |
Authors | Blake Woodworth, Vitaly Feldman, Saharon Rosset, Nathan Srebro |
Abstract | The problem of handling adaptivity in data analysis, intentional or not, permeates a variety of fields, including test-set overfitting in ML challenges and the accumulation of invalid scientific discoveries. We propose a mechanism for answering an arbitrarily long sequence of potentially adaptive statistical queries, by charging a price for each query and using the proceeds to collect additional samples. Crucially, we guarantee statistical validity without any assumptions on how the queries are generated. We also ensure with high probability that the cost for $M$ non-adaptive queries is $O(\log M)$, while the cost to a potentially adaptive user who makes $M$ queries that do not depend on any others is $O(\sqrt{M})$. |
Tasks | |
Published | 2018-03-12 |
URL | http://arxiv.org/abs/1803.04307v3 |
http://arxiv.org/pdf/1803.04307v3.pdf | |
PWC | https://paperswithcode.com/paper/the-everlasting-database-statistical-validity |
Repo | |
Framework | |
BELIEF: A distance-based redundancy-proof feature selection method for Big Data
Title | BELIEF: A distance-based redundancy-proof feature selection method for Big Data |
Authors | Sergio Ramírez-Gallego, Salvador García, Ning Xiong, Francisco Herrera |
Abstract | With the advent of Big Data era, data reduction methods are highly demanded given its ability to simplify huge data, and ease complex learning processes. Concretely, algorithms that are able to filter relevant dimensions from a set of millions are of huge importance. Although effective, these techniques suffer from the “scalability” curse as well. In this work, we propose a distributed feature weighting algorithm, which is able to rank millions of features in parallel using large samples. This method, inspired by the well-known RELIEF algorithm, introduces a novel redundancy elimination measure that provides similar schemes to those based on entropy at a much lower cost. It also allows smooth scale up when more instances are demanded in feature estimations. Empirical tests performed on our method show its estimation ability in manifold huge sets –both in number of features and instances–, as well as its simplified runtime cost (specially, at the redundancy detection step). |
Tasks | Feature Selection |
Published | 2018-04-16 |
URL | http://arxiv.org/abs/1804.05774v1 |
http://arxiv.org/pdf/1804.05774v1.pdf | |
PWC | https://paperswithcode.com/paper/belief-a-distance-based-redundancy-proof |
Repo | |
Framework | |
Analysis of a Canonical Labeling Algorithm for the Alignment of Correlated Erdős-Rényi Graphs
Title | Analysis of a Canonical Labeling Algorithm for the Alignment of Correlated Erdős-Rényi Graphs |
Authors | Osman Emre Dai, Daniel Cullina, Negar Kiyavash, Matthias Grossglauser |
Abstract | Graph alignment in two correlated random graphs refers to the task of identifying the correspondence between vertex sets of the graphs. Recent results have characterized the exact information-theoretic threshold for graph alignment in correlated Erd\H{o}s-R'enyi graphs. However, very little is known about the existence of efficient algorithms to achieve graph alignment without seeds. In this work we identify a region in which a straightforward $O(n^{11/5} \log n )$-time canonical labeling algorithm, initially introduced in the context of graph isomorphism, succeeds in aligning correlated Erd\H{o}s-R'enyi graphs. The algorithm has two steps. In the first step, all vertices are labeled by their degrees and a trivial minimum distance alignment (i.e., sorting vertices according to their degrees) matches a fixed number of highest degree vertices in the two graphs. Having identified this subset of vertices, the remaining vertices are matched using a alignment algorithm for bipartite graphs. |
Tasks | Graph Matching |
Published | 2018-04-25 |
URL | https://arxiv.org/abs/1804.09758v2 |
https://arxiv.org/pdf/1804.09758v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-performance-of-a-canonical-labeling |
Repo | |
Framework | |
Automatic Severity Classification of Coronary Artery Disease via Recurrent Capsule Network
Title | Automatic Severity Classification of Coronary Artery Disease via Recurrent Capsule Network |
Authors | Qi Wang, Jiahui Qiu, Yangming Zhou, Tong Ruan, Daqi Gao, Ju Gao |
Abstract | Coronary artery disease (CAD) is one of the leading causes of cardiovascular disease deaths. CAD condition progresses rapidly, if not diagnosed and treated at an early stage may eventually lead to an irreversible state of the heart muscle death. Invasive coronary arteriography is the gold standard technique for CAD diagnosis. Coronary arteriography texts describe which part has stenosis and how much stenosis is in details. It is crucial to conduct the severity classification of CAD. In this paper, we employ a recurrent capsule network (RCN) to extract semantic relations between clinical named entities in Chinese coronary arteriography texts, through which we can automatically find out the maximal stenosis for each lumen to inference how severe CAD is according to the improved method of Gensini. Experimental results on the corpus collected from Shanghai Shuguang Hospital show that our proposed method achieves an accuracy of 97.0% in the severity classification of CAD. |
Tasks | |
Published | 2018-07-18 |
URL | http://arxiv.org/abs/1807.06718v2 |
http://arxiv.org/pdf/1807.06718v2.pdf | |
PWC | https://paperswithcode.com/paper/automatic-severity-classification-of-coronary |
Repo | |
Framework | |
Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path?
Title | Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? |
Authors | Samet Oymak, Mahdi Soltanolkotabi |
Abstract | Many modern learning tasks involve fitting nonlinear models to data which are trained in an overparameterized regime where the parameters of the model exceed the size of the training dataset. Due to this overparameterization, the training loss may have infinitely many global minima and it is critical to understand the properties of the solutions found by first-order optimization schemes such as (stochastic) gradient descent starting from different initializations. In this paper we demonstrate that when the loss has certain properties over a minimally small neighborhood of the initial point, first order methods such as (stochastic) gradient descent have a few intriguing properties: (1) the iterates converge at a geometric rate to a global optima even when the loss is nonconvex, (2) among all global optima of the loss the iterates converge to one with a near minimal distance to the initial point, (3) the iterates take a near direct route from the initial point to this global optima. As part of our proof technique, we introduce a new potential function which captures the precise tradeoff between the loss function and the distance to the initial point as the iterations progress. For Stochastic Gradient Descent (SGD), we develop novel martingale techniques that guarantee SGD never leaves a small neighborhood of the initialization, even with rather large learning rates. We demonstrate the utility of our general theory for a variety of problem domains spanning low-rank matrix recovery to neural network training. Underlying our analysis are novel insights that may have implications for training and generalization of more sophisticated learning problems including those involving deep neural network architectures. |
Tasks | |
Published | 2018-12-25 |
URL | http://arxiv.org/abs/1812.10004v1 |
http://arxiv.org/pdf/1812.10004v1.pdf | |
PWC | https://paperswithcode.com/paper/overparameterized-nonlinear-learning-gradient |
Repo | |
Framework | |
Deep Learning for Predicting Asset Returns
Title | Deep Learning for Predicting Asset Returns |
Authors | Guanhao Feng, Jingyu He, Nicholas G. Polson |
Abstract | Deep learning searches for nonlinear factors for predicting asset returns. Predictability is achieved via multiple layers of composite factors as opposed to additive ones. Viewed in this way, asset pricing studies can be revisited using multi-layer deep learners, such as rectified linear units (ReLU) or long-short-term-memory (LSTM) for time-series effects. State-of-the-art algorithms including stochastic gradient descent (SGD), TensorFlow and dropout design provide imple- mentation and efficient factor exploration. To illustrate our methodology, we revisit the equity market risk premium dataset of Welch and Goyal (2008). We find the existence of nonlinear factors which explain predictability of returns, in particular at the extremes of the characteristic space. Finally, we conclude with directions for future research. |
Tasks | Time Series |
Published | 2018-04-25 |
URL | http://arxiv.org/abs/1804.09314v2 |
http://arxiv.org/pdf/1804.09314v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-predicting-asset-returns |
Repo | |
Framework | |
Risk-averse estimation, an axiomatic approach to inference, and Wallace-Freeman without MML
Title | Risk-averse estimation, an axiomatic approach to inference, and Wallace-Freeman without MML |
Authors | Michael Brand |
Abstract | We define a new class of Bayesian point estimators, which we refer to as risk averse. Using this definition, we formulate axioms that provide natural requirements for inference, e.g. in a scientific setting, and show that for well-behaved estimation problems the axioms uniquely characterise an estimator. Namely, for estimation problems in which some parameter values have a positive posterior probability (such as, e.g., problems with a discrete hypothesis space), the axioms characterise Maximum A Posteriori (MAP) estimation, whereas elsewhere (such as in continuous estimation) they characterise the Wallace-Freeman estimator. Our results provide a novel justification for the Wallace-Freeman estimator, which previously was derived only as an approximation to the information-theoretic Strict Minimum Message Length estimator. By contrast, our derivation requires neither approximations nor coding. |
Tasks | |
Published | 2018-06-28 |
URL | http://arxiv.org/abs/1806.10736v4 |
http://arxiv.org/pdf/1806.10736v4.pdf | |
PWC | https://paperswithcode.com/paper/risk-averse-estimation-an-axiomatic-approach |
Repo | |
Framework | |
Using Inherent Structures to design Lean 2-layer RBMs
Title | Using Inherent Structures to design Lean 2-layer RBMs |
Authors | Abhishek Bansal, Abhinav Anand, Chiranjib Bhattacharyya |
Abstract | Understanding the representational power of Restricted Boltzmann Machines (RBMs) with multiple layers is an ill-understood problem and is an area of active research. Motivated from the approach of \emph{Inherent Structure formalism} (Stillinger & Weber, 1982), extensively used in analysing Spin Glasses, we propose a novel measure called \emph{Inherent Structure Capacity} (ISC), which characterizes the representation capacity of a fixed architecture RBM by the expected number of modes of distributions emanating from the RBM with parameters drawn from a prior distribution. Though ISC is intractable, we show that for a single layer RBM architecture ISC approaches a finite constant as number of hidden units are increased and to further improve the ISC, one needs to add a second layer. Furthermore, we introduce \emph{Lean} RBMs, which are multi-layer RBMs where each layer can have at-most $O(n)$ units with the number of visible units being n. We show that for every single layer RBM with $\Omega(n^{2+r}), r \ge 0$, hidden units there exists a two-layered \emph{lean} RBM with $\Theta(n^2)$ parameters with the same ISC, establishing that 2 layer RBMs can achieve the same representational power as single-layer RBMs but using far fewer number of parameters. To the best of our knowledge, this is the first result which quantitatively establishes the need for layering. |
Tasks | |
Published | 2018-06-12 |
URL | http://arxiv.org/abs/1806.04577v1 |
http://arxiv.org/pdf/1806.04577v1.pdf | |
PWC | https://paperswithcode.com/paper/using-inherent-structures-to-design-lean-2 |
Repo | |
Framework | |
A Sufficient Condition for Convergences of Adam and RMSProp
Title | A Sufficient Condition for Convergences of Adam and RMSProp |
Authors | Fangyu Zou, Li Shen, Zequn Jie, Weizhong Zhang, Wei Liu |
Abstract | Adam and RMSProp are two of the most influential adaptive stochastic algorithms for training deep neural networks, which have been pointed out to be divergent even in the convex setting via a few simple counterexamples. Many attempts, such as decreasing an adaptive learning rate, adopting a big batch size, incorporating a temporal decorrelation technique, seeking an analogous surrogate, etc., have been tried to promote Adam/RMSProp-type algorithms to converge. In contrast with existing approaches, we introduce an alternative easy-to-check sufficient condition, which merely depends on the parameters of the base learning rate and combinations of historical second-order moments, to guarantee the global convergence of generic Adam/RMSProp for solving large-scale non-convex stochastic optimization. Moreover, we show that the convergences of several variants of Adam, such as AdamNC, AdaEMA, etc., can be directly implied via the proposed sufficient condition in the non-convex setting. In addition, we illustrate that Adam is essentially a specifically weighted AdaGrad with exponential moving average momentum, which provides a novel perspective for understanding Adam and RMSProp. This observation coupled with this sufficient condition gives much deeper interpretations on their divergences. At last, we validate the sufficient condition by applying Adam and RMSProp to tackle a certain counterexample and train deep neural networks. Numerical results are exactly in accord with our theoretical analysis. |
Tasks | Stochastic Optimization |
Published | 2018-11-23 |
URL | https://arxiv.org/abs/1811.09358v3 |
https://arxiv.org/pdf/1811.09358v3.pdf | |
PWC | https://paperswithcode.com/paper/a-sufficient-condition-for-convergences-of |
Repo | |
Framework | |
Graph Correspondence Transfer for Person Re-identification
Title | Graph Correspondence Transfer for Person Re-identification |
Authors | Qin Zhou, Heng Fan, Shibao Zheng, Hang Su, Xinzhe Li, Shuang Wu, Haibin Ling |
Abstract | In this paper, we propose a graph correspondence transfer (GCT) approach for person re-identification. Unlike existing methods, the GCT model formulates person re-identification as an off-line graph matching and on-line correspondence transferring problem. In specific, during training, the GCT model aims to learn off-line a set of correspondence templates from positive training pairs with various pose-pair configurations via patch-wise graph matching. During testing, for each pair of test samples, we select a few training pairs with the most similar pose-pair configurations as references, and transfer the correspondences of these references to test pair for feature distance calculation. The matching score is derived by aggregating distances from different references. For each probe image, the gallery image with the highest matching score is the re-identifying result. Compared to existing algorithms, our GCT can handle spatial misalignment caused by large variations in view angles and human poses owing to the benefits of patch-wise graph matching. Extensive experiments on five benchmarks including VIPeR, Road, PRID450S, 3DPES and CUHK01 evidence the superior performance of GCT model over other state-of-the-art methods. |
Tasks | Graph Matching, Person Re-Identification |
Published | 2018-04-01 |
URL | http://arxiv.org/abs/1804.00242v1 |
http://arxiv.org/pdf/1804.00242v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-correspondence-transfer-for-person-re |
Repo | |
Framework | |
How an Electrical Engineer Became an Artificial Intelligence Researcher, a Multiphase Active Contours Analysis
Title | How an Electrical Engineer Became an Artificial Intelligence Researcher, a Multiphase Active Contours Analysis |
Authors | Kush R. Varshney |
Abstract | This essay examines how what is considered to be artificial intelligence (AI) has changed over time and come to intersect with the expertise of the author. Initially, AI developed on a separate trajectory, both topically and institutionally, from pattern recognition, neural information processing, decision and control systems, and allied topics by focusing on symbolic systems within computer science departments rather than on continuous systems in electrical engineering departments. The separate evolutions continued throughout the author’s lifetime, with some crossover in reinforcement learning and graphical models, but were shocked into converging by the virality of deep learning, thus making an electrical engineer into an AI researcher. Now that this convergence has happened, opportunity exists to pursue an agenda that combines learning and reasoning bridged by interpretable machine learning models. |
Tasks | Interpretable Machine Learning |
Published | 2018-03-29 |
URL | http://arxiv.org/abs/1803.11261v1 |
http://arxiv.org/pdf/1803.11261v1.pdf | |
PWC | https://paperswithcode.com/paper/how-an-electrical-engineer-became-an |
Repo | |
Framework | |
UnibucKernel Reloaded: First Place in Arabic Dialect Identification for the Second Year in a Row
Title | UnibucKernel Reloaded: First Place in Arabic Dialect Identification for the Second Year in a Row |
Authors | Andrei M. Butnaru, Radu Tudor Ionescu |
Abstract | We present a machine learning approach that ranked on the first place in the Arabic Dialect Identification (ADI) Closed Shared Tasks of the 2018 VarDial Evaluation Campaign. The proposed approach combines several kernels using multiple kernel learning. While most of our kernels are based on character p-grams (also known as n-grams) extracted from speech or phonetic transcripts, we also use a kernel based on dialectal embeddings generated from audio recordings by the organizers. In the learning stage, we independently employ Kernel Discriminant Analysis (KDA) and Kernel Ridge Regression (KRR). Preliminary experiments indicate that KRR provides better classification results. Our approach is shallow and simple, but the empirical results obtained in the 2018 ADI Closed Shared Task prove that it achieves the best performance. Furthermore, our top macro-F1 score (58.92%) is significantly better than the second best score (57.59%) in the 2018 ADI Shared Task, according to the statistical significance test performed by the organizers. Nevertheless, we obtain even better post-competition results (a macro-F1 score of 62.28%) using the audio embeddings released by the organizers after the competition. With a very similar approach (that did not include phonetic features), we also ranked first in the ADI Closed Shared Tasks of the 2017 VarDial Evaluation Campaign, surpassing the second best method by 4.62%. We therefore conclude that our multiple kernel learning method is the best approach to date for Arabic dialect identification. |
Tasks | |
Published | 2018-05-13 |
URL | http://arxiv.org/abs/1805.04876v4 |
http://arxiv.org/pdf/1805.04876v4.pdf | |
PWC | https://paperswithcode.com/paper/unibuckernel-reloaded-first-place-in-arabic |
Repo | |
Framework | |
Fast Object Classification in Single-pixel Imaging
Title | Fast Object Classification in Single-pixel Imaging |
Authors | Shuming Jiao |
Abstract | In single-pixel imaging (SPI), the target object is illuminated with varying patterns sequentially and an intensity sequence is recorded by a single-pixel detector without spatial resolution. A high quality object image can only be computationally reconstructed after a large number of illuminations, with disadvantages of long imaging time and high cost. Conventionally, object classification is performed after a reconstructed object image with good fidelity is available. In this paper, we propose to classify the target object with a small number of illuminations in a fast manner for Fourier SPI. A naive Bayes classifier is employed to classify the target objects based on the single-pixel intensity sequence without any image reconstruction and each sequence element is regarded as an object feature in the classifier. Simulation results demonstrate our proposed scheme can classify the number digit object images with high accuracy (e.g. 80% accuracy using only 13 illuminations, at a sampling ratio of 0.3%). |
Tasks | Image Reconstruction, Object Classification |
Published | 2018-05-19 |
URL | http://arxiv.org/abs/1805.07582v1 |
http://arxiv.org/pdf/1805.07582v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-object-classification-in-single-pixel |
Repo | |
Framework | |
Learning in Non-convex Games with an Optimization Oracle
Title | Learning in Non-convex Games with an Optimization Oracle |
Authors | Naman Agarwal, Alon Gonen, Elad Hazan |
Abstract | We consider online learning in an adversarial, non-convex setting under the assumption that the learner has an access to an offline optimization oracle. In the general setting of prediction with expert advice, Hazan et al. (2016) established that in the optimization-oracle model, online learning requires exponentially more computation than statistical learning. In this paper we show that by slightly strengthening the oracle model, the online and the statistical learning models become computationally equivalent. Our result holds for any Lipschitz and bounded (but not necessarily convex) function. As an application we demonstrate how the offline oracle enables efficient computation of an equilibrium in non-convex games, that include GAN (generative adversarial networks) as a special case. |
Tasks | |
Published | 2018-10-17 |
URL | https://arxiv.org/abs/1810.07362v7 |
https://arxiv.org/pdf/1810.07362v7.pdf | |
PWC | https://paperswithcode.com/paper/learning-in-non-convex-games-with-an |
Repo | |
Framework | |
Open Loop Execution of Tree-Search Algorithms, extended version
Title | Open Loop Execution of Tree-Search Algorithms, extended version |
Authors | Erwan Lecarpentier, Guillaume Infantes, Charles Lesire, Emmanuel Rachelson |
Abstract | In the context of tree-search stochastic planning algorithms where a generative model is available, we consider on-line planning algorithms building trees in order to recommend an action. We investigate the question of avoiding re-planning in subsequent decision steps by directly using sub-trees as action recommender. Firstly, we propose a method for open loop control via a new algorithm taking the decision of re-planning or not at each time step based on an analysis of the statistics of the sub-tree. Secondly, we show that the probability of selecting a suboptimal action at any depth of the tree can be upper bounded and converges towards zero. Moreover, this upper bound decays in a logarithmic way between subsequent depths. This leads to a distinction between node-wise optimality and state-wise optimality. Finally, we empirically demonstrate that our method achieves a compromise between loss of performance and computational gain. |
Tasks | |
Published | 2018-05-03 |
URL | http://arxiv.org/abs/1805.01367v2 |
http://arxiv.org/pdf/1805.01367v2.pdf | |
PWC | https://paperswithcode.com/paper/open-loop-execution-of-tree-search-algorithms |
Repo | |
Framework | |