January 28, 2020

3096 words 15 mins read

Paper Group ANR 773

Proving the NP-completeness of optimal moral graph triangulation. Supervised Online Hashing via Similarity Distribution Learning. TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning. A Compared Study Between Some Subspace Based Algorithms. Distributionally Robust Removal of Malicious Nod …

Proving the NP-completeness of optimal moral graph triangulation


Title	Proving the NP-completeness of optimal moral graph triangulation
Authors	Yang Li, Lloyd Allison, Kevin Korb
Abstract	Moral graphs were introduced in the 1980s as an intermediate step when transforming a Bayesian network to a junction tree, on which exact belief propagation can be efficiently done. The moral graph of a Bayesian network can be trivially obtained by connecting non-adjacent parents for each node in the Bayesian network and dropping the direction of each edge. Perhaps because the moralization process looks simple, there has been little attention on the properties of moral graphs and their impact in belief propagation on Bayesian networks. This paper addresses the mistaken claim that it has been previously proved that optimal moral graph triangulation with the constraints of minimum fill-in, treewidth or total states is NP-complete. The problems are in fact NP-complete, but they have not previously been proved. We now prove these.
Tasks
Published	2019-03-06
URL	http://arxiv.org/abs/1903.02201v1
PDF	http://arxiv.org/pdf/1903.02201v1.pdf
PWC	https://paperswithcode.com/paper/proving-the-np-completeness-of-optimal-moral
Repo
Framework

Supervised Online Hashing via Similarity Distribution Learning


Title	Supervised Online Hashing via Similarity Distribution Learning
Authors	Mingbao Lin, Rongrong Ji, Shen Chen, Feng Zheng, Xiaoshuai Sun, Baochang Zhang, Liujuan Cao, Guodong Guo, Feiyue Huang
Abstract	Online hashing has attracted extensive research attention when facing streaming data. Most online hashing methods, learning binary codes based on pairwise similarities of training instances, fail to capture the semantic relationship, and suffer from a poor generalization in large-scale applications due to large variations. In this paper, we propose to model the similarity distributions between the input data and the hashing codes, upon which a novel supervised online hashing method, dubbed as Similarity Distribution based Online Hashing (SDOH), is proposed, to keep the intrinsic semantic relationship in the produced Hamming space. Specifically, we first transform the discrete similarity matrix into a probability matrix via a Gaussian-based normalization to address the extremely imbalanced distribution issue. And then, we introduce a scaling Student t-distribution to solve the challenging initialization problem, and efficiently bridge the gap between the known and unknown distributions. Lastly, we align the two distributions via minimizing the Kullback-Leibler divergence (KL-diverence) with stochastic gradient descent (SGD), by which an intuitive similarity constraint is imposed to update hashing model on the new streaming data with a powerful generalizing ability to the past data. Extensive experiments on three widely-used benchmarks validate the superiority of the proposed SDOH over the state-of-the-art methods in the online retrieval task.
Tasks
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13382v1
PDF	https://arxiv.org/pdf/1905.13382v1.pdf
PWC	https://paperswithcode.com/paper/supervised-online-hashing-via-similarity
Repo
Framework

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning


Title	TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
Authors	Youngeun Kwon, Yunjae Lee, Minsoo Rhu
Abstract	Recent studies from several hyperscalars pinpoint to embedding layers as the most memory-intensive deep learning (DL) algorithm being deployed in today’s datacenters. This paper addresses the memory capacity and bandwidth challenges of embedding layers and the associated tensor operations. We present our vertically integrated hardware/software co-design, which includes a custom DIMM module enhanced with near-data processing cores tailored for DL tensor operations. These custom DIMMs are populated inside a GPU-centric system interconnect as a remote memory pool, allowing GPUs to utilize for scalable memory bandwidth and capacity expansion. A prototype implementation of our proposal on real DL systems shows an average 6.2-17.6x performance improvement on state-of-the-art recommender systems.
Tasks	Recommendation Systems
Published	2019-08-08
URL	https://arxiv.org/abs/1908.03072v2
PDF	https://arxiv.org/pdf/1908.03072v2.pdf
PWC	https://paperswithcode.com/paper/tensordimm-a-practical-near-memory-processing
Repo
Framework

A Compared Study Between Some Subspace Based Algorithms


Title	A Compared Study Between Some Subspace Based Algorithms
Authors	Xing Liu, Xiao-Jun Wu, Zhen Liu, He-Feng Yin
Abstract	The technology of face recognition has made some progress in recent years. After studying the PCA, 2DPCA, R1-PCA, L1-PCA, KPCA and KECA algorithms, in this paper ECA (2DECA) is proposed by extracting features in PCA (2DPCA) based on Renyi entropy contribution. And then we conduct a study on the 2DL1-PCA and 2DR1-PCA algorithms. On the basis of the experiments, this paper compares the difference of the recognition accuracy and operational efficiency between the above algorithms.
Tasks	Face Recognition
Published	2019-12-23
URL	https://arxiv.org/abs/1912.10657v1
PDF	https://arxiv.org/pdf/1912.10657v1.pdf
PWC	https://paperswithcode.com/paper/a-compared-study-between-some-subspace-based
Repo
Framework

Distributionally Robust Removal of Malicious Nodes from Networks


Title	Distributionally Robust Removal of Malicious Nodes from Networks
Authors	Sixie Yu, Yevgeniy Vorobeychik
Abstract	An important problem in networked systems is detection and removal of suspected malicious nodes. A crucial consideration in such settings is the uncertainty endemic in detection, coupled with considerations of network connectivity, which impose indirect costs from mistakely removing benign nodes as well as failing to remove malicious nodes. A recent approach proposed to address this problem directly tackles these considerations, but has a significant limitation: it assumes that the decision maker has accurate knowledge of the joint maliciousness probability of the nodes on the network. This is clearly not the case in practice, where such a distribution is at best an estimate from limited evidence. To address this problem, we propose a distributionally robust framework for optimal node removal. While the problem is NP-Hard, we propose a principled algorithmic technique for solving it approximately based on duality combined with Semidefinite Programming relaxation. A combination of both theoretical and empirical analysis, the latter using both synthetic and real data, provide strong evidence that our algorithmic approach is highly effective and, in particular, is significantly more robust than the state of the art.
Tasks
Published	2019-01-31
URL	http://arxiv.org/abs/1901.11463v1
PDF	http://arxiv.org/pdf/1901.11463v1.pdf
PWC	https://paperswithcode.com/paper/distributionally-robust-removal-of-malicious
Repo
Framework

How Coupled are Mass Spectrometry and Capillary Electrophoresis?


Title	How Coupled are Mass Spectrometry and Capillary Electrophoresis?
Authors	Caroline Ceribeli, Henrique F. de Arruda, Luciano da F. Costa
Abstract	The understanding of how science works can contribute to making scientific development more effective. In this paper, we report an analysis of the organization and interconnection between two important issues in chemistry, namely mass spectrometry (MS) and capillary electrophoresis (CE). For that purpose, we employed science of science techniques based on complex networks. More specifically, we considered a citation network in which the nodes and connections represent papers and citations, respectively. Interesting results were found, including a good separation between some clusters of articles devoted to instrumentation techniques and applications. However, the papers that describe CE-MS did not lead to a well-defined cluster. In order to better understand the organization of the citation network, we considered a multi-scale analysis, in which we used the information regarding sub-clusters. Firstly, we analyzed the sub-cluster of the first article devoted to the coupling between CE and MS, which was found to be a good representation of its sub-cluster. The second analysis was about the sub-cluster of a seminal paper known to be the first that dealt with proteins by using CE-MS. By considering the proposed methodologies, our paper paves the way for researchers working with both techniques, since it elucidates the knowledge organization and can therefore lead to better literature reviews.
Tasks
Published	2019-10-18
URL	https://arxiv.org/abs/1910.13819v1
PDF	https://arxiv.org/pdf/1910.13819v1.pdf
PWC	https://paperswithcode.com/paper/how-coupled-are-mass-spectrometry-and
Repo
Framework

Quasi-potential as an implicit regularizer for the loss function in the stochastic gradient descent


Title	Quasi-potential as an implicit regularizer for the loss function in the stochastic gradient descent
Authors	Wenqing Hu, Zhanxing Zhu, Haoyi Xiong, Jun Huan
Abstract	We interpret the variational inference of the Stochastic Gradient Descent (SGD) as minimizing a new potential function named the \textit{quasi-potential}. We analytically construct the quasi-potential function in the case when the loss function is convex and admits only one global minimum point. We show in this case that the quasi-potential function is related to the noise covariance structure of SGD via a partial differential equation of Hamilton-Jacobi type. This relation helps us to show that anisotropic noise leads to faster escape than isotropic noise. We then consider the dynamics of SGD in the case when the loss function is non-convex and admits several different local minima. In this case, we demonstrate an example that shows how the noise covariance structure plays a role in “implicit regularization”, a phenomenon in which SGD favors some particular local minimum points. This is done through the relation between the noise covariance structure and the quasi-potential function. Our analysis is based on Large Deviations Theory (LDT), and they are validated by numerical experiments.
Tasks
Published	2019-01-18
URL	http://arxiv.org/abs/1901.06054v1
PDF	http://arxiv.org/pdf/1901.06054v1.pdf
PWC	https://paperswithcode.com/paper/quasi-potential-as-an-implicit-regularizer
Repo
Framework

Learning a Fixed-Length Fingerprint Representation


Title	Learning a Fixed-Length Fingerprint Representation
Authors	Joshua J. Engelsma, Kai Cao, Anil K. Jain
Abstract	We present DeepPrint, a deep network, which learns to extract fixed-length fingerprint representations of only 200 bytes. DeepPrint incorporates fingerprint domain knowledge, including alignment and minutiae detection, into the deep network architecture to maximize the discriminative power of its representation. The compact, DeepPrint representation has several advantages over the prevailing variable length minutiae representation which (i) requires computationally expensive graph matching techniques, (ii) is difficult to secure using strong encryption schemes (e.g. homomorphic encryption), and (iii) has low discriminative power in poor quality fingerprints where minutiae extraction is unreliable. We benchmark DeepPrint against two top performing COTS SDKs (Verifinger and Innovatrics) from the NIST and FVC evaluations. Coupled with a re-ranking scheme, the DeepPrint rank-1 search accuracy on the NIST SD4 dataset against a gallery of 1.1 million fingerprints is comparable to the top COTS matcher, but it is significantly faster (DeepPrint: 98.80% in 0.3 seconds vs. COTS A: 98.85% in 27 seconds). To the best of our knowledge, the DeepPrint representation is the most compact and discriminative fixed-length fingerprint representation reported in the academic literature.
Tasks	Graph Matching
Published	2019-09-21
URL	https://arxiv.org/abs/1909.09901v2
PDF	https://arxiv.org/pdf/1909.09901v2.pdf
PWC	https://paperswithcode.com/paper/190909901
Repo
Framework

Nonparametric Density Estimation for High-Dimensional Data - Algorithms and Applications


Title	Nonparametric Density Estimation for High-Dimensional Data - Algorithms and Applications
Authors	Zhipeng Wang, David W. Scott
Abstract	Density Estimation is one of the central areas of statistics whose purpose is to estimate the probability density function underlying the observed data. It serves as a building block for many tasks in statistical inference, visualization, and machine learning. Density Estimation is widely adopted in the domain of unsupervised learning especially for the application of clustering. As big data become pervasive in almost every area of data sciences, analyzing high-dimensional data that have many features and variables appears to be a major focus in both academia and industry. High-dimensional data pose challenges not only from the theoretical aspects of statistical inference, but also from the algorithmic/computational considerations of machine learning and data analytics. This paper reviews a collection of selected nonparametric density estimation algorithms for high-dimensional data, some of them are recently published and provide interesting mathematical insights. The important application domain of nonparametric density estimation, such as { modal clustering}, are also included in this paper. Several research directions related to density estimation and high-dimensional data analysis are suggested by the authors.
Tasks	Density Estimation
Published	2019-03-30
URL	http://arxiv.org/abs/1904.00176v1
PDF	http://arxiv.org/pdf/1904.00176v1.pdf
PWC	https://paperswithcode.com/paper/nonparametric-density-estimation-for-high
Repo
Framework

A RAD approach to deep mixture models


Title	A RAD approach to deep mixture models
Authors	Laurent Dinh, Jascha Sohl-Dickstein, Razvan Pascanu, Hugo Larochelle
Abstract	Flow based models such as Real NVP are an extremely powerful approach to density estimation. However, existing flow based models are restricted to transforming continuous densities over a continuous input space into similarly continuous distributions over continuous latent variables. This makes them poorly suited for modeling and representing discrete structures in data distributions, for example class membership or discrete symmetries. To address this difficulty, we present a normalizing flow architecture which relies on domain partitioning using locally invertible functions, and possesses both real and discrete valued latent variables. This Real and Discrete (RAD) approach retains the desirable normalizing flow properties of exact sampling, exact inference, and analytically computable probabilities, while at the same time allowing simultaneous modeling of both continuous and discrete structure in a data distribution.
Tasks	Density Estimation
Published	2019-03-18
URL	http://arxiv.org/abs/1903.07714v1
PDF	http://arxiv.org/pdf/1903.07714v1.pdf
PWC	https://paperswithcode.com/paper/a-rad-approach-to-deep-mixture-models
Repo
Framework

Optimizing Search API Queries for Twitter Topic Classifiers Using a Maximum Set Coverage Approach


Title	Optimizing Search API Queries for Twitter Topic Classifiers Using a Maximum Set Coverage Approach
Authors	Kasra Safari, Scott Sanner
Abstract	Twitter has grown to become an important platform to access immediate information about major events and dynamic topics. As one example, recent work has shown that classifiers trained to detect topical content on Twitter can generalize well beyond the training data. Since access to Twitter data is hidden behind a limited search API, it is impossible (for most users) to apply these classifiers directly to the Twitter unfiltered data streams (“firehose”). Rather, applications must first decide what content to retrieve through the search API before filtering that content with topical classifiers. Thus, it is critically important to query the Twitter API relative to the intended topical classifier in a way that minimizes the amount of negatively classified data retrieved. In this paper, we propose a sequence of query optimization methods that generalize notions of the maximum coverage problem to find the subset of query terms within the API limits that cover most of the topically relevant tweets without sacrificing precision. We evaluate the proposed methods on a large dataset of Twitter data collected during 2013 and 2014 labeled using manually curated hashtags for eight topics. Among many insights, our analysis shows that the best of the proposed methods can significantly outperform the firehose on precision and F1-score while achieving high recall within strict API limitations.
Tasks
Published	2019-04-23
URL	https://arxiv.org/abs/1904.10403v2
PDF	https://arxiv.org/pdf/1904.10403v2.pdf
PWC	https://paperswithcode.com/paper/optimizing-search-api-queries-for-twitter
Repo
Framework

Optimality of Maximum Likelihood for Log-Concave Density Estimation and Bounded Convex Regression


Title	Optimality of Maximum Likelihood for Log-Concave Density Estimation and Bounded Convex Regression
Authors	Gil Kur, Yuval Dagan, Alexander Rakhlin
Abstract	In this paper, we study two problems: (1) estimation of a $d$-dimensional log-concave distribution and (2) bounded multivariate convex regression with random design with an underlying log-concave density or a compactly supported distribution with a continuous density. First, we show that for all $d \ge 4$ the maximum likelihood estimators of both problems achieve an optimal risk of $\Theta_d(n^{-2/(d+1)})$ (up to a logarithmic factor) in terms of squared Hellinger distance and $L_2$ squared distance, respectively. Previously, the optimality of both these estimators was known only for $d\le 3$. We also prove that the $\epsilon$-entropy numbers of the two aforementioned families are equal up to logarithmic factors. We complement these results by proving a sharp bound $\Theta_d(n^{-2/(d+4)})$ on the minimax rate (up to logarithmic factors) with respect to the total variation distance. Finally, we prove that estimating a log-concave density - even a uniform distribution on a convex set - up to a fixed accuracy requires the number of samples \emph{at least} exponential in the dimension. We do that by improving the dimensional constant in the best known lower bound for the minimax rate from $2^{-d}\cdot n^{-2/(d+1)}$ to $c\cdot n^{-2/(d+1)}$ (when $d\geq 2$).
Tasks	Density Estimation
Published	2019-03-13
URL	https://arxiv.org/abs/1903.05315v4
PDF	https://arxiv.org/pdf/1903.05315v4.pdf
PWC	https://paperswithcode.com/paper/the-log-concave-maximum-likelihood-estimator
Repo
Framework

Neural Empirical Bayes


Title	Neural Empirical Bayes
Authors	Saeed Saremi, Aapo Hyvarinen
Abstract	We formulate a novel framework that unifies kernel density estimation and empirical Bayes, where we address a broad set of problems in unsupervised learning with a geometric interpretation rooted in the concentration of measure phenomenon. We start by energy estimation based on a denoising objective which recovers the original/clean data X from its measured/noisy version Y with empirical Bayes least squares estimator. The setup is rooted in kernel density estimation, but the log-pdf in Y is parametrized with a neural network, and crucially, the learning objective is derived for any level of noise/kernel bandwidth. Learning is efficient with double backpropagation and stochastic gradient descent. An elegant physical picture emerges of an interacting system of high-dimensional spheres around each data point, together with a globally-defined probability flow field. The picture is powerful: it leads to a novel sampling algorithm, a new notion of associative memory, and it is instrumental in designing experiments. We start with extreme denoising experiments. Walk-jump sampling is defined by Langevin MCMC walks in Y, along with asynchronous empirical Bayes jumps to X. Robbins associative memory is defined by a deterministic flow to attractors of the learned probability flow field. Finally, we observed the emergence of remarkably rich creative modes in the regime of highly overlapping spheres.
Tasks	Denoising, Density Estimation
Published	2019-03-06
URL	http://arxiv.org/abs/1903.02334v1
PDF	http://arxiv.org/pdf/1903.02334v1.pdf
PWC	https://paperswithcode.com/paper/neural-empirical-bayes
Repo
Framework

Crowd Counting and Density Estimation by Trellis Encoder-Decoder Network


Title	Crowd Counting and Density Estimation by Trellis Encoder-Decoder Network
Authors	Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xiantong Zhen, Xianbin Cao, David Doermann, Ling Shao
Abstract	Crowd counting has recently attracted increasing interest in computer vision but remains a challenging problem. In this paper, we propose a trellis encoder-decoder network (TEDnet) for crowd counting, which focuses on generating high-quality density estimation maps. The major contributions are four-fold. First, we develop a new trellis architecture that incorporates multiple decoding paths to hierarchically aggregate features at different encoding stages, which can handle large variations of objects. Second, we design dense skip connections interleaved across paths to facilitate sufficient multi-scale feature fusions and to absorb the supervision information. Third, we propose a new combinatorial loss to enforce local coherence and spatial correlation in density maps. By distributedly imposing this combinatorial loss on intermediate outputs, gradient vanishing can be largely alleviated for better back-propagation and faster convergence. Finally, our TEDnet achieves new state-of-the art performance on four benchmarks, with an improvement up to 14% in terms of MAE.
Tasks	Crowd Counting, Density Estimation
Published	2019-03-03
URL	http://arxiv.org/abs/1903.00853v2
PDF	http://arxiv.org/pdf/1903.00853v2.pdf
PWC	https://paperswithcode.com/paper/crowd-counting-and-density-estimation-by
Repo
Framework

Federated Machine Learning: Concept and Applications


Title	Federated Machine Learning: Concept and Applications
Authors	Qiang Yang, Yang Liu, Tianjian Chen, Yongxin Tong
Abstract	Today’s AI still faces two major challenges. One is that in most industries, data exists in the form of isolated islands. The other is the strengthening of data privacy and security. We propose a possible solution to these challenges: secure federated learning. Beyond the federated learning framework first proposed by Google in 2016, we introduce a comprehensive secure federated learning framework, which includes horizontal federated learning, vertical federated learning and federated transfer learning. We provide definitions, architectures and applications for the federated learning framework, and provide a comprehensive survey of existing works on this subject. In addition, we propose building data networks among organizations based on federated mechanisms as an effective solution to allow knowledge to be shared without compromising user privacy.
Tasks	Transfer Learning
Published	2019-02-13
URL	http://arxiv.org/abs/1902.04885v1
PDF	http://arxiv.org/pdf/1902.04885v1.pdf
PWC	https://paperswithcode.com/paper/federated-machine-learning-concept-and
Repo
Framework