Paper Group ANR 773
Proving the NP-completeness of optimal moral graph triangulation. Supervised Online Hashing via Similarity Distribution Learning. TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning. A Compared Study Between Some Subspace Based Algorithms. Distributionally Robust Removal of Malicious Nod …
Proving the NP-completeness of optimal moral graph triangulation
Title | Proving the NP-completeness of optimal moral graph triangulation |
Authors | Yang Li, Lloyd Allison, Kevin Korb |
Abstract | Moral graphs were introduced in the 1980s as an intermediate step when transforming a Bayesian network to a junction tree, on which exact belief propagation can be efficiently done. The moral graph of a Bayesian network can be trivially obtained by connecting non-adjacent parents for each node in the Bayesian network and dropping the direction of each edge. Perhaps because the moralization process looks simple, there has been little attention on the properties of moral graphs and their impact in belief propagation on Bayesian networks. This paper addresses the mistaken claim that it has been previously proved that optimal moral graph triangulation with the constraints of minimum fill-in, treewidth or total states is NP-complete. The problems are in fact NP-complete, but they have not previously been proved. We now prove these. |
Tasks | |
Published | 2019-03-06 |
URL | http://arxiv.org/abs/1903.02201v1 |
http://arxiv.org/pdf/1903.02201v1.pdf | |
PWC | https://paperswithcode.com/paper/proving-the-np-completeness-of-optimal-moral |
Repo | |
Framework | |
Supervised Online Hashing via Similarity Distribution Learning
Title | Supervised Online Hashing via Similarity Distribution Learning |
Authors | Mingbao Lin, Rongrong Ji, Shen Chen, Feng Zheng, Xiaoshuai Sun, Baochang Zhang, Liujuan Cao, Guodong Guo, Feiyue Huang |
Abstract | Online hashing has attracted extensive research attention when facing streaming data. Most online hashing methods, learning binary codes based on pairwise similarities of training instances, fail to capture the semantic relationship, and suffer from a poor generalization in large-scale applications due to large variations. In this paper, we propose to model the similarity distributions between the input data and the hashing codes, upon which a novel supervised online hashing method, dubbed as Similarity Distribution based Online Hashing (SDOH), is proposed, to keep the intrinsic semantic relationship in the produced Hamming space. Specifically, we first transform the discrete similarity matrix into a probability matrix via a Gaussian-based normalization to address the extremely imbalanced distribution issue. And then, we introduce a scaling Student t-distribution to solve the challenging initialization problem, and efficiently bridge the gap between the known and unknown distributions. Lastly, we align the two distributions via minimizing the Kullback-Leibler divergence (KL-diverence) with stochastic gradient descent (SGD), by which an intuitive similarity constraint is imposed to update hashing model on the new streaming data with a powerful generalizing ability to the past data. Extensive experiments on three widely-used benchmarks validate the superiority of the proposed SDOH over the state-of-the-art methods in the online retrieval task. |
Tasks | |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1905.13382v1 |
https://arxiv.org/pdf/1905.13382v1.pdf | |
PWC | https://paperswithcode.com/paper/supervised-online-hashing-via-similarity |
Repo | |
Framework | |
TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
Title | TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning |
Authors | Youngeun Kwon, Yunjae Lee, Minsoo Rhu |
Abstract | Recent studies from several hyperscalars pinpoint to embedding layers as the most memory-intensive deep learning (DL) algorithm being deployed in today’s datacenters. This paper addresses the memory capacity and bandwidth challenges of embedding layers and the associated tensor operations. We present our vertically integrated hardware/software co-design, which includes a custom DIMM module enhanced with near-data processing cores tailored for DL tensor operations. These custom DIMMs are populated inside a GPU-centric system interconnect as a remote memory pool, allowing GPUs to utilize for scalable memory bandwidth and capacity expansion. A prototype implementation of our proposal on real DL systems shows an average 6.2-17.6x performance improvement on state-of-the-art recommender systems. |
Tasks | Recommendation Systems |
Published | 2019-08-08 |
URL | https://arxiv.org/abs/1908.03072v2 |
https://arxiv.org/pdf/1908.03072v2.pdf | |
PWC | https://paperswithcode.com/paper/tensordimm-a-practical-near-memory-processing |
Repo | |
Framework | |
A Compared Study Between Some Subspace Based Algorithms
Title | A Compared Study Between Some Subspace Based Algorithms |
Authors | Xing Liu, Xiao-Jun Wu, Zhen Liu, He-Feng Yin |
Abstract | The technology of face recognition has made some progress in recent years. After studying the PCA, 2DPCA, R1-PCA, L1-PCA, KPCA and KECA algorithms, in this paper ECA (2DECA) is proposed by extracting features in PCA (2DPCA) based on Renyi entropy contribution. And then we conduct a study on the 2DL1-PCA and 2DR1-PCA algorithms. On the basis of the experiments, this paper compares the difference of the recognition accuracy and operational efficiency between the above algorithms. |
Tasks | Face Recognition |
Published | 2019-12-23 |
URL | https://arxiv.org/abs/1912.10657v1 |
https://arxiv.org/pdf/1912.10657v1.pdf | |
PWC | https://paperswithcode.com/paper/a-compared-study-between-some-subspace-based |
Repo | |
Framework | |
Distributionally Robust Removal of Malicious Nodes from Networks
Title | Distributionally Robust Removal of Malicious Nodes from Networks |
Authors | Sixie Yu, Yevgeniy Vorobeychik |
Abstract | An important problem in networked systems is detection and removal of suspected malicious nodes. A crucial consideration in such settings is the uncertainty endemic in detection, coupled with considerations of network connectivity, which impose indirect costs from mistakely removing benign nodes as well as failing to remove malicious nodes. A recent approach proposed to address this problem directly tackles these considerations, but has a significant limitation: it assumes that the decision maker has accurate knowledge of the joint maliciousness probability of the nodes on the network. This is clearly not the case in practice, where such a distribution is at best an estimate from limited evidence. To address this problem, we propose a distributionally robust framework for optimal node removal. While the problem is NP-Hard, we propose a principled algorithmic technique for solving it approximately based on duality combined with Semidefinite Programming relaxation. A combination of both theoretical and empirical analysis, the latter using both synthetic and real data, provide strong evidence that our algorithmic approach is highly effective and, in particular, is significantly more robust than the state of the art. |
Tasks | |
Published | 2019-01-31 |
URL | http://arxiv.org/abs/1901.11463v1 |
http://arxiv.org/pdf/1901.11463v1.pdf | |
PWC | https://paperswithcode.com/paper/distributionally-robust-removal-of-malicious |
Repo | |
Framework | |
How Coupled are Mass Spectrometry and Capillary Electrophoresis?
Title | How Coupled are Mass Spectrometry and Capillary Electrophoresis? |
Authors | Caroline Ceribeli, Henrique F. de Arruda, Luciano da F. Costa |
Abstract | The understanding of how science works can contribute to making scientific development more effective. In this paper, we report an analysis of the organization and interconnection between two important issues in chemistry, namely mass spectrometry (MS) and capillary electrophoresis (CE). For that purpose, we employed science of science techniques based on complex networks. More specifically, we considered a citation network in which the nodes and connections represent papers and citations, respectively. Interesting results were found, including a good separation between some clusters of articles devoted to instrumentation techniques and applications. However, the papers that describe CE-MS did not lead to a well-defined cluster. In order to better understand the organization of the citation network, we considered a multi-scale analysis, in which we used the information regarding sub-clusters. Firstly, we analyzed the sub-cluster of the first article devoted to the coupling between CE and MS, which was found to be a good representation of its sub-cluster. The second analysis was about the sub-cluster of a seminal paper known to be the first that dealt with proteins by using CE-MS. By considering the proposed methodologies, our paper paves the way for researchers working with both techniques, since it elucidates the knowledge organization and can therefore lead to better literature reviews. |
Tasks | |
Published | 2019-10-18 |
URL | https://arxiv.org/abs/1910.13819v1 |
https://arxiv.org/pdf/1910.13819v1.pdf | |
PWC | https://paperswithcode.com/paper/how-coupled-are-mass-spectrometry-and |
Repo | |
Framework | |
Quasi-potential as an implicit regularizer for the loss function in the stochastic gradient descent
Title | Quasi-potential as an implicit regularizer for the loss function in the stochastic gradient descent |
Authors | Wenqing Hu, Zhanxing Zhu, Haoyi Xiong, Jun Huan |
Abstract | We interpret the variational inference of the Stochastic Gradient Descent (SGD) as minimizing a new potential function named the \textit{quasi-potential}. We analytically construct the quasi-potential function in the case when the loss function is convex and admits only one global minimum point. We show in this case that the quasi-potential function is related to the noise covariance structure of SGD via a partial differential equation of Hamilton-Jacobi type. This relation helps us to show that anisotropic noise leads to faster escape than isotropic noise. We then consider the dynamics of SGD in the case when the loss function is non-convex and admits several different local minima. In this case, we demonstrate an example that shows how the noise covariance structure plays a role in “implicit regularization”, a phenomenon in which SGD favors some particular local minimum points. This is done through the relation between the noise covariance structure and the quasi-potential function. Our analysis is based on Large Deviations Theory (LDT), and they are validated by numerical experiments. |
Tasks | |
Published | 2019-01-18 |
URL | http://arxiv.org/abs/1901.06054v1 |
http://arxiv.org/pdf/1901.06054v1.pdf | |
PWC | https://paperswithcode.com/paper/quasi-potential-as-an-implicit-regularizer |
Repo | |
Framework | |
Learning a Fixed-Length Fingerprint Representation
Title | Learning a Fixed-Length Fingerprint Representation |
Authors | Joshua J. Engelsma, Kai Cao, Anil K. Jain |
Abstract | We present DeepPrint, a deep network, which learns to extract fixed-length fingerprint representations of only 200 bytes. DeepPrint incorporates fingerprint domain knowledge, including alignment and minutiae detection, into the deep network architecture to maximize the discriminative power of its representation. The compact, DeepPrint representation has several advantages over the prevailing variable length minutiae representation which (i) requires computationally expensive graph matching techniques, (ii) is difficult to secure using strong encryption schemes (e.g. homomorphic encryption), and (iii) has low discriminative power in poor quality fingerprints where minutiae extraction is unreliable. We benchmark DeepPrint against two top performing COTS SDKs (Verifinger and Innovatrics) from the NIST and FVC evaluations. Coupled with a re-ranking scheme, the DeepPrint rank-1 search accuracy on the NIST SD4 dataset against a gallery of 1.1 million fingerprints is comparable to the top COTS matcher, but it is significantly faster (DeepPrint: 98.80% in 0.3 seconds vs. COTS A: 98.85% in 27 seconds). To the best of our knowledge, the DeepPrint representation is the most compact and discriminative fixed-length fingerprint representation reported in the academic literature. |
Tasks | Graph Matching |
Published | 2019-09-21 |
URL | https://arxiv.org/abs/1909.09901v2 |
https://arxiv.org/pdf/1909.09901v2.pdf | |
PWC | https://paperswithcode.com/paper/190909901 |
Repo | |
Framework | |
Nonparametric Density Estimation for High-Dimensional Data - Algorithms and Applications
Title | Nonparametric Density Estimation for High-Dimensional Data - Algorithms and Applications |
Authors | Zhipeng Wang, David W. Scott |
Abstract | Density Estimation is one of the central areas of statistics whose purpose is to estimate the probability density function underlying the observed data. It serves as a building block for many tasks in statistical inference, visualization, and machine learning. Density Estimation is widely adopted in the domain of unsupervised learning especially for the application of clustering. As big data become pervasive in almost every area of data sciences, analyzing high-dimensional data that have many features and variables appears to be a major focus in both academia and industry. High-dimensional data pose challenges not only from the theoretical aspects of statistical inference, but also from the algorithmic/computational considerations of machine learning and data analytics. This paper reviews a collection of selected nonparametric density estimation algorithms for high-dimensional data, some of them are recently published and provide interesting mathematical insights. The important application domain of nonparametric density estimation, such as { modal clustering}, are also included in this paper. Several research directions related to density estimation and high-dimensional data analysis are suggested by the authors. |
Tasks | Density Estimation |
Published | 2019-03-30 |
URL | http://arxiv.org/abs/1904.00176v1 |
http://arxiv.org/pdf/1904.00176v1.pdf | |
PWC | https://paperswithcode.com/paper/nonparametric-density-estimation-for-high |
Repo | |
Framework | |
A RAD approach to deep mixture models
Title | A RAD approach to deep mixture models |
Authors | Laurent Dinh, Jascha Sohl-Dickstein, Razvan Pascanu, Hugo Larochelle |
Abstract | Flow based models such as Real NVP are an extremely powerful approach to density estimation. However, existing flow based models are restricted to transforming continuous densities over a continuous input space into similarly continuous distributions over continuous latent variables. This makes them poorly suited for modeling and representing discrete structures in data distributions, for example class membership or discrete symmetries. To address this difficulty, we present a normalizing flow architecture which relies on domain partitioning using locally invertible functions, and possesses both real and discrete valued latent variables. This Real and Discrete (RAD) approach retains the desirable normalizing flow properties of exact sampling, exact inference, and analytically computable probabilities, while at the same time allowing simultaneous modeling of both continuous and discrete structure in a data distribution. |
Tasks | Density Estimation |
Published | 2019-03-18 |
URL | http://arxiv.org/abs/1903.07714v1 |
http://arxiv.org/pdf/1903.07714v1.pdf | |
PWC | https://paperswithcode.com/paper/a-rad-approach-to-deep-mixture-models |
Repo | |
Framework | |
Optimizing Search API Queries for Twitter Topic Classifiers Using a Maximum Set Coverage Approach
Title | Optimizing Search API Queries for Twitter Topic Classifiers Using a Maximum Set Coverage Approach |
Authors | Kasra Safari, Scott Sanner |
Abstract | Twitter has grown to become an important platform to access immediate information about major events and dynamic topics. As one example, recent work has shown that classifiers trained to detect topical content on Twitter can generalize well beyond the training data. Since access to Twitter data is hidden behind a limited search API, it is impossible (for most users) to apply these classifiers directly to the Twitter unfiltered data streams (“firehose”). Rather, applications must first decide what content to retrieve through the search API before filtering that content with topical classifiers. Thus, it is critically important to query the Twitter API relative to the intended topical classifier in a way that minimizes the amount of negatively classified data retrieved. In this paper, we propose a sequence of query optimization methods that generalize notions of the maximum coverage problem to find the subset of query terms within the API limits that cover most of the topically relevant tweets without sacrificing precision. We evaluate the proposed methods on a large dataset of Twitter data collected during 2013 and 2014 labeled using manually curated hashtags for eight topics. Among many insights, our analysis shows that the best of the proposed methods can significantly outperform the firehose on precision and F1-score while achieving high recall within strict API limitations. |
Tasks | |
Published | 2019-04-23 |
URL | https://arxiv.org/abs/1904.10403v2 |
https://arxiv.org/pdf/1904.10403v2.pdf | |
PWC | https://paperswithcode.com/paper/optimizing-search-api-queries-for-twitter |
Repo | |
Framework | |
Optimality of Maximum Likelihood for Log-Concave Density Estimation and Bounded Convex Regression
Title | Optimality of Maximum Likelihood for Log-Concave Density Estimation and Bounded Convex Regression |
Authors | Gil Kur, Yuval Dagan, Alexander Rakhlin |
Abstract | In this paper, we study two problems: (1) estimation of a $d$-dimensional log-concave distribution and (2) bounded multivariate convex regression with random design with an underlying log-concave density or a compactly supported distribution with a continuous density. First, we show that for all $d \ge 4$ the maximum likelihood estimators of both problems achieve an optimal risk of $\Theta_d(n^{-2/(d+1)})$ (up to a logarithmic factor) in terms of squared Hellinger distance and $L_2$ squared distance, respectively. Previously, the optimality of both these estimators was known only for $d\le 3$. We also prove that the $\epsilon$-entropy numbers of the two aforementioned families are equal up to logarithmic factors. We complement these results by proving a sharp bound $\Theta_d(n^{-2/(d+4)})$ on the minimax rate (up to logarithmic factors) with respect to the total variation distance. Finally, we prove that estimating a log-concave density - even a uniform distribution on a convex set - up to a fixed accuracy requires the number of samples \emph{at least} exponential in the dimension. We do that by improving the dimensional constant in the best known lower bound for the minimax rate from $2^{-d}\cdot n^{-2/(d+1)}$ to $c\cdot n^{-2/(d+1)}$ (when $d\geq 2$). |
Tasks | Density Estimation |
Published | 2019-03-13 |
URL | https://arxiv.org/abs/1903.05315v4 |
https://arxiv.org/pdf/1903.05315v4.pdf | |
PWC | https://paperswithcode.com/paper/the-log-concave-maximum-likelihood-estimator |
Repo | |
Framework | |
Neural Empirical Bayes
Title | Neural Empirical Bayes |
Authors | Saeed Saremi, Aapo Hyvarinen |
Abstract | We formulate a novel framework that unifies kernel density estimation and empirical Bayes, where we address a broad set of problems in unsupervised learning with a geometric interpretation rooted in the concentration of measure phenomenon. We start by energy estimation based on a denoising objective which recovers the original/clean data X from its measured/noisy version Y with empirical Bayes least squares estimator. The setup is rooted in kernel density estimation, but the log-pdf in Y is parametrized with a neural network, and crucially, the learning objective is derived for any level of noise/kernel bandwidth. Learning is efficient with double backpropagation and stochastic gradient descent. An elegant physical picture emerges of an interacting system of high-dimensional spheres around each data point, together with a globally-defined probability flow field. The picture is powerful: it leads to a novel sampling algorithm, a new notion of associative memory, and it is instrumental in designing experiments. We start with extreme denoising experiments. Walk-jump sampling is defined by Langevin MCMC walks in Y, along with asynchronous empirical Bayes jumps to X. Robbins associative memory is defined by a deterministic flow to attractors of the learned probability flow field. Finally, we observed the emergence of remarkably rich creative modes in the regime of highly overlapping spheres. |
Tasks | Denoising, Density Estimation |
Published | 2019-03-06 |
URL | http://arxiv.org/abs/1903.02334v1 |
http://arxiv.org/pdf/1903.02334v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-empirical-bayes |
Repo | |
Framework | |
Crowd Counting and Density Estimation by Trellis Encoder-Decoder Network
Title | Crowd Counting and Density Estimation by Trellis Encoder-Decoder Network |
Authors | Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xiantong Zhen, Xianbin Cao, David Doermann, Ling Shao |
Abstract | Crowd counting has recently attracted increasing interest in computer vision but remains a challenging problem. In this paper, we propose a trellis encoder-decoder network (TEDnet) for crowd counting, which focuses on generating high-quality density estimation maps. The major contributions are four-fold. First, we develop a new trellis architecture that incorporates multiple decoding paths to hierarchically aggregate features at different encoding stages, which can handle large variations of objects. Second, we design dense skip connections interleaved across paths to facilitate sufficient multi-scale feature fusions and to absorb the supervision information. Third, we propose a new combinatorial loss to enforce local coherence and spatial correlation in density maps. By distributedly imposing this combinatorial loss on intermediate outputs, gradient vanishing can be largely alleviated for better back-propagation and faster convergence. Finally, our TEDnet achieves new state-of-the art performance on four benchmarks, with an improvement up to 14% in terms of MAE. |
Tasks | Crowd Counting, Density Estimation |
Published | 2019-03-03 |
URL | http://arxiv.org/abs/1903.00853v2 |
http://arxiv.org/pdf/1903.00853v2.pdf | |
PWC | https://paperswithcode.com/paper/crowd-counting-and-density-estimation-by |
Repo | |
Framework | |
Federated Machine Learning: Concept and Applications
Title | Federated Machine Learning: Concept and Applications |
Authors | Qiang Yang, Yang Liu, Tianjian Chen, Yongxin Tong |
Abstract | Today’s AI still faces two major challenges. One is that in most industries, data exists in the form of isolated islands. The other is the strengthening of data privacy and security. We propose a possible solution to these challenges: secure federated learning. Beyond the federated learning framework first proposed by Google in 2016, we introduce a comprehensive secure federated learning framework, which includes horizontal federated learning, vertical federated learning and federated transfer learning. We provide definitions, architectures and applications for the federated learning framework, and provide a comprehensive survey of existing works on this subject. In addition, we propose building data networks among organizations based on federated mechanisms as an effective solution to allow knowledge to be shared without compromising user privacy. |
Tasks | Transfer Learning |
Published | 2019-02-13 |
URL | http://arxiv.org/abs/1902.04885v1 |
http://arxiv.org/pdf/1902.04885v1.pdf | |
PWC | https://paperswithcode.com/paper/federated-machine-learning-concept-and |
Repo | |
Framework | |