July 27, 2019

2637 words 13 mins read

Paper Group ANR 465

Topic modeling of public repositories at scale using names in source code. Universal representations:The missing link between faces, text, planktons, and cat breeds. Summable Reparameterizations of Wasserstein Critics in the One-Dimensional Setting. Uncertainty quantification in graph-based classification of high dimensional data. KnowNER: Incremen …

Topic modeling of public repositories at scale using names in source code


Title	Topic modeling of public repositories at scale using names in source code
Authors	Vadim Markovtsev, Eiso Kant
Abstract	Programming languages themselves have a limited number of reserved keywords and character based tokens that define the language specification. However, programmers have a rich use of natural language within their code through comments, text literals and naming entities. The programmer defined names that can be found in source code are a rich source of information to build a high level understanding of the project. The goal of this paper is to apply topic modeling to names used in over 13.6 million repositories and perceive the inferred topics. One of the problems in such a study is the occurrence of duplicate repositories not officially marked as forks (obscure forks). We show how to address it using the same identifiers which are extracted for topic modeling. We open with a discussion on naming in source code, we then elaborate on our approach to remove exact duplicate and fuzzy duplicate repositories using Locality Sensitive Hashing on the bag-of-words model and then discuss our work on topic modeling; and finally present the results from our data analysis together with open-access to the source code, tools and datasets.
Tasks
Published	2017-04-01
URL	http://arxiv.org/abs/1704.00135v2
PDF	http://arxiv.org/pdf/1704.00135v2.pdf
PWC	https://paperswithcode.com/paper/topic-modeling-of-public-repositories-at
Repo
Framework

Universal representations:The missing link between faces, text, planktons, and cat breeds


Title	Universal representations:The missing link between faces, text, planktons, and cat breeds
Authors	Hakan Bilen, Andrea Vedaldi
Abstract	With the advent of large labelled datasets and high-capacity models, the performance of machine vision systems has been improving rapidly. However, the technology has still major limitations, starting from the fact that different vision problems are still solved by different models, trained from scratch or fine-tuned on the target data. The human visual system, in stark contrast, learns a universal representation for vision in the early life of an individual. This representation works well for an enormous variety of vision problems, with little or no change, with the major advantage of requiring little training data to solve any of them.
Tasks
Published	2017-01-25
URL	http://arxiv.org/abs/1701.07275v1
PDF	http://arxiv.org/pdf/1701.07275v1.pdf
PWC	https://paperswithcode.com/paper/universal-representationsthe-missing-link
Repo
Framework

Summable Reparameterizations of Wasserstein Critics in the One-Dimensional Setting


Title	Summable Reparameterizations of Wasserstein Critics in the One-Dimensional Setting
Authors	Christopher Grimm, Yuhang Song, Michael L. Littman
Abstract	Generative adversarial networks (GANs) are an exciting alternative to algorithms for solving density estimation problems—using data to assess how likely samples are to be drawn from the same distribution. Instead of explicitly computing these probabilities, GANs learn a generator that can match the given probabilistic source. This paper looks particularly at this matching capability in the context of problems with one-dimensional outputs. We identify a class of function decompositions with properties that make them well suited to the critic role in a leading approach to GANs known as Wasserstein GANs. We show that Taylor and Fourier series decompositions belong to our class, provide examples of these critics outperforming standard GAN approaches, and suggest how they can be scaled to higher dimensional problems in the future.
Tasks	Density Estimation
Published	2017-09-19
URL	http://arxiv.org/abs/1709.06533v1
PDF	http://arxiv.org/pdf/1709.06533v1.pdf
PWC	https://paperswithcode.com/paper/summable-reparameterizations-of-wasserstein
Repo
Framework

Uncertainty quantification in graph-based classification of high dimensional data


Title	Uncertainty quantification in graph-based classification of high dimensional data
Authors	Andrea L. Bertozzi, Xiyang Luo, Andrew M. Stuart, Konstantinos C. Zygalakis
Abstract	Classification of high dimensional data finds wide-ranging applications. In many of these applications equipping the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of, a variety of Bayesian models for the task of binary classification; via the posterior distribution on the classification labels, these methods automatically give measures of uncertainty. The methods are all based around the graph formulation of semi-supervised learning. We provide a unified framework which brings together a variety of methods which have been introduced in different communities within the mathematical sciences. We study probit classification in the graph-based setting, generalize the level-set method for Bayesian inverse problems to the classification setting, and generalize the Ginzburg-Landau optimization-based classifier to a Bayesian setting; we also show that the probit and level set approaches are natural relaxations of the harmonic function approach introduced in [Zhu et al 2003]. We introduce efficient numerical methods, suited to large data-sets, for both MCMC-based sampling as well as gradient-based MAP estimation. Through numerical experiments we study classification accuracy and uncertainty quantification for our models; these experiments showcase a suite of datasets commonly used to evaluate graph-based semi-supervised learning algorithms.
Tasks
Published	2017-03-26
URL	http://arxiv.org/abs/1703.08816v2
PDF	http://arxiv.org/pdf/1703.08816v2.pdf
PWC	https://paperswithcode.com/paper/uncertainty-quantification-in-graph-based
Repo
Framework

KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition


Title	KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition
Authors	Dominic Seyler, Tatiana Dembelova, Luciano Del Corro, Johannes Hoffart, Gerhard Weikum
Abstract	KnowNER is a multilingual Named Entity Recognition (NER) system that leverages different degrees of external knowledge. A novel modular framework divides the knowledge into four categories according to the depth of knowledge they convey. Each category consists of a set of features automatically generated from different information sources (such as a knowledge-base, a list of names or document-specific semantic annotations) and is used to train a conditional random field (CRF). Since those information sources are usually multilingual, KnowNER can be easily trained for a wide range of languages. In this paper, we show that the incorporation of deeper knowledge systematically boosts accuracy and compare KnowNER with state-of-the-art NER approaches across three languages (i.e., English, German and Spanish) performing amongst state-of-the art systems in all of them.
Tasks	Named Entity Recognition
Published	2017-09-11
URL	http://arxiv.org/abs/1709.03544v1
PDF	http://arxiv.org/pdf/1709.03544v1.pdf
PWC	https://paperswithcode.com/paper/knowner-incremental-multilingual-knowledge-in
Repo
Framework

Symbolic, Distributed and Distributional Representations for Natural Language Processing in the Era of Deep Learning: a Survey


Title	Symbolic, Distributed and Distributional Representations for Natural Language Processing in the Era of Deep Learning: a Survey
Authors	Lorenzo Ferrone, Fabio Massimo Zanzotto
Abstract	Natural language is inherently a discrete symbolic representation of human knowledge. Recent advances in machine learning (ML) and in natural language processing (NLP) seem to contradict the above intuition: discrete symbols are fading away, erased by vectors or tensors called distributed and distributional representations. However, there is a strict link between distributed/distributional representations and discrete symbols, being the first an approximation of the second. A clearer understanding of the strict link between distributed/distributional representations and symbols may certainly lead to radically new deep learning networks. In this paper we make a survey that aims to renew the link between symbolic representations and distributed/distributional representations. This is the right time to revitalize the area of interpreting how discrete symbols are represented inside neural networks.
Tasks
Published	2017-02-02
URL	https://arxiv.org/abs/1702.00764v2
PDF	https://arxiv.org/pdf/1702.00764v2.pdf
PWC	https://paperswithcode.com/paper/symbolic-distributed-and-distributional
Repo
Framework

Skeleton Boxes: Solving skeleton based action detection with a single deep convolutional neural network


Title	Skeleton Boxes: Solving skeleton based action detection with a single deep convolutional neural network
Authors	Bo Li, Huahui Chen, Yucheng Chen, Yuchao Dai, Mingyi He
Abstract	Action recognition from well-segmented 3D skeleton video has been intensively studied. However, due to the difficulty in representing the 3D skeleton video and the lack of training data, action detection from streaming 3D skeleton video still lags far behind its recognition counterpart and image based object detection. In this paper, we propose a novel approach for this problem, which leverages both effective skeleton video encoding and deep regression based object detection from images. Our framework consists of two parts: skeleton-based video image mapping, which encodes a skeleton video to a color image in a temporal preserving way, and an end-to-end trainable fast skeleton action detector (Skeleton Boxes) based on image detection. Experimental results on the latest and largest PKU-MMD benchmark dataset demonstrate that our method outperforms the state-of-the-art methods with a large margin. We believe our idea would inspire and benefit future research in this important area.
Tasks	Action Detection, Object Detection, Temporal Action Localization
Published	2017-04-19
URL	http://arxiv.org/abs/1704.05643v1
PDF	http://arxiv.org/pdf/1704.05643v1.pdf
PWC	https://paperswithcode.com/paper/skeleton-boxes-solving-skeleton-based-action
Repo
Framework

Clustering-based Source-aware Assessment of True Robustness for Learning Models


Title	Clustering-based Source-aware Assessment of True Robustness for Learning Models
Authors	Ozsel Kilinc, Ismail Uysal
Abstract	We introduce a novel validation framework to measure the true robustness of learning models for real-world applications by creating source-inclusive and source-exclusive partitions in a dataset via clustering. We develop a robustness metric derived from source-aware lower and upper bounds of model accuracy even when data source labels are not readily available. We clearly demonstrate that even on a well-explored dataset like MNIST, challenging training scenarios can be constructed under the proposed assessment framework for two separate yet equally important applications: i) more rigorous learning model comparison and ii) dataset adequacy evaluation. In addition, our findings not only promise a more complete identification of trade-offs between model complexity, accuracy and robustness but can also help researchers optimize their efforts in data collection by identifying the less robust and more challenging class labels.
Tasks
Published	2017-04-01
URL	http://arxiv.org/abs/1704.00158v1
PDF	http://arxiv.org/pdf/1704.00158v1.pdf
PWC	https://paperswithcode.com/paper/clustering-based-source-aware-assessment-of
Repo
Framework


Title	A Web of Hate: Tackling Hateful Speech in Online Social Spaces
Authors	Haji Mohammad Saleem, Kelly P Dillon, Susan Benesch, Derek Ruths
Abstract	Online social platforms are beset with hateful speech - content that expresses hatred for a person or group of people. Such content can frighten, intimidate, or silence platform users, and some of it can inspire other users to commit violence. Despite widespread recognition of the problems posed by such content, reliable solutions even for detecting hateful speech are lacking. In the present work, we establish why keyword-based methods are insufficient for detection. We then propose an approach to detecting hateful speech that uses content produced by self-identifying hateful communities as training data. Our approach bypasses the expensive annotation process often required to train keyword systems and performs well across several established platforms, making substantial improvements over current state-of-the-art approaches.
Tasks
Published	2017-09-28
URL	http://arxiv.org/abs/1709.10159v1
PDF	http://arxiv.org/pdf/1709.10159v1.pdf
PWC	https://paperswithcode.com/paper/a-web-of-hate-tackling-hateful-speech-in
Repo
Framework

An Improved Modified Cholesky Decomposition Method for Precision Matrix Estimation


Title	An Improved Modified Cholesky Decomposition Method for Precision Matrix Estimation
Authors	Xiaoning Kang, Xinwei Deng
Abstract	The modified Cholesky decomposition is commonly used for precision matrix estimation given a specified order of random variables. However, the order of variables is often not available or cannot be pre-determined. In this work, we propose to address the variable order issue in the modified Cholesky decomposition for sparse precision matrix estimation. The key idea is to effectively combine a set of estimates obtained from multiple permutations of variable orders, and to efficiently encourage the sparse structure for the resultant estimate by the thresholding technique on the ensemble Cholesky factor matrix. The consistent property of the proposed estimate is established under some weak regularity conditions. Simulation studies are conducted to evaluate the performance of the proposed method in comparison with several existing approaches. The proposed method is also applied into linear discriminant analysis of real data for classification.
Tasks
Published	2017-10-14
URL	https://arxiv.org/abs/1710.05163v2
PDF	https://arxiv.org/pdf/1710.05163v2.pdf
PWC	https://paperswithcode.com/paper/an-improved-modified-cholesky-decomposition
Repo
Framework

Disagreement-Based Combinatorial Pure Exploration: Sample Complexity Bounds and an Efficient Algorithm


Title	Disagreement-Based Combinatorial Pure Exploration: Sample Complexity Bounds and an Efficient Algorithm
Authors	Tongyi Cao, Akshay Krishnamurthy
Abstract	We design new algorithms for the combinatorial pure exploration problem in the multi-arm bandit framework. In this problem, we are given $K$ distributions and a collection of subsets $\mathcal{V} \subset 2^{[K]}$ of these distributions, and we would like to find the subset $v \in \mathcal{V}$ that has largest mean, while collecting, in a sequential fashion, as few samples from the distributions as possible. In both the fixed budget and fixed confidence settings, our algorithms achieve new sample-complexity bounds that provide polynomial improvements on previous results in some settings. Via an information-theoretic lower bound, we show that no approach based on uniform sampling can improve on ours in any regime, yielding the first interactive algorithms for this problem with this basic property. Computationally, we show how to efficiently implement our fixed confidence algorithm whenever $\mathcal{V}$ supports efficient linear optimization. Our results involve precise concentration-of-measure arguments and a new algorithm for linear programming with exponentially many constraints.
Tasks
Published	2017-11-21
URL	https://arxiv.org/abs/1711.08018v4
PDF	https://arxiv.org/pdf/1711.08018v4.pdf
PWC	https://paperswithcode.com/paper/disagreement-based-combinatorial-pure
Repo
Framework


Title	Armstrong’s Axioms and Navigation Strategies
Authors	Kaya Deuser, Pavel Naumov
Abstract	The paper investigates navigability with imperfect information. It shows that the properties of navigability with perfect recall are exactly those captured by Armstrong’s axioms from the database theory. If the assumption of perfect recall is omitted, then Armstrong’s transitivity axiom is not valid, but it can be replaced by two new weaker principles. The main technical results are soundness and completeness theorems for the logical systems describing properties of navigability with and without perfect recall.
Tasks
Published	2017-07-13
URL	http://arxiv.org/abs/1707.04106v2
PDF	http://arxiv.org/pdf/1707.04106v2.pdf
PWC	https://paperswithcode.com/paper/armstrongs-axioms-and-navigation-strategies
Repo
Framework

Convergence of Unregularized Online Learning Algorithms


Title	Convergence of Unregularized Online Learning Algorithms
Authors	Yunwen Lei, Lei Shi, Zheng-Chu Guo
Abstract	In this paper we study the convergence of online gradient descent algorithms in reproducing kernel Hilbert spaces (RKHSs) without regularization. We establish a sufficient condition and a necessary condition for the convergence of excess generalization errors in expectation. A sufficient condition for the almost sure convergence is also given. With high probability, we provide explicit convergence rates of the excess generalization errors for both averaged iterates and the last iterate, which in turn also imply convergence rates with probability one. To our best knowledge, this is the first high-probability convergence rate for the last iterate of online gradient descent algorithms without strong convexity. Without any boundedness assumptions on iterates, our results are derived by a novel use of two measures of the algorithm’s one-step progress, respectively by generalization errors and by distances in RKHSs, where the variances of the involved martingales are cancelled out by the descent property of the algorithm.
Tasks
Published	2017-08-09
URL	http://arxiv.org/abs/1708.02939v1
PDF	http://arxiv.org/pdf/1708.02939v1.pdf
PWC	https://paperswithcode.com/paper/convergence-of-unregularized-online-learning
Repo
Framework

Identifying Mirror Symmetry Density with Delay in Spiking Neural Networks


Title	Identifying Mirror Symmetry Density with Delay in Spiking Neural Networks
Authors	Jonathan K. George, Cesare Soci, Volker J. Sorger
Abstract	The ability to rapidly identify symmetry and anti-symmetry is an essential attribute of intelligence. Symmetry perception is a central process in human vision and may be key to human 3D visualization. While previous work in understanding neuron symmetry perception has concentrated on the neuron as an integrator, here we show how the coincidence detecting property of the spiking neuron can be used to reveal symmetry density in spatial data. We develop a method for synchronizing symmetry-identifying spiking artificial neural networks to enable layering and feedback in the network. We show a method for building a network capable of identifying symmetry density between sets of data and present a digital logic implementation demonstrating an 8x8 leaky-integrate-and-fire symmetry detector in a field programmable gate array. Our results show that the efficiencies of spiking neural networks can be harnessed to rapidly identify symmetry in spatial data with applications in image processing, 3D computer vision, and robotics.
Tasks
Published	2017-08-25
URL	http://arxiv.org/abs/1709.02684v1
PDF	http://arxiv.org/pdf/1709.02684v1.pdf
PWC	https://paperswithcode.com/paper/identifying-mirror-symmetry-density-with
Repo
Framework

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks


Title	A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
Authors	Behnam Neyshabur, Srinadh Bhojanapalli, Nathan Srebro
Abstract	We present a generalization bound for feedforward neural networks in terms of the product of the spectral norm of the layers and the Frobenius norm of the weights. The generalization bound is derived using a PAC-Bayes analysis.
Tasks
Published	2017-07-29
URL	http://arxiv.org/abs/1707.09564v2
PDF	http://arxiv.org/pdf/1707.09564v2.pdf
PWC	https://paperswithcode.com/paper/a-pac-bayesian-approach-to-spectrally
Repo
Framework