July 27, 2019

2637 words 13 mins read

Paper Group ANR 465

Paper Group ANR 465

Topic modeling of public repositories at scale using names in source code. Universal representations:The missing link between faces, text, planktons, and cat breeds. Summable Reparameterizations of Wasserstein Critics in the One-Dimensional Setting. Uncertainty quantification in graph-based classification of high dimensional data. KnowNER: Incremen …

Topic modeling of public repositories at scale using names in source code

Title Topic modeling of public repositories at scale using names in source code
Authors Vadim Markovtsev, Eiso Kant
Abstract Programming languages themselves have a limited number of reserved keywords and character based tokens that define the language specification. However, programmers have a rich use of natural language within their code through comments, text literals and naming entities. The programmer defined names that can be found in source code are a rich source of information to build a high level understanding of the project. The goal of this paper is to apply topic modeling to names used in over 13.6 million repositories and perceive the inferred topics. One of the problems in such a study is the occurrence of duplicate repositories not officially marked as forks (obscure forks). We show how to address it using the same identifiers which are extracted for topic modeling. We open with a discussion on naming in source code, we then elaborate on our approach to remove exact duplicate and fuzzy duplicate repositories using Locality Sensitive Hashing on the bag-of-words model and then discuss our work on topic modeling; and finally present the results from our data analysis together with open-access to the source code, tools and datasets.
Tasks
Published 2017-04-01
URL http://arxiv.org/abs/1704.00135v2
PDF http://arxiv.org/pdf/1704.00135v2.pdf
PWC https://paperswithcode.com/paper/topic-modeling-of-public-repositories-at
Repo
Framework
Title Universal representations:The missing link between faces, text, planktons, and cat breeds
Authors Hakan Bilen, Andrea Vedaldi
Abstract With the advent of large labelled datasets and high-capacity models, the performance of machine vision systems has been improving rapidly. However, the technology has still major limitations, starting from the fact that different vision problems are still solved by different models, trained from scratch or fine-tuned on the target data. The human visual system, in stark contrast, learns a universal representation for vision in the early life of an individual. This representation works well for an enormous variety of vision problems, with little or no change, with the major advantage of requiring little training data to solve any of them.
Tasks
Published 2017-01-25
URL http://arxiv.org/abs/1701.07275v1
PDF http://arxiv.org/pdf/1701.07275v1.pdf
PWC https://paperswithcode.com/paper/universal-representationsthe-missing-link
Repo
Framework

Summable Reparameterizations of Wasserstein Critics in the One-Dimensional Setting

Title Summable Reparameterizations of Wasserstein Critics in the One-Dimensional Setting
Authors Christopher Grimm, Yuhang Song, Michael L. Littman
Abstract Generative adversarial networks (GANs) are an exciting alternative to algorithms for solving density estimation problems—using data to assess how likely samples are to be drawn from the same distribution. Instead of explicitly computing these probabilities, GANs learn a generator that can match the given probabilistic source. This paper looks particularly at this matching capability in the context of problems with one-dimensional outputs. We identify a class of function decompositions with properties that make them well suited to the critic role in a leading approach to GANs known as Wasserstein GANs. We show that Taylor and Fourier series decompositions belong to our class, provide examples of these critics outperforming standard GAN approaches, and suggest how they can be scaled to higher dimensional problems in the future.
Tasks Density Estimation
Published 2017-09-19
URL http://arxiv.org/abs/1709.06533v1
PDF http://arxiv.org/pdf/1709.06533v1.pdf
PWC https://paperswithcode.com/paper/summable-reparameterizations-of-wasserstein
Repo
Framework

Uncertainty quantification in graph-based classification of high dimensional data

Title Uncertainty quantification in graph-based classification of high dimensional data
Authors Andrea L. Bertozzi, Xiyang Luo, Andrew M. Stuart, Konstantinos C. Zygalakis
Abstract Classification of high dimensional data finds wide-ranging applications. In many of these applications equipping the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of, a variety of Bayesian models for the task of binary classification; via the posterior distribution on the classification labels, these methods automatically give measures of uncertainty. The methods are all based around the graph formulation of semi-supervised learning. We provide a unified framework which brings together a variety of methods which have been introduced in different communities within the mathematical sciences. We study probit classification in the graph-based setting, generalize the level-set method for Bayesian inverse problems to the classification setting, and generalize the Ginzburg-Landau optimization-based classifier to a Bayesian setting; we also show that the probit and level set approaches are natural relaxations of the harmonic function approach introduced in [Zhu et al 2003]. We introduce efficient numerical methods, suited to large data-sets, for both MCMC-based sampling as well as gradient-based MAP estimation. Through numerical experiments we study classification accuracy and uncertainty quantification for our models; these experiments showcase a suite of datasets commonly used to evaluate graph-based semi-supervised learning algorithms.
Tasks
Published 2017-03-26
URL http://arxiv.org/abs/1703.08816v2
PDF http://arxiv.org/pdf/1703.08816v2.pdf
PWC https://paperswithcode.com/paper/uncertainty-quantification-in-graph-based
Repo
Framework

KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition

Title KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition
Authors Dominic Seyler, Tatiana Dembelova, Luciano Del Corro, Johannes Hoffart, Gerhard Weikum
Abstract KnowNER is a multilingual Named Entity Recognition (NER) system that leverages different degrees of external knowledge. A novel modular framework divides the knowledge into four categories according to the depth of knowledge they convey. Each category consists of a set of features automatically generated from different information sources (such as a knowledge-base, a list of names or document-specific semantic annotations) and is used to train a conditional random field (CRF). Since those information sources are usually multilingual, KnowNER can be easily trained for a wide range of languages. In this paper, we show that the incorporation of deeper knowledge systematically boosts accuracy and compare KnowNER with state-of-the-art NER approaches across three languages (i.e., English, German and Spanish) performing amongst state-of-the art systems in all of them.
Tasks Named Entity Recognition
Published 2017-09-11
URL http://arxiv.org/abs/1709.03544v1
PDF http://arxiv.org/pdf/1709.03544v1.pdf
PWC https://paperswithcode.com/paper/knowner-incremental-multilingual-knowledge-in
Repo
Framework

Symbolic, Distributed and Distributional Representations for Natural Language Processing in the Era of Deep Learning: a Survey

Title Symbolic, Distributed and Distributional Representations for Natural Language Processing in the Era of Deep Learning: a Survey
Authors Lorenzo Ferrone, Fabio Massimo Zanzotto
Abstract Natural language is inherently a discrete symbolic representation of human knowledge. Recent advances in machine learning (ML) and in natural language processing (NLP) seem to contradict the above intuition: discrete symbols are fading away, erased by vectors or tensors called distributed and distributional representations. However, there is a strict link between distributed/distributional representations and discrete symbols, being the first an approximation of the second. A clearer understanding of the strict link between distributed/distributional representations and symbols may certainly lead to radically new deep learning networks. In this paper we make a survey that aims to renew the link between symbolic representations and distributed/distributional representations. This is the right time to revitalize the area of interpreting how discrete symbols are represented inside neural networks.
Tasks
Published 2017-02-02
URL https://arxiv.org/abs/1702.00764v2
PDF https://arxiv.org/pdf/1702.00764v2.pdf
PWC https://paperswithcode.com/paper/symbolic-distributed-and-distributional
Repo
Framework

Skeleton Boxes: Solving skeleton based action detection with a single deep convolutional neural network

Title Skeleton Boxes: Solving skeleton based action detection with a single deep convolutional neural network
Authors Bo Li, Huahui Chen, Yucheng Chen, Yuchao Dai, Mingyi He
Abstract Action recognition from well-segmented 3D skeleton video has been intensively studied. However, due to the difficulty in representing the 3D skeleton video and the lack of training data, action detection from streaming 3D skeleton video still lags far behind its recognition counterpart and image based object detection. In this paper, we propose a novel approach for this problem, which leverages both effective skeleton video encoding and deep regression based object detection from images. Our framework consists of two parts: skeleton-based video image mapping, which encodes a skeleton video to a color image in a temporal preserving way, and an end-to-end trainable fast skeleton action detector (Skeleton Boxes) based on image detection. Experimental results on the latest and largest PKU-MMD benchmark dataset demonstrate that our method outperforms the state-of-the-art methods with a large margin. We believe our idea would inspire and benefit future research in this important area.
Tasks Action Detection, Object Detection, Temporal Action Localization
Published 2017-04-19
URL http://arxiv.org/abs/1704.05643v1
PDF http://arxiv.org/pdf/1704.05643v1.pdf
PWC https://paperswithcode.com/paper/skeleton-boxes-solving-skeleton-based-action
Repo
Framework

Clustering-based Source-aware Assessment of True Robustness for Learning Models

Title Clustering-based Source-aware Assessment of True Robustness for Learning Models
Authors Ozsel Kilinc, Ismail Uysal
Abstract We introduce a novel validation framework to measure the true robustness of learning models for real-world applications by creating source-inclusive and source-exclusive partitions in a dataset via clustering. We develop a robustness metric derived from source-aware lower and upper bounds of model accuracy even when data source labels are not readily available. We clearly demonstrate that even on a well-explored dataset like MNIST, challenging training scenarios can be constructed under the proposed assessment framework for two separate yet equally important applications: i) more rigorous learning model comparison and ii) dataset adequacy evaluation. In addition, our findings not only promise a more complete identification of trade-offs between model complexity, accuracy and robustness but can also help researchers optimize their efforts in data collection by identifying the less robust and more challenging class labels.
Tasks
Published 2017-04-01
URL http://arxiv.org/abs/1704.00158v1
PDF http://arxiv.org/pdf/1704.00158v1.pdf
PWC https://paperswithcode.com/paper/clustering-based-source-aware-assessment-of
Repo
Framework

A Web of Hate: Tackling Hateful Speech in Online Social Spaces

Title A Web of Hate: Tackling Hateful Speech in Online Social Spaces
Authors Haji Mohammad Saleem, Kelly P Dillon, Susan Benesch, Derek Ruths
Abstract Online social platforms are beset with hateful speech - content that expresses hatred for a person or group of people. Such content can frighten, intimidate, or silence platform users, and some of it can inspire other users to commit violence. Despite widespread recognition of the problems posed by such content, reliable solutions even for detecting hateful speech are lacking. In the present work, we establish why keyword-based methods are insufficient for detection. We then propose an approach to detecting hateful speech that uses content produced by self-identifying hateful communities as training data. Our approach bypasses the expensive annotation process often required to train keyword systems and performs well across several established platforms, making substantial improvements over current state-of-the-art approaches.
Tasks
Published 2017-09-28
URL http://arxiv.org/abs/1709.10159v1
PDF http://arxiv.org/pdf/1709.10159v1.pdf
PWC https://paperswithcode.com/paper/a-web-of-hate-tackling-hateful-speech-in
Repo
Framework

An Improved Modified Cholesky Decomposition Method for Precision Matrix Estimation

Title An Improved Modified Cholesky Decomposition Method for Precision Matrix Estimation
Authors Xiaoning Kang, Xinwei Deng
Abstract The modified Cholesky decomposition is commonly used for precision matrix estimation given a specified order of random variables. However, the order of variables is often not available or cannot be pre-determined. In this work, we propose to address the variable order issue in the modified Cholesky decomposition for sparse precision matrix estimation. The key idea is to effectively combine a set of estimates obtained from multiple permutations of variable orders, and to efficiently encourage the sparse structure for the resultant estimate by the thresholding technique on the ensemble Cholesky factor matrix. The consistent property of the proposed estimate is established under some weak regularity conditions. Simulation studies are conducted to evaluate the performance of the proposed method in comparison with several existing approaches. The proposed method is also applied into linear discriminant analysis of real data for classification.
Tasks
Published 2017-10-14
URL https://arxiv.org/abs/1710.05163v2
PDF https://arxiv.org/pdf/1710.05163v2.pdf
PWC https://paperswithcode.com/paper/an-improved-modified-cholesky-decomposition
Repo
Framework

Disagreement-Based Combinatorial Pure Exploration: Sample Complexity Bounds and an Efficient Algorithm

Title Disagreement-Based Combinatorial Pure Exploration: Sample Complexity Bounds and an Efficient Algorithm
Authors Tongyi Cao, Akshay Krishnamurthy
Abstract We design new algorithms for the combinatorial pure exploration problem in the multi-arm bandit framework. In this problem, we are given $K$ distributions and a collection of subsets $\mathcal{V} \subset 2^{[K]}$ of these distributions, and we would like to find the subset $v \in \mathcal{V}$ that has largest mean, while collecting, in a sequential fashion, as few samples from the distributions as possible. In both the fixed budget and fixed confidence settings, our algorithms achieve new sample-complexity bounds that provide polynomial improvements on previous results in some settings. Via an information-theoretic lower bound, we show that no approach based on uniform sampling can improve on ours in any regime, yielding the first interactive algorithms for this problem with this basic property. Computationally, we show how to efficiently implement our fixed confidence algorithm whenever $\mathcal{V}$ supports efficient linear optimization. Our results involve precise concentration-of-measure arguments and a new algorithm for linear programming with exponentially many constraints.
Tasks
Published 2017-11-21
URL https://arxiv.org/abs/1711.08018v4
PDF https://arxiv.org/pdf/1711.08018v4.pdf
PWC https://paperswithcode.com/paper/disagreement-based-combinatorial-pure
Repo
Framework

Armstrong’s Axioms and Navigation Strategies

Title Armstrong’s Axioms and Navigation Strategies
Authors Kaya Deuser, Pavel Naumov
Abstract The paper investigates navigability with imperfect information. It shows that the properties of navigability with perfect recall are exactly those captured by Armstrong’s axioms from the database theory. If the assumption of perfect recall is omitted, then Armstrong’s transitivity axiom is not valid, but it can be replaced by two new weaker principles. The main technical results are soundness and completeness theorems for the logical systems describing properties of navigability with and without perfect recall.
Tasks
Published 2017-07-13
URL http://arxiv.org/abs/1707.04106v2
PDF http://arxiv.org/pdf/1707.04106v2.pdf
PWC https://paperswithcode.com/paper/armstrongs-axioms-and-navigation-strategies
Repo
Framework

Convergence of Unregularized Online Learning Algorithms

Title Convergence of Unregularized Online Learning Algorithms
Authors Yunwen Lei, Lei Shi, Zheng-Chu Guo
Abstract In this paper we study the convergence of online gradient descent algorithms in reproducing kernel Hilbert spaces (RKHSs) without regularization. We establish a sufficient condition and a necessary condition for the convergence of excess generalization errors in expectation. A sufficient condition for the almost sure convergence is also given. With high probability, we provide explicit convergence rates of the excess generalization errors for both averaged iterates and the last iterate, which in turn also imply convergence rates with probability one. To our best knowledge, this is the first high-probability convergence rate for the last iterate of online gradient descent algorithms without strong convexity. Without any boundedness assumptions on iterates, our results are derived by a novel use of two measures of the algorithm’s one-step progress, respectively by generalization errors and by distances in RKHSs, where the variances of the involved martingales are cancelled out by the descent property of the algorithm.
Tasks
Published 2017-08-09
URL http://arxiv.org/abs/1708.02939v1
PDF http://arxiv.org/pdf/1708.02939v1.pdf
PWC https://paperswithcode.com/paper/convergence-of-unregularized-online-learning
Repo
Framework

Identifying Mirror Symmetry Density with Delay in Spiking Neural Networks

Title Identifying Mirror Symmetry Density with Delay in Spiking Neural Networks
Authors Jonathan K. George, Cesare Soci, Volker J. Sorger
Abstract The ability to rapidly identify symmetry and anti-symmetry is an essential attribute of intelligence. Symmetry perception is a central process in human vision and may be key to human 3D visualization. While previous work in understanding neuron symmetry perception has concentrated on the neuron as an integrator, here we show how the coincidence detecting property of the spiking neuron can be used to reveal symmetry density in spatial data. We develop a method for synchronizing symmetry-identifying spiking artificial neural networks to enable layering and feedback in the network. We show a method for building a network capable of identifying symmetry density between sets of data and present a digital logic implementation demonstrating an 8x8 leaky-integrate-and-fire symmetry detector in a field programmable gate array. Our results show that the efficiencies of spiking neural networks can be harnessed to rapidly identify symmetry in spatial data with applications in image processing, 3D computer vision, and robotics.
Tasks
Published 2017-08-25
URL http://arxiv.org/abs/1709.02684v1
PDF http://arxiv.org/pdf/1709.02684v1.pdf
PWC https://paperswithcode.com/paper/identifying-mirror-symmetry-density-with
Repo
Framework

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

Title A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
Authors Behnam Neyshabur, Srinadh Bhojanapalli, Nathan Srebro
Abstract We present a generalization bound for feedforward neural networks in terms of the product of the spectral norm of the layers and the Frobenius norm of the weights. The generalization bound is derived using a PAC-Bayes analysis.
Tasks
Published 2017-07-29
URL http://arxiv.org/abs/1707.09564v2
PDF http://arxiv.org/pdf/1707.09564v2.pdf
PWC https://paperswithcode.com/paper/a-pac-bayesian-approach-to-spectrally
Repo
Framework
comments powered by Disqus