May 6, 2019

3249 words 16 mins read

Paper Group ANR 224

Paper Group ANR 224

Adaptive Candidate Generation for Scalable Edge-discovery Tasks on Data Graphs. Mapping Tractography Across Subjects. A new primal-dual algorithm for minimizing the sum of three functions with a linear operator. On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays. Quantum spectral analysis: frequency in time, with applicatio …

Adaptive Candidate Generation for Scalable Edge-discovery Tasks on Data Graphs

Title Adaptive Candidate Generation for Scalable Edge-discovery Tasks on Data Graphs
Authors Mayank Kejriwal
Abstract Several `edge-discovery’ applications over graph-based data models are known to have worst-case quadratic time complexity in the nodes, even if the discovered edges are sparse. One example is the generic link discovery problem between two graphs, which has invited research interest in several communities. Specific versions of this problem include link prediction in social networks, ontology alignment between metadata-rich RDF data, approximate joins, and entity resolution between instance-rich data. As large datasets continue to proliferate, reducing quadratic complexity to make the task practical is an important research problem. Within the entity resolution community, the problem is commonly referred to as blocking. A particular class of learnable blocking schemes is known as Disjunctive Normal Form (DNF) blocking schemes, and has emerged as state-of-the art for homogeneous (i.e. same-schema) tabular data. Despite the promise of these schemes, a formalism or learning framework has not been developed for them when input data instances are generic, attributed graphs possessing both node and edge heterogeneity. With such a development, the complexity-reducing scope of DNF schemes becomes applicable to a variety of problems, including entity resolution and type alignment between heterogeneous graphs, and link prediction in networks represented as attributed graphs. This paper presents a graph-theoretic formalism for DNF schemes, and investigates their learnability in an optimization framework. We also briefly describe an empirical case study encapsulating some of the principles in this paper. |
Tasks Entity Resolution, Link Prediction
Published 2016-05-02
URL http://arxiv.org/abs/1605.00686v2
PDF http://arxiv.org/pdf/1605.00686v2.pdf
PWC https://paperswithcode.com/paper/adaptive-candidate-generation-for-scalable
Repo
Framework

Mapping Tractography Across Subjects

Title Mapping Tractography Across Subjects
Authors Thien Bao Nguyen, Emanuele Olivetti, Paolo Avesani
Abstract Diffusion magnetic resonance imaging (dMRI) and tractography provide means to study the anatomical structures within the white matter of the brain. When studying tractography data across subjects, it is usually necessary to align, i.e. to register, tractographies together. This registration step is most often performed by applying the transformation resulting from the registration of other volumetric images (T1, FA). In contrast with registration methods that “transform” tractographies, in this work, we try to find which streamline in one tractography correspond to which streamline in the other tractography, without any transformation. In other words, we try to find a “mapping” between the tractographies. We propose a graph-based solution for the tractography mapping problem and we explain similarities and differences with the related well-known graph matching problem. Specifically, we define a loss function based on the pairwise streamline distance and reformulate the mapping problem as combinatorial optimization of that loss function. We show preliminary promising results where we compare the proposed method, implemented with simulated annealing, against a standard registration techniques in a task of segmentation of the corticospinal tract.
Tasks Combinatorial Optimization, Graph Matching
Published 2016-01-29
URL http://arxiv.org/abs/1601.08165v1
PDF http://arxiv.org/pdf/1601.08165v1.pdf
PWC https://paperswithcode.com/paper/mapping-tractography-across-subjects
Repo
Framework

A new primal-dual algorithm for minimizing the sum of three functions with a linear operator

Title A new primal-dual algorithm for minimizing the sum of three functions with a linear operator
Authors Ming Yan
Abstract In this paper, we propose a new primal-dual algorithm for minimizing $f(x) + g(x) + h(Ax)$, where $f$, $g$, and $h$ are proper lower semi-continuous convex functions, $f$ is differentiable with a Lipschitz continuous gradient, and $A$ is a bounded linear operator. The proposed algorithm has some famous primal-dual algorithms for minimizing the sum of two functions as special cases. E.g., it reduces to the Chambolle-Pock algorithm when $f = 0$ and the proximal alternating predictor-corrector when $g = 0$. For the general convex case, we prove the convergence of this new algorithm in terms of the distance to a fixed point by showing that the iteration is a nonexpansive operator. In addition, we prove the $O(1/k)$ ergodic convergence rate in the primal-dual gap. With additional assumptions, we derive the linear convergence rate in terms of the distance to the fixed point. Comparing to other primal-dual algorithms for solving the same problem, this algorithm extends the range of acceptable parameters to ensure its convergence and has a smaller per-iteration cost. The numerical experiments show the efficiency of this algorithm.
Tasks
Published 2016-11-29
URL http://arxiv.org/abs/1611.09805v4
PDF http://arxiv.org/pdf/1611.09805v4.pdf
PWC https://paperswithcode.com/paper/a-new-primal-dual-algorithm-for-minimizing
Repo
Framework

On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays

Title On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays
Authors Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin
Abstract Recent years have witnessed the surge of asynchronous parallel (async-parallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables. Because of asynchrony, the iterates are computed with outdated information, and the age of the outdated information, which we call delay, is the number of times it has been updated since its creation. Almost all recent works prove convergence under the assumption of a finite maximum delay and set their stepsize parameters accordingly. However, the maximum delay is practically unknown. This paper presents convergence analysis of an async-parallel method from a probabilistic viewpoint, and it allows for large unbounded delays. An explicit formula of stepsize that guarantees convergence is given depending on delays’ statistics. With $p+1$ identical processors, we empirically measured that delays closely follow the Poisson distribution with parameter $p$, matching our theoretical model, and thus the stepsize can be set accordingly. Simulations on both convex and nonconvex optimization problems demonstrate the validness of our analysis and also show that the existing maximum-delay induced stepsize is too conservative, often slowing down the convergence of the algorithm.
Tasks
Published 2016-12-13
URL http://arxiv.org/abs/1612.04425v2
PDF http://arxiv.org/pdf/1612.04425v2.pdf
PWC https://paperswithcode.com/paper/on-the-convergence-of-asynchronous-parallel
Repo
Framework

Quantum spectral analysis: frequency in time, with applications to signal and image processing

Title Quantum spectral analysis: frequency in time, with applications to signal and image processing
Authors Mario Mastriani
Abstract A quantum time-dependent spectrum analysis, or simply, quantum spectral analysis (QSA) is presented in this work, and it is based on Schrodinger equation, which is a partial differential equation that describes how the quantum state of a non-relativistic physical system changes with time. In classic world is named frequency in time (FIT), which is presented here in opposition and as a complement of traditional spectral analysis frequency-dependent based on Fourier theory. Besides, FIT is a metric, which assesses the impact of the flanks of a signal on its frequency spectrum, which is not taken into account by Fourier theory and even less in real time. Even more, and unlike all derived tools from Fourier Theory (i.e., continuous, discrete, fast, short-time, fractional and quantum Fourier Transform, as well as, Gabor) FIT has the following advantages: a) compact support with excellent energy output treatment, b) low computational cost, O(N) for signals and O(N2) for images, c) it does not have phase uncertainties (indeterminate phase for magnitude = 0) as Discrete and Fast Fourier Transform (DFT, FFT, respectively), d) among others. In fact, FIT constitutes one side of a triangle (which from now on is closed) and it consists of the original signal in time, spectral analysis based on Fourier Theory and FIT. Thus a toolbox is completed, which it is essential for all applications of Digital Signal Processing (DSP) and Digital Image Processing (DIP); and, even, in the latter, FIT allows edge detection (which is called flank detection in case of signals), denoising, despeckling, compression, and superresolution of still images. Such applications include signals intelligence and imagery intelligence. On the other hand, we will present other DIP tools, which are also derived from the Schrodinger equation.
Tasks Denoising, Edge Detection
Published 2016-10-11
URL https://arxiv.org/abs/1611.02302v7
PDF https://arxiv.org/pdf/1611.02302v7.pdf
PWC https://paperswithcode.com/paper/quantum-spectral-analysis-frequency-in-time
Repo
Framework

Fractional Calculus In Image Processing: A Review

Title Fractional Calculus In Image Processing: A Review
Authors Qi Yang, Dali Chen, Tiebiao Zhao, YangQuan Chen
Abstract Over the last decade, it has been demonstrated that many systems in science and engineering can be modeled more accurately by fractional-order than integer-order derivatives, and many methods are developed to solve the problem of fractional systems. Due to the extra free parameter order, fractional-order based methods provide additional degree of freedom in optimization performance. Not surprisingly, many fractional-order based methods have been used in image processing field. Herein recent studies are reviewed in ten sub-fields, which include image enhancement, image denoising, image edge detection, image segmentation, image registration, image recognition, image fusion, image encryption, image compression and image restoration. In sum, it is well proved that as a fundamental mathematic tool, fractional-order derivative shows great success in image processing.
Tasks Denoising, Edge Detection, Image Compression, Image Denoising, Image Enhancement, Image Registration, Image Restoration, Semantic Segmentation
Published 2016-08-10
URL http://arxiv.org/abs/1608.03240v1
PDF http://arxiv.org/pdf/1608.03240v1.pdf
PWC https://paperswithcode.com/paper/fractional-calculus-in-image-processing-a
Repo
Framework

Semantically Guided Depth Upsampling

Title Semantically Guided Depth Upsampling
Authors Nick Schneider, Lukas Schneider, Peter Pinggera, Uwe Franke, Marc Pollefeys, Christoph Stiller
Abstract We present a novel method for accurate and efficient up- sampling of sparse depth data, guided by high-resolution imagery. Our approach goes beyond the use of intensity cues only and additionally exploits object boundary cues through structured edge detection and semantic scene labeling for guidance. Both cues are combined within a geodesic distance measure that allows for boundary-preserving depth in- terpolation while utilizing local context. We model the observed scene structure by locally planar elements and formulate the upsampling task as a global energy minimization problem. Our method determines glob- ally consistent solutions and preserves fine details and sharp depth bound- aries. In our experiments on several public datasets at different levels of application, we demonstrate superior performance of our approach over the state-of-the-art, even for very sparse measurements.
Tasks Edge Detection, Scene Labeling
Published 2016-08-02
URL http://arxiv.org/abs/1608.00753v1
PDF http://arxiv.org/pdf/1608.00753v1.pdf
PWC https://paperswithcode.com/paper/semantically-guided-depth-upsampling
Repo
Framework

Poor starting points in machine learning

Title Poor starting points in machine learning
Authors Mark Tygert
Abstract Poor (even random) starting points for learning/training/optimization are common in machine learning. In many settings, the method of Robbins and Monro (online stochastic gradient descent) is known to be optimal for good starting points, but may not be optimal for poor starting points – indeed, for poor starting points Nesterov acceleration can help during the initial iterations, even though Nesterov methods not designed for stochastic approximation could hurt during later iterations. The common practice of training with nontrivial minibatches enhances the advantage of Nesterov acceleration.
Tasks
Published 2016-02-09
URL http://arxiv.org/abs/1602.02823v1
PDF http://arxiv.org/pdf/1602.02823v1.pdf
PWC https://paperswithcode.com/paper/poor-starting-points-in-machine-learning
Repo
Framework

Causality and Responsibility for Formal Verification and Beyond

Title Causality and Responsibility for Formal Verification and Beyond
Authors Hana Chockler
Abstract The theory of actual causality, defined by Halpern and Pearl, and its quantitative measure - the degree of responsibility - was shown to be extremely useful in various areas of computer science due to a good match between the results it produces and our intuition. In this paper, I describe the applications of causality to formal verification, namely, explanation of counterexamples, refinement of coverage metrics, and symbolic trajectory evaluation. I also briefly discuss recent applications of causality to legal reasoning.
Tasks
Published 2016-08-29
URL http://arxiv.org/abs/1608.07879v1
PDF http://arxiv.org/pdf/1608.07879v1.pdf
PWC https://paperswithcode.com/paper/causality-and-responsibility-for-formal
Repo
Framework

Semantics derived automatically from language corpora contain human-like biases

Title Semantics derived automatically from language corpora contain human-like biases
Authors Aylin Caliskan, Joanna J. Bryson, Arvind Narayanan
Abstract Artificial intelligence and machine learning are in a period of astounding growth. However, there are concerns that these technologies may be used, either with or without intention, to perpetuate the prejudice and unfairness that unfortunately characterizes many human institutions. Here we show for the first time that human-like semantic biases result from the application of standard machine learning to ordinary language—the same sort of language humans are exposed to every day. We replicate a spectrum of standard human biases as exposed by the Implicit Association Test and other well-known psychological studies. We replicate these using a widely used, purely statistical machine-learning model—namely, the GloVe word embedding—trained on a corpus of text from the Web. Our results indicate that language itself contains recoverable and accurate imprints of our historic biases, whether these are morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the {\em status quo} for the distribution of gender with respect to careers or first names. These regularities are captured by machine learning along with the rest of semantics. In addition to our empirical findings concerning language, we also contribute new methods for evaluating bias in text, the Word Embedding Association Test (WEAT) and the Word Embedding Factual Association Test (WEFAT). Our results have implications not only for AI and machine learning, but also for the fields of psychology, sociology, and human ethics, since they raise the possibility that mere exposure to everyday language can account for the biases we replicate here.
Tasks
Published 2016-08-25
URL http://arxiv.org/abs/1608.07187v4
PDF http://arxiv.org/pdf/1608.07187v4.pdf
PWC https://paperswithcode.com/paper/semantics-derived-automatically-from-language
Repo
Framework

Clustering Via Crowdsourcing

Title Clustering Via Crowdsourcing
Authors Arya Mazumdar, Barna Saha
Abstract In recent years, crowdsourcing, aka human aided computation has emerged as an effective platform for solving problems that are considered complex for machines alone. Using human is time-consuming and costly due to monetary compensations. Therefore, a crowd based algorithm must judiciously use any information computed through an automated process, and ask minimum number of questions to the crowd adaptively. One such problem which has received significant attention is {\em entity resolution}. Formally, we are given a graph $G=(V,E)$ with unknown edge set $E$ where $G$ is a union of $k$ (again unknown, but typically large $O(n^\alpha)$, for $\alpha>0$) disjoint cliques $G_i(V_i, E_i)$, $i =1, \dots, k$. The goal is to retrieve the sets $V_i$s by making minimum number of pair-wise queries $V \times V\to{\pm1}$ to an oracle (the crowd). When the answer to each query is correct, e.g. via resampling, then this reduces to finding connected components in a graph. On the other hand, when crowd answers may be incorrect, it corresponds to clustering over minimum number of noisy inputs. Even, with perfect answers, a simple lower and upper bound of $\Theta(nk)$ on query complexity can be shown. A major contribution of this paper is to reduce the query complexity to linear or even sublinear in $n$ when mild side information is provided by a machine, and even in presence of crowd errors which are not correctable via resampling. We develop new information theoretic lower bounds on the query complexity of clustering with side information and errors, and our upper bounds closely match with them. Our algorithms are naturally parallelizable, and also give near-optimal bounds on the number of adaptive rounds required to match the query complexity.
Tasks Entity Resolution
Published 2016-04-07
URL http://arxiv.org/abs/1604.01839v1
PDF http://arxiv.org/pdf/1604.01839v1.pdf
PWC https://paperswithcode.com/paper/clustering-via-crowdsourcing
Repo
Framework

Fast Learning from Distributed Datasets without Entity Matching

Title Fast Learning from Distributed Datasets without Entity Matching
Authors Giorgio Patrini, Richard Nock, Stephen Hardy, Tiberio Caetano
Abstract Consider the following data fusion scenario: two datasets/peers contain the same real-world entities described using partially shared features, e.g. banking and insurance company records of the same customer base. Our goal is to learn a classifier in the cross product space of the two domains, in the hard case in which no shared ID is available – e.g. due to anonymization. Traditionally, the problem is approached by first addressing entity matching and subsequently learning the classifier in a standard manner. We present an end-to-end solution which bypasses matching entities, based on the recently introduced concept of Rademacher observations (rados). Informally, we replace the minimisation of a loss over examples, which requires to solve entity resolution, by the equivalent minimisation of a (different) loss over rados. Among others, key properties we show are (i) a potentially huge subset of these rados does not require to perform entity matching, and (ii) the algorithm that provably minimizes the rado loss over these rados has time and space complexities smaller than the algorithm minimizing the equivalent example loss. Last, we relax a key assumption of the model, that the data is vertically partitioned among peers — in this case, we would not even know the existence of a solution to entity resolution. In this more general setting, experiments validate the possibility of significantly beating even the optimal peer in hindsight.
Tasks Entity Resolution
Published 2016-03-13
URL http://arxiv.org/abs/1603.04002v1
PDF http://arxiv.org/pdf/1603.04002v1.pdf
PWC https://paperswithcode.com/paper/fast-learning-from-distributed-datasets
Repo
Framework

ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

Title ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
Authors Zeinab Bahmani, Leopoldo Bertossi, Nikolaos Vasiloglou
Abstract Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called “matching dependencies” (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating four components of ER: (a) Building a classifier for duplicate/non-duplicate record pairs built using machine learning (ML) techniques; (b) Use of MDs for supporting the blocking phase of ML; (c) Record merging on the basis of the classifier results; and (d) The use of the declarative language “LogiQL” -an extended form of Datalog supported by the “LogicBlox” platform- for all activities related to data processing, and the specification and enforcement of MDs.
Tasks Entity Resolution
Published 2016-02-07
URL http://arxiv.org/abs/1602.02334v3
PDF http://arxiv.org/pdf/1602.02334v3.pdf
PWC https://paperswithcode.com/paper/erblox-combining-matching-dependencies-with
Repo
Framework

Reduced Space and Faster Convergence in Imperfect-Information Games via Regret-Based Pruning

Title Reduced Space and Faster Convergence in Imperfect-Information Games via Regret-Based Pruning
Authors Noam Brown, Tuomas Sandholm
Abstract Counterfactual Regret Minimization (CFR) is the most popular iterative algorithm for solving zero-sum imperfect-information games. Regret-Based Pruning (RBP) is an improvement that allows poorly-performing actions to be temporarily pruned, thus speeding up CFR. We introduce Total RBP, a new form of RBP that reduces the space requirements of CFR as actions are pruned. We prove that in zero-sum games it asymptotically prunes any action that is not part of a best response to some Nash equilibrium. This leads to provably faster convergence and lower space requirements. Experiments show that Total RBP results in an order of magnitude reduction in space, and the reduction factor increases with game size.
Tasks
Published 2016-09-12
URL http://arxiv.org/abs/1609.03234v1
PDF http://arxiv.org/pdf/1609.03234v1.pdf
PWC https://paperswithcode.com/paper/reduced-space-and-faster-convergence-in-1
Repo
Framework

Mapping Data to Ontologies with Exceptions Using Answer Set Programming

Title Mapping Data to Ontologies with Exceptions Using Answer Set Programming
Authors Daniel P. Lupp, Evgenij Thorstensen
Abstract In ontology-based data access, databases are connected to an ontology via mappings from queries over the database to queries over the ontology. In this paper, we consider mappings from relational databases to first-order ontologies, and define an ASP-based framework for GLAV mappings with queries over the ontology in the mapping rule bodies. We show that this type of mappings can be used to express constraints and exceptions, as well as being a powerful mechanism for succinctly representing OBDA mappings. We give an algorithm for brave reasoning in this setting, and show that this problem has either the same data complexity as ASP (NP- complete), or it is at least as hard as the complexity of checking entailment for the ontology queries. Furthermore, we show that for ontologies with UCQ-rewritable queries there exists a natural reduction from mapping programs to \exists-ASP, an extension of ASP with existential variables that itself admits a natural reduction to ASP.
Tasks
Published 2016-07-07
URL http://arxiv.org/abs/1607.02018v1
PDF http://arxiv.org/pdf/1607.02018v1.pdf
PWC https://paperswithcode.com/paper/mapping-data-to-ontologies-with-exceptions
Repo
Framework
comments powered by Disqus