May 6, 2019

3249 words 16 mins read

Paper Group ANR 224

Adaptive Candidate Generation for Scalable Edge-discovery Tasks on Data Graphs. Mapping Tractography Across Subjects. A new primal-dual algorithm for minimizing the sum of three functions with a linear operator. On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays. Quantum spectral analysis: frequency in time, with applicatio …

Adaptive Candidate Generation for Scalable Edge-discovery Tasks on Data Graphs


Title	Adaptive Candidate Generation for Scalable Edge-discovery Tasks on Data Graphs
Authors	Mayank Kejriwal
Abstract	Several `edge-discovery’ applications over graph-based data models are known to have worst-case quadratic time complexity in the nodes, even if the discovered edges are sparse. One example is the generic link discovery problem between two graphs, which has invited research interest in several communities. Specific versions of this problem include link prediction in social networks, ontology alignment between metadata-rich RDF data, approximate joins, and entity resolution between instance-rich data. As large datasets continue to proliferate, reducing quadratic complexity to make the task practical is an important research problem. Within the entity resolution community, the problem is commonly referred to as blocking. A particular class of learnable blocking schemes is known as Disjunctive Normal Form (DNF) blocking schemes, and has emerged as state-of-the art for homogeneous (i.e. same-schema) tabular data. Despite the promise of these schemes, a formalism or learning framework has not been developed for them when input data instances are generic, attributed graphs possessing both node and edge heterogeneity. With such a development, the complexity-reducing scope of DNF schemes becomes applicable to a variety of problems, including entity resolution and type alignment between heterogeneous graphs, and link prediction in networks represented as attributed graphs. This paper presents a graph-theoretic formalism for DNF schemes, and investigates their learnability in an optimization framework. We also briefly describe an empirical case study encapsulating some of the principles in this paper. \|
Tasks	Entity Resolution, Link Prediction
Published	2016-05-02
URL	http://arxiv.org/abs/1605.00686v2
PDF	http://arxiv.org/pdf/1605.00686v2.pdf
PWC	https://paperswithcode.com/paper/adaptive-candidate-generation-for-scalable
Repo
Framework

Mapping Tractography Across Subjects


Title	Mapping Tractography Across Subjects
Authors	Thien Bao Nguyen, Emanuele Olivetti, Paolo Avesani
Abstract	Diffusion magnetic resonance imaging (dMRI) and tractography provide means to study the anatomical structures within the white matter of the brain. When studying tractography data across subjects, it is usually necessary to align, i.e. to register, tractographies together. This registration step is most often performed by applying the transformation resulting from the registration of other volumetric images (T1, FA). In contrast with registration methods that “transform” tractographies, in this work, we try to find which streamline in one tractography correspond to which streamline in the other tractography, without any transformation. In other words, we try to find a “mapping” between the tractographies. We propose a graph-based solution for the tractography mapping problem and we explain similarities and differences with the related well-known graph matching problem. Specifically, we define a loss function based on the pairwise streamline distance and reformulate the mapping problem as combinatorial optimization of that loss function. We show preliminary promising results where we compare the proposed method, implemented with simulated annealing, against a standard registration techniques in a task of segmentation of the corticospinal tract.
Tasks	Combinatorial Optimization, Graph Matching
Published	2016-01-29
URL	http://arxiv.org/abs/1601.08165v1
PDF	http://arxiv.org/pdf/1601.08165v1.pdf
PWC	https://paperswithcode.com/paper/mapping-tractography-across-subjects
Repo
Framework

A new primal-dual algorithm for minimizing the sum of three functions with a linear operator


Title	A new primal-dual algorithm for minimizing the sum of three functions with a linear operator
Authors	Ming Yan
Abstract	In this paper, we propose a new primal-dual algorithm for minimizing $f(x) + g(x) + h(Ax)$, where $f$, $g$, and $h$ are proper lower semi-continuous convex functions, $f$ is differentiable with a Lipschitz continuous gradient, and $A$ is a bounded linear operator. The proposed algorithm has some famous primal-dual algorithms for minimizing the sum of two functions as special cases. E.g., it reduces to the Chambolle-Pock algorithm when $f = 0$ and the proximal alternating predictor-corrector when $g = 0$. For the general convex case, we prove the convergence of this new algorithm in terms of the distance to a fixed point by showing that the iteration is a nonexpansive operator. In addition, we prove the $O(1/k)$ ergodic convergence rate in the primal-dual gap. With additional assumptions, we derive the linear convergence rate in terms of the distance to the fixed point. Comparing to other primal-dual algorithms for solving the same problem, this algorithm extends the range of acceptable parameters to ensure its convergence and has a smaller per-iteration cost. The numerical experiments show the efficiency of this algorithm.
Tasks
Published	2016-11-29
URL	http://arxiv.org/abs/1611.09805v4
PDF	http://arxiv.org/pdf/1611.09805v4.pdf
PWC	https://paperswithcode.com/paper/a-new-primal-dual-algorithm-for-minimizing
Repo
Framework

On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays


Title	On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays
Authors	Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin
Abstract	Recent years have witnessed the surge of asynchronous parallel (async-parallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables. Because of asynchrony, the iterates are computed with outdated information, and the age of the outdated information, which we call delay, is the number of times it has been updated since its creation. Almost all recent works prove convergence under the assumption of a finite maximum delay and set their stepsize parameters accordingly. However, the maximum delay is practically unknown. This paper presents convergence analysis of an async-parallel method from a probabilistic viewpoint, and it allows for large unbounded delays. An explicit formula of stepsize that guarantees convergence is given depending on delays’ statistics. With $p+1$ identical processors, we empirically measured that delays closely follow the Poisson distribution with parameter $p$, matching our theoretical model, and thus the stepsize can be set accordingly. Simulations on both convex and nonconvex optimization problems demonstrate the validness of our analysis and also show that the existing maximum-delay induced stepsize is too conservative, often slowing down the convergence of the algorithm.
Tasks
Published	2016-12-13
URL	http://arxiv.org/abs/1612.04425v2
PDF	http://arxiv.org/pdf/1612.04425v2.pdf
PWC	https://paperswithcode.com/paper/on-the-convergence-of-asynchronous-parallel
Repo
Framework

Quantum spectral analysis: frequency in time, with applications to signal and image processing


Title	Quantum spectral analysis: frequency in time, with applications to signal and image processing
Authors	Mario Mastriani
Abstract	A quantum time-dependent spectrum analysis, or simply, quantum spectral analysis (QSA) is presented in this work, and it is based on Schrodinger equation, which is a partial differential equation that describes how the quantum state of a non-relativistic physical system changes with time. In classic world is named frequency in time (FIT), which is presented here in opposition and as a complement of traditional spectral analysis frequency-dependent based on Fourier theory. Besides, FIT is a metric, which assesses the impact of the flanks of a signal on its frequency spectrum, which is not taken into account by Fourier theory and even less in real time. Even more, and unlike all derived tools from Fourier Theory (i.e., continuous, discrete, fast, short-time, fractional and quantum Fourier Transform, as well as, Gabor) FIT has the following advantages: a) compact support with excellent energy output treatment, b) low computational cost, O(N) for signals and O(N2) for images, c) it does not have phase uncertainties (indeterminate phase for magnitude = 0) as Discrete and Fast Fourier Transform (DFT, FFT, respectively), d) among others. In fact, FIT constitutes one side of a triangle (which from now on is closed) and it consists of the original signal in time, spectral analysis based on Fourier Theory and FIT. Thus a toolbox is completed, which it is essential for all applications of Digital Signal Processing (DSP) and Digital Image Processing (DIP); and, even, in the latter, FIT allows edge detection (which is called flank detection in case of signals), denoising, despeckling, compression, and superresolution of still images. Such applications include signals intelligence and imagery intelligence. On the other hand, we will present other DIP tools, which are also derived from the Schrodinger equation.
Tasks	Denoising, Edge Detection
Published	2016-10-11
URL	https://arxiv.org/abs/1611.02302v7
PDF	https://arxiv.org/pdf/1611.02302v7.pdf
PWC	https://paperswithcode.com/paper/quantum-spectral-analysis-frequency-in-time
Repo
Framework

Fractional Calculus In Image Processing: A Review


Title	Fractional Calculus In Image Processing: A Review
Authors	Qi Yang, Dali Chen, Tiebiao Zhao, YangQuan Chen
Abstract	Over the last decade, it has been demonstrated that many systems in science and engineering can be modeled more accurately by fractional-order than integer-order derivatives, and many methods are developed to solve the problem of fractional systems. Due to the extra free parameter order, fractional-order based methods provide additional degree of freedom in optimization performance. Not surprisingly, many fractional-order based methods have been used in image processing field. Herein recent studies are reviewed in ten sub-fields, which include image enhancement, image denoising, image edge detection, image segmentation, image registration, image recognition, image fusion, image encryption, image compression and image restoration. In sum, it is well proved that as a fundamental mathematic tool, fractional-order derivative shows great success in image processing.
Tasks	Denoising, Edge Detection, Image Compression, Image Denoising, Image Enhancement, Image Registration, Image Restoration, Semantic Segmentation
Published	2016-08-10
URL	http://arxiv.org/abs/1608.03240v1
PDF	http://arxiv.org/pdf/1608.03240v1.pdf
PWC	https://paperswithcode.com/paper/fractional-calculus-in-image-processing-a
Repo
Framework

Semantically Guided Depth Upsampling


Title	Semantically Guided Depth Upsampling
Authors	Nick Schneider, Lukas Schneider, Peter Pinggera, Uwe Franke, Marc Pollefeys, Christoph Stiller
Abstract	We present a novel method for accurate and efficient up- sampling of sparse depth data, guided by high-resolution imagery. Our approach goes beyond the use of intensity cues only and additionally exploits object boundary cues through structured edge detection and semantic scene labeling for guidance. Both cues are combined within a geodesic distance measure that allows for boundary-preserving depth in- terpolation while utilizing local context. We model the observed scene structure by locally planar elements and formulate the upsampling task as a global energy minimization problem. Our method determines glob- ally consistent solutions and preserves fine details and sharp depth bound- aries. In our experiments on several public datasets at different levels of application, we demonstrate superior performance of our approach over the state-of-the-art, even for very sparse measurements.
Tasks	Edge Detection, Scene Labeling
Published	2016-08-02
URL	http://arxiv.org/abs/1608.00753v1
PDF	http://arxiv.org/pdf/1608.00753v1.pdf
PWC	https://paperswithcode.com/paper/semantically-guided-depth-upsampling
Repo
Framework

Poor starting points in machine learning


Title	Poor starting points in machine learning
Authors	Mark Tygert
Abstract	Poor (even random) starting points for learning/training/optimization are common in machine learning. In many settings, the method of Robbins and Monro (online stochastic gradient descent) is known to be optimal for good starting points, but may not be optimal for poor starting points – indeed, for poor starting points Nesterov acceleration can help during the initial iterations, even though Nesterov methods not designed for stochastic approximation could hurt during later iterations. The common practice of training with nontrivial minibatches enhances the advantage of Nesterov acceleration.
Tasks
Published	2016-02-09
URL	http://arxiv.org/abs/1602.02823v1
PDF	http://arxiv.org/pdf/1602.02823v1.pdf
PWC	https://paperswithcode.com/paper/poor-starting-points-in-machine-learning
Repo
Framework

Causality and Responsibility for Formal Verification and Beyond


Title	Causality and Responsibility for Formal Verification and Beyond
Authors	Hana Chockler
Abstract	The theory of actual causality, defined by Halpern and Pearl, and its quantitative measure - the degree of responsibility - was shown to be extremely useful in various areas of computer science due to a good match between the results it produces and our intuition. In this paper, I describe the applications of causality to formal verification, namely, explanation of counterexamples, refinement of coverage metrics, and symbolic trajectory evaluation. I also briefly discuss recent applications of causality to legal reasoning.
Tasks
Published	2016-08-29
URL	http://arxiv.org/abs/1608.07879v1
PDF	http://arxiv.org/pdf/1608.07879v1.pdf
PWC	https://paperswithcode.com/paper/causality-and-responsibility-for-formal
Repo
Framework

Semantics derived automatically from language corpora contain human-like biases


Title	Semantics derived automatically from language corpora contain human-like biases
Authors	Aylin Caliskan, Joanna J. Bryson, Arvind Narayanan
Abstract	Artificial intelligence and machine learning are in a period of astounding growth. However, there are concerns that these technologies may be used, either with or without intention, to perpetuate the prejudice and unfairness that unfortunately characterizes many human institutions. Here we show for the first time that human-like semantic biases result from the application of standard machine learning to ordinary language—the same sort of language humans are exposed to every day. We replicate a spectrum of standard human biases as exposed by the Implicit Association Test and other well-known psychological studies. We replicate these using a widely used, purely statistical machine-learning model—namely, the GloVe word embedding—trained on a corpus of text from the Web. Our results indicate that language itself contains recoverable and accurate imprints of our historic biases, whether these are morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the {\em status quo} for the distribution of gender with respect to careers or first names. These regularities are captured by machine learning along with the rest of semantics. In addition to our empirical findings concerning language, we also contribute new methods for evaluating bias in text, the Word Embedding Association Test (WEAT) and the Word Embedding Factual Association Test (WEFAT). Our results have implications not only for AI and machine learning, but also for the fields of psychology, sociology, and human ethics, since they raise the possibility that mere exposure to everyday language can account for the biases we replicate here.
Tasks
Published	2016-08-25
URL	http://arxiv.org/abs/1608.07187v4
PDF	http://arxiv.org/pdf/1608.07187v4.pdf
PWC	https://paperswithcode.com/paper/semantics-derived-automatically-from-language
Repo
Framework

Clustering Via Crowdsourcing


Title	Clustering Via Crowdsourcing
Authors	Arya Mazumdar, Barna Saha
Abstract	In recent years, crowdsourcing, aka human aided computation has emerged as an effective platform for solving problems that are considered complex for machines alone. Using human is time-consuming and costly due to monetary compensations. Therefore, a crowd based algorithm must judiciously use any information computed through an automated process, and ask minimum number of questions to the crowd adaptively. One such problem which has received significant attention is {\em entity resolution}. Formally, we are given a graph $G=(V,E)$ with unknown edge set $E$ where $G$ is a union of $k$ (again unknown, but typically large $O(n^\alpha)$, for $\alpha>0$) disjoint cliques $G_i(V_i, E_i)$, $i =1, \dots, k$. The goal is to retrieve the sets $V_i$s by making minimum number of pair-wise queries $V \times V\to{\pm1}$ to an oracle (the crowd). When the answer to each query is correct, e.g. via resampling, then this reduces to finding connected components in a graph. On the other hand, when crowd answers may be incorrect, it corresponds to clustering over minimum number of noisy inputs. Even, with perfect answers, a simple lower and upper bound of $\Theta(nk)$ on query complexity can be shown. A major contribution of this paper is to reduce the query complexity to linear or even sublinear in $n$ when mild side information is provided by a machine, and even in presence of crowd errors which are not correctable via resampling. We develop new information theoretic lower bounds on the query complexity of clustering with side information and errors, and our upper bounds closely match with them. Our algorithms are naturally parallelizable, and also give near-optimal bounds on the number of adaptive rounds required to match the query complexity.
Tasks	Entity Resolution
Published	2016-04-07
URL	http://arxiv.org/abs/1604.01839v1
PDF	http://arxiv.org/pdf/1604.01839v1.pdf
PWC	https://paperswithcode.com/paper/clustering-via-crowdsourcing
Repo
Framework

Fast Learning from Distributed Datasets without Entity Matching


Title	Fast Learning from Distributed Datasets without Entity Matching
Authors	Giorgio Patrini, Richard Nock, Stephen Hardy, Tiberio Caetano
Abstract	Consider the following data fusion scenario: two datasets/peers contain the same real-world entities described using partially shared features, e.g. banking and insurance company records of the same customer base. Our goal is to learn a classifier in the cross product space of the two domains, in the hard case in which no shared ID is available – e.g. due to anonymization. Traditionally, the problem is approached by first addressing entity matching and subsequently learning the classifier in a standard manner. We present an end-to-end solution which bypasses matching entities, based on the recently introduced concept of Rademacher observations (rados). Informally, we replace the minimisation of a loss over examples, which requires to solve entity resolution, by the equivalent minimisation of a (different) loss over rados. Among others, key properties we show are (i) a potentially huge subset of these rados does not require to perform entity matching, and (ii) the algorithm that provably minimizes the rado loss over these rados has time and space complexities smaller than the algorithm minimizing the equivalent example loss. Last, we relax a key assumption of the model, that the data is vertically partitioned among peers — in this case, we would not even know the existence of a solution to entity resolution. In this more general setting, experiments validate the possibility of significantly beating even the optimal peer in hindsight.
Tasks	Entity Resolution
Published	2016-03-13
URL	http://arxiv.org/abs/1603.04002v1
PDF	http://arxiv.org/pdf/1603.04002v1.pdf
PWC	https://paperswithcode.com/paper/fast-learning-from-distributed-datasets
Repo
Framework

ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution


Title	ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
Authors	Zeinab Bahmani, Leopoldo Bertossi, Nikolaos Vasiloglou
Abstract	Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called “matching dependencies” (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating four components of ER: (a) Building a classifier for duplicate/non-duplicate record pairs built using machine learning (ML) techniques; (b) Use of MDs for supporting the blocking phase of ML; (c) Record merging on the basis of the classifier results; and (d) The use of the declarative language “LogiQL” -an extended form of Datalog supported by the “LogicBlox” platform- for all activities related to data processing, and the specification and enforcement of MDs.
Tasks	Entity Resolution
Published	2016-02-07
URL	http://arxiv.org/abs/1602.02334v3
PDF	http://arxiv.org/pdf/1602.02334v3.pdf
PWC	https://paperswithcode.com/paper/erblox-combining-matching-dependencies-with
Repo
Framework

Reduced Space and Faster Convergence in Imperfect-Information Games via Regret-Based Pruning


Title	Reduced Space and Faster Convergence in Imperfect-Information Games via Regret-Based Pruning
Authors	Noam Brown, Tuomas Sandholm
Abstract	Counterfactual Regret Minimization (CFR) is the most popular iterative algorithm for solving zero-sum imperfect-information games. Regret-Based Pruning (RBP) is an improvement that allows poorly-performing actions to be temporarily pruned, thus speeding up CFR. We introduce Total RBP, a new form of RBP that reduces the space requirements of CFR as actions are pruned. We prove that in zero-sum games it asymptotically prunes any action that is not part of a best response to some Nash equilibrium. This leads to provably faster convergence and lower space requirements. Experiments show that Total RBP results in an order of magnitude reduction in space, and the reduction factor increases with game size.
Tasks
Published	2016-09-12
URL	http://arxiv.org/abs/1609.03234v1
PDF	http://arxiv.org/pdf/1609.03234v1.pdf
PWC	https://paperswithcode.com/paper/reduced-space-and-faster-convergence-in-1
Repo
Framework

Mapping Data to Ontologies with Exceptions Using Answer Set Programming


Title	Mapping Data to Ontologies with Exceptions Using Answer Set Programming
Authors	Daniel P. Lupp, Evgenij Thorstensen
Abstract	In ontology-based data access, databases are connected to an ontology via mappings from queries over the database to queries over the ontology. In this paper, we consider mappings from relational databases to first-order ontologies, and define an ASP-based framework for GLAV mappings with queries over the ontology in the mapping rule bodies. We show that this type of mappings can be used to express constraints and exceptions, as well as being a powerful mechanism for succinctly representing OBDA mappings. We give an algorithm for brave reasoning in this setting, and show that this problem has either the same data complexity as ASP (NP- complete), or it is at least as hard as the complexity of checking entailment for the ontology queries. Furthermore, we show that for ontologies with UCQ-rewritable queries there exists a natural reduction from mapping programs to \exists-ASP, an extension of ASP with existential variables that itself admits a natural reduction to ASP.
Tasks
Published	2016-07-07
URL	http://arxiv.org/abs/1607.02018v1
PDF	http://arxiv.org/pdf/1607.02018v1.pdf
PWC	https://paperswithcode.com/paper/mapping-data-to-ontologies-with-exceptions
Repo
Framework