Paper Group ANR 348
A Stable Cardinality Distance for Topological Classification. High-dimensional estimation via sum-of-squares proofs. Robust and Resource Efficient Identification of Shallow Neural Networks by Fewest Samples. Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes. Stress-Testing Neural Models of Natur …
A Stable Cardinality Distance for Topological Classification
Title | A Stable Cardinality Distance for Topological Classification |
Authors | Vasileios Maroulas, Cassie Putman Micucci, Adam Spannaus |
Abstract | This work incorporates topological features via persistence diagrams to classify point cloud data arising from materials science. Persistence diagrams are multisets summarizing the connectedness and holes of given data. A new distance on the space of persistence diagrams generates relevant input features for a classification algorithm for materials science data. This distance measures the similarity of persistence diagrams using the cost of matching points and a regularization term corresponding to cardinality differences between diagrams. Establishing stability properties of this distance provides theoretical justification for the use of the distance in comparisons of such diagrams. The classification scheme succeeds in determining the crystal structure of materials on noisy and sparse data retrieved from synthetic atom probe tomography experiments. |
Tasks | |
Published | 2018-12-04 |
URL | https://arxiv.org/abs/1812.01664v2 |
https://arxiv.org/pdf/1812.01664v2.pdf | |
PWC | https://paperswithcode.com/paper/a-stable-cardinality-distance-for-topological |
Repo | |
Framework | |
High-dimensional estimation via sum-of-squares proofs
Title | High-dimensional estimation via sum-of-squares proofs |
Authors | Prasad Raghavendra, Tselil Schramm, David Steurer |
Abstract | Estimation is the computational task of recovering a hidden parameter $x$ associated with a distribution $D_x$, given a measurement $y$ sampled from the distribution. High dimensional estimation problems arise naturally in statistics, machine learning, and complexity theory. Many high dimensional estimation problems can be formulated as systems of polynomial equations and inequalities, and thus give rise to natural probability distributions over polynomial systems. Sum-of-squares proofs provide a powerful framework to reason about polynomial systems, and further there exist efficient algorithms to search for low-degree sum-of-squares proofs. Understanding and characterizing the power of sum-of-squares proofs for estimation problems has been a subject of intense study in recent years. On one hand, there is a growing body of work utilizing sum-of-squares proofs for recovering solutions to polynomial systems when the system is feasible. On the other hand, a general technique referred to as pseudocalibration has been developed towards showing lower bounds on the degree of sum-of-squares proofs. Finally, the existence of sum-of-squares refutations of a polynomial system has been shown to be intimately connected to the existence of spectral algorithms. In this article we survey these developments. |
Tasks | |
Published | 2018-07-30 |
URL | https://arxiv.org/abs/1807.11419v2 |
https://arxiv.org/pdf/1807.11419v2.pdf | |
PWC | https://paperswithcode.com/paper/high-dimensional-estimation-via-sum-of |
Repo | |
Framework | |
Robust and Resource Efficient Identification of Shallow Neural Networks by Fewest Samples
Title | Robust and Resource Efficient Identification of Shallow Neural Networks by Fewest Samples |
Authors | Massimo Fornasier, Jan Vybíral, Ingrid Daubechies |
Abstract | We address the structure identification and the uniform approximation of sums of ridge functions $f(x)=\sum_{i=1}^m g_i(a_i\cdot x)$ on ${\mathbb R}^d$, representing a general form of a shallow feed-forward neural network, from a small number of query samples. Higher order differentiation, as used in our constructive approximations, of sums of ridge functions or of their compositions, as in deeper neural network, yields a natural connection between neural network weight identification and tensor product decomposition identification. In the case of the shallowest feed-forward neural network, second order differentiation and tensors of order two (i.e., matrices) suffice as we prove in this paper. We use two sampling schemes to perform approximate differentiation - active sampling, where the sampling points are universal, actively, and randomly designed, and passive sampling, where sampling points were preselected at random from a distribution with known density. Based on multiple gathered approximated first and second order differentials, our general approximation strategy is developed as a sequence of algorithms to perform individual sub-tasks. We first perform an active subspace search by approximating the span of the weight vectors $a_1,\dots,a_m$. Then we use a straightforward substitution, which reduces the dimensionality of the problem from $d$ to $m$. The core of the construction is then the stable and efficient approximation of weights expressed in terms of rank-$1$ matrices $a_i \otimes a_i$, realized by formulating their individual identification as a suitable nonlinear program. We prove the successful identification by this program of weight vectors being close to orthonormal and we also show how we can costructively reduce to this case by a whitening procedure, without loss of any generality. |
Tasks | |
Published | 2018-04-04 |
URL | http://arxiv.org/abs/1804.01592v2 |
http://arxiv.org/pdf/1804.01592v2.pdf | |
PWC | https://paperswithcode.com/paper/identification-of-shallow-neural-networks-by |
Repo | |
Framework | |
Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes
Title | Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes |
Authors | Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach |
Abstract | We consider stochastic gradient descent (SGD) for least-squares regression with potentially several passes over the data. While several passes have been widely reported to perform practically better in terms of predictive performance on unseen data, the existing theoretical analysis of SGD suggests that a single pass is statistically optimal. While this is true for low-dimensional easy problems, we show that for hard problems, multiple passes lead to statistically optimal predictions while single pass does not; we also show that in these hard models, the optimal number of passes over the data increases with sample size. In order to define the notion of hardness and show that our predictive performances are optimal, we consider potentially infinite-dimensional models and notions typically associated to kernel methods, namely, the decay of eigenvalues of the covariance matrix of the features and the complexity of the optimal predictor as measured through the covariance matrix. We illustrate our results on synthetic experiments with non-linear kernel methods and on a classical benchmark with a linear model. |
Tasks | |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.10074v3 |
http://arxiv.org/pdf/1805.10074v3.pdf | |
PWC | https://paperswithcode.com/paper/statistical-optimality-of-stochastic-gradient |
Repo | |
Framework | |
Stress-Testing Neural Models of Natural Language Inference with Multiply-Quantified Sentences
Title | Stress-Testing Neural Models of Natural Language Inference with Multiply-Quantified Sentences |
Authors | Atticus Geiger, Ignacio Cases, Lauri Karttunen, Christopher Potts |
Abstract | Standard evaluations of deep learning models for semantics using naturalistic corpora are limited in what they can tell us about the fidelity of the learned representations, because the corpora rarely come with good measures of semantic complexity. To overcome this limitation, we present a method for generating data sets of multiply-quantified natural language inference (NLI) examples in which semantic complexity can be precisely characterized, and we use this method to show that a variety of common architectures for NLI inevitably fail to encode crucial information; only a model with forced lexical alignments avoids this damaging information loss. |
Tasks | Natural Language Inference |
Published | 2018-10-30 |
URL | http://arxiv.org/abs/1810.13033v1 |
http://arxiv.org/pdf/1810.13033v1.pdf | |
PWC | https://paperswithcode.com/paper/stress-testing-neural-models-of-natural |
Repo | |
Framework | |
Context, Attention and Audio Feature Explorations for Audio Visual Scene-Aware Dialog
Title | Context, Attention and Audio Feature Explorations for Audio Visual Scene-Aware Dialog |
Authors | Shachi H Kumar, Eda Okur, Saurav Sahay, Juan Jose Alvarado Leanos, Jonathan Huang, Lama Nachman |
Abstract | With the recent advancements in AI, Intelligent Virtual Assistants (IVA) have become a ubiquitous part of every home. Going forward, we are witnessing a confluence of vision, speech and dialog system technologies that are enabling the IVAs to learn audio-visual groundings of utterances and have conversations with users about the objects, activities and events surrounding them. As a part of the 7th Dialog System Technology Challenges (DSTC7), for Audio Visual Scene-Aware Dialog (AVSD) track, We explore `topics’ of the dialog as an important contextual feature into the architecture along with explorations around multimodal Attention. We also incorporate an end-to-end audio classification ConvNet, AclNet, into our models. We present detailed analysis of the experiments and show that some of our model variations outperform the baseline system presented for this task. | |
Tasks | Audio Classification |
Published | 2018-12-20 |
URL | http://arxiv.org/abs/1812.08407v1 |
http://arxiv.org/pdf/1812.08407v1.pdf | |
PWC | https://paperswithcode.com/paper/context-attention-and-audio-feature |
Repo | |
Framework | |
Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates
Title | Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates |
Authors | Arnulf Jentzen, Philippe von Wurstemberger |
Abstract | The stochastic gradient descent (SGD) optimization algorithm plays a central role in a series of machine learning applications. The scientific literature provides a vast amount of upper error bounds for the SGD method. Much less attention as been paid to proving lower error bounds for the SGD method. It is the key contribution of this paper to make a step in this direction. More precisely, in this article we establish for every $\gamma, \nu \in (0,\infty)$ essentially matching lower and upper bounds for the mean square error of the SGD process with learning rates $(\frac{\gamma}{n^\nu})_{n \in \mathbb{N}}$ associated to a simple quadratic stochastic optimization problem. This allows us to precisely quantify the mean square convergence rate of the SGD method in dependence on the asymptotic behavior of the learning rates. |
Tasks | Stochastic Optimization |
Published | 2018-03-22 |
URL | http://arxiv.org/abs/1803.08600v1 |
http://arxiv.org/pdf/1803.08600v1.pdf | |
PWC | https://paperswithcode.com/paper/lower-error-bounds-for-the-stochastic |
Repo | |
Framework | |
Monte Carlo Syntax Marginals for Exploring and Using Dependency Parses
Title | Monte Carlo Syntax Marginals for Exploring and Using Dependency Parses |
Authors | Katherine A. Keith, Su Lin Blodgett, Brendan O’Connor |
Abstract | Dependency parsing research, which has made significant gains in recent years, typically focuses on improving the accuracy of single-tree predictions. However, ambiguity is inherent to natural language syntax, and communicating such ambiguity is important for error analysis and better-informed downstream applications. In this work, we propose a transition sampling algorithm to sample from the full joint distribution of parse trees defined by a transition-based parsing model, and demonstrate the use of the samples in probabilistic dependency analysis. First, we define the new task of dependency path prediction, inferring syntactic substructures over part of a sentence, and provide the first analysis of performance on this task. Second, we demonstrate the usefulness of our Monte Carlo syntax marginal method for parser error analysis and calibration. Finally, we use this method to propagate parse uncertainty to two downstream information extraction applications: identifying persons killed by police and semantic role assignment. |
Tasks | Calibration, Dependency Parsing |
Published | 2018-04-17 |
URL | http://arxiv.org/abs/1804.06004v1 |
http://arxiv.org/pdf/1804.06004v1.pdf | |
PWC | https://paperswithcode.com/paper/monte-carlo-syntax-marginals-for-exploring |
Repo | |
Framework | |
Training dynamically balanced excitatory-inhibitory networks
Title | Training dynamically balanced excitatory-inhibitory networks |
Authors | Alessandro Ingrosso, L. F. Abbott |
Abstract | The construction of biologically plausible models of neural circuits is crucial for understanding the computational properties of the nervous system. Constructing functional networks composed of separate excitatory and inhibitory neurons obeying Dale’s law presents a number of challenges. We show how a target-based approach, when combined with a fast online constrained optimization technique, is capable of building functional models of rate and spiking recurrent neural networks in which excitation and inhibition are balanced. Balanced networks can be trained to produce complicated temporal patterns and to solve input-output tasks while retaining biologically desirable features such as Dale’s law and response variability. |
Tasks | |
Published | 2018-12-29 |
URL | http://arxiv.org/abs/1812.11424v1 |
http://arxiv.org/pdf/1812.11424v1.pdf | |
PWC | https://paperswithcode.com/paper/training-dynamically-balanced-excitatory |
Repo | |
Framework | |
Formal Modelling of Ontologies : An Event-B based Approach Using the Rodin Platform
Title | Formal Modelling of Ontologies : An Event-B based Approach Using the Rodin Platform |
Authors | Yamine Ait Ameur, Idir Ait Sadoune, Kahina Hacid, Linda Mohand Oussaid |
Abstract | This paper reports on the results of the French ANR IMPEX research project dealing with making explicit domain knowledge in design models. Ontologies are formalised as theories with sets, axioms, theorems and reasoning rules. They are integrated to design models through an annotation mechanism. Event-B has been chosen as the ground formal modelling technique for all our developments. In this paper, we particularly describe how ontologies are formalised as Event-B theories. |
Tasks | |
Published | 2018-05-15 |
URL | http://arxiv.org/abs/1805.05518v1 |
http://arxiv.org/pdf/1805.05518v1.pdf | |
PWC | https://paperswithcode.com/paper/formal-modelling-of-ontologies-an-event-b |
Repo | |
Framework | |
Deep Learning in Information Security
Title | Deep Learning in Information Security |
Authors | Stefan Thaler, Vlado Menkovski, Milan Petkovic |
Abstract | Machine learning has a long tradition of helping to solve complex information security problems that are difficult to solve manually. Machine learning techniques learn models from data representations to solve a task. These data representations are hand-crafted by domain experts. Deep Learning is a sub-field of machine learning, which uses models that are composed of multiple layers. Consequently, representations that are used to solve a task are learned from the data instead of being manually designed. In this survey, we study the use of DL techniques within the domain of information security. We systematically reviewed 77 papers and presented them from a data-centric perspective. This data-centric perspective reflects one of the most crucial advantages of DL techniques – domain independence. If DL-methods succeed to solve problems on a data type in one domain, they most likely will also succeed on similar data from another domain. Other advantages of DL methods are unrivaled scalability and efficiency, both regarding the number of examples that can be analyzed as well as with respect of dimensionality of the input data. DL methods generally are capable of achieving high-performance and generalize well. However, information security is a domain with unique requirements and challenges. Based on an analysis of our reviewed papers, we point out shortcomings of DL-methods to those requirements and discuss further research opportunities. |
Tasks | |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04332v1 |
http://arxiv.org/pdf/1809.04332v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-in-information-security |
Repo | |
Framework | |
Enhanced Ensemble Clustering via Fast Propagation of Cluster-wise Similarities
Title | Enhanced Ensemble Clustering via Fast Propagation of Cluster-wise Similarities |
Authors | Dong Huang, Chang-Dong Wang, Hongxing Peng, Jianhuang Lai, Chee-Keong Kwoh |
Abstract | Ensemble clustering has been a popular research topic in data mining and machine learning. Despite its significant progress in recent years, there are still two challenging issues in the current ensemble clustering research. First, most of the existing algorithms tend to investigate the ensemble information at the object-level, yet often lack the ability to explore the rich information at higher levels of granularity. Second, they mostly focus on the direct connections (e.g., direct intersection or pair-wise co-occurrence) in the multiple base clusterings, but generally neglect the multi-scale indirect relationship hidden in them. To address these two issues, this paper presents a novel ensemble clustering approach based on fast propagation of cluster-wise similarities via random walks. We first construct a cluster similarity graph with the base clusters treated as graph nodes and the cluster-wise Jaccard coefficient exploited to compute the initial edge weights. Upon the constructed graph, a transition probability matrix is defined, based on which the random walk process is conducted to propagate the graph structural information. Specifically, by investigating the propagating trajectories starting from different nodes, a new cluster-wise similarity matrix can be derived by considering the trajectory relationship. Then, the newly obtained cluster-wise similarity matrix is mapped from the cluster-level to the object-level to achieve an enhanced co-association (ECA) matrix, which is able to simultaneously capture the object-wise co-occurrence relationship as well as the multi-scale cluster-wise relationship in ensembles. Finally, two novel consensus functions are proposed to obtain the consensus clustering result. Extensive experiments on a variety of real-world datasets have demonstrated the effectiveness and efficiency of our approach. |
Tasks | |
Published | 2018-10-30 |
URL | http://arxiv.org/abs/1810.12544v1 |
http://arxiv.org/pdf/1810.12544v1.pdf | |
PWC | https://paperswithcode.com/paper/enhanced-ensemble-clustering-via-fast |
Repo | |
Framework | |
ProFlow: Learning to Predict Optical Flow
Title | ProFlow: Learning to Predict Optical Flow |
Authors | Daniel Maurer, Andrés Bruhn |
Abstract | Temporal coherence is a valuable source of information in the context of optical flow estimation. However, finding a suitable motion model to leverage this information is a non-trivial task. In this paper we propose an unsupervised online learning approach based on a convolutional neural network (CNN) that estimates such a motion model individually for each frame. By relating forward and backward motion these learned models not only allow to infer valuable motion information based on the backward flow, they also help to improve the performance at occlusions, where a reliable prediction is particularly useful. Moreover, our learned models are spatially variant and hence allow to estimate non-rigid motion per construction. This, in turns, allows to overcome the major limitation of recent rigidity-based approaches that seek to improve the estimation by incorporating additional stereo/SfM constraints. Experiments demonstrate the usefulness of our new approach. They not only show a consistent improvement of up to 27% for all major benchmarks (KITTI 2012, KITTI 2015, MPI Sintel) compared to a baseline without prediction, they also show top results for the MPI Sintel benchmark – the one of the three benchmarks that contains the largest amount of non-rigid motion. |
Tasks | Optical Flow Estimation |
Published | 2018-06-03 |
URL | http://arxiv.org/abs/1806.00800v1 |
http://arxiv.org/pdf/1806.00800v1.pdf | |
PWC | https://paperswithcode.com/paper/proflow-learning-to-predict-optical-flow |
Repo | |
Framework | |
Solving Non-identifiable Latent Feature Models
Title | Solving Non-identifiable Latent Feature Models |
Authors | Ryota Suzuki, Shingo Takahashi, Murtuza Petladwala, Shigeru Kohmoto |
Abstract | Latent feature models (LFM)s are widely employed for extracting latent structures of data. While offering high, parameter estimation is difficult with LFMs because of the combinational nature of latent features, and non-identifiability is a particularly difficult problem when parameter estimation is not unique and there exists equivalent solutions. In this paper, a necessary and sufficient condition for non-identifiability is shown. The condition is significantly related to dependency of features, and this implies that non-identifiability may often occur in real-world applications. A novel method for parameter estimation that solves the non-identifiability problem is also proposed. This method can be combined as a post-process with existing methods and can find an appropriate solution by hopping efficiently through equivalent solutions. We have evaluated the effectiveness of the method on both synthetic and real-world datasets. |
Tasks | |
Published | 2018-09-11 |
URL | http://arxiv.org/abs/1809.03776v2 |
http://arxiv.org/pdf/1809.03776v2.pdf | |
PWC | https://paperswithcode.com/paper/solving-non-identifiable-latent-feature |
Repo | |
Framework | |
Conditional Graph Neural Processes: A Functional Autoencoder Approach
Title | Conditional Graph Neural Processes: A Functional Autoencoder Approach |
Authors | Marcel Nassar, Xin Wang, Evren Tumer |
Abstract | We introduce a novel encoder-decoder architecture to embed functional processes into latent vector spaces. This embedding can then be decoded to sample the encoded functions over any arbitrary domain. This autoencoder generalizes the recently introduced Conditional Neural Process (CNP) model of random processes. Our architecture employs the latest advances in graph neural networks to process irregularly sampled functions. Thus, we refer to our model as Conditional Graph Neural Process (CGNP). Graph neural networks can effectively exploit `local’ structures of the metric spaces over which the functions/processes are defined. The contributions of this paper are twofold: (i) a novel graph-based encoder-decoder architecture for functional and process embeddings, and (ii) a demonstration of the importance of using the structure of metric spaces for this type of representations. | |
Tasks | |
Published | 2018-12-13 |
URL | http://arxiv.org/abs/1812.05212v1 |
http://arxiv.org/pdf/1812.05212v1.pdf | |
PWC | https://paperswithcode.com/paper/conditional-graph-neural-processes-a |
Repo | |
Framework | |