May 5, 2019

2675 words 13 mins read

Paper Group ANR 489

No bad local minima: Data independent training error guarantees for multilayer neural networks. Learning Gaussian Graphical Models With Fractional Marginal Pseudo-likelihood. Accelerating Deep Learning with Shrinkage and Recall. Neural Coarse-Graining: Extracting slowly-varying latent degrees of freedom with neural networks. Correct classification …

No bad local minima: Data independent training error guarantees for multilayer neural networks


Title	No bad local minima: Data independent training error guarantees for multilayer neural networks
Authors	Daniel Soudry, Yair Carmon
Abstract	We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima. Specifically, we examine MNNs with piecewise linear activation functions, quadratic loss and a single output, under mild over-parametrization. We prove that for a MNN with one hidden layer, the training error is zero at every differentiable local minimum, for almost every dataset and dropout-like noise realization. We then extend these results to the case of more than one hidden layer. Our theoretical guarantees assume essentially nothing on the training data, and are verified numerically. These results suggest why the highly non-convex loss of such MNNs can be easily optimized using local updates (e.g., stochastic gradient descent), as observed empirically.
Tasks
Published	2016-05-26
URL	http://arxiv.org/abs/1605.08361v2
PDF	http://arxiv.org/pdf/1605.08361v2.pdf
PWC	https://paperswithcode.com/paper/no-bad-local-minima-data-independent-training
Repo
Framework

Learning Gaussian Graphical Models With Fractional Marginal Pseudo-likelihood


Title	Learning Gaussian Graphical Models With Fractional Marginal Pseudo-likelihood
Authors	Janne Leppä-aho, Johan Pensar, Teemu Roos, Jukka Corander
Abstract	We propose a Bayesian approximate inference method for learning the dependence structure of a Gaussian graphical model. Using pseudo-likelihood, we derive an analytical expression to approximate the marginal likelihood for an arbitrary graph structure without invoking any assumptions about decomposability. The majority of the existing methods for learning Gaussian graphical models are either restricted to decomposable graphs or require specification of a tuning parameter that may have a substantial impact on learned structures. By combining a simple sparsity inducing prior for the graph structures with a default reference prior for the model parameters, we obtain a fast and easily applicable scoring function that works well for even high-dimensional data. We demonstrate the favourable performance of our approach by large-scale comparisons against the leading methods for learning non-decomposable Gaussian graphical models. A theoretical justification for our method is provided by showing that it yields a consistent estimator of the graph structure.
Tasks
Published	2016-02-25
URL	http://arxiv.org/abs/1602.07863v1
PDF	http://arxiv.org/pdf/1602.07863v1.pdf
PWC	https://paperswithcode.com/paper/learning-gaussian-graphical-models-with-1
Repo
Framework

Accelerating Deep Learning with Shrinkage and Recall


Title	Accelerating Deep Learning with Shrinkage and Recall
Authors	Shuai Zheng, Abhinav Vishnu, Chris Ding
Abstract	Deep Learning is a very powerful machine learning model. Deep Learning trains a large number of parameters for multiple layers and is very slow when data is in large scale and the architecture size is large. Inspired from the shrinking technique used in accelerating computation of Support Vector Machines (SVM) algorithm and screening technique used in LASSO, we propose a shrinking Deep Learning with recall (sDLr) approach to speed up deep learning computation. We experiment shrinking Deep Learning with recall (sDLr) using Deep Neural Network (DNN), Deep Belief Network (DBN) and Convolution Neural Network (CNN) on 4 data sets. Results show that the speedup using shrinking Deep Learning with recall (sDLr) can reach more than 2.0 while still giving competitive classification performance.
Tasks
Published	2016-05-04
URL	http://arxiv.org/abs/1605.01369v2
PDF	http://arxiv.org/pdf/1605.01369v2.pdf
PWC	https://paperswithcode.com/paper/accelerating-deep-learning-with-shrinkage-and
Repo
Framework

Neural Coarse-Graining: Extracting slowly-varying latent degrees of freedom with neural networks


Title	Neural Coarse-Graining: Extracting slowly-varying latent degrees of freedom with neural networks
Authors	Nicholas Guttenberg, Martin Biehl, Ryota Kanai
Abstract	We present a loss function for neural networks that encompasses an idea of trivial versus non-trivial predictions, such that the network jointly determines its own prediction goals and learns to satisfy them. This permits the network to choose sub-sets of a problem which are most amenable to its abilities to focus on solving, while discarding ‘distracting’ elements that interfere with its learning. To do this, the network first transforms the raw data into a higher-level categorical representation, and then trains a predictor from that new time series to its future. To prevent a trivial solution of mapping the signal to zero, we introduce a measure of non-triviality via a contrast between the prediction error of the learned model with a naive model of the overall signal statistics. The transform can learn to discard uninformative and unpredictable components of the signal in favor of the features which are both highly predictive and highly predictable. This creates a coarse-grained model of the time-series dynamics, focusing on predicting the slowly varying latent parameters which control the statistics of the time-series, rather than predicting the fast details directly. The result is a semi-supervised algorithm which is capable of extracting latent parameters, segmenting sections of time-series with differing statistics, and building a higher-level representation of the underlying dynamics from unlabeled data.
Tasks	Time Series
Published	2016-09-01
URL	http://arxiv.org/abs/1609.00116v1
PDF	http://arxiv.org/pdf/1609.00116v1.pdf
PWC	https://paperswithcode.com/paper/neural-coarse-graining-extracting-slowly
Repo
Framework

Correct classification for big/smart/fast data machine learning


Title	Correct classification for big/smart/fast data machine learning
Authors	Sander Stepanov
Abstract	Table (database) / Relational database Classification for big/smart/fast data machine learning is one of the most important tasks of predictive analytics and extracting valuable information from data. It is core applied technique for what now understood under data science and/or artificial intelligence. Widely used Decision Tree (Random Forest) and rare used rule based PRISM , VFST, etc classifiers are empirical substitutions of theoretically correct to use Boolean functions minimization. Developing Minimization of Boolean functions algorithms is started long time ago by Edward Veitch’s 1952. Since it, big efforts by wide scientific/industrial community was done to find feasible solution of Boolean functions minimization. In this paper we propose consider table data classification from mathematical point of view, as minimization of Boolean functions. It is shown that data representation may be transformed to Boolean functions form and how to use known algorithms. For simplicity, binary output function is used for development, what opens doors for multivalued outputs developments.
Tasks
Published	2016-09-27
URL	http://arxiv.org/abs/1609.08550v1
PDF	http://arxiv.org/pdf/1609.08550v1.pdf
PWC	https://paperswithcode.com/paper/correct-classification-for-bigsmartfast-data
Repo
Framework

Differentiable Programs with Neural Libraries


Title	Differentiable Programs with Neural Libraries
Authors	Alexander L. Gaunt, Marc Brockschmidt, Nate Kushman, Daniel Tarlow
Abstract	We develop a framework for combining differentiable programming languages with neural networks. Using this framework we create end-to-end trainable systems that learn to write interpretable algorithms with perceptual components. We explore the benefits of inductive biases for strong generalization and modularity that come from the program-like structure of our models. In particular, modularity allows us to learn a library of (neural) functions which grows and improves as more tasks are solved. Empirically, we show that this leads to lifelong learning systems that transfer knowledge to new tasks more effectively than baselines.
Tasks
Published	2016-11-07
URL	http://arxiv.org/abs/1611.02109v2
PDF	http://arxiv.org/pdf/1611.02109v2.pdf
PWC	https://paperswithcode.com/paper/differentiable-programs-with-neural-libraries
Repo
Framework

Appearance Harmonization for Single Image Shadow Removal


Title	Appearance Harmonization for Single Image Shadow Removal
Authors	Liqian Ma, Jue Wang, Eli Shechtman, Kalyan Sunkavalli, Shimin Hu
Abstract	Shadows often create unwanted artifacts in photographs, and removing them can be very challenging. Previous shadow removal methods often produce de-shadowed regions that are visually inconsistent with the rest of the image. In this work we propose a fully automatic shadow region harmonization approach that improves the appearance compatibility of the de-shadowed region as typically produced by previous methods. It is based on a shadow-guided patch-based image synthesis approach that reconstructs the shadow region using patches sampled from non-shadowed regions. The result is then refined based on the reconstruction confidence to handle unique image patterns. Many shadow removal results and comparisons are show the effectiveness of our improvement. Quantitative evaluation on a benchmark dataset suggests that our automatic shadow harmonization approach effectively improves upon the state-of-the-art.
Tasks	Image Generation, Image Shadow Removal
Published	2016-03-21
URL	http://arxiv.org/abs/1603.06398v1
PDF	http://arxiv.org/pdf/1603.06398v1.pdf
PWC	https://paperswithcode.com/paper/appearance-harmonization-for-single-image
Repo
Framework

Proceedings First International Workshop on Hammers for Type Theories


Title	Proceedings First International Workshop on Hammers for Type Theories
Authors	Jasmin Christian Blanchette, Cezary Kaliszyk
Abstract	This volume of EPTCS contains the proceedings of the First Workshop on Hammers for Type Theories (HaTT 2016), held on 1 July 2016 as part of the International Joint Conference on Automated Reasoning (IJCAR 2016) in Coimbra, Portugal. The proceedings contain four regular papers, as well as abstracts of the two invited talks by Pierre Corbineau (Verimag, France) and Aleksy Schubert (University of Warsaw, Poland).
Tasks
Published	2016-06-17
URL	http://arxiv.org/abs/1606.05427v1
PDF	http://arxiv.org/pdf/1606.05427v1.pdf
PWC	https://paperswithcode.com/paper/proceedings-first-international-workshop-on
Repo
Framework

Patterns of Scalable Bayesian Inference


Title	Patterns of Scalable Bayesian Inference
Authors	Elaine Angelino, Matthew James Johnson, Ryan P. Adams
Abstract	Datasets are growing not just in size but in complexity, creating a demand for rich models and quantification of uncertainty. Bayesian methods are an excellent fit for this demand, but scaling Bayesian inference is a challenge. In response to this challenge, there has been considerable recent work based on varying assumptions about model structure, underlying computational resources, and the importance of asymptotic correctness. As a result, there is a zoo of ideas with few clear overarching principles. In this paper, we seek to identify unifying principles, patterns, and intuitions for scaling Bayesian inference. We review existing work on utilizing modern computing resources with both MCMC and variational approximation techniques. From this taxonomy of ideas, we characterize the general principles that have proven successful for designing scalable inference procedures and comment on the path forward.
Tasks	Bayesian Inference
Published	2016-02-16
URL	http://arxiv.org/abs/1602.05221v2
PDF	http://arxiv.org/pdf/1602.05221v2.pdf
PWC	https://paperswithcode.com/paper/patterns-of-scalable-bayesian-inference
Repo
Framework

Generalizing Prototype Theory: A Formal Quantum Framework


Title	Generalizing Prototype Theory: A Formal Quantum Framework
Authors	Diederik Aerts, Jan Broekaert, Liane Gabora, Sandro Sozzo
Abstract	Theories of natural language and concepts have been unable to model the flexibility, creativity, context-dependence, and emergence, exhibited by words, concepts and their combinations. The mathematical formalism of quantum theory has instead been successful in capturing these phenomena such as graded membership, situational meaning, composition of categories, and also more complex decision making situations, which cannot be modeled in traditional probabilistic approaches. We show how a formal quantum approach to concepts and their combinations can provide a powerful extension of prototype theory. We explain how prototypes can interfere in conceptual combinations as a consequence of their contextual interactions, and provide an illustration of this using an intuitive wave-like diagram. This quantum-conceptual approach gives new life to original prototype theory, without however making it a privileged concept theory, as we explain at the end of our paper.
Tasks	Decision Making
Published	2016-01-25
URL	http://arxiv.org/abs/1601.06610v1
PDF	http://arxiv.org/pdf/1601.06610v1.pdf
PWC	https://paperswithcode.com/paper/generalizing-prototype-theory-a-formal
Repo
Framework

Confidence-Constrained Maximum Entropy Framework for Learning from Multi-Instance Data


Title	Confidence-Constrained Maximum Entropy Framework for Learning from Multi-Instance Data
Authors	Behrouz Behmardi, Forrest Briggs, Xiaoli Z. Fern, Raviv Raich
Abstract	Multi-instance data, in which each object (bag) contains a collection of instances, are widespread in machine learning, computer vision, bioinformatics, signal processing, and social sciences. We present a maximum entropy (ME) framework for learning from multi-instance data. In this approach each bag is represented as a distribution using the principle of ME. We introduce the concept of confidence-constrained ME (CME) to simultaneously learn the structure of distribution space and infer each distribution. The shared structure underlying each density is used to learn from instances inside each bag. The proposed CME is free of tuning parameters. We devise a fast optimization algorithm capable of handling large scale multi-instance data. In the experimental section, we evaluate the performance of the proposed approach in terms of exact rank recovery in the space of distributions and compare it with the regularized ME approach. Moreover, we compare the performance of CME with Multi-Instance Learning (MIL) state-of-the-art algorithms and show a comparable performance in terms of accuracy with reduced computational complexity.
Tasks
Published	2016-03-07
URL	http://arxiv.org/abs/1603.01901v1
PDF	http://arxiv.org/pdf/1603.01901v1.pdf
PWC	https://paperswithcode.com/paper/confidence-constrained-maximum-entropy
Repo
Framework

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations


Title	Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations
Authors	Behnam Neyshabur, Yuhuai Wu, Ruslan Salakhutdinov, Nathan Srebro
Abstract	We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, even with various recently suggested initialization schemes.
Tasks
Published	2016-05-23
URL	http://arxiv.org/abs/1605.07154v1
PDF	http://arxiv.org/pdf/1605.07154v1.pdf
PWC	https://paperswithcode.com/paper/path-normalized-optimization-of-recurrent
Repo
Framework

A Scalable and Robust Framework for Intelligent Real-time Video Surveillance


Title	A Scalable and Robust Framework for Intelligent Real-time Video Surveillance
Authors	Shreenath Dutt, Ankita Kalra
Abstract	In this paper, we present an intelligent, reliable and storage-efficient video surveillance system using Apache Storm and OpenCV. As a Storm topology, we have added multiple information extraction modules that only write important content to the disk. Our topology is extensible, capable of adding novel algorithms as per the use case without affecting the existing ones, since all the processing is independent of each other. This framework is also highly scalable and fault tolerant, which makes it a best option for organisations that need to monitor a large network of surveillance cameras.
Tasks
Published	2016-10-30
URL	http://arxiv.org/abs/1610.09590v1
PDF	http://arxiv.org/pdf/1610.09590v1.pdf
PWC	https://paperswithcode.com/paper/a-scalable-and-robust-framework-for
Repo
Framework

Word sense disambiguation: a complex network approach


Title	Word sense disambiguation: a complex network approach
Authors	Edilson A. Correa Jr., Alneu de Andrade Lopes, Diego R. Amancio
Abstract	In recent years, concepts and methods of complex networks have been employed to tackle the word sense disambiguation (WSD) task by representing words as nodes, which are connected if they are semantically similar. Despite the increasingly number of studies carried out with such models, most of them use networks just to represent the data, while the pattern recognition performed on the attribute space is performed using traditional learning techniques. In other words, the structural relationship between words have not been explicitly used in the pattern recognition process. In addition, only a few investigations have probed the suitability of representations based on bipartite networks and graphs (bigraphs) for the problem, as many approaches consider all possible links between words. In this context, we assess the relevance of a bipartite network model representing both feature words (i.e. the words characterizing the context) and target (ambiguous) words to solve ambiguities in written texts. Here, we focus on the semantical relationships between these two type of words, disregarding the relationships between feature words. In special, the proposed method not only serves to represent texts as graphs, but also constructs a structure on which the discrimination of senses is accomplished. Our results revealed that the proposed learning algorithm in such bipartite networks provides excellent results mostly when topical features are employed to characterize the context. Surprisingly, our method even outperformed the support vector machine algorithm in particular cases, with the advantage of being robust even if a small training dataset is available. Taken together, the results obtained here show that the proposed representation/classification method might be useful to improve the semantical characterization of written texts.
Tasks	Word Sense Disambiguation
Published	2016-06-25
URL	http://arxiv.org/abs/1606.07950v2
PDF	http://arxiv.org/pdf/1606.07950v2.pdf
PWC	https://paperswithcode.com/paper/word-sense-disambiguation-a-complex-network
Repo
Framework

FusionNet: 3D Object Classification Using Multiple Data Representations


Title	FusionNet: 3D Object Classification Using Multiple Data Representations
Authors	Vishakh Hegde, Reza Zadeh
Abstract	High-quality 3D object recognition is an important component of many vision and robotics systems. We tackle the object recognition problem using two data representations, to achieve leading results on the Princeton ModelNet challenge. The two representations: 1. Volumetric representation: the 3D object is discretized spatially as binary voxels - $1$ if the voxel is occupied and $0$ otherwise. 2. Pixel representation: the 3D object is represented as a set of projected 2D pixel images. Current leading submissions to the ModelNet Challenge use Convolutional Neural Networks (CNNs) on pixel representations. However, we diverge from this trend and additionally, use Volumetric CNNs to bridge the gap between the efficiency of the above two representations. We combine both representations and exploit them to learn new features, which yield a significantly better classifier than using either of the representations in isolation. To do this, we introduce new Volumetric CNN (V-CNN) architectures.
Tasks	3D Object Classification, 3D Object Recognition, Object Classification, Object Recognition
Published	2016-07-19
URL	http://arxiv.org/abs/1607.05695v4
PDF	http://arxiv.org/pdf/1607.05695v4.pdf
PWC	https://paperswithcode.com/paper/fusionnet-3d-object-classification-using
Repo
Framework