Paper Group ANR 489
No bad local minima: Data independent training error guarantees for multilayer neural networks. Learning Gaussian Graphical Models With Fractional Marginal Pseudo-likelihood. Accelerating Deep Learning with Shrinkage and Recall. Neural Coarse-Graining: Extracting slowly-varying latent degrees of freedom with neural networks. Correct classification …
No bad local minima: Data independent training error guarantees for multilayer neural networks
Title | No bad local minima: Data independent training error guarantees for multilayer neural networks |
Authors | Daniel Soudry, Yair Carmon |
Abstract | We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima. Specifically, we examine MNNs with piecewise linear activation functions, quadratic loss and a single output, under mild over-parametrization. We prove that for a MNN with one hidden layer, the training error is zero at every differentiable local minimum, for almost every dataset and dropout-like noise realization. We then extend these results to the case of more than one hidden layer. Our theoretical guarantees assume essentially nothing on the training data, and are verified numerically. These results suggest why the highly non-convex loss of such MNNs can be easily optimized using local updates (e.g., stochastic gradient descent), as observed empirically. |
Tasks | |
Published | 2016-05-26 |
URL | http://arxiv.org/abs/1605.08361v2 |
http://arxiv.org/pdf/1605.08361v2.pdf | |
PWC | https://paperswithcode.com/paper/no-bad-local-minima-data-independent-training |
Repo | |
Framework | |
Learning Gaussian Graphical Models With Fractional Marginal Pseudo-likelihood
Title | Learning Gaussian Graphical Models With Fractional Marginal Pseudo-likelihood |
Authors | Janne Leppä-aho, Johan Pensar, Teemu Roos, Jukka Corander |
Abstract | We propose a Bayesian approximate inference method for learning the dependence structure of a Gaussian graphical model. Using pseudo-likelihood, we derive an analytical expression to approximate the marginal likelihood for an arbitrary graph structure without invoking any assumptions about decomposability. The majority of the existing methods for learning Gaussian graphical models are either restricted to decomposable graphs or require specification of a tuning parameter that may have a substantial impact on learned structures. By combining a simple sparsity inducing prior for the graph structures with a default reference prior for the model parameters, we obtain a fast and easily applicable scoring function that works well for even high-dimensional data. We demonstrate the favourable performance of our approach by large-scale comparisons against the leading methods for learning non-decomposable Gaussian graphical models. A theoretical justification for our method is provided by showing that it yields a consistent estimator of the graph structure. |
Tasks | |
Published | 2016-02-25 |
URL | http://arxiv.org/abs/1602.07863v1 |
http://arxiv.org/pdf/1602.07863v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-gaussian-graphical-models-with-1 |
Repo | |
Framework | |
Accelerating Deep Learning with Shrinkage and Recall
Title | Accelerating Deep Learning with Shrinkage and Recall |
Authors | Shuai Zheng, Abhinav Vishnu, Chris Ding |
Abstract | Deep Learning is a very powerful machine learning model. Deep Learning trains a large number of parameters for multiple layers and is very slow when data is in large scale and the architecture size is large. Inspired from the shrinking technique used in accelerating computation of Support Vector Machines (SVM) algorithm and screening technique used in LASSO, we propose a shrinking Deep Learning with recall (sDLr) approach to speed up deep learning computation. We experiment shrinking Deep Learning with recall (sDLr) using Deep Neural Network (DNN), Deep Belief Network (DBN) and Convolution Neural Network (CNN) on 4 data sets. Results show that the speedup using shrinking Deep Learning with recall (sDLr) can reach more than 2.0 while still giving competitive classification performance. |
Tasks | |
Published | 2016-05-04 |
URL | http://arxiv.org/abs/1605.01369v2 |
http://arxiv.org/pdf/1605.01369v2.pdf | |
PWC | https://paperswithcode.com/paper/accelerating-deep-learning-with-shrinkage-and |
Repo | |
Framework | |
Neural Coarse-Graining: Extracting slowly-varying latent degrees of freedom with neural networks
Title | Neural Coarse-Graining: Extracting slowly-varying latent degrees of freedom with neural networks |
Authors | Nicholas Guttenberg, Martin Biehl, Ryota Kanai |
Abstract | We present a loss function for neural networks that encompasses an idea of trivial versus non-trivial predictions, such that the network jointly determines its own prediction goals and learns to satisfy them. This permits the network to choose sub-sets of a problem which are most amenable to its abilities to focus on solving, while discarding ‘distracting’ elements that interfere with its learning. To do this, the network first transforms the raw data into a higher-level categorical representation, and then trains a predictor from that new time series to its future. To prevent a trivial solution of mapping the signal to zero, we introduce a measure of non-triviality via a contrast between the prediction error of the learned model with a naive model of the overall signal statistics. The transform can learn to discard uninformative and unpredictable components of the signal in favor of the features which are both highly predictive and highly predictable. This creates a coarse-grained model of the time-series dynamics, focusing on predicting the slowly varying latent parameters which control the statistics of the time-series, rather than predicting the fast details directly. The result is a semi-supervised algorithm which is capable of extracting latent parameters, segmenting sections of time-series with differing statistics, and building a higher-level representation of the underlying dynamics from unlabeled data. |
Tasks | Time Series |
Published | 2016-09-01 |
URL | http://arxiv.org/abs/1609.00116v1 |
http://arxiv.org/pdf/1609.00116v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-coarse-graining-extracting-slowly |
Repo | |
Framework | |
Correct classification for big/smart/fast data machine learning
Title | Correct classification for big/smart/fast data machine learning |
Authors | Sander Stepanov |
Abstract | Table (database) / Relational database Classification for big/smart/fast data machine learning is one of the most important tasks of predictive analytics and extracting valuable information from data. It is core applied technique for what now understood under data science and/or artificial intelligence. Widely used Decision Tree (Random Forest) and rare used rule based PRISM , VFST, etc classifiers are empirical substitutions of theoretically correct to use Boolean functions minimization. Developing Minimization of Boolean functions algorithms is started long time ago by Edward Veitch’s 1952. Since it, big efforts by wide scientific/industrial community was done to find feasible solution of Boolean functions minimization. In this paper we propose consider table data classification from mathematical point of view, as minimization of Boolean functions. It is shown that data representation may be transformed to Boolean functions form and how to use known algorithms. For simplicity, binary output function is used for development, what opens doors for multivalued outputs developments. |
Tasks | |
Published | 2016-09-27 |
URL | http://arxiv.org/abs/1609.08550v1 |
http://arxiv.org/pdf/1609.08550v1.pdf | |
PWC | https://paperswithcode.com/paper/correct-classification-for-bigsmartfast-data |
Repo | |
Framework | |
Differentiable Programs with Neural Libraries
Title | Differentiable Programs with Neural Libraries |
Authors | Alexander L. Gaunt, Marc Brockschmidt, Nate Kushman, Daniel Tarlow |
Abstract | We develop a framework for combining differentiable programming languages with neural networks. Using this framework we create end-to-end trainable systems that learn to write interpretable algorithms with perceptual components. We explore the benefits of inductive biases for strong generalization and modularity that come from the program-like structure of our models. In particular, modularity allows us to learn a library of (neural) functions which grows and improves as more tasks are solved. Empirically, we show that this leads to lifelong learning systems that transfer knowledge to new tasks more effectively than baselines. |
Tasks | |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.02109v2 |
http://arxiv.org/pdf/1611.02109v2.pdf | |
PWC | https://paperswithcode.com/paper/differentiable-programs-with-neural-libraries |
Repo | |
Framework | |
Appearance Harmonization for Single Image Shadow Removal
Title | Appearance Harmonization for Single Image Shadow Removal |
Authors | Liqian Ma, Jue Wang, Eli Shechtman, Kalyan Sunkavalli, Shimin Hu |
Abstract | Shadows often create unwanted artifacts in photographs, and removing them can be very challenging. Previous shadow removal methods often produce de-shadowed regions that are visually inconsistent with the rest of the image. In this work we propose a fully automatic shadow region harmonization approach that improves the appearance compatibility of the de-shadowed region as typically produced by previous methods. It is based on a shadow-guided patch-based image synthesis approach that reconstructs the shadow region using patches sampled from non-shadowed regions. The result is then refined based on the reconstruction confidence to handle unique image patterns. Many shadow removal results and comparisons are show the effectiveness of our improvement. Quantitative evaluation on a benchmark dataset suggests that our automatic shadow harmonization approach effectively improves upon the state-of-the-art. |
Tasks | Image Generation, Image Shadow Removal |
Published | 2016-03-21 |
URL | http://arxiv.org/abs/1603.06398v1 |
http://arxiv.org/pdf/1603.06398v1.pdf | |
PWC | https://paperswithcode.com/paper/appearance-harmonization-for-single-image |
Repo | |
Framework | |
Proceedings First International Workshop on Hammers for Type Theories
Title | Proceedings First International Workshop on Hammers for Type Theories |
Authors | Jasmin Christian Blanchette, Cezary Kaliszyk |
Abstract | This volume of EPTCS contains the proceedings of the First Workshop on Hammers for Type Theories (HaTT 2016), held on 1 July 2016 as part of the International Joint Conference on Automated Reasoning (IJCAR 2016) in Coimbra, Portugal. The proceedings contain four regular papers, as well as abstracts of the two invited talks by Pierre Corbineau (Verimag, France) and Aleksy Schubert (University of Warsaw, Poland). |
Tasks | |
Published | 2016-06-17 |
URL | http://arxiv.org/abs/1606.05427v1 |
http://arxiv.org/pdf/1606.05427v1.pdf | |
PWC | https://paperswithcode.com/paper/proceedings-first-international-workshop-on |
Repo | |
Framework | |
Patterns of Scalable Bayesian Inference
Title | Patterns of Scalable Bayesian Inference |
Authors | Elaine Angelino, Matthew James Johnson, Ryan P. Adams |
Abstract | Datasets are growing not just in size but in complexity, creating a demand for rich models and quantification of uncertainty. Bayesian methods are an excellent fit for this demand, but scaling Bayesian inference is a challenge. In response to this challenge, there has been considerable recent work based on varying assumptions about model structure, underlying computational resources, and the importance of asymptotic correctness. As a result, there is a zoo of ideas with few clear overarching principles. In this paper, we seek to identify unifying principles, patterns, and intuitions for scaling Bayesian inference. We review existing work on utilizing modern computing resources with both MCMC and variational approximation techniques. From this taxonomy of ideas, we characterize the general principles that have proven successful for designing scalable inference procedures and comment on the path forward. |
Tasks | Bayesian Inference |
Published | 2016-02-16 |
URL | http://arxiv.org/abs/1602.05221v2 |
http://arxiv.org/pdf/1602.05221v2.pdf | |
PWC | https://paperswithcode.com/paper/patterns-of-scalable-bayesian-inference |
Repo | |
Framework | |
Generalizing Prototype Theory: A Formal Quantum Framework
Title | Generalizing Prototype Theory: A Formal Quantum Framework |
Authors | Diederik Aerts, Jan Broekaert, Liane Gabora, Sandro Sozzo |
Abstract | Theories of natural language and concepts have been unable to model the flexibility, creativity, context-dependence, and emergence, exhibited by words, concepts and their combinations. The mathematical formalism of quantum theory has instead been successful in capturing these phenomena such as graded membership, situational meaning, composition of categories, and also more complex decision making situations, which cannot be modeled in traditional probabilistic approaches. We show how a formal quantum approach to concepts and their combinations can provide a powerful extension of prototype theory. We explain how prototypes can interfere in conceptual combinations as a consequence of their contextual interactions, and provide an illustration of this using an intuitive wave-like diagram. This quantum-conceptual approach gives new life to original prototype theory, without however making it a privileged concept theory, as we explain at the end of our paper. |
Tasks | Decision Making |
Published | 2016-01-25 |
URL | http://arxiv.org/abs/1601.06610v1 |
http://arxiv.org/pdf/1601.06610v1.pdf | |
PWC | https://paperswithcode.com/paper/generalizing-prototype-theory-a-formal |
Repo | |
Framework | |
Confidence-Constrained Maximum Entropy Framework for Learning from Multi-Instance Data
Title | Confidence-Constrained Maximum Entropy Framework for Learning from Multi-Instance Data |
Authors | Behrouz Behmardi, Forrest Briggs, Xiaoli Z. Fern, Raviv Raich |
Abstract | Multi-instance data, in which each object (bag) contains a collection of instances, are widespread in machine learning, computer vision, bioinformatics, signal processing, and social sciences. We present a maximum entropy (ME) framework for learning from multi-instance data. In this approach each bag is represented as a distribution using the principle of ME. We introduce the concept of confidence-constrained ME (CME) to simultaneously learn the structure of distribution space and infer each distribution. The shared structure underlying each density is used to learn from instances inside each bag. The proposed CME is free of tuning parameters. We devise a fast optimization algorithm capable of handling large scale multi-instance data. In the experimental section, we evaluate the performance of the proposed approach in terms of exact rank recovery in the space of distributions and compare it with the regularized ME approach. Moreover, we compare the performance of CME with Multi-Instance Learning (MIL) state-of-the-art algorithms and show a comparable performance in terms of accuracy with reduced computational complexity. |
Tasks | |
Published | 2016-03-07 |
URL | http://arxiv.org/abs/1603.01901v1 |
http://arxiv.org/pdf/1603.01901v1.pdf | |
PWC | https://paperswithcode.com/paper/confidence-constrained-maximum-entropy |
Repo | |
Framework | |
Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations
Title | Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations |
Authors | Behnam Neyshabur, Yuhuai Wu, Ruslan Salakhutdinov, Nathan Srebro |
Abstract | We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, even with various recently suggested initialization schemes. |
Tasks | |
Published | 2016-05-23 |
URL | http://arxiv.org/abs/1605.07154v1 |
http://arxiv.org/pdf/1605.07154v1.pdf | |
PWC | https://paperswithcode.com/paper/path-normalized-optimization-of-recurrent |
Repo | |
Framework | |
A Scalable and Robust Framework for Intelligent Real-time Video Surveillance
Title | A Scalable and Robust Framework for Intelligent Real-time Video Surveillance |
Authors | Shreenath Dutt, Ankita Kalra |
Abstract | In this paper, we present an intelligent, reliable and storage-efficient video surveillance system using Apache Storm and OpenCV. As a Storm topology, we have added multiple information extraction modules that only write important content to the disk. Our topology is extensible, capable of adding novel algorithms as per the use case without affecting the existing ones, since all the processing is independent of each other. This framework is also highly scalable and fault tolerant, which makes it a best option for organisations that need to monitor a large network of surveillance cameras. |
Tasks | |
Published | 2016-10-30 |
URL | http://arxiv.org/abs/1610.09590v1 |
http://arxiv.org/pdf/1610.09590v1.pdf | |
PWC | https://paperswithcode.com/paper/a-scalable-and-robust-framework-for |
Repo | |
Framework | |
Word sense disambiguation: a complex network approach
Title | Word sense disambiguation: a complex network approach |
Authors | Edilson A. Correa Jr., Alneu de Andrade Lopes, Diego R. Amancio |
Abstract | In recent years, concepts and methods of complex networks have been employed to tackle the word sense disambiguation (WSD) task by representing words as nodes, which are connected if they are semantically similar. Despite the increasingly number of studies carried out with such models, most of them use networks just to represent the data, while the pattern recognition performed on the attribute space is performed using traditional learning techniques. In other words, the structural relationship between words have not been explicitly used in the pattern recognition process. In addition, only a few investigations have probed the suitability of representations based on bipartite networks and graphs (bigraphs) for the problem, as many approaches consider all possible links between words. In this context, we assess the relevance of a bipartite network model representing both feature words (i.e. the words characterizing the context) and target (ambiguous) words to solve ambiguities in written texts. Here, we focus on the semantical relationships between these two type of words, disregarding the relationships between feature words. In special, the proposed method not only serves to represent texts as graphs, but also constructs a structure on which the discrimination of senses is accomplished. Our results revealed that the proposed learning algorithm in such bipartite networks provides excellent results mostly when topical features are employed to characterize the context. Surprisingly, our method even outperformed the support vector machine algorithm in particular cases, with the advantage of being robust even if a small training dataset is available. Taken together, the results obtained here show that the proposed representation/classification method might be useful to improve the semantical characterization of written texts. |
Tasks | Word Sense Disambiguation |
Published | 2016-06-25 |
URL | http://arxiv.org/abs/1606.07950v2 |
http://arxiv.org/pdf/1606.07950v2.pdf | |
PWC | https://paperswithcode.com/paper/word-sense-disambiguation-a-complex-network |
Repo | |
Framework | |
FusionNet: 3D Object Classification Using Multiple Data Representations
Title | FusionNet: 3D Object Classification Using Multiple Data Representations |
Authors | Vishakh Hegde, Reza Zadeh |
Abstract | High-quality 3D object recognition is an important component of many vision and robotics systems. We tackle the object recognition problem using two data representations, to achieve leading results on the Princeton ModelNet challenge. The two representations: 1. Volumetric representation: the 3D object is discretized spatially as binary voxels - $1$ if the voxel is occupied and $0$ otherwise. 2. Pixel representation: the 3D object is represented as a set of projected 2D pixel images. Current leading submissions to the ModelNet Challenge use Convolutional Neural Networks (CNNs) on pixel representations. However, we diverge from this trend and additionally, use Volumetric CNNs to bridge the gap between the efficiency of the above two representations. We combine both representations and exploit them to learn new features, which yield a significantly better classifier than using either of the representations in isolation. To do this, we introduce new Volumetric CNN (V-CNN) architectures. |
Tasks | 3D Object Classification, 3D Object Recognition, Object Classification, Object Recognition |
Published | 2016-07-19 |
URL | http://arxiv.org/abs/1607.05695v4 |
http://arxiv.org/pdf/1607.05695v4.pdf | |
PWC | https://paperswithcode.com/paper/fusionnet-3d-object-classification-using |
Repo | |
Framework | |