Paper Group ANR 445
A wake-sleep algorithm for recurrent, spiking neural networks. On Quadratic Penalties in Elastic Weight Consolidation. A Generative Approach to Question Answering. Segmentation of Intracranial Arterial Calcification with Deeply Supervised Residual Dropout Networks. LDMNet: Low Dimensional Manifold Regularized Neural Networks. Recent Progress of Fac …
A wake-sleep algorithm for recurrent, spiking neural networks
Title | A wake-sleep algorithm for recurrent, spiking neural networks |
Authors | Johannes Thiele, Peter Diehl, Matthew Cook |
Abstract | We investigate a recently proposed model for cortical computation which performs relational inference. It consists of several interconnected, structurally equivalent populations of leaky integrate-and-fire (LIF) neurons, which are trained in a self-organized fashion with spike-timing dependent plasticity (STDP). Despite its robust learning dynamics, the model is susceptible to a problem typical for recurrent networks which use a correlation based (Hebbian) learning rule: if trained with high learning rates, the recurrent connections can cause strong feedback loops in the network dynamics, which lead to the emergence of attractor states. This causes a strong reduction in the number of representable patterns and a decay in the inference ability of the network. As a solution, we introduce a conceptually very simple “wake-sleep” algorithm: during the wake phase, training is executed normally, while during the sleep phase, the network “dreams” samples from its generative model, which are induced by random input. This process allows us to activate the attractor states in the network, which can then be unlearned effectively by an anti-Hebbian mechanism. The algorithm allows us to increase learning rates up to a factor of ten while avoiding clustering, which allows the network to learn several times faster. Also for low learning rates, where clustering is not an issue, it improves convergence speed and reduces the final inference error. |
Tasks | |
Published | 2017-03-18 |
URL | http://arxiv.org/abs/1703.06290v1 |
http://arxiv.org/pdf/1703.06290v1.pdf | |
PWC | https://paperswithcode.com/paper/a-wake-sleep-algorithm-for-recurrent-spiking |
Repo | |
Framework | |
On Quadratic Penalties in Elastic Weight Consolidation
Title | On Quadratic Penalties in Elastic Weight Consolidation |
Authors | Ferenc Huszár |
Abstract | Elastic weight consolidation (EWC, Kirkpatrick et al, 2017) is a novel algorithm designed to safeguard against catastrophic forgetting in neural networks. EWC can be seen as an approximation to Laplace propagation (Eskin et al, 2004), and this view is consistent with the motivation given by Kirkpatrick et al (2017). In this note, I present an extended derivation that covers the case when there are more than two tasks. I show that the quadratic penalties in EWC are inconsistent with this derivation and might lead to double-counting data from earlier tasks. |
Tasks | |
Published | 2017-12-11 |
URL | http://arxiv.org/abs/1712.03847v1 |
http://arxiv.org/pdf/1712.03847v1.pdf | |
PWC | https://paperswithcode.com/paper/on-quadratic-penalties-in-elastic-weight |
Repo | |
Framework | |
A Generative Approach to Question Answering
Title | A Generative Approach to Question Answering |
Authors | Rajarshee Mitra |
Abstract | Question Answering has come a long way from answer sentence selection, relational QA to reading and comprehension. We shift our attention to generative question answering (gQA) by which we facilitate machine to read passages and answer questions by learning to generate the answers. We frame the problem as a generative task where the encoder being a network that models the relationship between question and passage and encoding them to a vector thus facilitating the decoder to directly form an abstraction of the answer. Not being able to retain facts and making repetitions are common mistakes that affect the overall legibility of answers. To counter these issues, we employ copying mechanism and maintenance of coverage vector in our model respectively. Our results on MS-MARCO demonstrate it’s superiority over baselines and we also show qualitative examples where we improved in terms of correctness and readability |
Tasks | Question Answering |
Published | 2017-11-16 |
URL | http://arxiv.org/abs/1711.06238v2 |
http://arxiv.org/pdf/1711.06238v2.pdf | |
PWC | https://paperswithcode.com/paper/a-generative-approach-to-question-answering |
Repo | |
Framework | |
Segmentation of Intracranial Arterial Calcification with Deeply Supervised Residual Dropout Networks
Title | Segmentation of Intracranial Arterial Calcification with Deeply Supervised Residual Dropout Networks |
Authors | Gerda Bortsova, Gijs van Tulder, Florian Dubost, Tingying Peng, Nassir Navab, Aad van der Lugt, Daniel Bos, Marleen de Bruijne |
Abstract | Intracranial carotid artery calcification (ICAC) is a major risk factor for stroke, and might contribute to dementia and cognitive decline. Reliance on time-consuming manual annotation of ICAC hampers much demanded further research into the relationship between ICAC and neurological diseases. Automation of ICAC segmentation is therefore highly desirable, but difficult due to the proximity of the lesions to bony structures with a similar attenuation coefficient. In this paper, we propose a method for automatic segmentation of ICAC; the first to our knowledge. Our method is based on a 3D fully convolutional neural network that we extend with two regularization techniques. Firstly, we use deep supervision (hidden layers supervision) to encourage discriminative features in the hidden layers. Secondly, we augment the network with skip connections, as in the recently developed ResNet, and dropout layers, inserted in a way that skip connections circumvent them. We investigate the effect of skip connections and dropout. In addition, we propose a simple problem-specific modification of the network objective function that restricts the focus to the most important image regions and simplifies the optimization. We train and validate our model using 882 CT scans and test on 1,000. Our regularization techniques and objective improve the average Dice score by 7.1%, yielding an average Dice of 76.2% and 97.7% correlation between predicted ICAC volumes and manual annotations. |
Tasks | |
Published | 2017-06-04 |
URL | http://arxiv.org/abs/1706.01148v1 |
http://arxiv.org/pdf/1706.01148v1.pdf | |
PWC | https://paperswithcode.com/paper/segmentation-of-intracranial-arterial |
Repo | |
Framework | |
LDMNet: Low Dimensional Manifold Regularized Neural Networks
Title | LDMNet: Low Dimensional Manifold Regularized Neural Networks |
Authors | Wei Zhu, Qiang Qiu, Jiaji Huang, Robert Calderbank, Guillermo Sapiro, Ingrid Daubechies |
Abstract | Deep neural networks have proved very successful on archetypal tasks for which large training sets are available, but when the training data are scarce, their performance suffers from overfitting. Many existing methods of reducing overfitting are data-independent, and their efficacy is often limited when the training set is very small. Data-dependent regularizations are mostly motivated by the observation that data of interest lie close to a manifold, which is typically hard to parametrize explicitly and often requires human input of tangent vectors. These methods typically only focus on the geometry of the input data, and do not necessarily encourage the networks to produce geometrically meaningful features. To resolve this, we propose a new framework, the Low-Dimensional-Manifold-regularized neural Network (LDMNet), which incorporates a feature regularization method that focuses on the geometry of both the input data and the output features. In LDMNet, we regularize the network by encouraging the combination of the input data and the output features to sample a collection of low dimensional manifolds, which are searched efficiently without explicit parametrization. To achieve this, we directly use the manifold dimension as a regularization term in a variational functional. The resulting Euler-Lagrange equation is a Laplace-Beltrami equation over a point cloud, which is solved by the point integral method without increasing the computational complexity. We demonstrate two benefits of LDMNet in the experiments. First, we show that LDMNet significantly outperforms widely-used network regularizers such as weight decay and DropOut. Second, we show that LDMNet can be designed to extract common features of an object imaged via different modalities, which proves to be very useful in real-world applications such as cross-spectral face recognition. |
Tasks | Face Recognition |
Published | 2017-11-16 |
URL | http://arxiv.org/abs/1711.06246v1 |
http://arxiv.org/pdf/1711.06246v1.pdf | |
PWC | https://paperswithcode.com/paper/ldmnet-low-dimensional-manifold-regularized |
Repo | |
Framework | |
Recent Progress of Face Image Synthesis
Title | Recent Progress of Face Image Synthesis |
Authors | Zhihe Lu, Zhihang Li, Jie Cao, Ran He, Zhenan Sun |
Abstract | Face synthesis has been a fascinating yet challenging problem in computer vision and machine learning. Its main research effort is to design algorithms to generate photo-realistic face images via given semantic domain. It has been a crucial prepossessing step of main-stream face recognition approaches and an excellent test of AI ability to use complicated probability distributions. In this paper, we provide a comprehensive review of typical face synthesis works that involve traditional methods as well as advanced deep learning approaches. Particularly, Generative Adversarial Net (GAN) is highlighted to generate photo-realistic and identity preserving results. Furthermore, the public available databases and evaluation metrics are introduced in details. We end the review with discussing unsolved difficulties and promising directions for future research. |
Tasks | Face Generation, Face Recognition, Image Generation |
Published | 2017-06-15 |
URL | http://arxiv.org/abs/1706.04717v1 |
http://arxiv.org/pdf/1706.04717v1.pdf | |
PWC | https://paperswithcode.com/paper/recent-progress-of-face-image-synthesis |
Repo | |
Framework | |
Multibiometric Secure System Based on Deep Learning
Title | Multibiometric Secure System Based on Deep Learning |
Authors | Veeru Talreja, Matthew C. Valenti, Nasser M. Nasrabadi |
Abstract | In this paper, we propose a secure multibiometric system that uses deep neural networks and error-correction coding. We present a feature-level fusion framework to generate a secure multibiometric template from each user’s multiple biometrics. Two fusion architectures, fully connected architecture and bilinear architecture, are implemented to develop a robust multibiometric shared representation. The shared representation is used to generate a cancelable biometric template that involves the selection of a different set of reliable and discriminative features for each user. This cancelable template is a binary vector and is passed through an appropriate error-correcting decoder to find a closest codeword and this codeword is hashed to generate the final secure template. The efficacy of the proposed approach is shown using a multimodal database where we achieve state-of-the-art matching performance, along with cancelability and security. |
Tasks | |
Published | 2017-08-07 |
URL | http://arxiv.org/abs/1708.02314v1 |
http://arxiv.org/pdf/1708.02314v1.pdf | |
PWC | https://paperswithcode.com/paper/multibiometric-secure-system-based-on-deep |
Repo | |
Framework | |
Avoiding Discrimination through Causal Reasoning
Title | Avoiding Discrimination through Causal Reasoning |
Authors | Niki Kilbertus, Mateo Rojas-Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, Bernhard Schölkopf |
Abstract | Recent work on fairness in machine learning has focused on various statistical discrimination criteria and how they trade off. Most of these criteria are observational: They depend only on the joint distribution of predictor, protected attribute, features, and outcome. While convenient to work with, observational criteria have severe inherent limitations that prevent them from resolving matters of fairness conclusively. Going beyond observational criteria, we frame the problem of discrimination based on protected attributes in the language of causal reasoning. This viewpoint shifts attention from “What is the right fairness criterion?” to “What do we want to assume about the causal data generating process?” Through the lens of causality, we make several contributions. First, we crisply articulate why and when observational criteria fail, thus formalizing what was before a matter of opinion. Second, our approach exposes previously ignored subtleties and why they are fundamental to the problem. Finally, we put forward natural causal non-discrimination criteria and develop algorithms that satisfy them. |
Tasks | |
Published | 2017-06-08 |
URL | http://arxiv.org/abs/1706.02744v2 |
http://arxiv.org/pdf/1706.02744v2.pdf | |
PWC | https://paperswithcode.com/paper/avoiding-discrimination-through-causal |
Repo | |
Framework | |
Adaptivity to Noise Parameters in Nonparametric Active Learning
Title | Adaptivity to Noise Parameters in Nonparametric Active Learning |
Authors | Andrea Locatelli, Alexandra Carpentier, Samory Kpotufe |
Abstract | This work addresses various open questions in the theory of active learning for nonparametric classification. Our contributions are both statistical and algorithmic: -We establish new minimax-rates for active learning under common \textit{noise conditions}. These rates display interesting transitions – due to the interaction between noise \textit{smoothness and margin} – not present in the passive setting. Some such transitions were previously conjectured, but remained unconfirmed. -We present a generic algorithmic strategy for adaptivity to unknown noise smoothness and margin; our strategy achieves optimal rates in many general situations; furthermore, unlike in previous work, we avoid the need for \textit{adaptive confidence sets}, resulting in strictly milder distributional requirements. |
Tasks | Active Learning |
Published | 2017-03-16 |
URL | http://arxiv.org/abs/1703.05841v1 |
http://arxiv.org/pdf/1703.05841v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptivity-to-noise-parameters-in |
Repo | |
Framework | |
Bayesian stochastic blockmodeling
Title | Bayesian stochastic blockmodeling |
Authors | Tiago P. Peixoto |
Abstract | This chapter provides a self-contained introduction to the use of Bayesian inference to extract large-scale modular structures from network data, based on the stochastic blockmodel (SBM), as well as its degree-corrected and overlapping generalizations. We focus on nonparametric formulations that allow their inference in a manner that prevents overfitting, and enables model selection. We discuss aspects of the choice of priors, in particular how to avoid underfitting via increased Bayesian hierarchies, and we contrast the task of sampling network partitions from the posterior distribution with finding the single point estimate that maximizes it, while describing efficient algorithms to perform either one. We also show how inferring the SBM can be used to predict missing and spurious links, and shed light on the fundamental limitations of the detectability of modular structures in networks. |
Tasks | Bayesian Inference, Model Selection |
Published | 2017-05-29 |
URL | https://arxiv.org/abs/1705.10225v8 |
https://arxiv.org/pdf/1705.10225v8.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-stochastic-blockmodeling |
Repo | |
Framework | |
Image Forgery Localization Based on Multi-Scale Convolutional Neural Networks
Title | Image Forgery Localization Based on Multi-Scale Convolutional Neural Networks |
Authors | Yaqi Liu, Qingxiao Guan, Xianfeng Zhao, Yun Cao |
Abstract | In this paper, we propose to utilize Convolutional Neural Networks (CNNs) and the segmentation-based multi-scale analysis to locate tampered areas in digital images. First, to deal with color input sliding windows of different scales, a unified CNN architecture is designed. Then, we elaborately design the training procedures of CNNs on sampled training patches. With a set of robust multi-scale tampering detectors based on CNNs, complementary tampering possibility maps can be generated. Last but not least, a segmentation-based method is proposed to fuse the maps and generate the final decision map. By exploiting the benefits of both the small-scale and large-scale analyses, the segmentation-based multi-scale analysis can lead to a performance leap in forgery localization of CNNs. Numerous experiments are conducted to demonstrate the effectiveness and efficiency of our method. |
Tasks | |
Published | 2017-06-13 |
URL | http://arxiv.org/abs/1706.07842v4 |
http://arxiv.org/pdf/1706.07842v4.pdf | |
PWC | https://paperswithcode.com/paper/image-forgery-localization-based-on-multi |
Repo | |
Framework | |
Evaluating vector-space models of analogy
Title | Evaluating vector-space models of analogy |
Authors | Dawn Chen, Joshua C. Peterson, Thomas L. Griffiths |
Abstract | Vector-space representations provide geometric tools for reasoning about the similarity of a set of objects and their relationships. Recent machine learning methods for deriving vector-space embeddings of words (e.g., word2vec) have achieved considerable success in natural language processing. These vector spaces have also been shown to exhibit a surprising capacity to capture verbal analogies, with similar results for natural images, giving new life to a classic model of analogies as parallelograms that was first proposed by cognitive scientists. We evaluate the parallelogram model of analogy as applied to modern word embeddings, providing a detailed analysis of the extent to which this approach captures human relational similarity judgments in a large benchmark dataset. We find that that some semantic relationships are better captured than others. We then provide evidence for deeper limitations of the parallelogram model based on the intrinsic geometric constraints of vector spaces, paralleling classic results for first-order similarity. |
Tasks | Word Embeddings |
Published | 2017-05-12 |
URL | http://arxiv.org/abs/1705.04416v2 |
http://arxiv.org/pdf/1705.04416v2.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-vector-space-models-of-analogy |
Repo | |
Framework | |
Embedding Feature Selection for Large-scale Hierarchical Classification
Title | Embedding Feature Selection for Large-scale Hierarchical Classification |
Authors | Azad Naik, Huzefa Rangwala |
Abstract | Large-scale Hierarchical Classification (HC) involves datasets consisting of thousands of classes and millions of training instances with high-dimensional features posing several big data challenges. Feature selection that aims to select the subset of discriminant features is an effective strategy to deal with large-scale HC problem. It speeds up the training process, reduces the prediction time and minimizes the memory requirements by compressing the total size of learned model weight vectors. Majority of the studies have also shown feature selection to be competent and successful in improving the classification accuracy by removing irrelevant features. In this work, we investigate various filter-based feature selection methods for dimensionality reduction to solve the large-scale HC problem. Our experimental evaluation on text and image datasets with varying distribution of features, classes and instances shows upto 3x order of speed-up on massive datasets and upto 45% less memory requirements for storing the weight vectors of learned model without any significant loss (improvement for some datasets) in the classification accuracy. Source Code: https://cs.gmu.edu/~mlbio/featureselection. |
Tasks | Dimensionality Reduction, Feature Selection |
Published | 2017-06-06 |
URL | http://arxiv.org/abs/1706.01581v1 |
http://arxiv.org/pdf/1706.01581v1.pdf | |
PWC | https://paperswithcode.com/paper/embedding-feature-selection-for-large-scale |
Repo | |
Framework | |
Conditional Gradient Method for Stochastic Submodular Maximization: Closing the Gap
Title | Conditional Gradient Method for Stochastic Submodular Maximization: Closing the Gap |
Authors | Aryan Mokhtari, Hamed Hassani, Amin Karbasi |
Abstract | In this paper, we study the problem of \textit{constrained} and \textit{stochastic} continuous submodular maximization. Even though the objective function is not concave (nor convex) and is defined in terms of an expectation, we develop a variant of the conditional gradient method, called \alg, which achieves a \textit{tight} approximation guarantee. More precisely, for a monotone and continuous DR-submodular function and subject to a \textit{general} convex body constraint, we prove that \alg achieves a $[(1-1/e)\text{OPT} -\eps]$ guarantee (in expectation) with $\mathcal{O}{(1/\eps^3)}$ stochastic gradient computations. This guarantee matches the known hardness results and closes the gap between deterministic and stochastic continuous submodular maximization. By using stochastic continuous optimization as an interface, we also provide the first $(1-1/e)$ tight approximation guarantee for maximizing a \textit{monotone but stochastic} submodular \textit{set} function subject to a general matroid constraint. |
Tasks | |
Published | 2017-11-05 |
URL | http://arxiv.org/abs/1711.01660v1 |
http://arxiv.org/pdf/1711.01660v1.pdf | |
PWC | https://paperswithcode.com/paper/conditional-gradient-method-for-stochastic |
Repo | |
Framework | |
An Open Source C++ Implementation of Multi-Threaded Gaussian Mixture Models, k-Means and Expectation Maximisation
Title | An Open Source C++ Implementation of Multi-Threaded Gaussian Mixture Models, k-Means and Expectation Maximisation |
Authors | Conrad Sanderson, Ryan Curtin |
Abstract | Modelling of multivariate densities is a core component in many signal processing, pattern recognition and machine learning applications. The modelling is often done via Gaussian mixture models (GMMs), which use computationally expensive and potentially unstable training algorithms. We provide an overview of a fast and robust implementation of GMMs in the C++ language, employing multi-threaded versions of the Expectation Maximisation (EM) and k-means training algorithms. Multi-threading is achieved through reformulation of the EM and k-means algorithms into a MapReduce-like framework. Furthermore, the implementation uses several techniques to improve numerical stability and modelling accuracy. We demonstrate that the multi-threaded implementation achieves a speedup of an order of magnitude on a recent 16 core machine, and that it can achieve higher modelling accuracy than a previously well-established publically accessible implementation. The multi-threaded implementation is included as a user-friendly class in recent releases of the open source Armadillo C++ linear algebra library. The library is provided under the permissive Apache~2.0 license, allowing unencumbered use in commercial products. |
Tasks | |
Published | 2017-07-28 |
URL | http://arxiv.org/abs/1707.09094v1 |
http://arxiv.org/pdf/1707.09094v1.pdf | |
PWC | https://paperswithcode.com/paper/an-open-source-c-implementation-of-multi |
Repo | |
Framework | |