Paper Group ANR 221
Network Modeling of Short Over-Dispersed Spike-Counts: A Hierarchical Parametric Empirical Bayes Framework. Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation. Investigating echo state networks dynamics by means of recurrence analysis. Dynamic Key-Value Memory Networks for Knowledge Tracing. How deep is kno …
Network Modeling of Short Over-Dispersed Spike-Counts: A Hierarchical Parametric Empirical Bayes Framework
Title | Network Modeling of Short Over-Dispersed Spike-Counts: A Hierarchical Parametric Empirical Bayes Framework |
Authors | Qi She, Beth Jelfs, Adam S. Charles, Rosa H. M. Chan |
Abstract | Accurate statistical models of neural spike responses can characterize the information carried by neural populations. Yet, challenges in recording at the level of individual neurons commonly results in relatively limited samples of spike counts, which can lead to model overfitting. Moreover, current models assume spike counts to be Poisson-distributed, which ignores the fact that many neurons demonstrate over-dispersed spiking behavior. The Negative Binomial Generalized Linear Model (NB-GLM) provides a powerful tool for modeling over-dispersed spike counts. However, maximum likelihood based standard NB-GLM leads to unstable and inaccurate parameter estimations. Thus, we propose a hierarchical parametric empirical Bayes method for estimating the parameters of the NB-GLM. Our method integrates Generalized Linear Models (GLMs) and empirical Bayes theory to: (1) effectively capture over-dispersion nature of spike counts from retinal ganglion neural responses; (2) significantly reduce mean square error of parameter estimations when compared to maximum likelihood based method for NB-GLMs; (3) provide an efficient alternative to fully Bayesian inference with low computational cost for hierarchical models; and (4) give insightful findings on both neural interactions and spiking behaviors of real retina cells. We apply our approach to study both simulated data and experimental neural data from the retina. The simulation results indicate the new framework can efficiently and accurately retrieve the weights of functional connections among neural populations and predict mean spike counts. The results from the retinal datasets demonstrate the proposed method outperforms both standard Poisson and Negative Binomial GLMs in terms of the predictive log-likelihood of held-out data. |
Tasks | Bayesian Inference |
Published | 2016-05-10 |
URL | http://arxiv.org/abs/1605.02869v3 |
http://arxiv.org/pdf/1605.02869v3.pdf | |
PWC | https://paperswithcode.com/paper/network-modeling-of-short-over-dispersed |
Repo | |
Framework | |
Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation
Title | Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation |
Authors | Hoo-Chang Shin, Kirk Roberts, Le Lu, Dina Demner-Fushman, Jianhua Yao, Ronald M Summers |
Abstract | Despite the recent advances in automatically describing image contents, their applications have been mostly limited to image caption datasets containing natural images (e.g., Flickr 30k, MSCOCO). In this paper, we present a deep learning model to efficiently detect a disease from an image and annotate its contexts (e.g., location, severity and the affected organs). We employ a publicly available radiology dataset of chest x-rays and their reports, and use its image annotations to mine disease names to train convolutional neural networks (CNNs). In doing so, we adopt various regularization techniques to circumvent the large normal-vs-diseased cases bias. Recurrent neural networks (RNNs) are then trained to describe the contexts of a detected disease, based on the deep CNN features. Moreover, we introduce a novel approach to use the weights of the already trained pair of CNN/RNN on the domain-specific image/text dataset, to infer the joint image/text contexts for composite image labeling. Significantly improved image annotation results are demonstrated using the recurrent neural cascade model by taking the joint image/text contexts into account. |
Tasks | |
Published | 2016-03-28 |
URL | http://arxiv.org/abs/1603.08486v1 |
http://arxiv.org/pdf/1603.08486v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-read-chest-x-rays-recurrent |
Repo | |
Framework | |
Investigating echo state networks dynamics by means of recurrence analysis
Title | Investigating echo state networks dynamics by means of recurrence analysis |
Authors | Filippo Maria Bianchi, Lorenzo Livi, Cesare Alippi |
Abstract | In this paper, we elaborate over the well-known interpretability issue in echo state networks. The idea is to investigate the dynamics of reservoir neurons with time-series analysis techniques taken from research on complex systems. Notably, we analyze time-series of neuron activations with Recurrence Plots (RPs) and Recurrence Quantification Analysis (RQA), which permit to visualize and characterize high-dimensional dynamical systems. We show that this approach is useful in a number of ways. First, the two-dimensional representation offered by RPs provides a way for visualizing the high-dimensional dynamics of a reservoir. Our results suggest that, if the network is stable, reservoir and input denote similar line patterns in the respective RPs. Conversely, the more unstable the ESN, the more the RP of the reservoir presents instability patterns. As a second result, we show that the $\mathrm{L_{max}}$ measure is highly correlated with the well-established maximal local Lyapunov exponent. This suggests that complexity measures based on RP diagonal lines distribution provide a valuable tool to quantify the degree of network stability. Finally, our analysis shows that all RQA measures fluctuate on the proximity of the so-called edge of stability, where an ESN typically achieves maximum computational capability. We verify that the determination of the edge of stability provided by such RQA measures is more accurate than two well-known criteria based on the Jacobian matrix of the reservoir. Therefore, we claim that RPs and RQA-based analyses can be used as valuable tools to design an effective network given a specific problem. |
Tasks | Time Series, Time Series Analysis |
Published | 2016-01-26 |
URL | http://arxiv.org/abs/1601.07381v2 |
http://arxiv.org/pdf/1601.07381v2.pdf | |
PWC | https://paperswithcode.com/paper/investigating-echo-state-networks-dynamics-by |
Repo | |
Framework | |
Dynamic Key-Value Memory Networks for Knowledge Tracing
Title | Dynamic Key-Value Memory Networks for Knowledge Tracing |
Authors | Jiani Zhang, Xingjian Shi, Irwin King, Dit-Yan Yeung |
Abstract | Knowledge Tracing (KT) is a task of tracing evolving knowledge state of students with respect to one or more concepts as they engage in a sequence of learning activities. One important purpose of KT is to personalize the practice sequence to help students learn knowledge concepts efficiently. However, existing methods such as Bayesian Knowledge Tracing and Deep Knowledge Tracing either model knowledge state for each predefined concept separately or fail to pinpoint exactly which concepts a student is good at or unfamiliar with. To solve these problems, this work introduces a new model called Dynamic Key-Value Memory Networks (DKVMN) that can exploit the relationships between underlying concepts and directly output a student’s mastery level of each concept. Unlike standard memory-augmented neural networks that facilitate a single memory matrix or two static memory matrices, our model has one static matrix called key, which stores the knowledge concepts and the other dynamic matrix called value, which stores and updates the mastery levels of corresponding concepts. Experiments show that our model consistently outperforms the state-of-the-art model in a range of KT datasets. Moreover, the DKVMN model can automatically discover underlying concepts of exercises typically performed by human annotations and depict the changing knowledge state of a student. |
Tasks | Knowledge Tracing |
Published | 2016-11-24 |
URL | http://arxiv.org/abs/1611.08108v2 |
http://arxiv.org/pdf/1611.08108v2.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-key-value-memory-networks-for |
Repo | |
Framework | |
How deep is knowledge tracing?
Title | How deep is knowledge tracing? |
Authors | Mohammad Khajah, Robert V. Lindsey, Michael C. Mozer |
Abstract | In theoretical cognitive science, there is a tension between highly structured models whose parameters have a direct psychological interpretation and highly complex, general-purpose models whose parameters and representations are difficult to interpret. The former typically provide more insight into cognition but the latter often perform better. This tension has recently surfaced in the realm of educational data mining, where a deep learning approach to predicting students’ performance as they work through a series of exercises—termed deep knowledge tracing or DKT—has demonstrated a stunning performance advantage over the mainstay of the field, Bayesian knowledge tracing or BKT. In this article, we attempt to understand the basis for DKT’s advantage by considering the sources of statistical regularity in the data that DKT can leverage but which BKT cannot. We hypothesize four forms of regularity that BKT fails to exploit: recency effects, the contextualized trial sequence, inter-skill similarity, and individual variation in ability. We demonstrate that when BKT is extended to allow it more flexibility in modeling statistical regularities—using extensions previously proposed in the literature—BKT achieves a level of performance indistinguishable from that of DKT. We argue that while DKT is a powerful, useful, general-purpose framework for modeling student learning, its gains do not come from the discovery of novel representations—the fundamental advantage of deep learning. To answer the question posed in our title, knowledge tracing may be a domain that does not require `depth’; shallow models like BKT can perform just as well and offer us greater interpretability and explanatory power. | |
Tasks | Knowledge Tracing |
Published | 2016-03-14 |
URL | http://arxiv.org/abs/1604.02416v2 |
http://arxiv.org/pdf/1604.02416v2.pdf | |
PWC | https://paperswithcode.com/paper/how-deep-is-knowledge-tracing |
Repo | |
Framework | |
Operator-Valued Bochner Theorem, Fourier Feature Maps for Operator-Valued Kernels, and Vector-Valued Learning
Title | Operator-Valued Bochner Theorem, Fourier Feature Maps for Operator-Valued Kernels, and Vector-Valued Learning |
Authors | Ha Quang Minh |
Abstract | This paper presents a framework for computing random operator-valued feature maps for operator-valued positive definite kernels. This is a generalization of the random Fourier features for scalar-valued kernels to the operator-valued case. Our general setting is that of operator-valued kernels corresponding to RKHS of functions with values in a Hilbert space. We show that in general, for a given kernel, there are potentially infinitely many random feature maps, which can be bounded or unbounded. Most importantly, given a kernel, we present a general, closed form formula for computing a corresponding probability measure, which is required for the construction of the Fourier features, and which, unlike the scalar case, is not uniquely and automatically determined by the kernel. We also show that, under appropriate conditions, random bounded feature maps can always be computed. Furthermore, we show the uniform convergence, under the Hilbert-Schmidt norm, of the resulting approximate kernel to the exact kernel on any compact subset of Euclidean space. Our convergence requires differentiable kernels, an improvement over the twice-differentiability requirement in previous work in the scalar setting. We then show how operator-valued feature maps and their approximations can be employed in a general vector-valued learning framework. The mathematical formulation is illustrated by numerical examples on matrix-valued kernels. |
Tasks | |
Published | 2016-08-19 |
URL | http://arxiv.org/abs/1608.05639v1 |
http://arxiv.org/pdf/1608.05639v1.pdf | |
PWC | https://paperswithcode.com/paper/operator-valued-bochner-theorem-fourier |
Repo | |
Framework | |
It’s Moving! A Probabilistic Model for Causal Motion Segmentation in Moving Camera Videos
Title | It’s Moving! A Probabilistic Model for Causal Motion Segmentation in Moving Camera Videos |
Authors | Pia Bideau, Erik Learned-Miller |
Abstract | The human ability to detect and segment moving objects works in the presence of multiple objects, complex background geometry, motion of the observer, and even camouflage. In addition to all of this, the ability to detect motion is nearly instantaneous. While there has been much recent progress in motion segmentation, it still appears we are far from human capabilities. In this work, we derive from first principles a new likelihood function for assessing the probability of an optical flow vector given the 3D motion direction of an object. This likelihood uses a novel combination of the angle and magnitude of the optical flow to maximize the information about the true motions of objects. Using this new likelihood and several innovations in initialization, we develop a motion segmentation algorithm that beats current state-of-the-art methods by a large margin. We compare to five state-of-the-art methods on two established benchmarks, and a third new data set of camouflaged animals, which we introduce to push motion segmentation to the next level. |
Tasks | Motion Segmentation, Optical Flow Estimation |
Published | 2016-04-01 |
URL | http://arxiv.org/abs/1604.00136v1 |
http://arxiv.org/pdf/1604.00136v1.pdf | |
PWC | https://paperswithcode.com/paper/its-moving-a-probabilistic-model-for-causal |
Repo | |
Framework | |
Energy-efficient Machine Learning in Silicon: A Communications-inspired Approach
Title | Energy-efficient Machine Learning in Silicon: A Communications-inspired Approach |
Authors | Naresh R. Shanbhag |
Abstract | This position paper advocates a communications-inspired approach to the design of machine learning systems on energy-constrained embedded `always-on’ platforms. The communications-inspired approach has two versions - 1) a deterministic version where existing low-power communication IC design methods are repurposed, and 2) a stochastic version referred to as Shannon-inspired statistical information processing employing information-based metrics, statistical error compensation (SEC), and retraining-based methods to implement ML systems on stochastic circuit/device fabrics operating at the limits of energy-efficiency. The communications-inspired approach has the potential to fully leverage the opportunities afforded by ML algorithms and applications in order to address the challenges inherent in their deployment on energy-constrained platforms. | |
Tasks | |
Published | 2016-10-25 |
URL | http://arxiv.org/abs/1611.03109v1 |
http://arxiv.org/pdf/1611.03109v1.pdf | |
PWC | https://paperswithcode.com/paper/energy-efficient-machine-learning-in-silicon |
Repo | |
Framework | |
Probabilistic Dimensionality Reduction via Structure Learning
Title | Probabilistic Dimensionality Reduction via Structure Learning |
Authors | Li Wang |
Abstract | We propose a novel probabilistic dimensionality reduction framework that can naturally integrate the generative model and the locality information of data. Based on this framework, we present a new model, which is able to learn a smooth skeleton of embedding points in a low-dimensional space from high-dimensional noisy data. The formulation of the new model can be equivalently interpreted as two coupled learning problem, i.e., structure learning and the learning of projection matrix. This interpretation motivates the learning of the embedding points that can directly form an explicit graph structure. We develop a new method to learn the embedding points that form a spanning tree, which is further extended to obtain a discriminative and compact feature representation for clustering problems. Unlike traditional clustering methods, we assume that centers of clusters should be close to each other if they are connected in a learned graph, and other cluster centers should be distant. This can greatly facilitate data visualization and scientific discovery in downstream analysis. Extensive experiments are performed that demonstrate that the proposed framework is able to obtain discriminative feature representations, and correctly recover the intrinsic structures of various real-world datasets. |
Tasks | Dimensionality Reduction |
Published | 2016-10-16 |
URL | http://arxiv.org/abs/1610.04929v1 |
http://arxiv.org/pdf/1610.04929v1.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-dimensionality-reduction-via |
Repo | |
Framework | |
Sharing Network Parameters for Crosslingual Named Entity Recognition
Title | Sharing Network Parameters for Crosslingual Named Entity Recognition |
Authors | Rudra Murthy V, Mitesh Khapra, Pushpak Bhattacharyya |
Abstract | Most state of the art approaches for Named Entity Recognition rely on hand crafted features and annotated corpora. Recently Neural network based models have been proposed which do not require handcrafted features but still require annotated corpora. However, such annotated corpora may not be available for many languages. In this paper, we propose a neural network based model which allows sharing the decoder as well as word and character level parameters between two languages thereby allowing a resource fortunate language to aid a resource deprived language. Specifically, we focus on the case when limited annotated corpora is available in one language ($L_1$) and abundant annotated corpora is available in another language ($L_2$). Sharing the network architecture and parameters between $L_1$ and $L_2$ leads to improved performance in $L_1$. Further, our approach does not require any hand crafted features but instead directly learns meaningful feature representations from the training data itself. We experiment with 4 language pairs and show that indeed in a resource constrained setup (lesser annotated corpora), a model jointly trained with data from another language performs better than a model trained only on the limited corpora in one language. |
Tasks | Named Entity Recognition |
Published | 2016-07-01 |
URL | http://arxiv.org/abs/1607.00198v1 |
http://arxiv.org/pdf/1607.00198v1.pdf | |
PWC | https://paperswithcode.com/paper/sharing-network-parameters-for-crosslingual |
Repo | |
Framework | |
Multi Channel-Kernel Canonical Correlation Analysis for Cross-View Person Re-Identification
Title | Multi Channel-Kernel Canonical Correlation Analysis for Cross-View Person Re-Identification |
Authors | Giuseppe Lisanti, Svebor Karaman, Iacopo Masi |
Abstract | In this paper we introduce a method to overcome one of the main challenges of person re-identification in multi-camera networks, namely cross-view appearance changes. The proposed solution addresses the extreme variability of person appearance in different camera views by exploiting multiple feature representations. For each feature, Kernel Canonical Correlation Analysis (KCCA) with different kernels is exploited to learn several projection spaces in which the appearance correlation between samples of the same person observed from different cameras is maximized. An iterative logistic regression is finally used to select and weigh the contributions of each feature projections and perform the matching between the two views. Experimental evaluation shows that the proposed solution obtains comparable performance on VIPeR and PRID 450s datasets and improves on PRID and CUHK01 datasets with respect to the state of the art. |
Tasks | Cross-Modal Person Re-Identification, Person Re-Identification |
Published | 2016-07-08 |
URL | http://arxiv.org/abs/1607.02204v2 |
http://arxiv.org/pdf/1607.02204v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-channel-kernel-canonical-correlation |
Repo | |
Framework | |
A Randomized Approach to Efficient Kernel Clustering
Title | A Randomized Approach to Efficient Kernel Clustering |
Authors | Farhad Pourkamali-Anaraki, Stephen Becker |
Abstract | Kernel-based K-means clustering has gained popularity due to its simplicity and the power of its implicit non-linear representation of the data. A dominant concern is the memory requirement since memory scales as the square of the number of data points. We provide a new analysis of a class of approximate kernel methods that have more modest memory requirements, and propose a specific one-pass randomized kernel approximation followed by standard K-means on the transformed data. The analysis and experiments suggest the method is accurate, while requiring drastically less memory than standard kernel K-means and significantly less memory than Nystrom based approximations. |
Tasks | |
Published | 2016-08-26 |
URL | http://arxiv.org/abs/1608.07597v3 |
http://arxiv.org/pdf/1608.07597v3.pdf | |
PWC | https://paperswithcode.com/paper/a-randomized-approach-to-efficient-kernel |
Repo | |
Framework | |
A fuzzy expert system for earthquake prediction, case study: the Zagros range
Title | A fuzzy expert system for earthquake prediction, case study: the Zagros range |
Authors | Arash Andalib, Mehdi Zare, Farid Atry |
Abstract | A methodology for the development of a fuzzy expert system (FES) with application to earthquake prediction is presented. The idea is to reproduce the performance of a human expert in earthquake prediction. To do this, at the first step, rules provided by the human expert are used to generate a fuzzy rule base. These rules are then fed into an inference engine to produce a fuzzy inference system (FIS) and to infer the results. In this paper, we have used a Sugeno type fuzzy inference system to build the FES. At the next step, the adaptive network-based fuzzy inference system (ANFIS) is used to refine the FES parameters and improve its performance. The proposed framework is then employed to attain the performance of a human expert used to predict earthquakes in the Zagros area based on the idea of coupled earthquakes. While the prediction results are promising in parts of the testing set, the general performance indicates that prediction methodology based on coupled earthquakes needs more investigation and more complicated reasoning procedure to yield satisfactory predictions. |
Tasks | |
Published | 2016-10-13 |
URL | http://arxiv.org/abs/1610.04028v2 |
http://arxiv.org/pdf/1610.04028v2.pdf | |
PWC | https://paperswithcode.com/paper/a-fuzzy-expert-system-for-earthquake |
Repo | |
Framework | |
Optimal Number of Choices in Rating Contexts
Title | Optimal Number of Choices in Rating Contexts |
Authors | Sam Ganzfried, Farzana Yusuf |
Abstract | In many settings people must give numerical scores to entities from a small discrete set. For instance, rating physical attractiveness from 1–5 on dating sites, or papers from 1–10 for conference reviewing. We study the problem of understanding when using a different number of options is optimal. We consider the case when scores are uniform random and Gaussian. We study computationally when using 2, 3, 4, 5, and 10 options out of a total of 100 is optimal in these models (though our theoretical analysis is for a more general setting with $k$ choices from $n$ total options as well as a continuous underlying space). One may expect that using more options would always improve performance in this model, but we show that this is not necessarily the case, and that using fewer choices—even just two—can surprisingly be optimal in certain situations. While in theory for this setting it would be optimal to use all 100 options, in practice this is prohibitive, and it is preferable to utilize a smaller number of options due to humans’ limited computational resources. Our results could have many potential applications, as settings requiring entities to be ranked by humans are ubiquitous. There could also be applications to other fields such as signal or image processing where input values from a large set must be mapped to output values in a smaller set. |
Tasks | |
Published | 2016-05-21 |
URL | http://arxiv.org/abs/1605.06588v9 |
http://arxiv.org/pdf/1605.06588v9.pdf | |
PWC | https://paperswithcode.com/paper/optimal-number-of-choices-in-rating-contexts |
Repo | |
Framework | |
Spectral Inference Methods on Sparse Graphs: Theory and Applications
Title | Spectral Inference Methods on Sparse Graphs: Theory and Applications |
Authors | Alaa Saade |
Abstract | In an era of unprecedented deluge of (mostly unstructured) data, graphs are proving more and more useful, across the sciences, as a flexible abstraction to capture complex relationships between complex objects. One of the main challenges arising in the study of such networks is the inference of macroscopic, large-scale properties affecting a large number of objects, based solely on the microscopic interactions between their elementary constituents. Statistical physics, precisely created to recover the macroscopic laws of thermodynamics from an idealized model of interacting particles, provides significant insight to tackle such complex networks. In this dissertation, we use methods derived from the statistical physics of disordered systems to design and study new algorithms for inference on graphs. Our focus is on spectral methods, based on certain eigenvectors of carefully chosen matrices, and sparse graphs, containing only a small amount of information. We develop an original theory of spectral inference based on a relaxation of various mean-field free energy optimizations. Our approach is therefore fully probabilistic, and contrasts with more traditional motivations based on the optimization of a cost function. We illustrate the efficiency of our approach on various problems, including community detection, randomized similarity-based clustering, and matrix completion. |
Tasks | Community Detection, Matrix Completion |
Published | 2016-10-14 |
URL | http://arxiv.org/abs/1610.04337v1 |
http://arxiv.org/pdf/1610.04337v1.pdf | |
PWC | https://paperswithcode.com/paper/spectral-inference-methods-on-sparse-graphs |
Repo | |
Framework | |