May 6, 2019

3465 words 17 mins read

Paper Group ANR 221

Network Modeling of Short Over-Dispersed Spike-Counts: A Hierarchical Parametric Empirical Bayes Framework. Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation. Investigating echo state networks dynamics by means of recurrence analysis. Dynamic Key-Value Memory Networks for Knowledge Tracing. How deep is kno …

Network Modeling of Short Over-Dispersed Spike-Counts: A Hierarchical Parametric Empirical Bayes Framework


Title	Network Modeling of Short Over-Dispersed Spike-Counts: A Hierarchical Parametric Empirical Bayes Framework
Authors	Qi She, Beth Jelfs, Adam S. Charles, Rosa H. M. Chan
Abstract	Accurate statistical models of neural spike responses can characterize the information carried by neural populations. Yet, challenges in recording at the level of individual neurons commonly results in relatively limited samples of spike counts, which can lead to model overfitting. Moreover, current models assume spike counts to be Poisson-distributed, which ignores the fact that many neurons demonstrate over-dispersed spiking behavior. The Negative Binomial Generalized Linear Model (NB-GLM) provides a powerful tool for modeling over-dispersed spike counts. However, maximum likelihood based standard NB-GLM leads to unstable and inaccurate parameter estimations. Thus, we propose a hierarchical parametric empirical Bayes method for estimating the parameters of the NB-GLM. Our method integrates Generalized Linear Models (GLMs) and empirical Bayes theory to: (1) effectively capture over-dispersion nature of spike counts from retinal ganglion neural responses; (2) significantly reduce mean square error of parameter estimations when compared to maximum likelihood based method for NB-GLMs; (3) provide an efficient alternative to fully Bayesian inference with low computational cost for hierarchical models; and (4) give insightful findings on both neural interactions and spiking behaviors of real retina cells. We apply our approach to study both simulated data and experimental neural data from the retina. The simulation results indicate the new framework can efficiently and accurately retrieve the weights of functional connections among neural populations and predict mean spike counts. The results from the retinal datasets demonstrate the proposed method outperforms both standard Poisson and Negative Binomial GLMs in terms of the predictive log-likelihood of held-out data.
Tasks	Bayesian Inference
Published	2016-05-10
URL	http://arxiv.org/abs/1605.02869v3
PDF	http://arxiv.org/pdf/1605.02869v3.pdf
PWC	https://paperswithcode.com/paper/network-modeling-of-short-over-dispersed
Repo
Framework

Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation


Title	Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation
Authors	Hoo-Chang Shin, Kirk Roberts, Le Lu, Dina Demner-Fushman, Jianhua Yao, Ronald M Summers
Abstract	Despite the recent advances in automatically describing image contents, their applications have been mostly limited to image caption datasets containing natural images (e.g., Flickr 30k, MSCOCO). In this paper, we present a deep learning model to efficiently detect a disease from an image and annotate its contexts (e.g., location, severity and the affected organs). We employ a publicly available radiology dataset of chest x-rays and their reports, and use its image annotations to mine disease names to train convolutional neural networks (CNNs). In doing so, we adopt various regularization techniques to circumvent the large normal-vs-diseased cases bias. Recurrent neural networks (RNNs) are then trained to describe the contexts of a detected disease, based on the deep CNN features. Moreover, we introduce a novel approach to use the weights of the already trained pair of CNN/RNN on the domain-specific image/text dataset, to infer the joint image/text contexts for composite image labeling. Significantly improved image annotation results are demonstrated using the recurrent neural cascade model by taking the joint image/text contexts into account.
Tasks
Published	2016-03-28
URL	http://arxiv.org/abs/1603.08486v1
PDF	http://arxiv.org/pdf/1603.08486v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-read-chest-x-rays-recurrent
Repo
Framework

Investigating echo state networks dynamics by means of recurrence analysis


Title	Investigating echo state networks dynamics by means of recurrence analysis
Authors	Filippo Maria Bianchi, Lorenzo Livi, Cesare Alippi
Abstract	In this paper, we elaborate over the well-known interpretability issue in echo state networks. The idea is to investigate the dynamics of reservoir neurons with time-series analysis techniques taken from research on complex systems. Notably, we analyze time-series of neuron activations with Recurrence Plots (RPs) and Recurrence Quantification Analysis (RQA), which permit to visualize and characterize high-dimensional dynamical systems. We show that this approach is useful in a number of ways. First, the two-dimensional representation offered by RPs provides a way for visualizing the high-dimensional dynamics of a reservoir. Our results suggest that, if the network is stable, reservoir and input denote similar line patterns in the respective RPs. Conversely, the more unstable the ESN, the more the RP of the reservoir presents instability patterns. As a second result, we show that the $\mathrm{L_{max}}$ measure is highly correlated with the well-established maximal local Lyapunov exponent. This suggests that complexity measures based on RP diagonal lines distribution provide a valuable tool to quantify the degree of network stability. Finally, our analysis shows that all RQA measures fluctuate on the proximity of the so-called edge of stability, where an ESN typically achieves maximum computational capability. We verify that the determination of the edge of stability provided by such RQA measures is more accurate than two well-known criteria based on the Jacobian matrix of the reservoir. Therefore, we claim that RPs and RQA-based analyses can be used as valuable tools to design an effective network given a specific problem.
Tasks	Time Series, Time Series Analysis
Published	2016-01-26
URL	http://arxiv.org/abs/1601.07381v2
PDF	http://arxiv.org/pdf/1601.07381v2.pdf
PWC	https://paperswithcode.com/paper/investigating-echo-state-networks-dynamics-by
Repo
Framework

Dynamic Key-Value Memory Networks for Knowledge Tracing


Title	Dynamic Key-Value Memory Networks for Knowledge Tracing
Authors	Jiani Zhang, Xingjian Shi, Irwin King, Dit-Yan Yeung
Abstract	Knowledge Tracing (KT) is a task of tracing evolving knowledge state of students with respect to one or more concepts as they engage in a sequence of learning activities. One important purpose of KT is to personalize the practice sequence to help students learn knowledge concepts efficiently. However, existing methods such as Bayesian Knowledge Tracing and Deep Knowledge Tracing either model knowledge state for each predefined concept separately or fail to pinpoint exactly which concepts a student is good at or unfamiliar with. To solve these problems, this work introduces a new model called Dynamic Key-Value Memory Networks (DKVMN) that can exploit the relationships between underlying concepts and directly output a student’s mastery level of each concept. Unlike standard memory-augmented neural networks that facilitate a single memory matrix or two static memory matrices, our model has one static matrix called key, which stores the knowledge concepts and the other dynamic matrix called value, which stores and updates the mastery levels of corresponding concepts. Experiments show that our model consistently outperforms the state-of-the-art model in a range of KT datasets. Moreover, the DKVMN model can automatically discover underlying concepts of exercises typically performed by human annotations and depict the changing knowledge state of a student.
Tasks	Knowledge Tracing
Published	2016-11-24
URL	http://arxiv.org/abs/1611.08108v2
PDF	http://arxiv.org/pdf/1611.08108v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-key-value-memory-networks-for
Repo
Framework

How deep is knowledge tracing?


Title	How deep is knowledge tracing?
Authors	Mohammad Khajah, Robert V. Lindsey, Michael C. Mozer
Abstract	In theoretical cognitive science, there is a tension between highly structured models whose parameters have a direct psychological interpretation and highly complex, general-purpose models whose parameters and representations are difficult to interpret. The former typically provide more insight into cognition but the latter often perform better. This tension has recently surfaced in the realm of educational data mining, where a deep learning approach to predicting students’ performance as they work through a series of exercises—termed deep knowledge tracing or DKT—has demonstrated a stunning performance advantage over the mainstay of the field, Bayesian knowledge tracing or BKT. In this article, we attempt to understand the basis for DKT’s advantage by considering the sources of statistical regularity in the data that DKT can leverage but which BKT cannot. We hypothesize four forms of regularity that BKT fails to exploit: recency effects, the contextualized trial sequence, inter-skill similarity, and individual variation in ability. We demonstrate that when BKT is extended to allow it more flexibility in modeling statistical regularities—using extensions previously proposed in the literature—BKT achieves a level of performance indistinguishable from that of DKT. We argue that while DKT is a powerful, useful, general-purpose framework for modeling student learning, its gains do not come from the discovery of novel representations—the fundamental advantage of deep learning. To answer the question posed in our title, knowledge tracing may be a domain that does not require `depth’; shallow models like BKT can perform just as well and offer us greater interpretability and explanatory power. \|
Tasks	Knowledge Tracing
Published	2016-03-14
URL	http://arxiv.org/abs/1604.02416v2
PDF	http://arxiv.org/pdf/1604.02416v2.pdf
PWC	https://paperswithcode.com/paper/how-deep-is-knowledge-tracing
Repo
Framework

Operator-Valued Bochner Theorem, Fourier Feature Maps for Operator-Valued Kernels, and Vector-Valued Learning


Title	Operator-Valued Bochner Theorem, Fourier Feature Maps for Operator-Valued Kernels, and Vector-Valued Learning
Authors	Ha Quang Minh
Abstract	This paper presents a framework for computing random operator-valued feature maps for operator-valued positive definite kernels. This is a generalization of the random Fourier features for scalar-valued kernels to the operator-valued case. Our general setting is that of operator-valued kernels corresponding to RKHS of functions with values in a Hilbert space. We show that in general, for a given kernel, there are potentially infinitely many random feature maps, which can be bounded or unbounded. Most importantly, given a kernel, we present a general, closed form formula for computing a corresponding probability measure, which is required for the construction of the Fourier features, and which, unlike the scalar case, is not uniquely and automatically determined by the kernel. We also show that, under appropriate conditions, random bounded feature maps can always be computed. Furthermore, we show the uniform convergence, under the Hilbert-Schmidt norm, of the resulting approximate kernel to the exact kernel on any compact subset of Euclidean space. Our convergence requires differentiable kernels, an improvement over the twice-differentiability requirement in previous work in the scalar setting. We then show how operator-valued feature maps and their approximations can be employed in a general vector-valued learning framework. The mathematical formulation is illustrated by numerical examples on matrix-valued kernels.
Tasks
Published	2016-08-19
URL	http://arxiv.org/abs/1608.05639v1
PDF	http://arxiv.org/pdf/1608.05639v1.pdf
PWC	https://paperswithcode.com/paper/operator-valued-bochner-theorem-fourier
Repo
Framework

It’s Moving! A Probabilistic Model for Causal Motion Segmentation in Moving Camera Videos


Title	It’s Moving! A Probabilistic Model for Causal Motion Segmentation in Moving Camera Videos
Authors	Pia Bideau, Erik Learned-Miller
Abstract	The human ability to detect and segment moving objects works in the presence of multiple objects, complex background geometry, motion of the observer, and even camouflage. In addition to all of this, the ability to detect motion is nearly instantaneous. While there has been much recent progress in motion segmentation, it still appears we are far from human capabilities. In this work, we derive from first principles a new likelihood function for assessing the probability of an optical flow vector given the 3D motion direction of an object. This likelihood uses a novel combination of the angle and magnitude of the optical flow to maximize the information about the true motions of objects. Using this new likelihood and several innovations in initialization, we develop a motion segmentation algorithm that beats current state-of-the-art methods by a large margin. We compare to five state-of-the-art methods on two established benchmarks, and a third new data set of camouflaged animals, which we introduce to push motion segmentation to the next level.
Tasks	Motion Segmentation, Optical Flow Estimation
Published	2016-04-01
URL	http://arxiv.org/abs/1604.00136v1
PDF	http://arxiv.org/pdf/1604.00136v1.pdf
PWC	https://paperswithcode.com/paper/its-moving-a-probabilistic-model-for-causal
Repo
Framework

Energy-efficient Machine Learning in Silicon: A Communications-inspired Approach


Title	Energy-efficient Machine Learning in Silicon: A Communications-inspired Approach
Authors	Naresh R. Shanbhag
Abstract	This position paper advocates a communications-inspired approach to the design of machine learning systems on energy-constrained embedded `always-on’ platforms. The communications-inspired approach has two versions - 1) a deterministic version where existing low-power communication IC design methods are repurposed, and 2) a stochastic version referred to as Shannon-inspired statistical information processing employing information-based metrics, statistical error compensation (SEC), and retraining-based methods to implement ML systems on stochastic circuit/device fabrics operating at the limits of energy-efficiency. The communications-inspired approach has the potential to fully leverage the opportunities afforded by ML algorithms and applications in order to address the challenges inherent in their deployment on energy-constrained platforms. \|
Tasks
Published	2016-10-25
URL	http://arxiv.org/abs/1611.03109v1
PDF	http://arxiv.org/pdf/1611.03109v1.pdf
PWC	https://paperswithcode.com/paper/energy-efficient-machine-learning-in-silicon
Repo
Framework

Probabilistic Dimensionality Reduction via Structure Learning


Title	Probabilistic Dimensionality Reduction via Structure Learning
Authors	Li Wang
Abstract	We propose a novel probabilistic dimensionality reduction framework that can naturally integrate the generative model and the locality information of data. Based on this framework, we present a new model, which is able to learn a smooth skeleton of embedding points in a low-dimensional space from high-dimensional noisy data. The formulation of the new model can be equivalently interpreted as two coupled learning problem, i.e., structure learning and the learning of projection matrix. This interpretation motivates the learning of the embedding points that can directly form an explicit graph structure. We develop a new method to learn the embedding points that form a spanning tree, which is further extended to obtain a discriminative and compact feature representation for clustering problems. Unlike traditional clustering methods, we assume that centers of clusters should be close to each other if they are connected in a learned graph, and other cluster centers should be distant. This can greatly facilitate data visualization and scientific discovery in downstream analysis. Extensive experiments are performed that demonstrate that the proposed framework is able to obtain discriminative feature representations, and correctly recover the intrinsic structures of various real-world datasets.
Tasks	Dimensionality Reduction
Published	2016-10-16
URL	http://arxiv.org/abs/1610.04929v1
PDF	http://arxiv.org/pdf/1610.04929v1.pdf
PWC	https://paperswithcode.com/paper/probabilistic-dimensionality-reduction-via
Repo
Framework


Title	Sharing Network Parameters for Crosslingual Named Entity Recognition
Authors	Rudra Murthy V, Mitesh Khapra, Pushpak Bhattacharyya
Abstract	Most state of the art approaches for Named Entity Recognition rely on hand crafted features and annotated corpora. Recently Neural network based models have been proposed which do not require handcrafted features but still require annotated corpora. However, such annotated corpora may not be available for many languages. In this paper, we propose a neural network based model which allows sharing the decoder as well as word and character level parameters between two languages thereby allowing a resource fortunate language to aid a resource deprived language. Specifically, we focus on the case when limited annotated corpora is available in one language ($L_1$) and abundant annotated corpora is available in another language ($L_2$). Sharing the network architecture and parameters between $L_1$ and $L_2$ leads to improved performance in $L_1$. Further, our approach does not require any hand crafted features but instead directly learns meaningful feature representations from the training data itself. We experiment with 4 language pairs and show that indeed in a resource constrained setup (lesser annotated corpora), a model jointly trained with data from another language performs better than a model trained only on the limited corpora in one language.
Tasks	Named Entity Recognition
Published	2016-07-01
URL	http://arxiv.org/abs/1607.00198v1
PDF	http://arxiv.org/pdf/1607.00198v1.pdf
PWC	https://paperswithcode.com/paper/sharing-network-parameters-for-crosslingual
Repo
Framework

Multi Channel-Kernel Canonical Correlation Analysis for Cross-View Person Re-Identification


Title	Multi Channel-Kernel Canonical Correlation Analysis for Cross-View Person Re-Identification
Authors	Giuseppe Lisanti, Svebor Karaman, Iacopo Masi
Abstract	In this paper we introduce a method to overcome one of the main challenges of person re-identification in multi-camera networks, namely cross-view appearance changes. The proposed solution addresses the extreme variability of person appearance in different camera views by exploiting multiple feature representations. For each feature, Kernel Canonical Correlation Analysis (KCCA) with different kernels is exploited to learn several projection spaces in which the appearance correlation between samples of the same person observed from different cameras is maximized. An iterative logistic regression is finally used to select and weigh the contributions of each feature projections and perform the matching between the two views. Experimental evaluation shows that the proposed solution obtains comparable performance on VIPeR and PRID 450s datasets and improves on PRID and CUHK01 datasets with respect to the state of the art.
Tasks	Cross-Modal Person Re-Identification, Person Re-Identification
Published	2016-07-08
URL	http://arxiv.org/abs/1607.02204v2
PDF	http://arxiv.org/pdf/1607.02204v2.pdf
PWC	https://paperswithcode.com/paper/multi-channel-kernel-canonical-correlation
Repo
Framework

A Randomized Approach to Efficient Kernel Clustering


Title	A Randomized Approach to Efficient Kernel Clustering
Authors	Farhad Pourkamali-Anaraki, Stephen Becker
Abstract	Kernel-based K-means clustering has gained popularity due to its simplicity and the power of its implicit non-linear representation of the data. A dominant concern is the memory requirement since memory scales as the square of the number of data points. We provide a new analysis of a class of approximate kernel methods that have more modest memory requirements, and propose a specific one-pass randomized kernel approximation followed by standard K-means on the transformed data. The analysis and experiments suggest the method is accurate, while requiring drastically less memory than standard kernel K-means and significantly less memory than Nystrom based approximations.
Tasks
Published	2016-08-26
URL	http://arxiv.org/abs/1608.07597v3
PDF	http://arxiv.org/pdf/1608.07597v3.pdf
PWC	https://paperswithcode.com/paper/a-randomized-approach-to-efficient-kernel
Repo
Framework

A fuzzy expert system for earthquake prediction, case study: the Zagros range


Title	A fuzzy expert system for earthquake prediction, case study: the Zagros range
Authors	Arash Andalib, Mehdi Zare, Farid Atry
Abstract	A methodology for the development of a fuzzy expert system (FES) with application to earthquake prediction is presented. The idea is to reproduce the performance of a human expert in earthquake prediction. To do this, at the first step, rules provided by the human expert are used to generate a fuzzy rule base. These rules are then fed into an inference engine to produce a fuzzy inference system (FIS) and to infer the results. In this paper, we have used a Sugeno type fuzzy inference system to build the FES. At the next step, the adaptive network-based fuzzy inference system (ANFIS) is used to refine the FES parameters and improve its performance. The proposed framework is then employed to attain the performance of a human expert used to predict earthquakes in the Zagros area based on the idea of coupled earthquakes. While the prediction results are promising in parts of the testing set, the general performance indicates that prediction methodology based on coupled earthquakes needs more investigation and more complicated reasoning procedure to yield satisfactory predictions.
Tasks
Published	2016-10-13
URL	http://arxiv.org/abs/1610.04028v2
PDF	http://arxiv.org/pdf/1610.04028v2.pdf
PWC	https://paperswithcode.com/paper/a-fuzzy-expert-system-for-earthquake
Repo
Framework

Optimal Number of Choices in Rating Contexts


Title	Optimal Number of Choices in Rating Contexts
Authors	Sam Ganzfried, Farzana Yusuf
Abstract	In many settings people must give numerical scores to entities from a small discrete set. For instance, rating physical attractiveness from 1–5 on dating sites, or papers from 1–10 for conference reviewing. We study the problem of understanding when using a different number of options is optimal. We consider the case when scores are uniform random and Gaussian. We study computationally when using 2, 3, 4, 5, and 10 options out of a total of 100 is optimal in these models (though our theoretical analysis is for a more general setting with $k$ choices from $n$ total options as well as a continuous underlying space). One may expect that using more options would always improve performance in this model, but we show that this is not necessarily the case, and that using fewer choices—even just two—can surprisingly be optimal in certain situations. While in theory for this setting it would be optimal to use all 100 options, in practice this is prohibitive, and it is preferable to utilize a smaller number of options due to humans’ limited computational resources. Our results could have many potential applications, as settings requiring entities to be ranked by humans are ubiquitous. There could also be applications to other fields such as signal or image processing where input values from a large set must be mapped to output values in a smaller set.
Tasks
Published	2016-05-21
URL	http://arxiv.org/abs/1605.06588v9
PDF	http://arxiv.org/pdf/1605.06588v9.pdf
PWC	https://paperswithcode.com/paper/optimal-number-of-choices-in-rating-contexts
Repo
Framework

Spectral Inference Methods on Sparse Graphs: Theory and Applications


Title	Spectral Inference Methods on Sparse Graphs: Theory and Applications
Authors	Alaa Saade
Abstract	In an era of unprecedented deluge of (mostly unstructured) data, graphs are proving more and more useful, across the sciences, as a flexible abstraction to capture complex relationships between complex objects. One of the main challenges arising in the study of such networks is the inference of macroscopic, large-scale properties affecting a large number of objects, based solely on the microscopic interactions between their elementary constituents. Statistical physics, precisely created to recover the macroscopic laws of thermodynamics from an idealized model of interacting particles, provides significant insight to tackle such complex networks. In this dissertation, we use methods derived from the statistical physics of disordered systems to design and study new algorithms for inference on graphs. Our focus is on spectral methods, based on certain eigenvectors of carefully chosen matrices, and sparse graphs, containing only a small amount of information. We develop an original theory of spectral inference based on a relaxation of various mean-field free energy optimizations. Our approach is therefore fully probabilistic, and contrasts with more traditional motivations based on the optimization of a cost function. We illustrate the efficiency of our approach on various problems, including community detection, randomized similarity-based clustering, and matrix completion.
Tasks	Community Detection, Matrix Completion
Published	2016-10-14
URL	http://arxiv.org/abs/1610.04337v1
PDF	http://arxiv.org/pdf/1610.04337v1.pdf
PWC	https://paperswithcode.com/paper/spectral-inference-methods-on-sparse-graphs
Repo
Framework