October 16, 2019

2923 words 14 mins read

Paper Group ANR 1102

Paper Group ANR 1102

Robust Bayesian Model Selection for Variable Clustering with the Gaussian Graphical Model. Automatic Summarization of Student Course Feedback. Learning with Interpretable Structure from Gated RNN. Stationary Geometric Graphical Model Selection. AgileNet: Lightweight Dictionary-based Few-shot Learning. No New-Net. Degrees of Freedom and Model Select …

Robust Bayesian Model Selection for Variable Clustering with the Gaussian Graphical Model

Title Robust Bayesian Model Selection for Variable Clustering with the Gaussian Graphical Model
Authors Daniel Andrade, Akiko Takeda, Kenji Fukumizu
Abstract Variable clustering is important for explanatory analysis. However, only few dedicated methods for variable clustering with the Gaussian graphical model have been proposed. Even more severe, small insignificant partial correlations due to noise can dramatically change the clustering result when evaluating for example with the Bayesian Information Criteria (BIC). In this work, we try to address this issue by proposing a Bayesian model that accounts for negligible small, but not necessarily zero, partial correlations. Based on our model, we propose to evaluate a variable clustering result using the marginal likelihood. To address the intractable calculation of the marginal likelihood, we propose two solutions: one based on a variational approximation, and another based on MCMC. Experiments on simulated data shows that the proposed method is similarly accurate as BIC in the no noise setting, but considerably more accurate when there are noisy partial correlations. Furthermore, on real data the proposed method provides clustering results that are intuitively sensible, which is not always the case when using BIC or its extensions.
Tasks Model Selection
Published 2018-06-15
URL http://arxiv.org/abs/1806.05924v1
PDF http://arxiv.org/pdf/1806.05924v1.pdf
PWC https://paperswithcode.com/paper/robust-bayesian-model-selection-for-variable
Repo
Framework

Automatic Summarization of Student Course Feedback

Title Automatic Summarization of Student Course Feedback
Authors Wencan Luo, Fei Liu, Zitao Liu, Diane Litman
Abstract Student course feedback is generated daily in both classrooms and online course discussion forums. Traditionally, instructors manually analyze these responses in a costly manner. In this work, we propose a new approach to summarizing student course feedback based on the integer linear programming (ILP) framework. Our approach allows different student responses to share co-occurrence statistics and alleviates sparsity issues. Experimental results on a student feedback corpus show that our approach outperforms a range of baselines in terms of both ROUGE scores and human evaluation.
Tasks
Published 2018-05-25
URL http://arxiv.org/abs/1805.10395v1
PDF http://arxiv.org/pdf/1805.10395v1.pdf
PWC https://paperswithcode.com/paper/automatic-summarization-of-student-course
Repo
Framework

Learning with Interpretable Structure from Gated RNN

Title Learning with Interpretable Structure from Gated RNN
Authors Bo-Jian Hou, Zhi-Hua Zhou
Abstract The interpretability of deep learning models has raised extended attention these years. It will be beneficial if we can learn an interpretable structure from deep learning models. In this paper, we focus on Recurrent Neural Networks~(RNNs) especially gated RNNs whose inner mechanism is still not clearly understood. We find that Finite State Automaton~(FSA) that processes sequential data has more interpretable inner mechanism according to the definition of interpretability and can be learned from RNNs as the interpretable structure. We propose two methods to learn FSA from RNN based on two different clustering methods. With the learned FSA and via experiments on artificial and real datasets, we find that FSA is more trustable than the RNN from which it learned, which gives FSA a chance to substitute RNNs in applications involving humans’ lives or dangerous facilities. Besides, we analyze how the number of gates affects the performance of RNN. Our result suggests that gate in RNN is important but the less the better, which could be a guidance to design other RNNs. Finally, we observe that the FSA learned from RNN gives semantic aggregated states and its transition graph shows us a very interesting vision of how RNNs intrinsically handle text classification tasks.
Tasks Text Classification
Published 2018-10-25
URL https://arxiv.org/abs/1810.10708v2
PDF https://arxiv.org/pdf/1810.10708v2.pdf
PWC https://paperswithcode.com/paper/learning-with-interpretable-structure-from
Repo
Framework

Stationary Geometric Graphical Model Selection

Title Stationary Geometric Graphical Model Selection
Authors Ilya Soloveychik, Vahid Tarokh
Abstract We consider the problem of model selection in Gaussian Markov fields in the sample deficient scenario. In many practically important cases, the underlying networks are embedded into Euclidean spaces. Using the natural geometric structure, we introduce the notion of spatially stationary distributions over geometric graphs. This directly generalizes the notion of stationary time series to the multidimensional setting lacking time axis. We show that the idea of spatial stationarity leads to a dramatic decrease in the sample complexity of the model selection compared to abstract graphs with the same level of sparsity. For geometric graphs on randomly spread vertices and edges of bounded length, we develop tight information-theoretic bounds on sample complexity and show that a finite number of independent samples is sufficient for a consistent recovery. Finally, we develop an efficient technique capable of reliably and consistently reconstructing graphs with a bounded number of measurements.
Tasks Model Selection, Time Series
Published 2018-06-10
URL http://arxiv.org/abs/1806.03571v2
PDF http://arxiv.org/pdf/1806.03571v2.pdf
PWC https://paperswithcode.com/paper/stationary-geometric-graphical-model
Repo
Framework

AgileNet: Lightweight Dictionary-based Few-shot Learning

Title AgileNet: Lightweight Dictionary-based Few-shot Learning
Authors Mohammad Ghasemzadeh, Fang Lin, Bita Darvish Rouhani, Farinaz Koushanfar, Ke Huang
Abstract The success of deep learning models is heavily tied to the use of massive amount of labeled data and excessively long training time. With the emergence of intelligent edge applications that use these models, the critical challenge is to obtain the same inference capability on a resource-constrained device while providing adaptability to cope with the dynamic changes in the data. We propose AgileNet, a novel lightweight dictionary-based few-shot learning methodology which provides reduced complexity deep neural network for efficient execution at the edge while enabling low-cost updates to capture the dynamics of the new data. Evaluations of state-of-the-art few-shot learning benchmarks demonstrate the superior accuracy of AgileNet compared to prior arts. Additionally, AgileNet is the first few-shot learning approach that prevents model updates by eliminating the knowledge obtained from the primary training. This property is ensured through the dictionaries learned by our novel end-to-end structured decomposition, which also reduces the memory footprint and computation complexity to match the edge device constraints.
Tasks Few-Shot Learning
Published 2018-05-21
URL http://arxiv.org/abs/1805.08311v1
PDF http://arxiv.org/pdf/1805.08311v1.pdf
PWC https://paperswithcode.com/paper/agilenet-lightweight-dictionary-based-few
Repo
Framework

No New-Net

Title No New-Net
Authors Fabian Isensee, Philipp Kickingereder, Wolfgang Wick, Martin Bendszus, Klaus H. Maier-Hein
Abstract In this paper we demonstrate the effectiveness of a well trained U-Net in the context of the BraTS 2018 challenge. This endeavour is particularly interesting given that researchers are currently besting each other with architectural modifications that are intended to improve the segmentation performance. We instead focus on the training process arguing that a well trained U-Net is hard to beat. Our baseline U-Net, which has only minor modifications and is trained with a large patch size and a Dice loss function indeed achieved competitive Dice scores on the BraTS2018 validation data. By incorporating additional measures such as region based training, additional training data, a simple postprocessing technique and a combination of loss functions, we obtain Dice scores of 77.88, 87.81 and 80.62, and Hausdorff Distances (95th percentile) of 2.90, 6.03 and 5.08 for the enhancing tumor, whole tumor and tumor core, respectively on the test data. This setup achieved rank two in BraTS2018, with more than 60 teams participating in the challenge.
Tasks
Published 2018-09-27
URL http://arxiv.org/abs/1809.10483v2
PDF http://arxiv.org/pdf/1809.10483v2.pdf
PWC https://paperswithcode.com/paper/no-new-net
Repo
Framework

Degrees of Freedom and Model Selection for k-means Clustering

Title Degrees of Freedom and Model Selection for k-means Clustering
Authors David P. Hofmeyr
Abstract This paper investigates the model degrees of freedom in k-means clustering. An extension of Stein’s lemma provides an expression for the effective degrees of freedom in the k-means model. Approximating the degrees of freedom in practice requires simplifications of this expression, however empirical studies evince the appropriateness of our proposed approach. The practical relevance of this new degrees of freedom formulation for k-means is demonstrated through model selection using the Bayesian Information Criterion. The reliability of this method is validated through experiments on simulated data as well as on a large collection of publicly available benchmark data sets from diverse application areas. Comparisons with popular existing techniques indicate that this approach is extremely competitive for selecting high quality clustering solutions. Code to implement the proposed approach is available in the form of an R package from https://github.com/DavidHofmeyr/edfkmeans.
Tasks Model Selection
Published 2018-06-06
URL https://arxiv.org/abs/1806.02034v4
PDF https://arxiv.org/pdf/1806.02034v4.pdf
PWC https://paperswithcode.com/paper/degrees-of-freedom-and-model-selection-for
Repo
Framework

Cyberattack Detection using Deep Generative Models with Variational Inference

Title Cyberattack Detection using Deep Generative Models with Variational Inference
Authors Sarin E. Chandy, Amin Rasekh, Zachary A. Barker, M. Ehsan Shafiee
Abstract Recent years have witnessed a rise in the frequency and intensity of cyberattacks targeted at critical infrastructure systems. This study designs a versatile, data-driven cyberattack detection platform for infrastructure systems cybersecurity, with a special demonstration in water sector. A deep generative model with variational inference autonomously learns normal system behavior and detects attacks as they occur. The model can process the natural data in its raw form and automatically discover and learn its representations, hence augmenting system knowledge discovery and reducing the need for laborious human engineering and domain expertise. The proposed model is applied to a simulated cyberattack detection problem involving a drinking water distribution system subject to programmable logic controller hacks, malicious actuator activation, and deception attacks. The model is only provided with observations of the system, such as pump pressure and tank water level reads, and is blind to the internal structures and workings of the water distribution system. The simulated attacks are manifested in the model’s generated reproduction probability plot, indicating its ability to discern the attacks. There is, however, need for improvements in reducing false alarms, especially by optimizing detection thresholds. Altogether, the results indicate ability of the model in distinguishing attacks and their repercussions from normal system operation in water distribution systems, and the promise it holds for cyberattack detection in other domains.
Tasks
Published 2018-05-31
URL http://arxiv.org/abs/1805.12511v1
PDF http://arxiv.org/pdf/1805.12511v1.pdf
PWC https://paperswithcode.com/paper/cyberattack-detection-using-deep-generative
Repo
Framework

Structural Learning of Multivariate Regression Chain Graphs via Decomposition

Title Structural Learning of Multivariate Regression Chain Graphs via Decomposition
Authors Mohammad Ali Javidian, Marco Valtorta
Abstract We extend the decomposition approach for learning Bayesian networks (BNs) proposed by (Xie et. al.) to learning multivariate regression chain graphs (MVR CGs), which include BNs as a special case. The same advantages of this decomposition approach hold in the more general setting: reduced complexity and increased power of computational independence tests. Moreover, latent (hidden) variables can be represented in MVR CGs by using bidirected edges, and our algorithm correctly recovers any independence structure that is faithful to an MVR CG, thus greatly extending the range of applications of decomposition-based model selection techniques. Simulations under a variety of settings demonstrate the competitive performance of our method in comparison with the PC-like algorithm (Sonntag and Pena). In fact, the decomposition-based algorithm usually outperforms the PC-like algorithm except in running time. The performance of both algorithms is much better when the underlying graph is sparse.
Tasks Model Selection
Published 2018-06-03
URL https://arxiv.org/abs/1806.00882v2
PDF https://arxiv.org/pdf/1806.00882v2.pdf
PWC https://paperswithcode.com/paper/structural-learning-of-multivariate
Repo
Framework

Prior Attention for Style-aware Sequence-to-Sequence Models

Title Prior Attention for Style-aware Sequence-to-Sequence Models
Authors Lucas Sterckx, Johannes Deleu, Chris Develder, Thomas Demeester
Abstract We extend sequence-to-sequence models with the possibility to control the characteristics or style of the generated output, via attention that is generated a priori (before decoding) from a latent code vector. After training an initial attention-based sequence-to-sequence model, we use a variational auto-encoder conditioned on representations of input sequences and a latent code vector space to generate attention matrices. By sampling the code vector from specific regions of this latent space during decoding and imposing prior attention generated from it in the seq2seq model, output can be steered towards having certain attributes. This is demonstrated for the task of sentence simplification, where the latent code vector allows control over output length and lexical simplification, and enables fine-tuning to optimize for different evaluation metrics.
Tasks Lexical Simplification
Published 2018-06-25
URL http://arxiv.org/abs/1806.09439v1
PDF http://arxiv.org/pdf/1806.09439v1.pdf
PWC https://paperswithcode.com/paper/prior-attention-for-style-aware-sequence-to
Repo
Framework

Filtering and Mining Parallel Data in a Joint Multilingual Space

Title Filtering and Mining Parallel Data in a Joint Multilingual Space
Authors Holger Schwenk
Abstract We learn a joint multilingual sentence embedding and use the distance between sentences in different languages to filter noisy parallel data and to mine for parallel data in large news collections. We are able to improve a competitive baseline on the WMT’14 English to German task by 0.3 BLEU by filtering out 25% of the training data. The same approach is used to mine additional bitexts for the WMT’14 system and to obtain competitive results on the BUCC shared task to identify parallel sentences in comparable corpora. The approach is generic, it can be applied to many language pairs and it is independent of the architecture of the machine translation system.
Tasks Machine Translation, Sentence Embedding
Published 2018-05-24
URL http://arxiv.org/abs/1805.09822v1
PDF http://arxiv.org/pdf/1805.09822v1.pdf
PWC https://paperswithcode.com/paper/filtering-and-mining-parallel-data-in-a-joint
Repo
Framework

Learning to attend in a brain-inspired deep neural network

Title Learning to attend in a brain-inspired deep neural network
Authors Hossein Adeli, Gregory Zelinsky
Abstract Recent machine learning models have shown that including attention as a component results in improved model accuracy and interpretability, despite the concept of attention in these approaches only loosely approximating the brain’s attention mechanism. Here we extend this work by building a more brain-inspired deep network model of the primate ATTention Network (ATTNet) that learns to shift its attention so as to maximize the reward. Using deep reinforcement learning, ATTNet learned to shift its attention to the visual features of a target category in the context of a search task. ATTNet’s dorsal layers also learned to prioritize these shifts of attention so as to maximize success of the ventral pathway classification and receive greater reward. Model behavior was tested against the fixations made by subjects searching images for the same cued category. Both subjects and ATTNet showed evidence for attention being preferentially directed to target goals, behaviorally measured as oculomotor guidance to targets. More fundamentally, ATTNet learned to shift its attention to target like objects and spatially route its visual inputs to accomplish the task. This work makes a step toward a better understanding of the role of attention in the brain and other computational systems.
Tasks
Published 2018-11-23
URL http://arxiv.org/abs/1811.09699v1
PDF http://arxiv.org/pdf/1811.09699v1.pdf
PWC https://paperswithcode.com/paper/learning-to-attend-in-a-brain-inspired-deep
Repo
Framework

An Average of the Human Ear Canal: Recovering Acoustical Properties via Shape Analysis

Title An Average of the Human Ear Canal: Recovering Acoustical Properties via Shape Analysis
Authors Sune Darkner, Stefan Sommer, Andreas Schuhmacher, Henrik Ingerslev Anders O. Baandrup, Carsten Thomsen, Søren Jønsson
Abstract Humans are highly dependent on the ability to process audio in order to interact through conversation and navigate from sound. For this, the shape of the ear acts as a mechanical audio filter. The anatomy of the outer human ear canal to approximately 15-20 mm beyond the Tragus is well described because of its importance for customized hearing aid production. This is however not the case for the part of the ear canal that is embedded in the skull, until the typanic membrane. Due to the sensitivity of the outer ear, this part, referred to as the bony part, has only been described in a few population studies and only ex-vivo. We present a study of the entire ear canal including the bony part and the tympanic membrane. We form an average ear canal from a number of MRI scans using standard image registration methods. We show that the obtained representation is realistic in the sense that it has acoustical properties almost identical to a real ear.
Tasks Image Registration
Published 2018-11-09
URL http://arxiv.org/abs/1811.03848v1
PDF http://arxiv.org/pdf/1811.03848v1.pdf
PWC https://paperswithcode.com/paper/an-average-of-the-human-ear-canal-recovering
Repo
Framework

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

Title Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Authors Zeyuan Allen-Zhu, Yuanzhi Li, Yingyu Liang
Abstract The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn? Why doesn’t the trained neural networks overfit when the it is overparameterized (namely, having more parameters than statistically needed to overfit training data)? In this work, we prove that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations. Moreover, the learning can be simply done by SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. The sample complexity can also be almost independent of the number of parameters in the overparameterized network.
Tasks
Published 2018-11-12
URL https://arxiv.org/abs/1811.04918v5
PDF https://arxiv.org/pdf/1811.04918v5.pdf
PWC https://paperswithcode.com/paper/learning-and-generalization-in
Repo
Framework

Generalisation of structural knowledge in the hippocampal-entorhinal system

Title Generalisation of structural knowledge in the hippocampal-entorhinal system
Authors James C. R. Whittington, Timothy H. Muller, Shirley Mark, Caswell Barry, Timothy E. J. Behrens
Abstract A central problem to understanding intelligence is the concept of generalisation. This allows previously learnt structure to be exploited to solve tasks in novel situations differing in their particularities. We take inspiration from neuroscience, specifically the hippocampal-entorhinal system known to be important for generalisation. We propose that to generalise structural knowledge, the representations of the structure of the world, i.e. how entities in the world relate to each other, need to be separated from representations of the entities themselves. We show, under these principles, artificial neural networks embedded with hierarchy and fast Hebbian memory, can learn the statistics of memories and generalise structural knowledge. Spatial neuronal representations mirroring those found in the brain emerge, suggesting spatial cognition is an instance of more general organising principles. We further unify many entorhinal cell types as basis functions for constructing transition graphs, and show these representations effectively utilise memories. We experimentally support model assumptions, showing a preserved relationship between entorhinal grid and hippocampal place cells across environments.
Tasks
Published 2018-05-23
URL http://arxiv.org/abs/1805.09042v2
PDF http://arxiv.org/pdf/1805.09042v2.pdf
PWC https://paperswithcode.com/paper/generalisation-of-structural-knowledge-in-the
Repo
Framework
comments powered by Disqus