January 26, 2020

3606 words 17 mins read

Paper Group ANR 1516

Contextual Bandits with Random Projection. Extending Signature-based Intrusion Detection Systems WithBayesian Abductive Reasoning. Grid Anchor based Image Cropping: A New Benchmark and An Efficient Model. Syntax-aware Neural Semantic Role Labeling with Supertags. Training Neural Machine Translation (NMT) Models using Tensor Train Decomposition on T …

Contextual Bandits with Random Projection


Title	Contextual Bandits with Random Projection
Authors	Xiaotian Yu
Abstract	Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful alternative for solving practical problems of sequential decisions, e.g., online advertisements. In the era of big data, contextual data usually tend to be high-dimensional, which leads to new challenges for traditional linear bandits mostly designed for the setting of low-dimensional contextual data. Due to the curse of dimensionality, there are two challenges in most of the current bandit algorithms: the first is high time-complexity; and the second is extreme large upper regret bounds with high-dimensional data. In this paper, in order to attack the above two challenges effectively, we develop an algorithm of Contextual Bandits via RAndom Projection (\texttt{CBRAP}) in the setting of linear payoffs, which works especially for high-dimensional contextual data. The proposed \texttt{CBRAP} algorithm is time-efficient and flexible, because it enables players to choose an arm in a low-dimensional space, and relaxes the sparsity assumption of constant number of non-zero components in previous work. Besides, we provide a linear upper regret bound for the proposed algorithm, which is associated with reduced dimensions.
Tasks	Multi-Armed Bandits
Published	2019-03-20
URL	http://arxiv.org/abs/1903.08600v1
PDF	http://arxiv.org/pdf/1903.08600v1.pdf
PWC	https://paperswithcode.com/paper/contextual-bandits-with-random-projection
Repo
Framework

Extending Signature-based Intrusion Detection Systems WithBayesian Abductive Reasoning


Title	Extending Signature-based Intrusion Detection Systems WithBayesian Abductive Reasoning
Authors	Ashwinkumar Ganesan, Pooja Parameshwarappa, Akshay Peshave, Zhiyuan Chen, Tim Oates
Abstract	Evolving cybersecurity threats are a persistent challenge for systemadministrators and security experts as new malwares are continu-ally released. Attackers may look for vulnerabilities in commercialproducts or execute sophisticated reconnaissance campaigns tounderstand a targets network and gather information on securityproducts like firewalls and intrusion detection / prevention systems(network or host-based). Many new attacks tend to be modificationsof existing ones. In such a scenario, rule-based systems fail to detectthe attack, even though there are minor differences in conditions /attributes between rules to identify the new and existing attack. Todetect these differences the IDS must be able to isolate the subset ofconditions that are true and predict the likely conditions (differentfrom the original) that must be observed. In this paper, we proposeaprobabilistic abductive reasoningapproach that augments an exist-ing rule-based IDS (snort [29]) to detect these evolved attacks by (a)Predicting rule conditions that are likely to occur (based on existingrules) and (b) able to generate new snort rules when provided withseed rule (i.e. a starting rule) to reduce the burden on experts toconstantly update them. We demonstrate the effectiveness of theapproach by generating new rules from the snort 2012 rules set andtesting it on the MACCDC 2012 dataset [6].
Tasks	Intrusion Detection
Published	2019-03-28
URL	http://arxiv.org/abs/1903.12101v1
PDF	http://arxiv.org/pdf/1903.12101v1.pdf
PWC	https://paperswithcode.com/paper/extending-signature-based-intrusion-detection
Repo
Framework

Grid Anchor based Image Cropping: A New Benchmark and An Efficient Model


Title	Grid Anchor based Image Cropping: A New Benchmark and An Efficient Model
Authors	Hui Zeng, Lida Li, Zisheng Cao, Lei Zhang
Abstract	Image cropping aims to improve the composition as well as aesthetic quality of an image by removing extraneous content from it. Most of the existing image cropping databases provide only one or several human-annotated bounding boxes as the groundtruths, which can hardly reflect the non-uniqueness and flexibility of image cropping in practice. The employed evaluation metrics such as intersection-over-union cannot reliably reflect the real performance of a cropping model, either. This work revisits the problem of image cropping, and presents a grid anchor based formulation by considering the special properties and requirements (e.g., local redundancy, content preservation, aspect ratio) of image cropping. Our formulation reduces the searching space of candidate crops from millions to no more than ninety. Consequently, a grid anchor based cropping benchmark is constructed, where all crops of each image are annotated and more reliable evaluation metrics are defined. To meet the practical demands of robust performance and high efficiency, we also design an effective and lightweight cropping model. By simultaneously considering the region of interest and region of discard, and leveraging multi-scale information, our model can robustly output visually pleasing crops for images of different scenes. With less than 2.5M parameters, our model runs at a speed of 200 FPS on one single GTX 1080Ti GPU and 12 FPS on one i7-6800K CPU. The code is available at: \url{https://github.com/HuiZeng/Grid-Anchor-based-Image-Cropping-Pytorch}.
Tasks	Image Cropping
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08989v1
PDF	https://arxiv.org/pdf/1909.08989v1.pdf
PWC	https://paperswithcode.com/paper/grid-anchor-based-image-cropping-a-new
Repo
Framework

Syntax-aware Neural Semantic Role Labeling with Supertags


Title	Syntax-aware Neural Semantic Role Labeling with Supertags
Authors	Jungo Kasai, Dan Friedman, Robert Frank, Dragomir Radev, Owen Rambow
Abstract	We introduce a new syntax-aware model for dependency-based semantic role labeling that outperforms syntax-agnostic models for English and Spanish. We use a BiLSTM to tag the text with supertags extracted from dependency parses, and we feed these supertags, along with words and parts of speech, into a deep highway BiLSTM for semantic role labeling. Our model combines the strengths of earlier models that performed SRL on the basis of a full dependency parse with more recent models that use no syntactic information at all. Our local and non-ensemble model achieves state-of-the-art performance on the CoNLL 09 English and Spanish datasets. SRL models benefit from syntactic information, and we show that supertagging is a simple, powerful, and robust way to incorporate syntax into a neural SRL system.
Tasks	Semantic Role Labeling
Published	2019-03-12
URL	http://arxiv.org/abs/1903.05260v2
PDF	http://arxiv.org/pdf/1903.05260v2.pdf
PWC	https://paperswithcode.com/paper/syntax-aware-neural-semantic-role-labeling
Repo
Framework

Training Neural Machine Translation (NMT) Models using Tensor Train Decomposition on TensorFlow (T3F)


Title	Training Neural Machine Translation (NMT) Models using Tensor Train Decomposition on TensorFlow (T3F)
Authors	Amelia Drew, Alexander Heinecke
Abstract	We implement a Tensor Train layer in the TensorFlow Neural Machine Translation (NMT) model using the t3f library. We perform training runs on the IWSLT English-Vietnamese ‘15 and WMT German-English ‘16 datasets with learning rates $\in {0.0004,0.0008,0.0012}$, maximum ranks $\in {2,4,8,16}$ and a range of core dimensions. We compare against a target BLEU test score of 24.0, obtained by our benchmark run. For the IWSLT English-Vietnamese training, we obtain BLEU test/dev scores of 24.0/21.9 and 24.2/21.9 using core dimensions $(2, 2, 256) \times (2, 2, 512)$ with learning rate 0.0012 and rank distributions $(1,4,4,1)$ and $(1,4,16,1)$ respectively. These runs use 113% and 397% of the flops of the benchmark run respectively. We find that, of the parameters surveyed, a higher learning rate and more `rectangular’ core dimensions generally produce higher BLEU scores. For the WMT German-English dataset, we obtain BLEU scores of 24.0/23.8 using core dimensions $(4, 4, 128) \times (4, 4, 256)$ with learning rate 0.0012 and rank distribution $(1,2,2,1)$. We discuss the potential for future optimization and application of Tensor Train decomposition to other NMT models. \|
Tasks	Machine Translation
Published	2019-11-05
URL	https://arxiv.org/abs/1911.01933v1
PDF	https://arxiv.org/pdf/1911.01933v1.pdf
PWC	https://paperswithcode.com/paper/training-neural-machine-translation-nmt
Repo
Framework

PT-MMD: A Novel Statistical Framework for the Evaluation of Generative Systems


Title	PT-MMD: A Novel Statistical Framework for the Evaluation of Generative Systems
Authors	Alexander Potapov, Ian Colbert, Ken Kreutz-Delgado, Alexander Cloninger, Srinjoy Das
Abstract	Stochastic-sampling-based Generative Neural Networks, such as Restricted Boltzmann Machines and Generative Adversarial Networks, are now used for applications such as denoising, image occlusion removal, pattern completion, and motion synthesis. In scenarios which involve performing such inference tasks with these models, it is critical to determine metrics that allow for model selection and/or maintenance of requisite generative performance under pre-specified implementation constraints. In this paper, we propose a new metric for evaluating generative model performance based on $p$-values derived from the combined use of Maximum Mean Discrepancy (MMD) and permutation-based (PT-based) resampling, which we refer to as PT-MMD. We demonstrate the effectiveness of this metric for two cases: (1) Selection of bitwidth and activation function complexity to achieve minimum power-at-performance for Restricted Boltzmann Machines; (2) Quantitative comparison of images generated by two types of Generative Adversarial Networks (PGAN and WGAN) to facilitate model selection in order to maximize the fidelity of generated images. For these applications, our results are shown using Euclidean and Haar-based kernels for the PT-MMD two sample hypothesis test. This demonstrates the critical role of distance functions in comparing generated images against their corresponding ground truth counterparts as what would be perceived by human users.
Tasks	Denoising, Model Selection
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12454v1
PDF	https://arxiv.org/pdf/1910.12454v1.pdf
PWC	https://paperswithcode.com/paper/pt-mmd-a-novel-statistical-framework-for-the
Repo
Framework

A comparative evaluation of novelty detection algorithms for discrete sequences


Title	A comparative evaluation of novelty detection algorithms for discrete sequences
Authors	Rémi Domingues, Pietro Michiardi, Jérémie Barlet, Maurizio Filippone
Abstract	The identification of anomalies in temporal data is a core component of numerous research areas such as intrusion detection, fault prevention, genomics and fraud detection. This article provides an experimental comparison of the novelty detection problem applied to discrete sequences. The objective of this study is to identify which state-of-the-art methods are efficient and appropriate candidates for a given use case. These recommendations rely on extensive novelty detection experiments based on a variety of public datasets in addition to novel industrial datasets. We also perform thorough scalability and memory usage tests resulting in new supplementary insights of the methods’ performance, key selection criterion to solve problems relying on large volumes of data and to meet the expectations of applications subject to strict response time constraints.
Tasks	Fraud Detection, Intrusion Detection
Published	2019-02-28
URL	https://arxiv.org/abs/1902.10940v2
PDF	https://arxiv.org/pdf/1902.10940v2.pdf
PWC	https://paperswithcode.com/paper/a-comparative-evaluation-of-novelty-detection
Repo
Framework

Securing Fog-to-Things Environment Using Intrusion Detection System Based On Ensemble Learning


Title	Securing Fog-to-Things Environment Using Intrusion Detection System Based On Ensemble Learning
Authors	Poulmanogo Illy, Georges Kaddoum, Christian Miranda Moreira, Kuljeet Kaur, Sahil Garg
Abstract	The growing interest in the Internet of Things (IoT) applications is associated with an augmented volume of security threats. In this vein, the Intrusion detection systems (IDS) have emerged as a viable solution for the detection and prevention of malicious activities. Unlike the signature-based detection approaches, machine learning-based solutions are a promising means for detecting unknown attacks. However, the machine learning models need to be accurate enough to reduce the number of false alarms. More importantly, they need to be trained and evaluated on realistic datasets such that their efficacy can be validated on real-time deployments. Many solutions proposed in the literature are reported to have high accuracy but are ineffective in real applications due to the non-representativity of the dataset used for training and evaluation of the underlying models. On the other hand, some of the existing solutions overcome these challenges but yield low accuracy which hampers their implementation for commercial tools. These solutions are majorly based on single learners and are therefore directly affected by the intrinsic limitations of each learning algorithm. The novelty of this paper is to use the most realistic dataset available for intrusion detection called NSL-KDD, and combine multiple learners to build ensemble learners that increase the accuracy of the detection. Furthermore, a deployment architecture in a fog-to-things environment that employs two levels of classifications is proposed. In such architecture, the first level performs an anomaly detection which reduces the latency of the classification substantially, while the second level, executes attack classifications, enabling precise prevention measures. Finally, the experimental results demonstrate the effectiveness of the proposed IDS in comparison with the other state-of-the-arts on the NSL-KDD dataset.
Tasks	Anomaly Detection, Intrusion Detection
Published	2019-01-30
URL	http://arxiv.org/abs/1901.10933v1
PDF	http://arxiv.org/pdf/1901.10933v1.pdf
PWC	https://paperswithcode.com/paper/securing-fog-to-things-environment-using
Repo
Framework

Dataset2Vec: Learning Dataset Meta-Features


Title	Dataset2Vec: Learning Dataset Meta-Features
Authors	Hadi S. Jomaa, Josif Grabocka, Lars Schmidt-Thieme
Abstract	Machine learning tasks such as optimizing the hyper-parameters of a model for a new dataset or few-shot learning can be vastly accelerated if they are not done from scratch for every new dataset, but carry over findings from previous runs. Meta-learning makes use of features of a whole dataset such as its number of instances, its number of predictors, the means of the predictors etc., so called meta-features, dataset summary statistics or simply dataset characteristics, which so far have been hand-crafted, often specifically for the task at hand. More recently, unsupervised dataset encoding models based on variational auto-encoders have been successful in learning such characteristics for the special case when all datasets follow the same schema, but not beyond. In this paper we design a novel model, Dataset2Vec, that is able to characterize datasets with a latent feature vector based on batches and thus is able to generalize beyond datasets having the same schema to arbitrary (tabular) datasets. To do so, we employ auxiliary learning tasks on batches of datasets, esp. to distinguish batches from different datasets. We show empirically that the meta-features collected from batches of similar datasets are concentrated within a small area in the latent space, hence preserving similarity. We also show that using the dataset characteristics learned by Dataset2Vec in a state-of-the-art hyper-parameter optimization model outperforms the hand-crafted meta-features that have been used in the hyper-parameter optimization literature so far. As a result, we advance the current state-of-the-art results for hyper-parameter optimization.
Tasks	Auxiliary Learning, Few-Shot Learning, Meta-Learning
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11063v1
PDF	https://arxiv.org/pdf/1905.11063v1.pdf
PWC	https://paperswithcode.com/paper/dataset2vec-learning-dataset-meta-features
Repo
Framework

Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition


Title	Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition
Authors	Angli Liu, Jingfei Du, Veselin Stoyanov
Abstract	Traditional language models are unable to efficiently model entity names observed in text. All but the most popular named entities appear infrequently in text providing insufficient context. Recent efforts have recognized that context can be generalized between entity names that share the same type (e.g., \emph{person} or \emph{location}) and have equipped language models with access to an external knowledge base (KB). Our Knowledge-Augmented Language Model (KALM) continues this line of work by augmenting a traditional model with a KB. Unlike previous methods, however, we train with an end-to-end predictive objective optimizing the perplexity of text. We do not require any additional information such as named entity tags. In addition to improving language modeling performance, KALM learns to recognize named entities in an entirely unsupervised way by using entity type information latent in the model. On a Named Entity Recognition (NER) task, KALM achieves performance comparable with state-of-the-art supervised models. Our work demonstrates that named entities (and possibly other types of world knowledge) can be modeled successfully using predictive learning and training on large corpora of text without any additional information.
Tasks	Language Modelling, Named Entity Recognition
Published	2019-04-09
URL	https://arxiv.org/abs/1904.04458v2
PDF	https://arxiv.org/pdf/1904.04458v2.pdf
PWC	https://paperswithcode.com/paper/knowledge-augmented-language-model-and-its
Repo
Framework

1D Convolutional Neural Network Models for Sleep Arousal Detection


Title	1D Convolutional Neural Network Models for Sleep Arousal Detection
Authors	Morteza Zabihi, Ali Bahrami Rad, Serkan Kiranyaz, Simo Särkkä, Moncef Gabbouj
Abstract	Sleep arousals transition the depth of sleep to a more superficial stage. The occurrence of such events is often considered as a protective mechanism to alert the body of harmful stimuli. Thus, accurate sleep arousal detection can lead to an enhanced understanding of the underlying causes and influencing the assessment of sleep quality. Previous studies and guidelines have suggested that sleep arousals are linked mainly to abrupt frequency shifts in EEG signals, but the proposed rules are shown to be insufficient for a comprehensive characterization of arousals. This study investigates the application of five recent convolutional neural networks (CNNs) for sleep arousal detection and performs comparative evaluations to determine the best model for this task. The investigated state-of-the-art CNN models have originally been designed for image or speech processing. A detailed set of evaluations is performed on the benchmark dataset provided by PhysioNet/Computing in Cardiology Challenge 2018, and the results show that the best 1D CNN model has achieved an average of 0.31 and 0.84 for the area under the precision-recall and area under the ROC curves, respectively.
Tasks	EEG, Sleep Arousal Detection, Sleep Quality
Published	2019-03-01
URL	http://arxiv.org/abs/1903.01552v1
PDF	http://arxiv.org/pdf/1903.01552v1.pdf
PWC	https://paperswithcode.com/paper/1d-convolutional-neural-network-models-for
Repo
Framework

Restricted Boltzmann Machine Assignment Algorithm: Application to solve many-to-one matching problems on weighted bipartite graph


Title	Restricted Boltzmann Machine Assignment Algorithm: Application to solve many-to-one matching problems on weighted bipartite graph
Authors	Francesco Curia
Abstract	In this work an iterative algorithm based on unsupervised learning is presented, specifically on a Restricted Boltzmann Machine (RBM) to solve a perfect matching problem on a bipartite weighted graph. Iteratively is calculated the weights $w_{ij}$ and the bias parameters $\theta = ( a_i, b_j) $ that maximize the energy function and assignment element $i$ to element $j$. An application of real problem is presented to show the potentiality of this algorithm.
Tasks
Published	2019-04-30
URL	https://arxiv.org/abs/1904.13111v2
PDF	https://arxiv.org/pdf/1904.13111v2.pdf
PWC	https://paperswithcode.com/paper/restricted-boltzmann-machine-assignment
Repo
Framework

Group level MEG/EEG source imaging via optimal transport: minimum Wasserstein estimates


Title	Group level MEG/EEG source imaging via optimal transport: minimum Wasserstein estimates
Authors	Hicham Janati, Thomas Bazeille, Bertrand Thirion, Marco Cuturi, Alexandre Gramfort
Abstract	Magnetoencephalography (MEG) and electroencephalogra-phy (EEG) are non-invasive modalities that measure the weak electromagnetic fields generated by neural activity. Inferring the location of the current sources that generated these magnetic fields is an ill-posed inverse problem known as source imaging. When considering a group study, a baseline approach consists in carrying out the estimation of these sources independently for each subject. The ill-posedness of each problem is typically addressed using sparsity promoting regularizations. A straightforward way to define a common pattern for these sources is then to average them. A more advanced alternative relies on a joint localization of sources for all subjects taken together, by enforcing some similarity across all estimated sources. An important advantage of this approach is that it consists in a single estimation in which all measurements are pooled together, making the inverse problem better posed. Such a joint estimation poses however a few challenges, notably the selection of a valid regularizer that can quantify such spatial similarities. We propose in this work a new procedure that can do so while taking into account the geometrical structure of the cortex. We call this procedure Minimum Wasserstein Estimates (MWE). The benefits of this model are twofold. First, joint inference allows to pool together the data of different brain geometries, accumulating more spatial information. Second, MWE are defined through Optimal Transport (OT) metrics which provide a tool to model spatial proximity between cortical sources of different subjects, hence not enforcing identical source location in the group. These benefits allow MWE to be more accurate than standard MEG source localization techniques. To support these claims, we perform source localization on realistic MEG simulations based on forward operators derived from MRI scans. On a visual task dataset, we demonstrate how MWE infer neural patterns similar to functional Magnetic Resonance Imaging (fMRI) maps.
Tasks	EEG
Published	2019-02-13
URL	http://arxiv.org/abs/1902.04812v1
PDF	http://arxiv.org/pdf/1902.04812v1.pdf
PWC	https://paperswithcode.com/paper/group-level-megeeg-source-imaging-via-optimal
Repo
Framework

A Hippocampus Model for Online One-Shot Storage of Pattern Sequences


Title	A Hippocampus Model for Online One-Shot Storage of Pattern Sequences
Authors	Jan Melchior, Mehdi Bayati, Amir Azizi, Sen Cheng, Laurenz Wiskott
Abstract	We present a computational model based on the CRISP theory (Content Representation, Intrinsic Sequences, and Pattern completion) of the hippocampus that allows to continuously store pattern sequences online in a one-shot fashion. Rather than storing a sequence in CA3, CA3 provides a pre-trained sequence that is hetero-associated with the input sequence, which allows the system to perform one-shot learning. Plasticity on a short time scale therefore only happens in the incoming and outgoing connections of CA3. Stored sequences can later be recalled from a single cue pattern. We identify the pattern separation performed by subregion DG to be necessary for storing sequences that contain correlated patterns. A design principle of the model is that we use a single learning rule named Hebbiand-escent to train all parts of the system. Hebbian-descent has an inherent forgetting mechanism that allows the system to continuously memorize new patterns while forgetting early stored ones. The model shows a plausible behavior when noisy and new patterns are presented and has a rather high capacity of about 40% in terms of the number of neurons in CA3. One notable property of our model is that it is capable of `boot-strapping' (improving) itself without external input in a process we refer to as` dreaming’. Besides artificially generated input sequences we also show that the model works with sequences of encoded handwritten digits or natural images. To our knowledge this is the first model of the hippocampus that allows to store correlated pattern sequences online in a one-shot fashion without a consolidation process, which can instantaneously be recalled later.
Tasks	One-Shot Learning
Published	2019-05-30
URL	https://arxiv.org/abs/1905.12937v1
PDF	https://arxiv.org/pdf/1905.12937v1.pdf
PWC	https://paperswithcode.com/paper/a-hippocampus-model-for-online-one-shot
Repo
Framework

Differentially Private M-band Wavelet-Based Mechanisms in Machine Learning Environments


Title	Differentially Private M-band Wavelet-Based Mechanisms in Machine Learning Environments
Authors	Kenneth Choi, Tony Lee
Abstract	In the post-industrial world, data science and analytics have gained paramount importance regarding digital data privacy. Improper methods of establishing privacy for accessible datasets can compromise large amounts of user data even if the adversary has a small amount of preliminary knowledge of a user. Many researchers have been developing high-level privacy-preserving mechanisms that also retain the statistical integrity of the data to apply to machine learning. Recent developments of differential privacy, such as the Laplace and Privelet mechanisms, drastically decrease the probability that an adversary can distinguish the elements in a data set and thus extract user information. In this paper, we develop three privacy-preserving mechanisms with the discrete M-band wavelet transform that embed noise into data. The first two methods (LS and LS+) add noise through a Laplace-Sigmoid distribution that multiplies Laplace-distributed values with the sigmoid function, and the third method utilizes pseudo-quantum steganography to embed noise into the data. We then show that our mechanisms successfully retain both differential privacy and learnability through statistical analysis in various machine learning environments.
Tasks
Published	2019-12-30
URL	https://arxiv.org/abs/2001.00012v2
PDF	https://arxiv.org/pdf/2001.00012v2.pdf
PWC	https://paperswithcode.com/paper/differentially-private-m-band-wavelet-based
Repo
Framework