January 31, 2020

3462 words 17 mins read

Paper Group ANR 192

Domain Knowledge Aided Explainable Artificial Intelligence for Intrusion Detection and Response. Quantifying Exposure Bias for Neural Language Generation. An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models. Exploiting semi-supervised training through a dropout regularization in end-to-end speech recogn …

Domain Knowledge Aided Explainable Artificial Intelligence for Intrusion Detection and Response


Title	Domain Knowledge Aided Explainable Artificial Intelligence for Intrusion Detection and Response
Authors	Sheikh Rabiul Islam, William Eberle, Sheikh K. Ghafoor, Ambareen Siraj, Mike Rogers
Abstract	Artificial Intelligence (AI) has become an integral part of modern-day security solutions for its ability to learn very complex functions and handling “Big Data”. However, the lack of explainability and interpretability of successful AI models is a key stumbling block when trust in a model’s prediction is critical. This leads to human intervention, which in turn results in a delayed response or decision. While there have been major advancements in the speed and performance of AI-based intrusion detection systems, the response is still at human speed when it comes to explaining and interpreting a specific prediction or decision. In this work, we infuse popular domain knowledge (i.e., CIA principles) in our model for better explainability and validate the approach on a network intrusion detection test case. Our experimental results suggest that the infusion of domain knowledge provides better explainability as well as a faster decision or response. In addition, the infused domain knowledge generalizes the model to work well with unknown attacks, as well as opens the path to adapt to a large stream of network traffic from numerous IoT devices.
Tasks	Intrusion Detection, Network Intrusion Detection
Published	2019-11-22
URL	https://arxiv.org/abs/1911.09853v2
PDF	https://arxiv.org/pdf/1911.09853v2.pdf
PWC	https://paperswithcode.com/paper/domain-knowledge-aided-explainable-artificial
Repo
Framework

Quantifying Exposure Bias for Neural Language Generation


Title	Quantifying Exposure Bias for Neural Language Generation
Authors	Tianxing He, Jingzhao Zhang, Zhiming Zhou, James Glass
Abstract	The exposure bias problem refers to the training-generation discrepancy, caused by teacher forcing, in maximum likelihood estimation (MLE) training for auto-regressive neural network language models (LM). It has been regarded as a central problem for neural language generation (NLG) model training. Although a lot of algorithms have been proposed to avoid teacher forcing and therefore` to alleviate exposure bias, there is little work showing how serious the exposure bias problem actually is. In this work, we first identify the self-recovery ability of MLE-trained LM, which casts doubt on the seriousness of exposure bias. We then propose sequence-level (EB-bleu) and word-level (EB-C) metrics to quantify the impact of exposure bias. We conduct experiments for the LSTM/transformer model, in both real and synthetic settings. In addition to the unconditional NLG task, we also include results for a seq2seq machine translation task. Surprisingly, all our measurements indicate that removing the training-generation discrepancy only brings very little performance gain. In our analysis, we hypothesise that although there exist a mismatch between the model distribution and the data distribution, the mismatch is still in the model’s “comfortable zone”, and is not big enough to induce significant performance loss. \|
Tasks	Machine Translation, Text Generation
Published	2019-05-25
URL	https://arxiv.org/abs/1905.10617v4
PDF	https://arxiv.org/pdf/1905.10617v4.pdf
PWC	https://paperswithcode.com/paper/quantifying-exposure-bias-for-neural-language
Repo
Framework

An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models


Title	An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models
Authors	Khe Chai Sim, Petr Zadrazil, Françoise Beaufays
Abstract	Speaker-independent speech recognition systems trained with data from many users are generally robust against speaker variability and work well for a large population of speakers. However, these systems do not always generalize well for users with very different speech characteristics. This issue can be addressed by building personalized systems that are designed to work well for each specific user. In this paper, we investigate the idea of securely training personalized end-to-end speech recognition models on mobile devices so that user data and models never leave the device and are never stored on a server. We study how the mobile training environment impacts performance by simulating on-device data consumption. We conduct experiments using data collected from speech impaired users for personalization. Our results show that personalization achieved 63.7% relative word error rate reduction when trained in a server environment and 58.1% in a mobile environment. Moving to on-device personalization resulted in 18.7% performance degradation, in exchange for improved scalability and data privacy. To train the model on device, we split the gradient computation into two and achieved 45% memory reduction at the expense of 42% increase in training time.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2019-09-14
URL	https://arxiv.org/abs/1909.06678v1
PDF	https://arxiv.org/pdf/1909.06678v1.pdf
PWC	https://paperswithcode.com/paper/an-investigation-into-on-device
Repo
Framework

Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition


Title	Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition
Authors	Subhadeep Dey, Petr Motlicek, Trung Bui, Franck Dernoncourt
Abstract	In this paper, we explore various approaches for semi supervised learning in an end to end automatic speech recognition (ASR) framework. The first step in our approach involves training a seed model on the limited amount of labelled data. Additional unlabelled speech data is employed through a data selection mechanism to obtain the best hypothesized output, further used to retrain the seed model. However, uncertainties of the model may not be well captured with a single hypothesis. As opposed to this technique, we apply a dropout mechanism to capture the uncertainty by obtaining multiple hypothesized text transcripts of an speech recording. We assume that the diversity of automatically generated transcripts for an utterance will implicitly increase the reliability of the model. Finally, the data selection process is also applied on these hypothesized transcripts to reduce the uncertainty. Experiments on freely available TEDLIUM corpus and proprietary Adobe’s internal dataset show that the proposed approach significantly reduces ASR errors, compared to the baseline model.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2019-08-08
URL	https://arxiv.org/abs/1908.05227v1
PDF	https://arxiv.org/pdf/1908.05227v1.pdf
PWC	https://paperswithcode.com/paper/exploiting-semi-supervised-training-through-a
Repo
Framework

Correlation Distance Skip Connection Denoising Autoencoder (CDSK-DAE) for Speech Feature Enhancement


Title	Correlation Distance Skip Connection Denoising Autoencoder (CDSK-DAE) for Speech Feature Enhancement
Authors	Alzahra Badi, Sangwook Park, David K. Han, Hanseok Ko
Abstract	Performance of learning based Automatic Speech Recognition (ASR) is susceptible to noise, especially when it is introduced in the testing data while not presented in the training data. This work focuses on a feature enhancement for noise robust end-to-end ASR system by introducing a novel variant of denoising autoencoder (DAE). The proposed method uses skip connections in both encoder and decoder sides by passing speech information of the target frame from input to the model. It also uses a new objective function in training model that uses a correlation distance measure in penalty terms by measuring dependency of the latent target features and the model (latent features and enhanced features obtained from the DAE). Performance of the proposed method was compared against a conventional model and a state of the art model under both seen and unseen noisy environments of 7 different types of background noise with different SNR levels (0, 5, 10 and 20 dB). The proposed method also is tested using linear and non-linear penalty terms as well, where, they both show an improvement on the overall average WER under noisy conditions both seen and unseen in comparison to the state-of-the-art model.
Tasks	Denoising, End-To-End Speech Recognition, Speech Recognition
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11361v1
PDF	https://arxiv.org/pdf/1907.11361v1.pdf
PWC	https://paperswithcode.com/paper/correlation-distance-skip-connection
Repo
Framework

Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition


Title	Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition
Authors	Yonatan Belinkov, Ahmed Ali, James Glass
Abstract	End-to-end neural network systems for automatic speech recognition (ASR) are trained from acoustic features to text transcriptions. In contrast to modular ASR systems, which contain separately-trained components for acoustic modeling, pronunciation lexicon, and language modeling, the end-to-end paradigm is both conceptually simpler and has the potential benefit of training the entire system on the end task. However, such neural network models are more opaque: it is not clear how to interpret the role of different parts of the network and what information it learns during training. In this paper, we analyze the learned internal representations in an end-to-end ASR model. We evaluate the representation quality in terms of several classification tasks, comparing phonemes and graphemes, as well as different articulatory features. We study two languages (English and Arabic) and three datasets, finding remarkable consistency in how different properties are represented in different layers of the deep neural network.
Tasks	End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published	2019-07-09
URL	https://arxiv.org/abs/1907.04224v1
PDF	https://arxiv.org/pdf/1907.04224v1.pdf
PWC	https://paperswithcode.com/paper/analyzing-phonetic-and-graphemic
Repo
Framework

On the Importance of Subword Information for Morphological Tasks in Truly Low-Resource Languages


Title	On the Importance of Subword Information for Morphological Tasks in Truly Low-Resource Languages
Authors	Yi Zhu, Benjamin Heinzerling, Ivan Vulić, Michael Strube, Roi Reichart, Anna Korhonen
Abstract	Recent work has validated the importance of subword information for word representation learning. Since subwords increase parameter sharing ability in neural models, their value should be even more pronounced in low-data regimes. In this work, we therefore provide a comprehensive analysis focused on the usefulness of subwords for word representation learning in truly low-resource scenarios and for three representative morphological tasks: fine-grained entity typing, morphological tagging, and named entity recognition. We conduct a systematic study that spans several dimensions of comparison: 1) type of data scarcity which can stem from the lack of task-specific training data, or even from the lack of unannotated data required to train word embeddings, or both; 2) language type by working with a sample of 16 typologically diverse languages including some truly low-resource ones (e.g. Rusyn, Buryat, and Zulu); 3) the choice of the subword-informed word representation method. Our main results show that subword-informed models are universally useful across all language types, with large gains over subword-agnostic embeddings. They also suggest that the effective use of subwords largely depends on the language (type) and the task at hand, as well as on the amount of available data for training the embeddings and task-based models, where having sufficient in-task data is a more critical requirement.
Tasks	Entity Typing, Morphological Tagging, Named Entity Recognition, Representation Learning, Word Embeddings
Published	2019-09-26
URL	https://arxiv.org/abs/1909.12375v1
PDF	https://arxiv.org/pdf/1909.12375v1.pdf
PWC	https://paperswithcode.com/paper/on-the-importance-of-subword-information-for
Repo
Framework

CAiRE: An End-to-End Empathetic Chatbot


Title	CAiRE: An End-to-End Empathetic Chatbot
Authors	Zhaojiang Lin, Peng Xu, Genta Indra Winata, Farhad Bin Siddique, Zihan Liu, Jamin Shin, Pascale Fung
Abstract	In this paper, we present an end-to-end empathetic conversation agent CAiRE. Our system adapts TransferTransfo (Wolf et al., 2019) learning approach that fine-tunes a large-scale pre-trained language model with multi-task objectives: response language modeling, response prediction and dialogue emotion detection. We evaluate our model on the recently proposed empathetic-dialogues dataset (Rashkin et al., 2019), the experiment results show that CAiRE achieves state-of-the-art performance on dialogue emotion detection and empathetic response generation.
Tasks	Chatbot, Language Modelling
Published	2019-07-28
URL	https://arxiv.org/abs/1907.12108v3
PDF	https://arxiv.org/pdf/1907.12108v3.pdf
PWC	https://paperswithcode.com/paper/caire-an-end-to-end-empathetic-chatbot
Repo
Framework

Coresets for Gaussian Mixture Models of Any Shape


Title	Coresets for Gaussian Mixture Models of Any Shape
Authors	Dan Feldman, Zahi Kfir, Xuan Wu
Abstract	An $\varepsilon$-coreset for a given set $D$ of $n$ points, is usually a small weighted set, such that querying the coreset \emph{provably} yields a $(1+\varepsilon)$-factor approximation to the original (full) dataset, for a given family of queries. Using existing techniques, coresets can be maintained for streaming, dynamic (insertion/deletions), and distributed data in parallel, e.g. on a network, GPU or cloud. We suggest the first coresets that approximate the negative log-likelihood for $k$-Gaussians Mixture Models (GMM) of arbitrary shapes (ratio between eigenvalues of their covariance matrices). For example, for any input set $D$ whose coordinates are integers in $[-n^{100},n^{100}]$ and any fixed $k,d\geq 1$, the coreset size is $(\log n)^{O(1)}/\varepsilon^2$, and can be computed in time near-linear in $n$, with high probability. The optimal GMM may then be approximated quickly by learning the small coreset. Previous results [NIPS’11, JMLR’18] suggested such small coresets for the case of semi-speherical unit Gaussians, i.e., where their corresponding eigenvalues are constants between $\frac{1}{2\pi}$ to $2\pi$. Our main technique is a reduction between coresets for $k$-GMMs and projective clustering problems. We implemented our algorithms, and provide open code, and experimental results. Since our coresets are generic, with no special dependency on GMMs, we hope that they will be useful for many other functions.
Tasks
Published	2019-06-12
URL	https://arxiv.org/abs/1906.04895v1
PDF	https://arxiv.org/pdf/1906.04895v1.pdf
PWC	https://paperswithcode.com/paper/coresets-for-gaussian-mixture-models-of-any
Repo
Framework

Bilevel Optimization, Deep Learning and Fractional Laplacian Regularization with Applications in Tomography


Title	Bilevel Optimization, Deep Learning and Fractional Laplacian Regularization with Applications in Tomography
Authors	Harbir Antil, Zichao, Di, Ratna Khatri
Abstract	In this work we consider a generalized bilevel optimization framework for solving inverse problems. We introduce fractional Laplacian as a regularizer to improve the reconstruction quality, and compare it with the total variation regularization. We emphasize that the key advantage of using fractional Laplacian as a regularizer is that it leads to a linear operator, as opposed to the total variation regularization which results in a nonlinear degenerate operator. Inspired by residual neural networks, to learn the optimal strength of regularization and the exponent of fractional Laplacian, we develop a dedicated bilevel optimization neural network with a variable depth for a general regularized inverse problem. We also draw some parallels between an activation function in a neural network and regularization. We illustrate how to incorporate various regularizer choices into our proposed network. As an example, we consider tomographic reconstruction as a model problem and show an improvement in reconstruction quality, especially for limited data, via fractional Laplacian regularization. We successfully learn the regularization strength and the fractional exponent via our proposed bilevel optimization neural network. We observe that the fractional Laplacian regularization outperforms total variation regularization. This is specially encouraging, and important, in the case of limited and noisy data.
Tasks	bilevel optimization
Published	2019-07-22
URL	https://arxiv.org/abs/1907.09605v1
PDF	https://arxiv.org/pdf/1907.09605v1.pdf
PWC	https://paperswithcode.com/paper/bilevel-optimization-deep-learning-and
Repo
Framework

A Transformer-based approach to Irony and Sarcasm detection


Title	A Transformer-based approach to Irony and Sarcasm detection
Authors	Rolandos Alexandros Potamias, Georgios Siolas, Andreas - Georgios Stafylopatis
Abstract	Figurative Language (FL) seems ubiquitous in all social-media discussion forums and chats, posing extra challenges to sentiment analysis endeavors. Identification of FL schemas in short texts remains largely an unresolved issue in the broader field of Natural Language Processing (NLP), mainly due to their contradictory and metaphorical meaning content. The main FL expression forms are sarcasm, irony and metaphor. In the present paper we employ advanced Deep Learning (DL) methodologies to tackle the problem of identifying the aforementioned FL forms. Significantly extending our previous work [71], we propose a neural network methodology that builds on a recently proposed pre-trained transformer-based network architecture which, is further enhanced with the employment and devise of a recurrent convolutional neural network (RCNN). With this set-up, data preprocessing is kept in minimum. The performance of the devised hybrid neural architecture is tested on four benchmark datasets, and contrasted with other relevant state of the art methodologies and systems. Results demonstrate that the proposed methodology achieves state of the art performance under all benchmark datasets, outperforming, even by a large margin, all other methodologies and published studies.
Tasks	Sarcasm Detection, Sentiment Analysis
Published	2019-11-23
URL	https://arxiv.org/abs/1911.10401v1
PDF	https://arxiv.org/pdf/1911.10401v1.pdf
PWC	https://paperswithcode.com/paper/a-transformer-based-approach-to-irony-and
Repo
Framework

libmolgrid: GPU Accelerated Molecular Gridding for Deep Learning Applications


Title	libmolgrid: GPU Accelerated Molecular Gridding for Deep Learning Applications
Authors	Jocelyn Sunseri, David Ryan Koes
Abstract	There are many ways to represent a molecule as input to a machine learning model and each is associated with loss and retention of certain kinds of information. In the interest of preserving three-dimensional spatial information, including bond angles and torsions, we have developed libmolgrid, a general-purpose library for representing three-dimensional molecules using multidimensional arrays. This library also provides functionality for composing batches of data suited to machine learning workflows, including data augmentation, class balancing, and example stratification according to a regression variable or data subgroup, and it further supports temporal and spatial recurrences over that data to facilitate work with recurrent neural networks, dynamical data, and size extensive modeling. It was designed for seamless integration with popular deep learning frameworks, including Caffe, PyTorch, and Keras, providing good performance by leveraging graphical processing units (GPUs) for computationally-intensive tasks and efficient memory usage through the use of memory views over preallocated buffers. libmolgrid is a free and open source project that is actively supported, serving the growing need in the molecular modeling community for tools that streamline the process of data ingestion, representation construction, and principled machine learning model development.
Tasks	Data Augmentation
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04822v1
PDF	https://arxiv.org/pdf/1912.04822v1.pdf
PWC	https://paperswithcode.com/paper/libmolgrid-gpu-accelerated-molecular-gridding
Repo
Framework

MetaCI: Meta-Learning for Causal Inference in a Heterogeneous Population


Title	MetaCI: Meta-Learning for Causal Inference in a Heterogeneous Population
Authors	Ankit Sharma, Garima Gupta, Ranjitha Prasad, Arnab Chatterjee, Lovekesh Vig, Gautam Shroff
Abstract	Performing inference on data obtained through observational studies is becoming extremely relevant due to the widespread availability of data in fields such as healthcare, education, retail, etc. Furthermore, this data is accrued from multiple homogeneous subgroups of a heterogeneous population, and hence, generalizing the inference mechanism over such data is essential. We propose the MetaCI framework with the goal of answering counterfactual questions in the context of causal inference (CI), where the factual observations are obtained from several homogeneous subgroups. While the CI network is designed to generalize from factual to counterfactual distribution in order to tackle covariate shift, MetaCI employs the meta-learning paradigm to tackle the shift in data distributions between training and test phase due to the presence of heterogeneity in the population, and due to drifts in the target distribution, also known as concept shift. We benchmark the performance of the MetaCI algorithm using the mean absolute percentage error over the average treatment effect as the metric, and demonstrate that meta initialization has significant gains compared to randomly initialized networks, and other methods.
Tasks	Causal Inference, Meta-Learning
Published	2019-12-09
URL	https://arxiv.org/abs/1912.03960v1
PDF	https://arxiv.org/pdf/1912.03960v1.pdf
PWC	https://paperswithcode.com/paper/metaci-meta-learning-for-causal-inference-in
Repo
Framework

Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging


Title	Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging
Authors	Luke Oakden-Rayner, Jared Dunnmon, Gustavo Carneiro, Christopher Ré
Abstract	Machine learning models for medical image analysis often suffer from poor performance on important subsets of a population that are not identified during training or testing. For example, overall performance of a cancer detection model may be high, but the model still consistently misses a rare but aggressive cancer subtype. We refer to this problem as hidden stratification, and observe that it results from incompletely describing the meaningful variation in a dataset. While hidden stratification can substantially reduce the clinical efficacy of machine learning models, its effects remain difficult to measure. In this work, we assess the utility of several possible techniques for measuring and describing hidden stratification effects, and characterize these effects on multiple medical imaging datasets. We find evidence that hidden stratification can occur in unidentified imaging subsets with low prevalence, low label quality, subtle distinguishing features, or spurious correlates, and that it can result in relative performance differences of over 20% on clinically important subsets. Finally, we explore the clinical implications of our findings, and suggest that evaluation of hidden stratification should be a critical component of any machine learning deployment in medical imaging.
Tasks
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12475v2
PDF	https://arxiv.org/pdf/1909.12475v2.pdf
PWC	https://paperswithcode.com/paper/hidden-stratification-causes-clinically
Repo
Framework

DeepErase: Weakly Supervised Ink Artifact Removal in Document Text Images


Title	DeepErase: Weakly Supervised Ink Artifact Removal in Document Text Images
Authors	W. Ronny Huang, Yike Qi, Qianqian Li, Jonathan Degange
Abstract	Paper-intensive industries like insurance, law, and government have long leveraged optical character recognition (OCR) to automatically transcribe hordes of scanned documents into text strings for downstream processing. Even in 2019, there are still many scanned documents and mail that come into businesses in non-digital format. Text to be extracted from real world documents is often nestled inside rich formatting, such as tabular structures or forms with fill-in-the-blank boxes or underlines whose ink often touches or even strikes through the ink of the text itself. Further, the text region could have random ink smudges or spurious strokes. Such ink artifacts can severely interfere with the performance of recognition algorithms or other downstream processing tasks. In this work, we propose DeepErase, a neural-based preprocessor to erase ink artifacts from text images. We devise a method to programmatically assemble real text images and real artifacts into realistic-looking “dirty” text images, and use them to train an artifact segmentation network in a weakly supervised manner, since pixel-level annotations are automatically obtained during the assembly process. In addition to high segmentation accuracy, we show that our cleansed images achieve a significant boost in recognition accuracy by popular OCR software such as Tesseract 4.0. Finally, we test DeepErase on out-of-distribution datasets (NIST SDB) of scanned IRS tax return forms and achieve double-digit improvements in accuracy. All experiments are performed on both printed and handwritten text. Code for all experiments is available at https://github.com/yikeqicn/DeepErase
Tasks	Optical Character Recognition
Published	2019-10-15
URL	https://arxiv.org/abs/1910.07070v3
PDF	https://arxiv.org/pdf/1910.07070v3.pdf
PWC	https://paperswithcode.com/paper/deeperase-weakly-supervised-ink-artifact
Repo
Framework