Paper Group ANR 192
Domain Knowledge Aided Explainable Artificial Intelligence for Intrusion Detection and Response. Quantifying Exposure Bias for Neural Language Generation. An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models. Exploiting semi-supervised training through a dropout regularization in end-to-end speech recogn …
Domain Knowledge Aided Explainable Artificial Intelligence for Intrusion Detection and Response
Title | Domain Knowledge Aided Explainable Artificial Intelligence for Intrusion Detection and Response |
Authors | Sheikh Rabiul Islam, William Eberle, Sheikh K. Ghafoor, Ambareen Siraj, Mike Rogers |
Abstract | Artificial Intelligence (AI) has become an integral part of modern-day security solutions for its ability to learn very complex functions and handling “Big Data”. However, the lack of explainability and interpretability of successful AI models is a key stumbling block when trust in a model’s prediction is critical. This leads to human intervention, which in turn results in a delayed response or decision. While there have been major advancements in the speed and performance of AI-based intrusion detection systems, the response is still at human speed when it comes to explaining and interpreting a specific prediction or decision. In this work, we infuse popular domain knowledge (i.e., CIA principles) in our model for better explainability and validate the approach on a network intrusion detection test case. Our experimental results suggest that the infusion of domain knowledge provides better explainability as well as a faster decision or response. In addition, the infused domain knowledge generalizes the model to work well with unknown attacks, as well as opens the path to adapt to a large stream of network traffic from numerous IoT devices. |
Tasks | Intrusion Detection, Network Intrusion Detection |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.09853v2 |
https://arxiv.org/pdf/1911.09853v2.pdf | |
PWC | https://paperswithcode.com/paper/domain-knowledge-aided-explainable-artificial |
Repo | |
Framework | |
Quantifying Exposure Bias for Neural Language Generation
Title | Quantifying Exposure Bias for Neural Language Generation |
Authors | Tianxing He, Jingzhao Zhang, Zhiming Zhou, James Glass |
Abstract | The exposure bias problem refers to the training-generation discrepancy, caused by teacher forcing, in maximum likelihood estimation (MLE) training for auto-regressive neural network language models (LM). It has been regarded as a central problem for neural language generation (NLG) model training. Although a lot of algorithms have been proposed to avoid teacher forcing and therefore` to alleviate exposure bias, there is little work showing how serious the exposure bias problem actually is. In this work, we first identify the self-recovery ability of MLE-trained LM, which casts doubt on the seriousness of exposure bias. We then propose sequence-level (EB-bleu) and word-level (EB-C) metrics to quantify the impact of exposure bias. We conduct experiments for the LSTM/transformer model, in both real and synthetic settings. In addition to the unconditional NLG task, we also include results for a seq2seq machine translation task. Surprisingly, all our measurements indicate that removing the training-generation discrepancy only brings very little performance gain. In our analysis, we hypothesise that although there exist a mismatch between the model distribution and the data distribution, the mismatch is still in the model’s “comfortable zone”, and is not big enough to induce significant performance loss. | |
Tasks | Machine Translation, Text Generation |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1905.10617v4 |
https://arxiv.org/pdf/1905.10617v4.pdf | |
PWC | https://paperswithcode.com/paper/quantifying-exposure-bias-for-neural-language |
Repo | |
Framework | |
An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models
Title | An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models |
Authors | Khe Chai Sim, Petr Zadrazil, Françoise Beaufays |
Abstract | Speaker-independent speech recognition systems trained with data from many users are generally robust against speaker variability and work well for a large population of speakers. However, these systems do not always generalize well for users with very different speech characteristics. This issue can be addressed by building personalized systems that are designed to work well for each specific user. In this paper, we investigate the idea of securely training personalized end-to-end speech recognition models on mobile devices so that user data and models never leave the device and are never stored on a server. We study how the mobile training environment impacts performance by simulating on-device data consumption. We conduct experiments using data collected from speech impaired users for personalization. Our results show that personalization achieved 63.7% relative word error rate reduction when trained in a server environment and 58.1% in a mobile environment. Moving to on-device personalization resulted in 18.7% performance degradation, in exchange for improved scalability and data privacy. To train the model on device, we split the gradient computation into two and achieved 45% memory reduction at the expense of 42% increase in training time. |
Tasks | End-To-End Speech Recognition, Speech Recognition |
Published | 2019-09-14 |
URL | https://arxiv.org/abs/1909.06678v1 |
https://arxiv.org/pdf/1909.06678v1.pdf | |
PWC | https://paperswithcode.com/paper/an-investigation-into-on-device |
Repo | |
Framework | |
Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition
Title | Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition |
Authors | Subhadeep Dey, Petr Motlicek, Trung Bui, Franck Dernoncourt |
Abstract | In this paper, we explore various approaches for semi supervised learning in an end to end automatic speech recognition (ASR) framework. The first step in our approach involves training a seed model on the limited amount of labelled data. Additional unlabelled speech data is employed through a data selection mechanism to obtain the best hypothesized output, further used to retrain the seed model. However, uncertainties of the model may not be well captured with a single hypothesis. As opposed to this technique, we apply a dropout mechanism to capture the uncertainty by obtaining multiple hypothesized text transcripts of an speech recording. We assume that the diversity of automatically generated transcripts for an utterance will implicitly increase the reliability of the model. Finally, the data selection process is also applied on these hypothesized transcripts to reduce the uncertainty. Experiments on freely available TEDLIUM corpus and proprietary Adobe’s internal dataset show that the proposed approach significantly reduces ASR errors, compared to the baseline model. |
Tasks | End-To-End Speech Recognition, Speech Recognition |
Published | 2019-08-08 |
URL | https://arxiv.org/abs/1908.05227v1 |
https://arxiv.org/pdf/1908.05227v1.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-semi-supervised-training-through-a |
Repo | |
Framework | |
Correlation Distance Skip Connection Denoising Autoencoder (CDSK-DAE) for Speech Feature Enhancement
Title | Correlation Distance Skip Connection Denoising Autoencoder (CDSK-DAE) for Speech Feature Enhancement |
Authors | Alzahra Badi, Sangwook Park, David K. Han, Hanseok Ko |
Abstract | Performance of learning based Automatic Speech Recognition (ASR) is susceptible to noise, especially when it is introduced in the testing data while not presented in the training data. This work focuses on a feature enhancement for noise robust end-to-end ASR system by introducing a novel variant of denoising autoencoder (DAE). The proposed method uses skip connections in both encoder and decoder sides by passing speech information of the target frame from input to the model. It also uses a new objective function in training model that uses a correlation distance measure in penalty terms by measuring dependency of the latent target features and the model (latent features and enhanced features obtained from the DAE). Performance of the proposed method was compared against a conventional model and a state of the art model under both seen and unseen noisy environments of 7 different types of background noise with different SNR levels (0, 5, 10 and 20 dB). The proposed method also is tested using linear and non-linear penalty terms as well, where, they both show an improvement on the overall average WER under noisy conditions both seen and unseen in comparison to the state-of-the-art model. |
Tasks | Denoising, End-To-End Speech Recognition, Speech Recognition |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11361v1 |
https://arxiv.org/pdf/1907.11361v1.pdf | |
PWC | https://paperswithcode.com/paper/correlation-distance-skip-connection |
Repo | |
Framework | |
Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition
Title | Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition |
Authors | Yonatan Belinkov, Ahmed Ali, James Glass |
Abstract | End-to-end neural network systems for automatic speech recognition (ASR) are trained from acoustic features to text transcriptions. In contrast to modular ASR systems, which contain separately-trained components for acoustic modeling, pronunciation lexicon, and language modeling, the end-to-end paradigm is both conceptually simpler and has the potential benefit of training the entire system on the end task. However, such neural network models are more opaque: it is not clear how to interpret the role of different parts of the network and what information it learns during training. In this paper, we analyze the learned internal representations in an end-to-end ASR model. We evaluate the representation quality in terms of several classification tasks, comparing phonemes and graphemes, as well as different articulatory features. We study two languages (English and Arabic) and three datasets, finding remarkable consistency in how different properties are represented in different layers of the deep neural network. |
Tasks | End-To-End Speech Recognition, Language Modelling, Speech Recognition |
Published | 2019-07-09 |
URL | https://arxiv.org/abs/1907.04224v1 |
https://arxiv.org/pdf/1907.04224v1.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-phonetic-and-graphemic |
Repo | |
Framework | |
On the Importance of Subword Information for Morphological Tasks in Truly Low-Resource Languages
Title | On the Importance of Subword Information for Morphological Tasks in Truly Low-Resource Languages |
Authors | Yi Zhu, Benjamin Heinzerling, Ivan Vulić, Michael Strube, Roi Reichart, Anna Korhonen |
Abstract | Recent work has validated the importance of subword information for word representation learning. Since subwords increase parameter sharing ability in neural models, their value should be even more pronounced in low-data regimes. In this work, we therefore provide a comprehensive analysis focused on the usefulness of subwords for word representation learning in truly low-resource scenarios and for three representative morphological tasks: fine-grained entity typing, morphological tagging, and named entity recognition. We conduct a systematic study that spans several dimensions of comparison: 1) type of data scarcity which can stem from the lack of task-specific training data, or even from the lack of unannotated data required to train word embeddings, or both; 2) language type by working with a sample of 16 typologically diverse languages including some truly low-resource ones (e.g. Rusyn, Buryat, and Zulu); 3) the choice of the subword-informed word representation method. Our main results show that subword-informed models are universally useful across all language types, with large gains over subword-agnostic embeddings. They also suggest that the effective use of subwords largely depends on the language (type) and the task at hand, as well as on the amount of available data for training the embeddings and task-based models, where having sufficient in-task data is a more critical requirement. |
Tasks | Entity Typing, Morphological Tagging, Named Entity Recognition, Representation Learning, Word Embeddings |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.12375v1 |
https://arxiv.org/pdf/1909.12375v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-importance-of-subword-information-for |
Repo | |
Framework | |
CAiRE: An End-to-End Empathetic Chatbot
Title | CAiRE: An End-to-End Empathetic Chatbot |
Authors | Zhaojiang Lin, Peng Xu, Genta Indra Winata, Farhad Bin Siddique, Zihan Liu, Jamin Shin, Pascale Fung |
Abstract | In this paper, we present an end-to-end empathetic conversation agent CAiRE. Our system adapts TransferTransfo (Wolf et al., 2019) learning approach that fine-tunes a large-scale pre-trained language model with multi-task objectives: response language modeling, response prediction and dialogue emotion detection. We evaluate our model on the recently proposed empathetic-dialogues dataset (Rashkin et al., 2019), the experiment results show that CAiRE achieves state-of-the-art performance on dialogue emotion detection and empathetic response generation. |
Tasks | Chatbot, Language Modelling |
Published | 2019-07-28 |
URL | https://arxiv.org/abs/1907.12108v3 |
https://arxiv.org/pdf/1907.12108v3.pdf | |
PWC | https://paperswithcode.com/paper/caire-an-end-to-end-empathetic-chatbot |
Repo | |
Framework | |
Coresets for Gaussian Mixture Models of Any Shape
Title | Coresets for Gaussian Mixture Models of Any Shape |
Authors | Dan Feldman, Zahi Kfir, Xuan Wu |
Abstract | An $\varepsilon$-coreset for a given set $D$ of $n$ points, is usually a small weighted set, such that querying the coreset \emph{provably} yields a $(1+\varepsilon)$-factor approximation to the original (full) dataset, for a given family of queries. Using existing techniques, coresets can be maintained for streaming, dynamic (insertion/deletions), and distributed data in parallel, e.g. on a network, GPU or cloud. We suggest the first coresets that approximate the negative log-likelihood for $k$-Gaussians Mixture Models (GMM) of arbitrary shapes (ratio between eigenvalues of their covariance matrices). For example, for any input set $D$ whose coordinates are integers in $[-n^{100},n^{100}]$ and any fixed $k,d\geq 1$, the coreset size is $(\log n)^{O(1)}/\varepsilon^2$, and can be computed in time near-linear in $n$, with high probability. The optimal GMM may then be approximated quickly by learning the small coreset. Previous results [NIPS’11, JMLR’18] suggested such small coresets for the case of semi-speherical unit Gaussians, i.e., where their corresponding eigenvalues are constants between $\frac{1}{2\pi}$ to $2\pi$. Our main technique is a reduction between coresets for $k$-GMMs and projective clustering problems. We implemented our algorithms, and provide open code, and experimental results. Since our coresets are generic, with no special dependency on GMMs, we hope that they will be useful for many other functions. |
Tasks | |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.04895v1 |
https://arxiv.org/pdf/1906.04895v1.pdf | |
PWC | https://paperswithcode.com/paper/coresets-for-gaussian-mixture-models-of-any |
Repo | |
Framework | |
Bilevel Optimization, Deep Learning and Fractional Laplacian Regularization with Applications in Tomography
Title | Bilevel Optimization, Deep Learning and Fractional Laplacian Regularization with Applications in Tomography |
Authors | Harbir Antil, Zichao, Di, Ratna Khatri |
Abstract | In this work we consider a generalized bilevel optimization framework for solving inverse problems. We introduce fractional Laplacian as a regularizer to improve the reconstruction quality, and compare it with the total variation regularization. We emphasize that the key advantage of using fractional Laplacian as a regularizer is that it leads to a linear operator, as opposed to the total variation regularization which results in a nonlinear degenerate operator. Inspired by residual neural networks, to learn the optimal strength of regularization and the exponent of fractional Laplacian, we develop a dedicated bilevel optimization neural network with a variable depth for a general regularized inverse problem. We also draw some parallels between an activation function in a neural network and regularization. We illustrate how to incorporate various regularizer choices into our proposed network. As an example, we consider tomographic reconstruction as a model problem and show an improvement in reconstruction quality, especially for limited data, via fractional Laplacian regularization. We successfully learn the regularization strength and the fractional exponent via our proposed bilevel optimization neural network. We observe that the fractional Laplacian regularization outperforms total variation regularization. This is specially encouraging, and important, in the case of limited and noisy data. |
Tasks | bilevel optimization |
Published | 2019-07-22 |
URL | https://arxiv.org/abs/1907.09605v1 |
https://arxiv.org/pdf/1907.09605v1.pdf | |
PWC | https://paperswithcode.com/paper/bilevel-optimization-deep-learning-and |
Repo | |
Framework | |
A Transformer-based approach to Irony and Sarcasm detection
Title | A Transformer-based approach to Irony and Sarcasm detection |
Authors | Rolandos Alexandros Potamias, Georgios Siolas, Andreas - Georgios Stafylopatis |
Abstract | Figurative Language (FL) seems ubiquitous in all social-media discussion forums and chats, posing extra challenges to sentiment analysis endeavors. Identification of FL schemas in short texts remains largely an unresolved issue in the broader field of Natural Language Processing (NLP), mainly due to their contradictory and metaphorical meaning content. The main FL expression forms are sarcasm, irony and metaphor. In the present paper we employ advanced Deep Learning (DL) methodologies to tackle the problem of identifying the aforementioned FL forms. Significantly extending our previous work [71], we propose a neural network methodology that builds on a recently proposed pre-trained transformer-based network architecture which, is further enhanced with the employment and devise of a recurrent convolutional neural network (RCNN). With this set-up, data preprocessing is kept in minimum. The performance of the devised hybrid neural architecture is tested on four benchmark datasets, and contrasted with other relevant state of the art methodologies and systems. Results demonstrate that the proposed methodology achieves state of the art performance under all benchmark datasets, outperforming, even by a large margin, all other methodologies and published studies. |
Tasks | Sarcasm Detection, Sentiment Analysis |
Published | 2019-11-23 |
URL | https://arxiv.org/abs/1911.10401v1 |
https://arxiv.org/pdf/1911.10401v1.pdf | |
PWC | https://paperswithcode.com/paper/a-transformer-based-approach-to-irony-and |
Repo | |
Framework | |
libmolgrid: GPU Accelerated Molecular Gridding for Deep Learning Applications
Title | libmolgrid: GPU Accelerated Molecular Gridding for Deep Learning Applications |
Authors | Jocelyn Sunseri, David Ryan Koes |
Abstract | There are many ways to represent a molecule as input to a machine learning model and each is associated with loss and retention of certain kinds of information. In the interest of preserving three-dimensional spatial information, including bond angles and torsions, we have developed libmolgrid, a general-purpose library for representing three-dimensional molecules using multidimensional arrays. This library also provides functionality for composing batches of data suited to machine learning workflows, including data augmentation, class balancing, and example stratification according to a regression variable or data subgroup, and it further supports temporal and spatial recurrences over that data to facilitate work with recurrent neural networks, dynamical data, and size extensive modeling. It was designed for seamless integration with popular deep learning frameworks, including Caffe, PyTorch, and Keras, providing good performance by leveraging graphical processing units (GPUs) for computationally-intensive tasks and efficient memory usage through the use of memory views over preallocated buffers. libmolgrid is a free and open source project that is actively supported, serving the growing need in the molecular modeling community for tools that streamline the process of data ingestion, representation construction, and principled machine learning model development. |
Tasks | Data Augmentation |
Published | 2019-12-10 |
URL | https://arxiv.org/abs/1912.04822v1 |
https://arxiv.org/pdf/1912.04822v1.pdf | |
PWC | https://paperswithcode.com/paper/libmolgrid-gpu-accelerated-molecular-gridding |
Repo | |
Framework | |
MetaCI: Meta-Learning for Causal Inference in a Heterogeneous Population
Title | MetaCI: Meta-Learning for Causal Inference in a Heterogeneous Population |
Authors | Ankit Sharma, Garima Gupta, Ranjitha Prasad, Arnab Chatterjee, Lovekesh Vig, Gautam Shroff |
Abstract | Performing inference on data obtained through observational studies is becoming extremely relevant due to the widespread availability of data in fields such as healthcare, education, retail, etc. Furthermore, this data is accrued from multiple homogeneous subgroups of a heterogeneous population, and hence, generalizing the inference mechanism over such data is essential. We propose the MetaCI framework with the goal of answering counterfactual questions in the context of causal inference (CI), where the factual observations are obtained from several homogeneous subgroups. While the CI network is designed to generalize from factual to counterfactual distribution in order to tackle covariate shift, MetaCI employs the meta-learning paradigm to tackle the shift in data distributions between training and test phase due to the presence of heterogeneity in the population, and due to drifts in the target distribution, also known as concept shift. We benchmark the performance of the MetaCI algorithm using the mean absolute percentage error over the average treatment effect as the metric, and demonstrate that meta initialization has significant gains compared to randomly initialized networks, and other methods. |
Tasks | Causal Inference, Meta-Learning |
Published | 2019-12-09 |
URL | https://arxiv.org/abs/1912.03960v1 |
https://arxiv.org/pdf/1912.03960v1.pdf | |
PWC | https://paperswithcode.com/paper/metaci-meta-learning-for-causal-inference-in |
Repo | |
Framework | |
Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging
Title | Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging |
Authors | Luke Oakden-Rayner, Jared Dunnmon, Gustavo Carneiro, Christopher Ré |
Abstract | Machine learning models for medical image analysis often suffer from poor performance on important subsets of a population that are not identified during training or testing. For example, overall performance of a cancer detection model may be high, but the model still consistently misses a rare but aggressive cancer subtype. We refer to this problem as hidden stratification, and observe that it results from incompletely describing the meaningful variation in a dataset. While hidden stratification can substantially reduce the clinical efficacy of machine learning models, its effects remain difficult to measure. In this work, we assess the utility of several possible techniques for measuring and describing hidden stratification effects, and characterize these effects on multiple medical imaging datasets. We find evidence that hidden stratification can occur in unidentified imaging subsets with low prevalence, low label quality, subtle distinguishing features, or spurious correlates, and that it can result in relative performance differences of over 20% on clinically important subsets. Finally, we explore the clinical implications of our findings, and suggest that evaluation of hidden stratification should be a critical component of any machine learning deployment in medical imaging. |
Tasks | |
Published | 2019-09-27 |
URL | https://arxiv.org/abs/1909.12475v2 |
https://arxiv.org/pdf/1909.12475v2.pdf | |
PWC | https://paperswithcode.com/paper/hidden-stratification-causes-clinically |
Repo | |
Framework | |
DeepErase: Weakly Supervised Ink Artifact Removal in Document Text Images
Title | DeepErase: Weakly Supervised Ink Artifact Removal in Document Text Images |
Authors | W. Ronny Huang, Yike Qi, Qianqian Li, Jonathan Degange |
Abstract | Paper-intensive industries like insurance, law, and government have long leveraged optical character recognition (OCR) to automatically transcribe hordes of scanned documents into text strings for downstream processing. Even in 2019, there are still many scanned documents and mail that come into businesses in non-digital format. Text to be extracted from real world documents is often nestled inside rich formatting, such as tabular structures or forms with fill-in-the-blank boxes or underlines whose ink often touches or even strikes through the ink of the text itself. Further, the text region could have random ink smudges or spurious strokes. Such ink artifacts can severely interfere with the performance of recognition algorithms or other downstream processing tasks. In this work, we propose DeepErase, a neural-based preprocessor to erase ink artifacts from text images. We devise a method to programmatically assemble real text images and real artifacts into realistic-looking “dirty” text images, and use them to train an artifact segmentation network in a weakly supervised manner, since pixel-level annotations are automatically obtained during the assembly process. In addition to high segmentation accuracy, we show that our cleansed images achieve a significant boost in recognition accuracy by popular OCR software such as Tesseract 4.0. Finally, we test DeepErase on out-of-distribution datasets (NIST SDB) of scanned IRS tax return forms and achieve double-digit improvements in accuracy. All experiments are performed on both printed and handwritten text. Code for all experiments is available at https://github.com/yikeqicn/DeepErase |
Tasks | Optical Character Recognition |
Published | 2019-10-15 |
URL | https://arxiv.org/abs/1910.07070v3 |
https://arxiv.org/pdf/1910.07070v3.pdf | |
PWC | https://paperswithcode.com/paper/deeperase-weakly-supervised-ink-artifact |
Repo | |
Framework | |