April 3, 2020

# Paper Group ANR 28

Fast Estimation of Information Theoretic Learning Descriptors using Explicit Inner Product Spaces. Accelerated learning algorithms of general fuzzy min-max neural network using a branch-and-bound-based hyperbox selection rule. Deliberation Model Based Two-Pass End-to-End Speech Recognition. A.I. based Embedded Speech to Text Using Deepspeech. Trans …

#### Fast Estimation of Information Theoretic Learning Descriptors using Explicit Inner Product Spaces

Title Fast Estimation of Information Theoretic Learning Descriptors using Explicit Inner Product Spaces
Authors Kan Li, Jose C. Principe
Abstract Kernel methods form a theoretically-grounded, powerful and versatile framework to solve nonlinear problems in signal processing and machine learning. The standard approach relies on the \emph{kernel trick} to perform pairwise evaluations of a kernel function, leading to scalability issues for large datasets due to its linear and superlinear growth with respect to the training data. Recently, we proposed \emph{no-trick} (NT) kernel adaptive filtering (KAF) that leverages explicit feature space mappings using data-independent basis with constant complexity. The inner product defined by the feature mapping corresponds to a positive-definite finite-rank kernel that induces a finite-dimensional reproducing kernel Hilbert space (RKHS). Information theoretic learning (ITL) is a framework where information theory descriptors based on non-parametric estimator of Renyi entropy replace conventional second-order statistics for the design of adaptive systems. An RKHS for ITL defined on a space of probability density functions simplifies statistical inference for supervised or unsupervised learning. ITL criteria take into account the higher-order statistical behavior of the systems and signals as desired. However, this comes at a cost of increased computational complexity. In this paper, we extend the NT kernel concept to ITL for improved information extraction from the signal without compromising scalability. Specifically, we focus on a family of fast, scalable, and accurate estimators for ITL using explicit inner product space (EIPS) kernels. We demonstrate the superior performance of EIPS-ITL estimators and combined NT-KAF using EIPS-ITL cost functions through experiments.
Published 2020-01-01
URL https://arxiv.org/abs/2001.00265v1
PDF https://arxiv.org/pdf/2001.00265v1.pdf
PWC https://paperswithcode.com/paper/fast-estimation-of-information-theoretic
Repo
Framework

#### Accelerated learning algorithms of general fuzzy min-max neural network using a branch-and-bound-based hyperbox selection rule

Title Accelerated learning algorithms of general fuzzy min-max neural network using a branch-and-bound-based hyperbox selection rule
Authors Thanh Tung Khuat, Bogdan Gabrys
Abstract This paper proposes a method to accelerate the training process of general fuzzy min-max neural network. The purpose is to reduce the unsuitable hyperboxes selected as the potential candidates of the expansion step of existing hyperboxes to cover a new input pattern in the online learning algorithms or candidates of the hyperbox aggregation process in the agglomerative learning algorithms. Our proposed approach is based on the mathematical formulas to form a branch-and-bound solution aiming to remove the hyperboxes which are certain not to satisfy expansion or aggregation conditions, and in turn decreasing the training time of learning algorithms. The efficiency of the proposed method is assessed over a number of widely used data sets. The experimental results indicated the significant decrease in training time of proposed approach for both online and agglomerative learning algorithms. Notably, the training time of the online learning algorithms is reduced from 1.2 to 12 times when using the proposed method, while the agglomerative learning algorithms are accelerated from 7 to 37 times on average.
Published 2020-03-25
URL https://arxiv.org/abs/2003.11333v1
PDF https://arxiv.org/pdf/2003.11333v1.pdf
PWC https://paperswithcode.com/paper/accelerated-learning-algorithms-of-general
Repo
Framework

#### Deliberation Model Based Two-Pass End-to-End Speech Recognition

Title Deliberation Model Based Two-Pass End-to-End Speech Recognition
Authors Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prabhavalkar
Abstract End-to-end (E2E) models have made rapid progress in automatic speech recognition (ASR) and perform competitively relative to conventional models. To further improve the quality, a two-pass model has been proposed to rescore streamed hypotheses using the non-streaming Listen, Attend and Spell (LAS) model while maintaining a reasonable latency. The model attends to acoustics to rescore hypotheses, as opposed to a class of neural correction models that use only first-pass text hypotheses. In this work, we propose to attend to both acoustics and first-pass hypotheses using a deliberation network. A bidirectional encoder is used to extract context information from first-pass hypotheses. The proposed deliberation model achieves 12% relative WER reduction compared to LAS rescoring in Google Voice Search (VS) tasks, and 23% reduction on a proper noun test set. Compared to a large conventional model, our best model performs 21% relatively better for VS. In terms of computational complexity, the deliberation decoder has a larger size than the LAS decoder, and hence requires more computations in second-pass decoding.
Tasks End-To-End Speech Recognition, Speech Recognition
Published 2020-03-17
URL https://arxiv.org/abs/2003.07962v1
PDF https://arxiv.org/pdf/2003.07962v1.pdf
PWC https://paperswithcode.com/paper/deliberation-model-based-two-pass-end-to-end
Repo
Framework

#### A.I. based Embedded Speech to Text Using Deepspeech

Title A.I. based Embedded Speech to Text Using Deepspeech
Authors Muhammad Hafidh Firmansyah, Anand Paul, Deblina Bhattacharya, Gul Malik Urfa
Abstract Deepspeech was very useful for development IoT devices that need voice recognition. One of the voice recognition systems is deepspeech from Mozilla. Deepspeech is an open-source voice recognition that was using a neural network to convert speech spectrogram into a text transcript. This paper shows the implementation process of speech recognition on a low-end computational device. Development of English-language speech recognition that has many datasets become a good point for starting. The model that used results from pre-trained model that provide by each version of deepspeech, without change of the model that already released, furthermore the benefit of using raspberry pi as a media end-to-end speech recognition device become a good thing, user can change and modify of the speech recognition, and also deepspeech can be standalone device without need continuously internet connection to process speech recognition, and even this paper show the power of Tensorflow Lite can make a significant difference on inference by deepspeech rather than using Tensorflow non-Lite.This paper shows the experiment using Deepspeech version 0.1.0, 0.1.1, and 0.6.0, and there is some improvement on Deepspeech version 0.6.0, faster while processing speech-to-text on old hardware raspberry pi 3 b+.
Tasks End-To-End Speech Recognition, Speech Recognition
Published 2020-02-25
URL https://arxiv.org/abs/2002.12830v1
PDF https://arxiv.org/pdf/2002.12830v1.pdf
PWC https://paperswithcode.com/paper/ai-based-embedded-speech-to-text-using
Repo
Framework

#### Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss

Title Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss
Authors Qian Zhang, Han Lu, Hasim Sak, Anshuman Tripathi, Erik McDermott, Stephen Koo, Shankar Kumar
Abstract In this paper we present an end-to-end speech recognition model with Transformer encoders that can be used in a streaming speech recognition system. Transformer computation blocks based on self-attention are used to encode both audio and label sequences independently. The activations from both audio and label encoders are combined with a feed-forward layer to compute a probability distribution over the label space for every combination of acoustic frame position and label history. This is similar to the Recurrent Neural Network Transducer (RNN-T) model, which uses RNNs for information encoding instead of Transformer encoders. The model is trained with the RNN-T loss well-suited to streaming decoding. We present results on the LibriSpeech dataset showing that limiting the left context for self-attention in the Transformer layers makes decoding computationally tractable for streaming, with only a slight degradation in accuracy. We also show that the full attention version of our model beats the-state-of-the art accuracy on the LibriSpeech benchmarks. Our results also show that we can bridge the gap between full attention and limited attention versions of our model by attending to a limited number of future frames.
Tasks End-To-End Speech Recognition, Speech Recognition
Published 2020-02-07
URL https://arxiv.org/abs/2002.02562v2
PDF https://arxiv.org/pdf/2002.02562v2.pdf
PWC https://paperswithcode.com/paper/transformer-transducer-a-streamable-speech
Repo
Framework

#### Scaling Up Online Speech Recognition Using ConvNets

Title Scaling Up Online Speech Recognition Using ConvNets
Authors Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
Abstract We design an online end-to-end speech recognition system based on Time-Depth Separable (TDS) convolutions and Connectionist Temporal Classification (CTC). We improve the core TDS architecture in order to limit the future context and hence reduce latency while maintaining accuracy. The system has almost three times the throughput of a well tuned hybrid ASR baseline while also having lower latency and a better word error rate. Also important to the efficiency of the recognizer is our highly optimized beam search decoder. To show the impact of our design choices, we analyze throughput, latency, accuracy, and discuss how these metrics can be tuned based on the user requirements.
Tasks End-To-End Speech Recognition, Speech Recognition
Published 2020-01-27
URL https://arxiv.org/abs/2001.09727v1
PDF https://arxiv.org/pdf/2001.09727v1.pdf
PWC https://paperswithcode.com/paper/scaling-up-online-speech-recognition-using
Repo
Framework

#### Semi-supervised ASR by End-to-end Self-training

Title Semi-supervised ASR by End-to-end Self-training
Authors Yang Chen, Weiran Wang, Chao Wang
Abstract While deep learning based end-to-end automatic speech recognition (ASR) systems have greatly simplified modeling pipelines, they suffer from the data sparsity issue. In this work, we propose a self-training method with an end-to-end system for semi-supervised ASR. Starting from a Connectionist Temporal Classification (CTC) system trained on the supervised data, we iteratively generate pseudo-labels on a mini-batch of unsupervised utterances with the current model, and use the pseudo-labels to augment the supervised data for immediate model update. Our method retains the simplicity of end-to-end ASR systems, and can be seen as performing alternating optimization over a well-defined learning objective. We also perform empirical investigations of our method, regarding the effect of data augmentation, decoding beamsize for pseudo-label generation, and freshness of pseudo-labels. On a commonly used semi-supervised ASR setting with the WSJ corpus, our method gives 14.4% relative WER improvement over a carefully-trained base system with data augmentation, reducing the performance gap between the base system and the oracle system by 50%.
Tasks Data Augmentation, End-To-End Speech Recognition, Speech Recognition
Published 2020-01-24
URL https://arxiv.org/abs/2001.09128v1
PDF https://arxiv.org/pdf/2001.09128v1.pdf
PWC https://paperswithcode.com/paper/semi-supervised-asr-by-end-to-end-self
Repo
Framework

#### Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

Title Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
Authors Raphaël Barman, Maud Ehrmann, Simon Clematide, Sofia Ares Oliveira, Frédéric Kaplan
Abstract The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance.
Tasks Document Layout Analysis, Semantic Segmentation
Published 2020-02-14
URL https://arxiv.org/abs/2002.06144v1
PDF https://arxiv.org/pdf/2002.06144v1.pdf
PWC https://paperswithcode.com/paper/combining-visual-and-textual-features-for-1
Repo
Framework

#### A Block Coordinate Descent-based Projected Gradient Algorithm for Orthogonal Non-negative Matrix Factorization

Title A Block Coordinate Descent-based Projected Gradient Algorithm for Orthogonal Non-negative Matrix Factorization
Abstract This article utilizes the projected gradient method (PG) for a non-negative matrix factorization problem (NMF), where one or both matrix factors must have orthonormal columns or rows. We penalise the orthonormality constraints and apply the PG method via a block coordinate descent approach. This means that at a certain time one matrix factor is fixed and the other is updated by moving along the steepest descent direction computed from the penalised objective function and projecting onto the space of non-negative matrices. Our method is tested on two sets of synthetic data for various values of penalty parameters. The performance is compared to the well-known multiplicative update (MU) method from Ding (2006), and with a modified global convergent variant of the MU algorithm recently proposed by Mirzal (2014). We provide extensive numerical results coupled with appropriate visualizations, which demonstrate that our method is very competitive and usually outperforms the other two methods.
Published 2020-03-23
URL https://arxiv.org/abs/2003.10269v1
PDF https://arxiv.org/pdf/2003.10269v1.pdf
PWC https://paperswithcode.com/paper/a-block-coordinate-descent-based-projected
Repo
Framework

#### Security & Privacy in IoT Using Machine Learning & Blockchain: Threats & Countermeasures

Title Security & Privacy in IoT Using Machine Learning & Blockchain: Threats & Countermeasures
Abstract Security and privacy have become significant concerns due to the involvement of the Internet of Things (IoT) devices in different applications. Cyber threats are growing at an explosive pace making the existing security and privacy measures inadequate. Hence, everyone on the Internet is a product for hackers. Consequently, Machine Learning (ML) algorithms are used to produce accurate outputs from large complex databases. The generated outputs can be used to predict and detect vulnerabilities in IoT-based systems. Furthermore, Blockchain (BC) technique is becoming popular in modern IoT applications to deal with security and privacy issues. Several studies have been conducted on either ML algorithms or BC techniques. However, these studies target either security or privacy issues using ML algorithms or BC techniques, thus posing a need for a combined survey on efforts made in recent years addressing both security and privacy issues using ML algorithms and BC techniques. In this paper, we have provided a summary of research efforts made in the past few years addressing security and privacy issues using ML algorithms and BC techniques in the IoT domain. First, we discuss and categorize various security and privacy threats in the IoT domain that were reported in the past few years. Secondly, we classify the literature on security and privacy efforts based on ML algorithms and BC techniques in the IoT domain. In the end, various challenges and future research directions using ML algorithms and BC techniques to address security and privacy issues in the IoT domain are identified and discussed.
Published 2020-02-10
URL https://arxiv.org/abs/2002.03488v1
PDF https://arxiv.org/pdf/2002.03488v1.pdf
PWC https://paperswithcode.com/paper/security-privacy-in-iot-using-machine
Repo
Framework

#### A Wasserstein Minimum Velocity Approach to Learning Unnormalized Models

Title A Wasserstein Minimum Velocity Approach to Learning Unnormalized Models
Authors Ziyu Wang, Shuyu Cheng, Yueru Li, Jun Zhu, Bo Zhang
Abstract Score matching provides an effective approach to learning flexible unnormalized models, but its scalability is limited by the need to evaluate a second-order derivative. In this paper, we present a scalable approximation to a general family of learning objectives including score matching, by observing a new connection between these objectives and Wasserstein gradient flows. We present applications with promise in learning neural density estimators on manifolds, and training implicit variational and Wasserstein auto-encoders with a manifold-valued prior.
Published 2020-02-18
URL https://arxiv.org/abs/2002.07501v1
PDF https://arxiv.org/pdf/2002.07501v1.pdf
PWC https://paperswithcode.com/paper/a-wasserstein-minimum-velocity-approach-to
Repo
Framework

#### Federated Clustering via Matrix Factorization Models: From Model Averaging to Gradient Sharing

Title Federated Clustering via Matrix Factorization Models: From Model Averaging to Gradient Sharing
Authors Shuai Wang, Tsung-Hui Chang
Abstract Recently, federated learning (FL) has drawn significant attention due to its capability of training a model over the network without knowing the client’s private raw data. In this paper, we study the unsupervised clustering problem under the FL setting. By adopting a generalized matrix factorization model for clustering, we propose two novel (first-order) federated clustering (FedC) algorithms based on principles of model averaging and gradient sharing, respectively, and present their theoretical convergence conditions. We show that both algorithms have a $\mathcal{O}(1/T)$ convergence rate, where $T$ is the total number of gradient evaluations per client, and the communication cost can be effectively reduced by controlling the local epoch length and allowing partial client participation within each communication round. Numerical experiments show that the FedC algorithm based on gradient sharing outperforms that based on model averaging, especially in scenarios with non-i.i.d. data, and can perform comparably as or exceed the centralized clustering algorithms.
Published 2020-02-12
URL https://arxiv.org/abs/2002.04930v1
PDF https://arxiv.org/pdf/2002.04930v1.pdf
PWC https://paperswithcode.com/paper/federated-clustering-via-matrix-factorization
Repo
Framework

#### Hadath: From Social Media Mapping to Multi-Resolution Event-Enriched Maps

Title Hadath: From Social Media Mapping to Multi-Resolution Event-Enriched Maps
Authors Faizan Ur Rehman, Imad Afyouni, Ahmed Lbath, Saleh Basalamah
Abstract Publicly available data is increasing rapidly, and will continue to grow with the advancement of technologies in sensors, smartphones and the Internet of Things. Data from multiple sources can improve coverage and provide more relevant knowledge about surrounding events and points of Interest. The strength of one source of data can compensate for the shortcomings of another source by providing supplementary information. Maps are also getting popular day-by-day and people are using it to achieve their daily task smoothly and efficiently. Starting from paper maps hundred years ago, multiple type of maps are available with point of interest, real-time traffic update or displaying micro-blogs from social media. In this paper, we introduce Hadath, a system that displays multi-resolution live events of interest from a variety of available data sources. The system has been designed to be able to handle multiple type of inputs by encapsulating incoming unstructured data into generic data packets. System extracts local events of interest from generic data packets and identify their spatio-temporal scope to display such events on a map, so that as a user changes the zoom level, only events of appropriate scope are displayed. This allows us to show live events in correspondence to the scale of view - when viewing at a city scale, we see events of higher significance, while zooming in to a neighbourhood, events of a more local interest are highlighted. The final output creates a unique and dynamic map browsing experience. Finally, to validate our proposed system, we conducted experiments on social media data.
Published 2020-03-05
URL https://arxiv.org/abs/2003.02615v1
PDF https://arxiv.org/pdf/2003.02615v1.pdf
Repo
Framework

#### Span-based discontinuous constituency parsing: a family of exact chart-based algorithms with time complexities from O(n^6) down to O(n^3)

Title Span-based discontinuous constituency parsing: a family of exact chart-based algorithms with time complexities from O(n^6) down to O(n^3)
Authors Caio Corro
Abstract We introduce a novel chart-based algorithm for span-based parsing of discontinuous constituency trees of block degree two, including ill-nested structures. In particular, we show that we can build variants of our parser with smaller search spaces and time complexities ranging from $\mathcal O(n^6)$ down to $\mathcal O(n^3)$. The cubic time variant covers 98% of constituents observed in linguistic treebanks while having the same complexity as continuous constituency parsers. We evaluate our approach on German and English treebanks (Negra, Tiger and Discontinuous PTB) and report state-of-the-art results in the fully supervised setting. We also experiment with pre-trained word embeddings and \bert{}-based neural networks.
Published 2020-03-30
URL https://arxiv.org/abs/2003.13785v1
PDF https://arxiv.org/pdf/2003.13785v1.pdf
PWC https://paperswithcode.com/paper/span-based-discontinuous-constituency-parsing
Repo
Framework

#### DeepFocus: a Few-Shot Microscope Slide Auto-Focus using a Sample Invariant CNN-based Sharpness Function

Title DeepFocus: a Few-Shot Microscope Slide Auto-Focus using a Sample Invariant CNN-based Sharpness Function