October 16, 2019

3351 words 16 mins read

Paper Group ANR 1076

Target Transfer Q-Learning and Its Convergence Analysis. A Novel Neural Sequence Model with Multiple Attentions for Word Sense Disambiguation. Emulating dynamic non-linear simulators using Gaussian processes. Tube-CNN: Modeling temporal evolution of appearance for object detection in video. Dialog-context aware end-to-end speech recognition. Speake …

Target Transfer Q-Learning and Its Convergence Analysis


Title	Target Transfer Q-Learning and Its Convergence Analysis
Authors	Yue Wang, Qi Meng, Wei Cheng, Yuting Liug, Zhi-Ming Ma, Tie-Yan Liu
Abstract	Q-learning is one of the most popular methods in Reinforcement Learning (RL). Transfer Learning aims to utilize the learned knowledge from source tasks to help new tasks to improve the sample complexity of the new tasks. Considering that data collection in RL is both more time and cost consuming and Q-learning converges slowly comparing to supervised learning, different kinds of transfer RL algorithms are designed. However, most of them are heuristic with no theoretical guarantee of the convergence rate. Therefore, it is important for us to clearly understand when and how will transfer learning help RL method and provide the theoretical guarantee for the improvement of the sample complexity. In this paper, we propose to transfer the Q-function learned in the source task to the target of the Q-learning in the new task when certain safe conditions are satisfied. We call this new transfer Q-learning method target transfer Q-Learning. The safe conditions are necessary to avoid the harm to the new tasks and thus ensure the convergence of the algorithm. We study the convergence rate of the target transfer Q-learning. We prove that if the two tasks are similar with respect to the MDPs, the optimal Q-functions in the source and new RL tasks are similar which means the error of the transferred target Q-function in new MDP is small. Also, the convergence rate analysis shows that the target transfer Q-Learning will converge faster than Q-learning if the error of the transferred target Q-function is smaller than the current Q-function in the new task. Based on our theoretical results, we design the safe condition as the Bellman error of the transferred target Q-function is less than the current Q-function. Our experiments are consistent with our theoretical founding and verified the effectiveness of our proposed target transfer Q-learning method.
Tasks	Q-Learning, Transfer Learning
Published	2018-09-21
URL	http://arxiv.org/abs/1809.08923v1
PDF	http://arxiv.org/pdf/1809.08923v1.pdf
PWC	https://paperswithcode.com/paper/target-transfer-q-learning-and-its
Repo
Framework

A Novel Neural Sequence Model with Multiple Attentions for Word Sense Disambiguation


Title	A Novel Neural Sequence Model with Multiple Attentions for Word Sense Disambiguation
Authors	Mahtab Ahmed, Muhammad Rifayat Samee, Robert E. Mercer
Abstract	Word sense disambiguation (WSD) is a well researched problem in computational linguistics. Different research works have approached this problem in different ways. Some state of the art results that have been achieved for this problem are by supervised models in terms of accuracy, but they often fall behind flexible knowledge-based solutions which use engineered features as well as human annotators to disambiguate every target word. This work focuses on bridging this gap using neural sequence models incorporating the well-known attention mechanism. The main gist of our work is to combine multiple attentions on different linguistic features through weights and to provide a unified framework for doing this. This weighted attention allows the model to easily disambiguate the sense of an ambiguous word by attending over a suitable portion of a sentence. Our extensive experiments show that multiple attention enables a more versatile encoder-decoder model leading to state of the art results.
Tasks	Word Sense Disambiguation
Published	2018-09-04
URL	http://arxiv.org/abs/1809.01074v1
PDF	http://arxiv.org/pdf/1809.01074v1.pdf
PWC	https://paperswithcode.com/paper/a-novel-neural-sequence-model-with-multiple
Repo
Framework

Emulating dynamic non-linear simulators using Gaussian processes


Title	Emulating dynamic non-linear simulators using Gaussian processes
Authors	Hossein Mohammadi, Peter Challenor, Marc Goodfellow
Abstract	The dynamic emulation of non-linear deterministic computer codes where the output is a time series, possibly multivariate, is examined. Such computer models simulate the evolution of some real-world phenomenon over time, for example models of the climate or the functioning of the human brain. The models we are interested in are highly non-linear and exhibit tipping points, bifurcations and chaotic behaviour. However, each simulation run could be too time-consuming to perform analyses that require many runs, including quantifying the variation in model output with respect to changes in the inputs. Therefore, Gaussian process emulators are used to approximate the output of the code. To do this, the flow map of the system under study is emulated over a short time period. Then, it is used in an iterative way to predict the whole time series. A number of ways are proposed to take into account the uncertainty of inputs to the emulators, after fixed initial conditions, and the correlation between them through the time series. The methodology is illustrated with two examples: the highly non-linear dynamical systems described by the Lorenz and Van der Pol equations. In both cases, the predictive performance is relatively high and the measure of uncertainty provided by the method reflects the extent of predictability in each system.
Tasks	Gaussian Processes, Time Series
Published	2018-02-21
URL	http://arxiv.org/abs/1802.07575v4
PDF	http://arxiv.org/pdf/1802.07575v4.pdf
PWC	https://paperswithcode.com/paper/emulating-dynamic-non-linear-simulators-using
Repo
Framework

Tube-CNN: Modeling temporal evolution of appearance for object detection in video


Title	Tube-CNN: Modeling temporal evolution of appearance for object detection in video
Authors	Tuan-Hung Vu, Anton Osokin, Ivan Laptev
Abstract	Object detection in video is crucial for many applications. Compared to images, video provides additional cues which can help to disambiguate the detection problem. Our goal in this paper is to learn discriminative models for the temporal evolution of object appearance and to use such models for object detection. To model temporal evolution, we introduce space-time tubes corresponding to temporal sequences of bounding boxes. We propose two CNN architectures for generating and classifying tubes, respectively. Our tube proposal network (TPN) first generates a large number of spatio-temporal tube proposals maximizing object recall. The Tube-CNN then implements a tube-level object detector in the video. Our method improves state of the art on two large-scale datasets for object detection in video: HollywoodHeads and ImageNet VID. Tube models show particular advantages in difficult dynamic scenes.
Tasks	Object Detection
Published	2018-12-06
URL	http://arxiv.org/abs/1812.02619v1
PDF	http://arxiv.org/pdf/1812.02619v1.pdf
PWC	https://paperswithcode.com/paper/tube-cnn-modeling-temporal-evolution-of
Repo
Framework

Dialog-context aware end-to-end speech recognition


Title	Dialog-context aware end-to-end speech recognition
Authors	Suyoun Kim, Florian Metze
Abstract	Existing speech recognition systems are typically built at the sentence level, although it is known that dialog context, e.g. higher-level knowledge that spans across sentences or speakers, can help the processing of long conversations. The recent progress in end-to-end speech recognition systems promises to integrate all available information (e.g. acoustic, language resources) into a single model, which is then jointly optimized. It seems natural that such dialog context information should thus also be integrated into the end-to-end models to improve further recognition accuracy. In this work, we present a dialog-context aware speech recognition model, which explicitly uses context information beyond sentence-level information, in an end-to-end fashion. Our dialog-context model captures a history of sentence-level context so that the whole system can be trained with dialog-context information in an end-to-end manner. We evaluate our proposed approach on the Switchboard conversational speech corpus and show that our system outperforms a comparable sentence-level end-to-end speech recognition system.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2018-08-07
URL	http://arxiv.org/abs/1808.02171v1
PDF	http://arxiv.org/pdf/1808.02171v1.pdf
PWC	https://paperswithcode.com/paper/dialog-context-aware-end-to-end-speech
Repo
Framework

Speaker Fluency Level Classification Using Machine Learning Techniques


Title	Speaker Fluency Level Classification Using Machine Learning Techniques
Authors	Alan Preciado-Grijalva, Ramon F. Brena
Abstract	Level assessment for foreign language students is necessary for putting them in the right level group, furthermore, interviewing students is a very time-consuming task, so we propose to automate the evaluation of speaker fluency level by implementing machine learning techniques. This work presents an audio processing system capable of classifying the level of fluency of non-native English speakers using five different machine learning models. As a first step, we have built our own dataset, which consists of labeled audio conversations in English between people ranging in different fluency domains/classes (low, intermediate, high). We segment the audio conversations into 5s non-overlapped audio clips to perform feature extraction on them. We start by extracting Mel cepstral coefficients from the audios, selecting 20 coefficients is an appropriate quantity for our data. We thereafter extracted zero-crossing rate, root mean square energy and spectral flux features, proving that this improves model performance. Out of a total of 1424 audio segments, with 70% training data and 30% test data, one of our trained models (support vector machine) achieved a classification accuracy of 94.39%, whereas the other four models passed an 89% classification accuracy threshold.
Tasks
Published	2018-08-31
URL	http://arxiv.org/abs/1808.10556v1
PDF	http://arxiv.org/pdf/1808.10556v1.pdf
PWC	https://paperswithcode.com/paper/speaker-fluency-level-classification-using
Repo
Framework

Only Bayes should learn a manifold (on the estimation of differential geometric structure from data)


Title	Only Bayes should learn a manifold (on the estimation of differential geometric structure from data)
Authors	Søren Hauberg
Abstract	We investigate learning of the differential geometric structure of a data manifold embedded in a high-dimensional Euclidean space. We first analyze kernel-based algorithms and show that under the usual regularizations, non-probabilistic methods cannot recover the differential geometric structure, but instead find mostly linear manifolds or spaces equipped with teleports. To properly learn the differential geometric structure, non-probabilistic methods must apply regularizations that enforce large gradients, which go against common wisdom. We repeat the analysis for probabilistic methods and find that under reasonable priors, the geometric structure can be recovered. Fully exploiting the recovered structure, however, requires the development of stochastic extensions to classic Riemannian geometry. We take early steps in that regard. Finally, we partly extend the analysis to modern models based on neural networks, thereby highlighting geometric and probabilistic shortcomings of current deep generative models.
Tasks
Published	2018-06-13
URL	https://arxiv.org/abs/1806.04994v3
PDF	https://arxiv.org/pdf/1806.04994v3.pdf
PWC	https://paperswithcode.com/paper/only-bayes-should-learn-a-manifold-on-the
Repo
Framework

Closed-form detector for solid sub-pixel targets in multivariate t-distributed background clutter


Title	Closed-form detector for solid sub-pixel targets in multivariate t-distributed background clutter
Authors	James Theiler, Beate Zimmer, Amanda Ziemann
Abstract	The generalized likelihood ratio test (GLRT) is used to derive a detector for solid sub-pixel targets in hyperspectral imagery. A closed-form solution is obtained that optimizes the replacement target model when the background is a fat-tailed elliptically-contoured multivariate t-distribution. This generalizes GLRT-based detectors that have previously been derived for the replacement target model with Gaussian background, and for the additive target model with an elliptically-contoured background. Experiments with simulated hyperspectral data illustrate the performance of this detector in various parameter regimes.
Tasks
Published	2018-04-05
URL	http://arxiv.org/abs/1804.02062v2
PDF	http://arxiv.org/pdf/1804.02062v2.pdf
PWC	https://paperswithcode.com/paper/closed-form-detector-for-solid-sub-pixel
Repo
Framework

Back-Translation-Style Data Augmentation for End-to-End ASR


Title	Back-Translation-Style Data Augmentation for End-to-End ASR
Authors	Tomoki Hayashi, Shinji Watanabe, Yu Zhang, Tomoki Toda, Takaaki Hori, Ramon Astudillo, Kazuya Takeda
Abstract	In this paper we propose a novel data augmentation method for attention-based end-to-end automatic speech recognition (E2E-ASR), utilizing a large amount of text which is not paired with speech signals. Inspired by the back-translation technique proposed in the field of machine translation, we build a neural text-to-encoder model which predicts a sequence of hidden states extracted by a pre-trained E2E-ASR encoder from a sequence of characters. By using hidden states as a target instead of acoustic features, it is possible to achieve faster attention learning and reduce computational cost, thanks to sub-sampling in E2E-ASR encoder, also the use of the hidden states can avoid to model speaker dependencies unlike acoustic features. After training, the text-to-encoder model generates the hidden states from a large amount of unpaired text, then E2E-ASR decoder is retrained using the generated hidden states as additional training data. Experimental evaluation using LibriSpeech dataset demonstrates that our proposed method achieves improvement of ASR performance and reduces the number of unknown words without the need for paired data.
Tasks	Data Augmentation, End-To-End Speech Recognition, Machine Translation, Speech Recognition
Published	2018-07-28
URL	http://arxiv.org/abs/1807.10893v1
PDF	http://arxiv.org/pdf/1807.10893v1.pdf
PWC	https://paperswithcode.com/paper/back-translation-style-data-augmentation-for
Repo
Framework

MOANOFS: Multi-Objective Automated Negotiation based Online Feature Selection System for Big Data Classification


Title	MOANOFS: Multi-Objective Automated Negotiation based Online Feature Selection System for Big Data Classification
Authors	Fatma BenSaid, Adel M. Alimi
Abstract	Feature Selection (FS) plays an important role in learning and classification tasks. The object of FS is to select the relevant and non-redundant features. Considering the huge amount number of features in real-world applications, FS methods using batch learning technique can’t resolve big data problem especially when data arrive sequentially. In this paper, we propose an online feature selection system which resolves this problem. More specifically, we treat the problem of online supervised feature selection for binary classification as a decision-making problem. A philosophical vision to this problem leads to a hybridization between two important domains: feature selection using online learning technique (OFS) and automated negotiation (AN). The proposed OFS system called MOANOFS (Multi-Objective Automated Negotiation based Online Feature Selection) uses two levels of decision. In the first level, from n learners (or OFS methods), we decide which are the k trustful ones (with high confidence or trust value). These elected k learners will participate in the second level. In this level, we integrate our proposed Multilateral Automated Negotiation based OFS (MANOFS) method to decide finally which is the best solution or which are relevant features. We show that MOANOFS system is applicable to different domains successfully and achieves high accuracy with several real-world applications. Index Terms: Feature selection, online learning, multi-objective automated negotiation, trust, classification, big data.
Tasks	Decision Making, Feature Selection
Published	2018-10-11
URL	https://arxiv.org/abs/1810.04903v2
PDF	https://arxiv.org/pdf/1810.04903v2.pdf
PWC	https://paperswithcode.com/paper/moanofs-multi-objective-automated-negotiation
Repo
Framework

Vehicle Instance Segmentation from Aerial Image and Video Using a Multi-Task Learning Residual Fully Convolutional Network


Title	Vehicle Instance Segmentation from Aerial Image and Video Using a Multi-Task Learning Residual Fully Convolutional Network
Authors	Lichao Mou, Xiao Xiang Zhu
Abstract	Object detection and semantic segmentation are two main themes in object retrieval from high-resolution remote sensing images, which have recently achieved remarkable performance by surfing the wave of deep learning and, more notably, convolutional neural networks (CNNs). In this paper, we are interested in a novel, more challenging problem of vehicle instance segmentation, which entails identifying, at a pixel-level, where the vehicles appear as well as associating each pixel with a physical instance of a vehicle. In contrast, vehicle detection and semantic segmentation each only concern one of the two. We propose to tackle this problem with a semantic boundary-aware multi-task learning network. More specifically, we utilize the philosophy of residual learning (ResNet) to construct a fully convolutional network that is capable of harnessing multi-level contextual feature representations learned from different residual blocks. We theoretically analyze and discuss why residual networks can produce better probability maps for pixel-wise segmentation tasks. Then, based on this network architecture, we propose a unified multi-task learning network that can simultaneously learn two complementary tasks, namely, segmenting vehicle regions and detecting semantic boundaries. The latter subproblem is helpful for differentiating closely spaced vehicles, which are usually not correctly separated into instances. Currently, datasets with pixel-wise annotation for vehicle extraction are ISPRS dataset and IEEE GRSS DFC2015 dataset over Zeebrugge, which specializes in semantic segmentation. Therefore, we built a new, more challenging dataset for vehicle instance segmentation, called the Busy Parking Lot UAV Video dataset, and we make our dataset available at http://www.sipeo.bgu.tum.de/download so that it can be used to benchmark future vehicle instance segmentation algorithms.
Tasks	Instance Segmentation, Multi-Task Learning, Object Detection, Semantic Segmentation
Published	2018-05-26
URL	http://arxiv.org/abs/1805.10485v1
PDF	http://arxiv.org/pdf/1805.10485v1.pdf
PWC	https://paperswithcode.com/paper/vehicle-instance-segmentation-from-aerial
Repo
Framework

Adaptive Federated Learning in Resource Constrained Edge Computing Systems


Title	Adaptive Federated Learning in Resource Constrained Edge Computing Systems
Authors	Shiqiang Wang, Tiffany Tuor, Theodoros Salonidis, Kin K. Leung, Christian Makaya, Ting He, Kevin Chan
Abstract	Emerging technologies and applications including Internet of Things (IoT), social networking, and crowd-sourcing generate large amounts of data at the network edge. Machine learning models are often built from the collected data, to enable the detection, classification, and prediction of future events. Due to bandwidth, storage, and privacy concerns, it is often impractical to send all the data to a centralized location. In this paper, we consider the problem of learning model parameters from data distributed across multiple edge nodes, without sending raw data to a centralized place. Our focus is on a generic class of machine learning models that are trained using gradient-descent based approaches. We analyze the convergence bound of distributed gradient descent from a theoretical point of view, based on which we propose a control algorithm that determines the best trade-off between local update and global parameter aggregation to minimize the loss function under a given resource budget. The performance of the proposed algorithm is evaluated via extensive experiments with real datasets, both on a networked prototype system and in a larger-scale simulated environment. The experimentation results show that our proposed approach performs near to the optimum with various machine learning models and different data distributions.
Tasks
Published	2018-04-14
URL	http://arxiv.org/abs/1804.05271v3
PDF	http://arxiv.org/pdf/1804.05271v3.pdf
PWC	https://paperswithcode.com/paper/adaptive-federated-learning-in-resource
Repo
Framework

Acoustic-to-Word Recognition with Sequence-to-Sequence Models


Title	Acoustic-to-Word Recognition with Sequence-to-Sequence Models
Authors	Shruti Palaskar, Florian Metze
Abstract	Acoustic-to-Word recognition provides a straightforward solution to end-to-end speech recognition without needing external decoding, language model re-scoring or lexicon. While character-based models offer a natural solution to the out-of-vocabulary problem, word models can be simpler to decode and may also be able to directly recognize semantically meaningful units. We present effective methods to train Sequence-to-Sequence models for direct word-level recognition (and character-level recognition) and show an absolute improvement of 4.4-5.0% in Word Error Rate on the Switchboard corpus compared to prior work. In addition to these promising results, word-based models are more interpretable than character models, which have to be composed into words using a separate decoding step. We analyze the encoder hidden states and the attention behavior, and show that location-aware attention naturally represents words as a single speech-word-vector, despite spanning multiple frames in the input. We finally show that the Acoustic-to-Word model also learns to segment speech into words with a mean standard deviation of 3 frames as compared with human annotated forced-alignments for the Switchboard corpus.
Tasks	End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published	2018-07-23
URL	http://arxiv.org/abs/1807.09597v2
PDF	http://arxiv.org/pdf/1807.09597v2.pdf
PWC	https://paperswithcode.com/paper/acoustic-to-word-recognition-with-sequence-to
Repo
Framework

Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units


Title	Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units
Authors	Zhangyu Xiao, Zhijian Ou, Wei Chu, Hui Lin
Abstract	In this paper, we present an end-to-end automatic speech recognition system, which successfully employs subword units in a hybrid CTC-Attention based system. The subword units are obtained by the byte-pair encoding (BPE) compression algorithm. Compared to using words as modeling units, using characters or subword units does not suffer from the out-of-vocabulary (OOV) problem. Furthermore, using subword units further offers a capability in modeling longer context than using characters. We evaluate different systems over the LibriSpeech 1000h dataset. The subword-based hybrid CTC-Attention system obtains 6.8% word error rate (WER) on the test_clean subset without any dictionary or external language model. This represents a significant improvement (a 12.8% WER relative reduction) over the character-based hybrid CTC-Attention system.
Tasks	End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published	2018-07-13
URL	http://arxiv.org/abs/1807.04978v2
PDF	http://arxiv.org/pdf/1807.04978v2.pdf
PWC	https://paperswithcode.com/paper/hybrid-ctc-attention-based-end-to-end-speech
Repo
Framework

Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin


Title	Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin
Authors	Linhao Dong, Shiyu Zhou, Wei Chen, Bo Xu
Abstract	End-to-end models have been showing superiority in Automatic Speech Recognition (ASR). At the same time, the capacity of streaming recognition has become a growing requirement for end-to-end models. Following these trends, an encoder-decoder recurrent neural network called Recurrent Neural Aligner (RNA) has been freshly proposed and shown its competitiveness on two English ASR tasks. However, it is not clear if RNA can be further improved and applied to other spoken language. In this work, we explore the applicability of RNA in Mandarin Chinese and present four effective extensions: In the encoder, we redesign the temporal down-sampling and introduce a powerful convolutional structure. In the decoder, we utilize a regularizer to smooth the output distribution and conduct joint training with a language model. On two Mandarin Chinese conversational telephone speech recognition (MTS) datasets, our Extended-RNA obtains promising performance. Particularly, it achieves 27.7% character error rate (CER), which is superior to current state-of-the-art result on the popular HKUST task.
Tasks	End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published	2018-06-17
URL	http://arxiv.org/abs/1806.06342v2
PDF	http://arxiv.org/pdf/1806.06342v2.pdf
PWC	https://paperswithcode.com/paper/extending-recurrent-neural-aligner-for
Repo
Framework