January 31, 2020

3193 words 15 mins read

Paper Group ANR 168

On the Convergence of FedAvg on Non-IID Data. Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed. English-Bhojpuri SMT System: Insights from the Karaka Model. DeepPFCN: Deep Parallel Feature Consensus Network For Person Re-Identification. Improving sequence-to-sequence speech recognition training with on-the-fly data augmenta …

On the Convergence of FedAvg on Non-IID Data


Title	On the Convergence of FedAvg on Non-IID Data
Authors	Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, Zhihua Zhang
Abstract	Federated learning enables a large amount of edge computing devices to jointly learn a model without data sharing. As a leading algorithm in this setting, Federated Averaging (\texttt{FedAvg}) runs Stochastic Gradient Descent (SGD) in parallel on a small subset of the total devices and averages the sequences only once in a while. Despite its simplicity, it lacks theoretical guarantees under realistic settings. In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs. Importantly, our bound demonstrates a trade-off between communication-efficiency and convergence rate. As user devices may be disconnected from the server, we relax the assumption of full device participation to partial device participation and study different averaging schemes; low device participation rate can be achieved without severely slowing down the learning. Our results indicate that heterogeneity of data slows down the convergence, which matches empirical observations. Furthermore, we provide a necessary condition for \texttt{FedAvg} on non-iid data: the learning rate $\eta$ must decay, even if full-gradient is used; otherwise, the solution will be $\Omega (\eta)$ away from the optimal.
Tasks
Published	2019-07-04
URL	https://arxiv.org/abs/1907.02189v3
PDF	https://arxiv.org/pdf/1907.02189v3.pdf
PWC	https://paperswithcode.com/paper/on-the-convergence-of-fedavg-on-non-iid-data
Repo
Framework

Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed


Title	Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed
Authors	Thuong-Hai Pham, Dominik Macháček, Ondřej Bojar
Abstract	The utility of linguistic annotation in neural machine translation seemed to had been established in past papers. The experiments were however limited to recurrent sequence-to-sequence architectures and relatively small data settings. We focus on the state-of-the-art Transformer model and use comparably larger corpora. Specifically, we try to promote the knowledge of source-side syntax using multi-task learning either through simple data manipulation techniques or through a dedicated model component. In particular, we train one of Transformer attention heads to produce source-side dependency tree. Overall, our results cast some doubt on the utility of multi-task setups with linguistic information. The data manipulation techniques, recommended in previous works, prove ineffective in large data settings. The treatment of self-attention as dependencies seems much more promising: it helps in translation and reveals that Transformer model can very easily grasp the syntactic structure. An important but curious result is, however, that identical gains are obtained by using trivial “linear trees” instead of true dependencies. The reason for the gain thus may not be coming from the added linguistic knowledge but from some simpler regularizing effect we induced on self-attention matrices.
Tasks	Machine Translation, Multi-Task Learning
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11218v1
PDF	https://arxiv.org/pdf/1910.11218v1.pdf
PWC	https://paperswithcode.com/paper/promoting-the-knowledge-of-source-syntax-in
Repo
Framework

English-Bhojpuri SMT System: Insights from the Karaka Model


Title	English-Bhojpuri SMT System: Insights from the Karaka Model
Authors	Atul Kr. Ojha
Abstract	This thesis has been divided into six chapters namely: Introduction, Karaka Model and it impacts on Dependency Parsing, LT Resources for Bhojpuri, English-Bhojpuri SMT System: Experiment, Evaluation of EB-SMT System, and Conclusion. Chapter one introduces this PhD research by detailing the motivation of the study, the methodology used for the study and the literature review of the existing MT related work in Indian Languages. Chapter two talks of the theoretical background of Karaka and Karaka model. Along with this, it talks about previous related work. It also discusses the impacts of the Karaka model in NLP and dependency parsing. It compares Karaka dependency and Universal Dependency. It also presents a brief idea of the implementation of these models in the SMT system for English-Bhojpuri language pair.
Tasks	Dependency Parsing
Published	2019-05-06
URL	https://arxiv.org/abs/1905.02239v1
PDF	https://arxiv.org/pdf/1905.02239v1.pdf
PWC	https://paperswithcode.com/paper/english-bhojpuri-smt-system-insights-from-the
Repo
Framework

DeepPFCN: Deep Parallel Feature Consensus Network For Person Re-Identification


Title	DeepPFCN: Deep Parallel Feature Consensus Network For Person Re-Identification
Authors	Shubham Kumar Singh, Krishna P Miyapuram, Shanmuganathan Raman
Abstract	Person re-identification aims to associate images of the same person over multiple non-overlapping camera views at different times. Depending on the human operator, manual re-identification in large camera networks is highly time consuming and erroneous. Automated person re-identification is required due to the extensive quantity of visual data produced by rapid inflation of large scale distributed multi-camera systems. The state-of-the-art works focus on learning and factorize person appearance features into latent discriminative factors at multiple semantic levels. We propose Deep Parallel Feature Consensus Network (DeepPFCN), a novel network architecture that learns multi-scale person appearance features using convolutional neural networks. This model factorizes the visual appearance of a person into latent discriminative factors at multiple semantic levels. Finally consensus is built. The feature representations learned by DeepPFCN are more robust for the person re-identification task, as we learn discriminative scale-specific features and maximize multi-scale feature fusion selections in multi-scale image inputs. We further exploit average and max pooling in separate scale for person-specific task to discriminate features globally and locally. We demonstrate the re-identification advantages of the proposed DeepPFCN model over the state-of-the-art re-identification methods on three benchmark datasets: Market1501, DukeMTMCreID, and CUHK03. We have achieved mAP results of 75.8%, 64.3%, and 52.6% respectively on these benchmark datasets.
Tasks	Person Re-Identification
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07776v1
PDF	https://arxiv.org/pdf/1911.07776v1.pdf
PWC	https://paperswithcode.com/paper/deeppfcn-deep-parallel-feature-consensus
Repo
Framework

Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation


Title	Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation
Authors	Thai-Son Nguyen, Sebastian Stueker, Jan Niehues, Alex Waibel
Abstract	Sequence-to-Sequence (S2S) models recently started to show state-of-the-art performance for automatic speech recognition (ASR). With these large and deep models overfitting remains the largest problem, outweighing performance improvements that can be obtained from better architectures. One solution to the overfitting problem is increasing the amount of available training data and the variety exhibited by the training data with the help of data augmentation. In this paper we examine the influence of three data augmentation methods on the performance of two S2S model architectures. One of the data augmentation method comes from literature, while two other methods are our own development - a time perturbation in the frequency domain and sub-sequence sampling. Our experiments on Switchboard and Fisher data show state-of-the-art performance for S2S models that are trained solely on the speech training data and do not use additional text data.
Tasks	Data Augmentation, Sequence-To-Sequence Speech Recognition, Speech Recognition
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13296v2
PDF	https://arxiv.org/pdf/1910.13296v2.pdf
PWC	https://paperswithcode.com/paper/191013296
Repo
Framework

A Simplified Fully Quantized Transformer for End-to-end Speech Recognition


Title	A Simplified Fully Quantized Transformer for End-to-end Speech Recognition
Authors	Alex Bie, Bharat Venkitesh, Joao Monteiro, Md. Akmal Haidar, Mehdi Rezagholizadeh
Abstract	While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices. That being said, in this paper, we work on simplifying and compressing Transformer-based encoder-decoder architectures for the end-to-end ASR task. We empirically introduce a more compact Speech-Transformer by investigating the impact of discarding particular modules on the performance of the model. Moreover, we evaluate reducing the numerical precision of our network’s weights and activations while maintaining the performance of the full-precision model. Our experiments show that we can reduce the number of parameters of the full-precision model and then further compress the model 4x by fully quantizing to 8-bit fixed point precision.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2019-11-09
URL	https://arxiv.org/abs/1911.03604v4
PDF	https://arxiv.org/pdf/1911.03604v4.pdf
PWC	https://paperswithcode.com/paper/fully-quantizing-a-simplified-transformer-for
Repo
Framework

A Pattern Recognition Method for Partial Discharge Detection on Insulated Overhead Conductors


Title	A Pattern Recognition Method for Partial Discharge Detection on Insulated Overhead Conductors
Authors	Ming Dong, Jessie Sun, Carl Wang
Abstract	Today,insulated overhead conductors are increasingly used in many places of the world due to the higher operational reliability, elimination of phase-to-phase contact, closer distances between phases and stronger protection for animals. However, the standard protection devices are often not able to detect the conductor phase-to-ground fault and the more frequent tree/tree branch hitting conductor events as these events only lead to partial discharge (PD) activities instead of causing overcurrent seen on bare conductors. To solve this problem, in recent years, Technical University of Ostrava (VSB) devised a special meter to measure the voltage signal of the stray electrical field along the insulated overhead conductors, hoping to detect the above hazardous PD activities. In 2018, VSB published a large amount of waveform data recorded by their meter on Kaggle, the world’s largest data science collaboration platform, looking for promising pattern recognition methods for this application. To tackle this challenge, we developed a unique method based on Seasonal and Trend decomposition using Loess (STL) and Support Vector Machine (SVM) to recognize PD activities on insulated overhead conductors. Different SVM kernels were tested and compared. Satisfactory classification rates on VSB dataset were achieved with the use of Gaussian radial basis kernel.
Tasks
Published	2019-05-05
URL	https://arxiv.org/abs/1905.01588v2
PDF	https://arxiv.org/pdf/1905.01588v2.pdf
PWC	https://paperswithcode.com/paper/a-pattern-recognition-method-for-partial
Repo
Framework

Machine-learning non-stationary noise out of gravitational wave detectors


Title	Machine-learning non-stationary noise out of gravitational wave detectors
Authors	Gabriele Vajente, Yiwen Huang, Maximiliano Isi, Jenne C. Driggers, Jeffrey S. Kissel, Marek J. Szczepanczyk, Salvatore Vitale
Abstract	Signal extraction out of background noise is a common challenge in high precision physics experiments, where the measurement output is often a continuous data stream. To improve the signal to noise ratio of the detection, witness sensors are often used to independently measure background noises and subtract them from the main signal. If the noise coupling is linear and stationary, optimal techniques already exist and are routinely implemented in many experiments. However, when the noise coupling is non-stationary, linear techniques often fail or are sub-optimal. Inspired by the properties of the background noise in gravitational wave detectors, this work develops a novel algorithm to efficiently characterize and remove non-stationary noise couplings, provided there exist witnesses of the noise source and of the modulation. In this work, the algorithm is described in its most general formulation, and its efficiency is demonstrated with examples from the data of the Advanced LIGO gravitational wave observatory, where we could obtain an improvement of the detector gravitational wave reach without introducing any bias on the source parameter estimation.
Tasks
Published	2019-11-20
URL	https://arxiv.org/abs/1911.09083v3
PDF	https://arxiv.org/pdf/1911.09083v3.pdf
PWC	https://paperswithcode.com/paper/machine-learning-non-stationary-noise-out-of
Repo
Framework

Learning Effective Embeddings From Crowdsourced Labels: An Educational Case Study


Title	Learning Effective Embeddings From Crowdsourced Labels: An Educational Case Study
Authors	Guowei Xu, Wenbiao Ding, Jiliang Tang, Songfan Yang, Gale Yan Huang, Zitao Liu
Abstract	Learning representation has been proven to be helpful in numerous machine learning tasks. The success of the majority of existing representation learning approaches often requires a large amount of consistent and noise-free labels. However, labels are not accessible in many real-world scenarios and they are usually annotated by the crowds. In practice, the crowdsourced labels are usually inconsistent among crowd workers given their diverse expertise and the number of crowdsourced labels is very limited. Thus, directly adopting crowdsourced labels for existing representation learning algorithms is inappropriate and suboptimal. In this paper, we investigate the above problem and propose a novel framework of \textbf{R}epresentation \textbf{L}earning with crowdsourced \textbf{L}abels, i.e., “RLL”, which learns representation of data with crowdsourced labels by jointly and coherently solving the challenges introduced by limited and inconsistent labels. The proposed representation learning framework is evaluated in two real-world education applications. The experimental results demonstrate the benefits of our approach on learning representation from limited labeled data from the crowds, and show RLL is able to outperform state-of-the-art baselines. Moreover, detailed experiments are conducted on RLL to fully understand its key components and the corresponding performance.
Tasks	Representation Learning
Published	2019-07-18
URL	https://arxiv.org/abs/1908.00086v1
PDF	https://arxiv.org/pdf/1908.00086v1.pdf
PWC	https://paperswithcode.com/paper/learning-effective-embeddings-from
Repo
Framework

Cross-Lingual Transfer of Semantic Roles: From Raw Text to Semantic Roles


Title	Cross-Lingual Transfer of Semantic Roles: From Raw Text to Semantic Roles
Authors	Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab
Abstract	We describe a transfer method based on annotation projection to develop a dependency-based semantic role labeling system for languages for which no supervised linguistic information other than parallel data is available. Unlike previous work that presumes the availability of supervised features such as lemmas, part-of-speech tags, and dependency parse trees, we only make use of word and character features. Our deep model considers using character-based representations as well as unsupervised stem embeddings to alleviate the need for supervised features. Our experiments outperform a state-of-the-art method that uses supervised lexico-syntactic features on 6 out of 7 languages in the Universal Proposition Bank.
Tasks	Cross-Lingual Transfer, Semantic Role Labeling
Published	2019-04-05
URL	http://arxiv.org/abs/1904.03256v1
PDF	http://arxiv.org/pdf/1904.03256v1.pdf
PWC	https://paperswithcode.com/paper/cross-lingual-transfer-of-semantic-roles-from
Repo
Framework

Cross-lingual transfer learning for spoken language understanding


Title	Cross-lingual transfer learning for spoken language understanding
Authors	Quynh Ngoc Thi Do, Judith Gaspers
Abstract	Typically, spoken language understanding (SLU) models are trained on annotated data which are costly to gather. Aiming to reduce data needs for bootstrapping a SLU system for a new language, we present a simple but effective weight transfer approach using data from another language. The approach is evaluated with our promising multi-task SLU framework developed towards different languages. We evaluate our approach on the ATIS and a real-world SLU dataset, showing that i) our monolingual models outperform the state-of-the-art, ii) we can reduce data amounts needed for bootstrapping a SLU system for a new language greatly, and iii) while multitask training improves over separate training, different weight transfer settings may work best for different SLU modules.
Tasks	Cross-Lingual Transfer, Spoken Language Understanding, Transfer Learning
Published	2019-04-03
URL	http://arxiv.org/abs/1904.01825v1
PDF	http://arxiv.org/pdf/1904.01825v1.pdf
PWC	https://paperswithcode.com/paper/cross-lingual-transfer-learning-for-spoken
Repo
Framework

Investigation on the generalization of the Sampled Policy Gradient algorithm


Title	Investigation on the generalization of the Sampled Policy Gradient algorithm
Authors	Nil Stolt Ansó
Abstract	The Sampled Policy Gradient (SPG) algorithm is a new offline actor-critic variant that samples in the action space to approximate the policy gradient. It does so by using the critic to evaluate the sampled actions. SPG offers theoretical promise over similar algorithms such as DPG as it searches the action-Q-value space independently of the local gradient, enabling it to avoid local minima. This paper aims to compare SPG to two similar actor-critic algorithms, CACLA and DPG. The comparison is made across two different environments, two different network architectures, as well as training on on-policy transitions in contrast to using an experience buffer. Results seem to show that although SPG does often not perform the worst, it doesn’t always match the performance of the best performing algorithm at a particular task. Further experiments are required to get a better estimate of the qualities of SPG.
Tasks
Published	2019-10-09
URL	https://arxiv.org/abs/1910.03728v1
PDF	https://arxiv.org/pdf/1910.03728v1.pdf
PWC	https://paperswithcode.com/paper/investigation-on-the-generalization-of-the
Repo
Framework

How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions


Title	How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions
Authors	Goran Glavas, Robert Litschko, Sebastian Ruder, Ivan Vulic
Abstract	Cross-lingual word embeddings (CLEs) enable multilingual modeling of meaning and facilitate cross-lingual transfer of NLP models. Despite their ubiquitous usage in downstream tasks, recent increasingly popular projection-based CLE models are almost exclusively evaluated on a single task only: bilingual lexicon induction (BLI). Even BLI evaluations vary greatly, hindering our ability to correctly interpret performance and properties of different CLE models. In this work, we make the first step towards a comprehensive evaluation of cross-lingual word embeddings. We thoroughly evaluate both supervised and unsupervised CLE models on a large number of language pairs in the BLI task and three downstream tasks, providing new insights concerning the ability of cutting-edge CLE models to support cross-lingual NLP. We empirically demonstrate that the performance of CLE models largely depends on the task at hand and that optimizing CLE models for BLI can result in deteriorated downstream performance. We indicate the most robust supervised and unsupervised CLE models and emphasize the need to reassess existing baselines, which still display competitive performance across the board. We hope that our work will catalyze further work on CLE evaluation and model analysis.
Tasks	Cross-Lingual Transfer, Word Embeddings
Published	2019-02-01
URL	http://arxiv.org/abs/1902.00508v1
PDF	http://arxiv.org/pdf/1902.00508v1.pdf
PWC	https://paperswithcode.com/paper/how-to-properly-evaluate-cross-lingual-word
Repo
Framework

Progressive Sample Mining and Representation Learning for One-Shot Person Re-identification with Adversarial Samples


Title	Progressive Sample Mining and Representation Learning for One-Shot Person Re-identification with Adversarial Samples
Authors	Hui Li, Jimin Xiao, Mingjie Sun, Eng Gee Lim, Yao Zhao
Abstract	In this paper, we aim to tackle the one-shot person re-identification problem where only one image is labelled for each person, while other images are unlabelled. This task is challenging due to the lack of sufficient labelled training data. To tackle this problem, we propose to iteratively guess pseudo labels for the unlabeled image samples, which are later used to update the re-identification model together with the labelled samples. A new sampling mechanism is designed to select unlabeled samples to pseudo labelled samples based on the distance matrix, and to form a training triplet batch including both labelled samples and pseudo labelled samples. We also design an HSoften-Triplet-Loss to soften the negative impact of the incorrect pseudo label, considering the unreliable nature of pseudo labelled samples. Finally, we deploy an adversarial learning method to expand the image samples to different camera views. Our experiments show that our framework achieves a new state-of-the-art one-shot Re-ID performance on Market-1501 (mAP 42.7%) and DukeMTMC-Reid dataset (mAP 40.3%). Code will be available soon.
Tasks	Person Re-Identification, Representation Learning
Published	2019-11-02
URL	https://arxiv.org/abs/1911.00666v1
PDF	https://arxiv.org/pdf/1911.00666v1.pdf
PWC	https://paperswithcode.com/paper/progressive-sample-mining-and-representation
Repo
Framework

Deep Learning-Based Automatic Downbeat Tracking: A Brief Review


Title	Deep Learning-Based Automatic Downbeat Tracking: A Brief Review
Authors	Bijue Jia, Jiancheng Lv, Dayiheng Liu
Abstract	As an important format of multimedia, music has filled almost everyone’s life. Automatic analyzing music is a significant step to satisfy people’s need for music retrieval and music recommendation in an effortless way. Thereinto, downbeat tracking has been a fundamental and continuous problem in Music Information Retrieval (MIR) area. Despite significant research efforts, downbeat tracking still remains a challenge. Previous researches either focus on feature engineering (extracting certain features by signal processing, which are semi-automatic solutions); or have some limitations: they can only model music audio recordings within limited time signatures and tempo ranges. Recently, deep learning has surpassed traditional machine learning methods and has become the primary algorithm in feature learning; the combination of traditional and deep learning methods also has made better performance. In this paper, we begin with a background introduction of downbeat tracking problem. Then, we give detailed discussions of the following topics: system architecture, feature extraction, deep neural network algorithms, datasets, and evaluation strategy. In addition, we take a look at the results from the annual benchmark evaluation–Music Information Retrieval Evaluation eXchange (MIREX)–as well as the developments in software implementations. Although much has been achieved in the area of automatic downbeat tracking, some problems still remain. We point out these problems and conclude with possible directions and challenges for future research.
Tasks	Feature Engineering, Information Retrieval, Music Information Retrieval
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03870v1
PDF	https://arxiv.org/pdf/1906.03870v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-based-automatic-downbeat
Repo
Framework