Paper Group ANR 168
On the Convergence of FedAvg on Non-IID Data. Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed. English-Bhojpuri SMT System: Insights from the Karaka Model. DeepPFCN: Deep Parallel Feature Consensus Network For Person Re-Identification. Improving sequence-to-sequence speech recognition training with on-the-fly data augmenta …
On the Convergence of FedAvg on Non-IID Data
Title | On the Convergence of FedAvg on Non-IID Data |
Authors | Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, Zhihua Zhang |
Abstract | Federated learning enables a large amount of edge computing devices to jointly learn a model without data sharing. As a leading algorithm in this setting, Federated Averaging (\texttt{FedAvg}) runs Stochastic Gradient Descent (SGD) in parallel on a small subset of the total devices and averages the sequences only once in a while. Despite its simplicity, it lacks theoretical guarantees under realistic settings. In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs. Importantly, our bound demonstrates a trade-off between communication-efficiency and convergence rate. As user devices may be disconnected from the server, we relax the assumption of full device participation to partial device participation and study different averaging schemes; low device participation rate can be achieved without severely slowing down the learning. Our results indicate that heterogeneity of data slows down the convergence, which matches empirical observations. Furthermore, we provide a necessary condition for \texttt{FedAvg} on non-iid data: the learning rate $\eta$ must decay, even if full-gradient is used; otherwise, the solution will be $\Omega (\eta)$ away from the optimal. |
Tasks | |
Published | 2019-07-04 |
URL | https://arxiv.org/abs/1907.02189v3 |
https://arxiv.org/pdf/1907.02189v3.pdf | |
PWC | https://paperswithcode.com/paper/on-the-convergence-of-fedavg-on-non-iid-data |
Repo | |
Framework | |
Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed
Title | Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed |
Authors | Thuong-Hai Pham, Dominik Macháček, Ondřej Bojar |
Abstract | The utility of linguistic annotation in neural machine translation seemed to had been established in past papers. The experiments were however limited to recurrent sequence-to-sequence architectures and relatively small data settings. We focus on the state-of-the-art Transformer model and use comparably larger corpora. Specifically, we try to promote the knowledge of source-side syntax using multi-task learning either through simple data manipulation techniques or through a dedicated model component. In particular, we train one of Transformer attention heads to produce source-side dependency tree. Overall, our results cast some doubt on the utility of multi-task setups with linguistic information. The data manipulation techniques, recommended in previous works, prove ineffective in large data settings. The treatment of self-attention as dependencies seems much more promising: it helps in translation and reveals that Transformer model can very easily grasp the syntactic structure. An important but curious result is, however, that identical gains are obtained by using trivial “linear trees” instead of true dependencies. The reason for the gain thus may not be coming from the added linguistic knowledge but from some simpler regularizing effect we induced on self-attention matrices. |
Tasks | Machine Translation, Multi-Task Learning |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1910.11218v1 |
https://arxiv.org/pdf/1910.11218v1.pdf | |
PWC | https://paperswithcode.com/paper/promoting-the-knowledge-of-source-syntax-in |
Repo | |
Framework | |
English-Bhojpuri SMT System: Insights from the Karaka Model
Title | English-Bhojpuri SMT System: Insights from the Karaka Model |
Authors | Atul Kr. Ojha |
Abstract | This thesis has been divided into six chapters namely: Introduction, Karaka Model and it impacts on Dependency Parsing, LT Resources for Bhojpuri, English-Bhojpuri SMT System: Experiment, Evaluation of EB-SMT System, and Conclusion. Chapter one introduces this PhD research by detailing the motivation of the study, the methodology used for the study and the literature review of the existing MT related work in Indian Languages. Chapter two talks of the theoretical background of Karaka and Karaka model. Along with this, it talks about previous related work. It also discusses the impacts of the Karaka model in NLP and dependency parsing. It compares Karaka dependency and Universal Dependency. It also presents a brief idea of the implementation of these models in the SMT system for English-Bhojpuri language pair. |
Tasks | Dependency Parsing |
Published | 2019-05-06 |
URL | https://arxiv.org/abs/1905.02239v1 |
https://arxiv.org/pdf/1905.02239v1.pdf | |
PWC | https://paperswithcode.com/paper/english-bhojpuri-smt-system-insights-from-the |
Repo | |
Framework | |
DeepPFCN: Deep Parallel Feature Consensus Network For Person Re-Identification
Title | DeepPFCN: Deep Parallel Feature Consensus Network For Person Re-Identification |
Authors | Shubham Kumar Singh, Krishna P Miyapuram, Shanmuganathan Raman |
Abstract | Person re-identification aims to associate images of the same person over multiple non-overlapping camera views at different times. Depending on the human operator, manual re-identification in large camera networks is highly time consuming and erroneous. Automated person re-identification is required due to the extensive quantity of visual data produced by rapid inflation of large scale distributed multi-camera systems. The state-of-the-art works focus on learning and factorize person appearance features into latent discriminative factors at multiple semantic levels. We propose Deep Parallel Feature Consensus Network (DeepPFCN), a novel network architecture that learns multi-scale person appearance features using convolutional neural networks. This model factorizes the visual appearance of a person into latent discriminative factors at multiple semantic levels. Finally consensus is built. The feature representations learned by DeepPFCN are more robust for the person re-identification task, as we learn discriminative scale-specific features and maximize multi-scale feature fusion selections in multi-scale image inputs. We further exploit average and max pooling in separate scale for person-specific task to discriminate features globally and locally. We demonstrate the re-identification advantages of the proposed DeepPFCN model over the state-of-the-art re-identification methods on three benchmark datasets: Market1501, DukeMTMCreID, and CUHK03. We have achieved mAP results of 75.8%, 64.3%, and 52.6% respectively on these benchmark datasets. |
Tasks | Person Re-Identification |
Published | 2019-11-18 |
URL | https://arxiv.org/abs/1911.07776v1 |
https://arxiv.org/pdf/1911.07776v1.pdf | |
PWC | https://paperswithcode.com/paper/deeppfcn-deep-parallel-feature-consensus |
Repo | |
Framework | |
Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation
Title | Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation |
Authors | Thai-Son Nguyen, Sebastian Stueker, Jan Niehues, Alex Waibel |
Abstract | Sequence-to-Sequence (S2S) models recently started to show state-of-the-art performance for automatic speech recognition (ASR). With these large and deep models overfitting remains the largest problem, outweighing performance improvements that can be obtained from better architectures. One solution to the overfitting problem is increasing the amount of available training data and the variety exhibited by the training data with the help of data augmentation. In this paper we examine the influence of three data augmentation methods on the performance of two S2S model architectures. One of the data augmentation method comes from literature, while two other methods are our own development - a time perturbation in the frequency domain and sub-sequence sampling. Our experiments on Switchboard and Fisher data show state-of-the-art performance for S2S models that are trained solely on the speech training data and do not use additional text data. |
Tasks | Data Augmentation, Sequence-To-Sequence Speech Recognition, Speech Recognition |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1910.13296v2 |
https://arxiv.org/pdf/1910.13296v2.pdf | |
PWC | https://paperswithcode.com/paper/191013296 |
Repo | |
Framework | |
A Simplified Fully Quantized Transformer for End-to-end Speech Recognition
Title | A Simplified Fully Quantized Transformer for End-to-end Speech Recognition |
Authors | Alex Bie, Bharat Venkitesh, Joao Monteiro, Md. Akmal Haidar, Mehdi Rezagholizadeh |
Abstract | While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices. That being said, in this paper, we work on simplifying and compressing Transformer-based encoder-decoder architectures for the end-to-end ASR task. We empirically introduce a more compact Speech-Transformer by investigating the impact of discarding particular modules on the performance of the model. Moreover, we evaluate reducing the numerical precision of our network’s weights and activations while maintaining the performance of the full-precision model. Our experiments show that we can reduce the number of parameters of the full-precision model and then further compress the model 4x by fully quantizing to 8-bit fixed point precision. |
Tasks | End-To-End Speech Recognition, Speech Recognition |
Published | 2019-11-09 |
URL | https://arxiv.org/abs/1911.03604v4 |
https://arxiv.org/pdf/1911.03604v4.pdf | |
PWC | https://paperswithcode.com/paper/fully-quantizing-a-simplified-transformer-for |
Repo | |
Framework | |
A Pattern Recognition Method for Partial Discharge Detection on Insulated Overhead Conductors
Title | A Pattern Recognition Method for Partial Discharge Detection on Insulated Overhead Conductors |
Authors | Ming Dong, Jessie Sun, Carl Wang |
Abstract | Today,insulated overhead conductors are increasingly used in many places of the world due to the higher operational reliability, elimination of phase-to-phase contact, closer distances between phases and stronger protection for animals. However, the standard protection devices are often not able to detect the conductor phase-to-ground fault and the more frequent tree/tree branch hitting conductor events as these events only lead to partial discharge (PD) activities instead of causing overcurrent seen on bare conductors. To solve this problem, in recent years, Technical University of Ostrava (VSB) devised a special meter to measure the voltage signal of the stray electrical field along the insulated overhead conductors, hoping to detect the above hazardous PD activities. In 2018, VSB published a large amount of waveform data recorded by their meter on Kaggle, the world’s largest data science collaboration platform, looking for promising pattern recognition methods for this application. To tackle this challenge, we developed a unique method based on Seasonal and Trend decomposition using Loess (STL) and Support Vector Machine (SVM) to recognize PD activities on insulated overhead conductors. Different SVM kernels were tested and compared. Satisfactory classification rates on VSB dataset were achieved with the use of Gaussian radial basis kernel. |
Tasks | |
Published | 2019-05-05 |
URL | https://arxiv.org/abs/1905.01588v2 |
https://arxiv.org/pdf/1905.01588v2.pdf | |
PWC | https://paperswithcode.com/paper/a-pattern-recognition-method-for-partial |
Repo | |
Framework | |
Machine-learning non-stationary noise out of gravitational wave detectors
Title | Machine-learning non-stationary noise out of gravitational wave detectors |
Authors | Gabriele Vajente, Yiwen Huang, Maximiliano Isi, Jenne C. Driggers, Jeffrey S. Kissel, Marek J. Szczepanczyk, Salvatore Vitale |
Abstract | Signal extraction out of background noise is a common challenge in high precision physics experiments, where the measurement output is often a continuous data stream. To improve the signal to noise ratio of the detection, witness sensors are often used to independently measure background noises and subtract them from the main signal. If the noise coupling is linear and stationary, optimal techniques already exist and are routinely implemented in many experiments. However, when the noise coupling is non-stationary, linear techniques often fail or are sub-optimal. Inspired by the properties of the background noise in gravitational wave detectors, this work develops a novel algorithm to efficiently characterize and remove non-stationary noise couplings, provided there exist witnesses of the noise source and of the modulation. In this work, the algorithm is described in its most general formulation, and its efficiency is demonstrated with examples from the data of the Advanced LIGO gravitational wave observatory, where we could obtain an improvement of the detector gravitational wave reach without introducing any bias on the source parameter estimation. |
Tasks | |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.09083v3 |
https://arxiv.org/pdf/1911.09083v3.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-non-stationary-noise-out-of |
Repo | |
Framework | |
Learning Effective Embeddings From Crowdsourced Labels: An Educational Case Study
Title | Learning Effective Embeddings From Crowdsourced Labels: An Educational Case Study |
Authors | Guowei Xu, Wenbiao Ding, Jiliang Tang, Songfan Yang, Gale Yan Huang, Zitao Liu |
Abstract | Learning representation has been proven to be helpful in numerous machine learning tasks. The success of the majority of existing representation learning approaches often requires a large amount of consistent and noise-free labels. However, labels are not accessible in many real-world scenarios and they are usually annotated by the crowds. In practice, the crowdsourced labels are usually inconsistent among crowd workers given their diverse expertise and the number of crowdsourced labels is very limited. Thus, directly adopting crowdsourced labels for existing representation learning algorithms is inappropriate and suboptimal. In this paper, we investigate the above problem and propose a novel framework of \textbf{R}epresentation \textbf{L}earning with crowdsourced \textbf{L}abels, i.e., “RLL”, which learns representation of data with crowdsourced labels by jointly and coherently solving the challenges introduced by limited and inconsistent labels. The proposed representation learning framework is evaluated in two real-world education applications. The experimental results demonstrate the benefits of our approach on learning representation from limited labeled data from the crowds, and show RLL is able to outperform state-of-the-art baselines. Moreover, detailed experiments are conducted on RLL to fully understand its key components and the corresponding performance. |
Tasks | Representation Learning |
Published | 2019-07-18 |
URL | https://arxiv.org/abs/1908.00086v1 |
https://arxiv.org/pdf/1908.00086v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-effective-embeddings-from |
Repo | |
Framework | |
Cross-Lingual Transfer of Semantic Roles: From Raw Text to Semantic Roles
Title | Cross-Lingual Transfer of Semantic Roles: From Raw Text to Semantic Roles |
Authors | Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab |
Abstract | We describe a transfer method based on annotation projection to develop a dependency-based semantic role labeling system for languages for which no supervised linguistic information other than parallel data is available. Unlike previous work that presumes the availability of supervised features such as lemmas, part-of-speech tags, and dependency parse trees, we only make use of word and character features. Our deep model considers using character-based representations as well as unsupervised stem embeddings to alleviate the need for supervised features. Our experiments outperform a state-of-the-art method that uses supervised lexico-syntactic features on 6 out of 7 languages in the Universal Proposition Bank. |
Tasks | Cross-Lingual Transfer, Semantic Role Labeling |
Published | 2019-04-05 |
URL | http://arxiv.org/abs/1904.03256v1 |
http://arxiv.org/pdf/1904.03256v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-lingual-transfer-of-semantic-roles-from |
Repo | |
Framework | |
Cross-lingual transfer learning for spoken language understanding
Title | Cross-lingual transfer learning for spoken language understanding |
Authors | Quynh Ngoc Thi Do, Judith Gaspers |
Abstract | Typically, spoken language understanding (SLU) models are trained on annotated data which are costly to gather. Aiming to reduce data needs for bootstrapping a SLU system for a new language, we present a simple but effective weight transfer approach using data from another language. The approach is evaluated with our promising multi-task SLU framework developed towards different languages. We evaluate our approach on the ATIS and a real-world SLU dataset, showing that i) our monolingual models outperform the state-of-the-art, ii) we can reduce data amounts needed for bootstrapping a SLU system for a new language greatly, and iii) while multitask training improves over separate training, different weight transfer settings may work best for different SLU modules. |
Tasks | Cross-Lingual Transfer, Spoken Language Understanding, Transfer Learning |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.01825v1 |
http://arxiv.org/pdf/1904.01825v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-lingual-transfer-learning-for-spoken |
Repo | |
Framework | |
Investigation on the generalization of the Sampled Policy Gradient algorithm
Title | Investigation on the generalization of the Sampled Policy Gradient algorithm |
Authors | Nil Stolt Ansó |
Abstract | The Sampled Policy Gradient (SPG) algorithm is a new offline actor-critic variant that samples in the action space to approximate the policy gradient. It does so by using the critic to evaluate the sampled actions. SPG offers theoretical promise over similar algorithms such as DPG as it searches the action-Q-value space independently of the local gradient, enabling it to avoid local minima. This paper aims to compare SPG to two similar actor-critic algorithms, CACLA and DPG. The comparison is made across two different environments, two different network architectures, as well as training on on-policy transitions in contrast to using an experience buffer. Results seem to show that although SPG does often not perform the worst, it doesn’t always match the performance of the best performing algorithm at a particular task. Further experiments are required to get a better estimate of the qualities of SPG. |
Tasks | |
Published | 2019-10-09 |
URL | https://arxiv.org/abs/1910.03728v1 |
https://arxiv.org/pdf/1910.03728v1.pdf | |
PWC | https://paperswithcode.com/paper/investigation-on-the-generalization-of-the |
Repo | |
Framework | |
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions
Title | How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions |
Authors | Goran Glavas, Robert Litschko, Sebastian Ruder, Ivan Vulic |
Abstract | Cross-lingual word embeddings (CLEs) enable multilingual modeling of meaning and facilitate cross-lingual transfer of NLP models. Despite their ubiquitous usage in downstream tasks, recent increasingly popular projection-based CLE models are almost exclusively evaluated on a single task only: bilingual lexicon induction (BLI). Even BLI evaluations vary greatly, hindering our ability to correctly interpret performance and properties of different CLE models. In this work, we make the first step towards a comprehensive evaluation of cross-lingual word embeddings. We thoroughly evaluate both supervised and unsupervised CLE models on a large number of language pairs in the BLI task and three downstream tasks, providing new insights concerning the ability of cutting-edge CLE models to support cross-lingual NLP. We empirically demonstrate that the performance of CLE models largely depends on the task at hand and that optimizing CLE models for BLI can result in deteriorated downstream performance. We indicate the most robust supervised and unsupervised CLE models and emphasize the need to reassess existing baselines, which still display competitive performance across the board. We hope that our work will catalyze further work on CLE evaluation and model analysis. |
Tasks | Cross-Lingual Transfer, Word Embeddings |
Published | 2019-02-01 |
URL | http://arxiv.org/abs/1902.00508v1 |
http://arxiv.org/pdf/1902.00508v1.pdf | |
PWC | https://paperswithcode.com/paper/how-to-properly-evaluate-cross-lingual-word |
Repo | |
Framework | |
Progressive Sample Mining and Representation Learning for One-Shot Person Re-identification with Adversarial Samples
Title | Progressive Sample Mining and Representation Learning for One-Shot Person Re-identification with Adversarial Samples |
Authors | Hui Li, Jimin Xiao, Mingjie Sun, Eng Gee Lim, Yao Zhao |
Abstract | In this paper, we aim to tackle the one-shot person re-identification problem where only one image is labelled for each person, while other images are unlabelled. This task is challenging due to the lack of sufficient labelled training data. To tackle this problem, we propose to iteratively guess pseudo labels for the unlabeled image samples, which are later used to update the re-identification model together with the labelled samples. A new sampling mechanism is designed to select unlabeled samples to pseudo labelled samples based on the distance matrix, and to form a training triplet batch including both labelled samples and pseudo labelled samples. We also design an HSoften-Triplet-Loss to soften the negative impact of the incorrect pseudo label, considering the unreliable nature of pseudo labelled samples. Finally, we deploy an adversarial learning method to expand the image samples to different camera views. Our experiments show that our framework achieves a new state-of-the-art one-shot Re-ID performance on Market-1501 (mAP 42.7%) and DukeMTMC-Reid dataset (mAP 40.3%). Code will be available soon. |
Tasks | Person Re-Identification, Representation Learning |
Published | 2019-11-02 |
URL | https://arxiv.org/abs/1911.00666v1 |
https://arxiv.org/pdf/1911.00666v1.pdf | |
PWC | https://paperswithcode.com/paper/progressive-sample-mining-and-representation |
Repo | |
Framework | |
Deep Learning-Based Automatic Downbeat Tracking: A Brief Review
Title | Deep Learning-Based Automatic Downbeat Tracking: A Brief Review |
Authors | Bijue Jia, Jiancheng Lv, Dayiheng Liu |
Abstract | As an important format of multimedia, music has filled almost everyone’s life. Automatic analyzing music is a significant step to satisfy people’s need for music retrieval and music recommendation in an effortless way. Thereinto, downbeat tracking has been a fundamental and continuous problem in Music Information Retrieval (MIR) area. Despite significant research efforts, downbeat tracking still remains a challenge. Previous researches either focus on feature engineering (extracting certain features by signal processing, which are semi-automatic solutions); or have some limitations: they can only model music audio recordings within limited time signatures and tempo ranges. Recently, deep learning has surpassed traditional machine learning methods and has become the primary algorithm in feature learning; the combination of traditional and deep learning methods also has made better performance. In this paper, we begin with a background introduction of downbeat tracking problem. Then, we give detailed discussions of the following topics: system architecture, feature extraction, deep neural network algorithms, datasets, and evaluation strategy. In addition, we take a look at the results from the annual benchmark evaluation–Music Information Retrieval Evaluation eXchange (MIREX)–as well as the developments in software implementations. Although much has been achieved in the area of automatic downbeat tracking, some problems still remain. We point out these problems and conclude with possible directions and challenges for future research. |
Tasks | Feature Engineering, Information Retrieval, Music Information Retrieval |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03870v1 |
https://arxiv.org/pdf/1906.03870v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-based-automatic-downbeat |
Repo | |
Framework | |