January 28, 2020

3218 words 16 mins read

Paper Group ANR 991

Paper Group ANR 991

Two-stage Training for Chinese Dialect Recognition. Ghost Units Yield Biologically Plausible Backprop in Deep Neural Networks. Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes. Adversarial Training for Multilingual Acoustic Modeling. Assessing the Benchmarking Capacity of Machine Reading Comprehension Datase …

Two-stage Training for Chinese Dialect Recognition

Title Two-stage Training for Chinese Dialect Recognition
Authors Zongze Ren, Guofu Yang, Shugong Xu
Abstract In this paper, we present a two-stage language identification (LID) system based on a shallow ResNet14 followed by a simple 2-layer recurrent neural network (RNN) architecture, which was used for Xunfei (iFlyTek) Chinese Dialect Recognition Challenge and won the first place among 110 teams. The system trains an acoustic model (AM) firstly with connectionist temporal classification (CTC) to recognize the given phonetic sequence annotation and then train another RNN to classify dialect category by utilizing the intermediate features as inputs from the AM. Compared with a three-stage system we further explore, our results show that the two-stage system can achieve high accuracy for Chinese dialects recognition under both short utterance and long utterance conditions with less training time.
Tasks Language Identification
Published 2019-08-06
URL https://arxiv.org/abs/1908.02284v2
PDF https://arxiv.org/pdf/1908.02284v2.pdf
PWC https://paperswithcode.com/paper/two-stage-training-for-chinese-dialect
Repo
Framework

Ghost Units Yield Biologically Plausible Backprop in Deep Neural Networks

Title Ghost Units Yield Biologically Plausible Backprop in Deep Neural Networks
Authors Thomas Mesnard, Gaetan Vignoud, Joao Sacramento, Walter Senn, Yoshua Bengio
Abstract In the past few years, deep learning has transformed artificial intelligence research and led to impressive performance in various difficult tasks. However, it is still unclear how the brain can perform credit assignment across many areas as efficiently as backpropagation does in deep neural networks. In this paper, we introduce a model that relies on a new role for a neuronal inhibitory machinery, referred to as ghost units. By cancelling the feedback coming from the upper layer when no target signal is provided to the top layer, the ghost units enables the network to backpropagate errors and do efficient credit assignment in deep structures. While considering one-compartment neurons and requiring very few biological assumptions, it is able to approximate the error gradient and achieve good performance on classification tasks. Error backpropagation occurs through the recurrent dynamics of the network and thanks to biologically plausible local learning rules. In particular, it does not require separate feedforward and feedback circuits. Different mechanisms for cancelling the feedback were studied, ranging from complete duplication of the connectivity by long term processes to online replication of the feedback activity. This reduced system combines the essential elements to have a working biologically abstracted analogue of backpropagation with a simple formulation and proofs of the associated results. Therefore, this model is a step towards understanding how learning and memory are implemented in cortical multilayer structures, but it also raises interesting perspectives for neuromorphic hardware.
Tasks
Published 2019-11-15
URL https://arxiv.org/abs/1911.08585v1
PDF https://arxiv.org/pdf/1911.08585v1.pdf
PWC https://paperswithcode.com/paper/ghost-units-yield-biologically-plausible
Repo
Framework

Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes

Title Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
Authors Arghyadip Roy, Vivek Borkar, Abhay Karandikar, Prasanna Chaporkar
Abstract Markov Decision Process (MDP) problems can be solved using Dynamic Programming (DP) methods which suffer from the curse of dimensionality and the curse of modeling. To overcome these issues, Reinforcement Learning (RL) methods are adopted in practice. In this paper, we aim to obtain the optimal admission control policy in a system where different classes of customers are present. Using DP techniques, we prove that it is optimal to admit the $i$ th class of customers only upto a threshold $\tau(i)$ which is a non-increasing function of $i$. Contrary to traditional RL algorithms which do not take into account the structural properties of the optimal policy while learning, we propose a structure-aware learning algorithm which exploits the threshold structure of the optimal policy. We prove the asymptotic convergence of the proposed algorithm to the optimal policy. Due to the reduction in the policy space, the structure-aware learning algorithm provides remarkable improvements in storage and computational complexities over classical RL algorithms. Simulation results also establish the gain in the convergence rate of the proposed algorithm over other RL algorithms. The techniques presented in the paper can be applied to any general MDP problem covering various applications such as inventory management, financial planning and communication networking.
Tasks
Published 2019-12-21
URL https://arxiv.org/abs/1912.10325v1
PDF https://arxiv.org/pdf/1912.10325v1.pdf
PWC https://paperswithcode.com/paper/online-reinforcement-learning-of-optimal
Repo
Framework

Adversarial Training for Multilingual Acoustic Modeling

Title Adversarial Training for Multilingual Acoustic Modeling
Authors Ke Hu, Hasim Sak, Hank Liao
Abstract Multilingual training has been shown to improve acoustic modeling performance by sharing and transferring knowledge in modeling different languages. Knowledge sharing is usually achieved by using common lower-level layers for different languages in a deep neural network. Recently, the domain adversarial network was proposed to reduce domain mismatch of training data and learn domain-invariant features. It is thus worth exploring whether adversarial training can further promote knowledge sharing in multilingual models. In this work, we apply the domain adversarial network to encourage the shared layers of a multilingual model to learn language-invariant features. Bidirectional Long Short-Term Memory (LSTM) recurrent neural networks (RNN) are used as building blocks. We show that shared layers learned this way contain less language identification information and lead to better performance. In an automatic speech recognition task for seven languages, the resultant acoustic model improves the word error rate (WER) of the multilingual model by 4% relative on average, and the monolingual models by 10%.
Tasks Language Identification, Speech Recognition
Published 2019-06-17
URL https://arxiv.org/abs/1906.07093v1
PDF https://arxiv.org/pdf/1906.07093v1.pdf
PWC https://paperswithcode.com/paper/adversarial-training-for-multilingual
Repo
Framework

Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets

Title Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets
Authors Saku Sugawara, Pontus Stenetorp, Kentaro Inui, Akiko Aizawa
Abstract Existing analysis work in machine reading comprehension (MRC) is largely concerned with evaluating the capabilities of systems. However, the capabilities of datasets are not assessed for benchmarking language understanding precisely. We propose a semi-automated, ablation-based methodology for this challenge; By checking whether questions can be solved even after removing features associated with a skill requisite for language understanding, we evaluate to what degree the questions do not require the skill. Experiments on 10 datasets (e.g., CoQA, SQuAD v2.0, and RACE) with a strong baseline model show that, for example, the relative scores of a baseline model provided with content words only and with shuffled sentence words in the context are on average 89.2% and 78.5% of the original score, respectively. These results suggest that most of the questions already answered correctly by the model do not necessarily require grammatical and complex reasoning. For precise benchmarking, MRC datasets will need to take extra care in their design to ensure that questions can correctly evaluate the intended skills.
Tasks Machine Reading Comprehension, Reading Comprehension
Published 2019-11-21
URL https://arxiv.org/abs/1911.09241v1
PDF https://arxiv.org/pdf/1911.09241v1.pdf
PWC https://paperswithcode.com/paper/assessing-the-benchmarking-capacity-of
Repo
Framework

Analysis of Hydrological and Suspended Sediment Events from Mad River Watershed using Multivariate Time Series Clustering

Title Analysis of Hydrological and Suspended Sediment Events from Mad River Watershed using Multivariate Time Series Clustering
Authors Ali Javed, Scott D. Hamshaw, Donna M. Rizzo, Byung Suk Lee
Abstract Hydrological storm events are a primary driver for transporting water quality constituents such as turbidity, suspended sediments and nutrients. Analyzing the concentration (C) of these water quality constituents in response to increased streamflow discharge (Q), particularly when monitored at high temporal resolution during a hydrological event, helps to characterize the dynamics and flux of such constituents. A conventional approach to storm event analysis is to reduce the C-Q time series to two-dimensional (2-D) hysteresis loops and analyze these 2-D patterns. While effective and informative to some extent, this hysteresis loop approach has limitations because projecting the C-Q time series onto a 2-D plane obscures detail (e.g., temporal variation) associated with the C-Q relationships. In this paper, we address this issue using a multivariate time series clustering approach. Clustering is applied to sequences of river discharge and suspended sediment data (acquired through turbidity-based monitoring) from six watersheds located in the Lake Champlain Basin in the northeastern United States. While clusters of the hydrological storm events using the multivariate time series approach were found to be correlated to 2-D hysteresis loop classifications and watershed locations, the clusters differed from the 2-D hysteresis classifications. Additionally, using available meteorological data associated with storm events, we examine the characteristics of computational clusters of storm events in the study watersheds and identify the features driving the clustering approach.
Tasks Time Series, Time Series Clustering
Published 2019-11-28
URL https://arxiv.org/abs/1911.12466v2
PDF https://arxiv.org/pdf/1911.12466v2.pdf
PWC https://paperswithcode.com/paper/analysis-of-hydrological-and-suspended
Repo
Framework

K-TanH: Hardware Efficient Activations For Deep Learning

Title K-TanH: Hardware Efficient Activations For Deep Learning
Authors Abhisek Kundu, Sudarshan Srinivasan, Eric C. Qin, Dhiraj Kalamkar, Naveen K. Mellempudi, Dipankar Das, Kunal Banerjee, Bharat Kaul, Pradeep Dubey
Abstract We propose K-TanH, a novel, highly accurate, hardware efficient approximation of popular activation function Tanh for Deep Learning. K-TanH consists of a sequence of parameterized bit/integer operations, such as, masking, shift and add/subtract (no floating point operation needed) where parameters are stored in a very small look-up table (bit-masking step can be eliminated). The design of K-TanH is flexible enough to deal with multiple numerical formats, such as, FP32 and BFloat16. High quality approximations to other activation functions, e.g., Swish and GELU, can be derived from K-TanH. We provide RTL design for K-TanH to demonstrate its area/power/performance efficacy. It is more accurate than existing piecewise approximations for Tanh. For example, K-TanH achieves $\sim 5\times$ speed up and $> 6\times$ reduction in maximum approximation error over software implementation of Hard TanH. Experimental results for low-precision BFloat16 training of language translation model GNMT on WMT16 data sets with approximate Tanh and Sigmoid obtained via K-TanH achieve similar accuracy and convergence as training with exact Tanh and Sigmoid.
Tasks
Published 2019-09-17
URL https://arxiv.org/abs/1909.07729v2
PDF https://arxiv.org/pdf/1909.07729v2.pdf
PWC https://paperswithcode.com/paper/k-tanh-hardware-efficient-activations-for
Repo
Framework

Multi-Rank Sparse and Functional PCA: Manifold Optimization and Iterative Deflation Techniques

Title Multi-Rank Sparse and Functional PCA: Manifold Optimization and Iterative Deflation Techniques
Authors Michael Weylandt
Abstract We consider the problem of estimating multiple principal components using the recently-proposed Sparse and Functional Principal Components Analysis (SFPCA) estimator. We first propose an extension of SFPCA which estimates several principal components simultaneously using manifold optimization techniques to enforce orthogonality constraints. While effective, this approach is computationally burdensome so we also consider iterative deflation approaches which take advantage of existing fast algorithms for rank-one SFPCA. We show that alternative deflation schemes can more efficiently extract signal from the data, in turn improving estimation of subsequent components. Finally, we compare the performance of our manifold optimization and deflation techniques in a scenario where orthogonality does not hold and find that they still lead to significantly improved performance.
Tasks
Published 2019-07-28
URL https://arxiv.org/abs/1907.12012v2
PDF https://arxiv.org/pdf/1907.12012v2.pdf
PWC https://paperswithcode.com/paper/multi-rank-sparse-and-functional-pca-manifold
Repo
Framework

Gated Variational AutoEncoders: Incorporating Weak Supervision to Encourage Disentanglement

Title Gated Variational AutoEncoders: Incorporating Weak Supervision to Encourage Disentanglement
Authors Matthew J. Vowels, Necati Cihan Camgoz, Richard Bowden
Abstract Variational AutoEncoders (VAEs) provide a means to generate representational latent embeddings. Previous research has highlighted the benefits of achieving representations that are disentangled, particularly for downstream tasks. However, there is some debate about how to encourage disentanglement with VAEs and evidence indicates that existing implementations of VAEs do not achieve disentanglement consistently. The evaluation of how well a VAE’s latent space has been disentangled is often evaluated against our subjective expectations of which attributes should be disentangled for a given problem. Therefore, by definition, we already have domain knowledge of what should be achieved and yet we use unsupervised approaches to achieve it. We propose a weakly-supervised approach that incorporates any available domain knowledge into the training process to form a Gated-VAE. The process involves partitioning the representational embedding and gating backpropagation. All partitions are utilised on the forward pass but gradients are backpropagated through different partitions according to selected image/target pairings. The approach can be used to modify existing VAE models such as beta-VAE, InfoVAE and DIP-VAE-II. Experiments demonstrate that using gated backpropagation, latent factors are represented in their intended partition. The approach is applied to images of faces for the purpose of disentangling head-pose from facial expression. Quantitative metrics show that using Gated-VAE improves average disentanglement, completeness and informativeness, as compared with un-gated implementations. Qualitative assessment of latent traversals demonstrate its disentanglement of head-pose from expression, even when only weak/noisy supervision is available.
Tasks
Published 2019-11-15
URL https://arxiv.org/abs/1911.06443v1
PDF https://arxiv.org/pdf/1911.06443v1.pdf
PWC https://paperswithcode.com/paper/gated-variational-autoencoders-incorporating
Repo
Framework

Semi-supervised Domain Adaptation via Minimax Entropy

Title Semi-supervised Domain Adaptation via Minimax Entropy
Authors Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Trevor Darrell, Kate Saenko
Abstract Contemporary domain adaptation methods are very effective at aligning feature distributions of source and target domains without any target supervision. However, we show that these techniques perform poorly when even a few labeled examples are available in the target. To address this semi-supervised domain adaptation (SSDA) setting, we propose a novel Minimax Entropy (MME) approach that adversarially optimizes an adaptive few-shot model. Our base model consists of a feature encoding network, followed by a classification layer that computes the features’ similarity to estimated prototypes (representatives of each class). Adaptation is achieved by alternately maximizing the conditional entropy of unlabeled target data with respect to the classifier and minimizing it with respect to the feature encoder. We empirically demonstrate the superiority of our method over many baselines, including conventional feature alignment and few-shot methods, setting a new state of the art for SSDA.
Tasks Domain Adaptation
Published 2019-04-13
URL https://arxiv.org/abs/1904.06487v5
PDF https://arxiv.org/pdf/1904.06487v5.pdf
PWC https://paperswithcode.com/paper/semi-supervised-domain-adaptation-via-minimax
Repo
Framework

Ask to Learn: A Study on Curiosity-driven Question Generation

Title Ask to Learn: A Study on Curiosity-driven Question Generation
Authors Thomas Scialom, Jacopo Staiano
Abstract We propose a novel text generation task, namely Curiosity-driven Question Generation. We start from the observation that the Question Generation task has traditionally been considered as the dual problem of Question Answering, hence tackling the problem of generating a question given the text that contains its answer. Such questions can be used to evaluate machine reading comprehension. However, in real life, and especially in conversational settings, humans tend to ask questions with the goal of enriching their knowledge and/or clarifying aspects of previously gathered information. We refer to these inquisitive questions as Curiosity-driven: these questions are generated with the goal of obtaining new information (the answer) which is not present in the input text. In this work, we experiment on this new task using a conversational Question Answering (QA) dataset; further, since the majority of QA dataset are not built in a conversational manner, we describe a methodology to derive data for this novel task from non-conversational QA data. We investigate several automated metrics to measure the different properties of Curious Questions, and experiment different approaches on the Curiosity-driven Question Generation task, including model pre-training and reinforcement learning. Finally, we report a qualitative evaluation of the generated outputs.
Tasks Machine Reading Comprehension, Question Answering, Question Generation, Reading Comprehension, Text Generation
Published 2019-11-08
URL https://arxiv.org/abs/1911.03350v1
PDF https://arxiv.org/pdf/1911.03350v1.pdf
PWC https://paperswithcode.com/paper/ask-to-learn-a-study-on-curiosity-driven
Repo
Framework

Towards neural networks that provably know when they don’t know

Title Towards neural networks that provably know when they don’t know
Authors Alexander Meinke, Matthias Hein
Abstract It has recently been shown that ReLU networks produce arbitrarily over-confident predictions far away from the training data. Thus, ReLU networks do not know when they don’t know. However, this is a highly important property in safety critical applications. In the context of out-of-distribution detection (OOD) there have been a number of proposals to mitigate this problem but none of them are able to make any mathematical guarantees. In this paper we propose a new approach to OOD which overcomes both problems. Our approach can be used with ReLU networks and provides provably low confidence predictions far away from the training data as well as the first certificates for low confidence predictions in a neighborhood of an out-distribution point. In the experiments we show that state-of-the-art methods fail in this worst-case setting whereas our model can guarantee its performance while retaining state-of-the-art OOD performance.
Tasks Out-of-Distribution Detection
Published 2019-09-26
URL https://arxiv.org/abs/1909.12180v2
PDF https://arxiv.org/pdf/1909.12180v2.pdf
PWC https://paperswithcode.com/paper/towards-neural-networks-that-provably-know
Repo
Framework

English Broadcast News Speech Recognition by Humans and Machines

Title English Broadcast News Speech Recognition by Humans and Machines
Authors Samuel Thomas, Masayuki Suzuki, Yinghui Huang, Gakuto Kurata, Zoltan Tuske, George Saon, Brian Kingsbury, Michael Picheny, Tom Dibert, Alice Kaiser-Schatzlein, Bern Samko
Abstract With recent advances in deep learning, considerable attention has been given to achieving automatic speech recognition performance close to human performance on tasks like conversational telephone speech (CTS) recognition. In this paper we evaluate the usefulness of these proposed techniques on broadcast news (BN), a similar challenging task. We also perform a set of recognition measurements to understand how close the achieved automatic speech recognition results are to human performance on this task. On two publicly available BN test sets, DEV04F and RT04, our speech recognition system using LSTM and residual network based acoustic models with a combination of n-gram and neural network language models performs at 6.5% and 5.9% word error rate. By achieving new performance milestones on these test sets, our experiments show that techniques developed on other related tasks, like CTS, can be transferred to achieve similar performance. In contrast, the best measured human recognition performance on these test sets is much lower, at 3.6% and 2.8% respectively, indicating that there is still room for new techniques and improvements in this space, to reach human performance levels.
Tasks Speech Recognition
Published 2019-04-30
URL http://arxiv.org/abs/1904.13258v1
PDF http://arxiv.org/pdf/1904.13258v1.pdf
PWC https://paperswithcode.com/paper/english-broadcast-news-speech-recognition-by
Repo
Framework

Learning Hierarchical Teaching Policies for Cooperative Agents

Title Learning Hierarchical Teaching Policies for Cooperative Agents
Authors Dong-Ki Kim, Miao Liu, Shayegan Omidshafiei, Sebastian Lopez-Cot, Matthew Riemer, Golnaz Habibi, Gerald Tesauro, Sami Mourad, Murray Campbell, Jonathan P. How
Abstract Collective learning can be greatly enhanced when agents effectively exchange knowledge with their peers. In particular, recent work studying agents that learn to teach other teammates has demonstrated that action advising accelerates team-wide learning. However, the prior work has simplified the learning of advising policies by using simple function approximations and only considered advising with primitive (low-level) actions, limiting the scalability of learning and teaching to complex domains. This paper introduces a novel learning-to-teach framework, called hierarchical multiagent teaching (HMAT), that improves scalability to complex environments by using the deep representation for student policies and by advising with more expressive extended action sequences over multiple levels of temporal abstraction. Our empirical evaluations demonstrate that HMAT improves team-wide learning progress in large, complex domains where previous approaches fail. HMAT also learns teaching policies that can effectively transfer knowledge to different teammates with knowledge of different tasks, even when the teammates have heterogeneous action spaces.
Tasks Transfer Learning
Published 2019-03-07
URL https://arxiv.org/abs/1903.03216v5
PDF https://arxiv.org/pdf/1903.03216v5.pdf
PWC https://paperswithcode.com/paper/learning-hierarchical-teaching-in-cooperative
Repo
Framework

Adaptive Wind Driven Optimization Trained Artificial Neural Networks

Title Adaptive Wind Driven Optimization Trained Artificial Neural Networks
Authors Zikri Bayraktar
Abstract This paper presents the application of a newly developed nature-inspired metaheuristic optimization method, namely the Adaptive Wind Driven Optimization (AWDO), to the training of feedforward artificial neural networks (NN) and presents a discussion into the future research of AWDO implementation in Deep Learning (DL). Application example of digit classification with MNIST dataset reveals interesting behavior of the derivative-free AWDO method compared to steepest descent method where results and future work on the implementation of AWDO in deep neural networks are discussed.
Tasks
Published 2019-11-20
URL https://arxiv.org/abs/1911.08942v1
PDF https://arxiv.org/pdf/1911.08942v1.pdf
PWC https://paperswithcode.com/paper/adaptive-wind-driven-optimization-trained
Repo
Framework
comments powered by Disqus