January 26, 2020

3129 words 15 mins read

Paper Group ANR 1449

Paper Group ANR 1449

A Personalized Affective Memory Neural Model for Improving Emotion Recognition. Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes. RNN-T For Latency Controlled ASR With Improved Beam Search. Estimating the Density of States of Boolean Satisfiability Problems on Classi …

A Personalized Affective Memory Neural Model for Improving Emotion Recognition

Title A Personalized Affective Memory Neural Model for Improving Emotion Recognition
Authors Pablo Barros, German I. Parisi, Stefan Wermter
Abstract Recent models of emotion recognition strongly rely on supervised deep learning solutions for the distinction of general emotion expressions. However, they are not reliable when recognizing online and personalized facial expressions, e.g., for person-specific affective understanding. In this paper, we present a neural model based on a conditional adversarial autoencoder to learn how to represent and edit general emotion expressions. We then propose Grow-When-Required networks as personalized affective memories to learn individualized aspects of emotion expressions. Our model achieves state-of-the-art performance on emotion recognition when evaluated on \textit{in-the-wild} datasets. Furthermore, our experiments include ablation studies and neural visualizations in order to explain the behavior of our model.
Tasks Emotion Recognition
Published 2019-04-23
URL http://arxiv.org/abs/1904.12632v1
PDF http://arxiv.org/pdf/1904.12632v1.pdf
PWC https://paperswithcode.com/paper/190412632
Repo
Framework

Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes

Title Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes
Authors Yujia Bao, Zhengyi Deng, Yan Wang, Heeyoon Kim, Victor Diego Armengol, Francisco Acevedo, Nofal Ouardaoui, Cathy Wang, Giovanni Parmigiani, Regina Barzilay, Danielle Braun, Kevin S Hughes
Abstract PURPOSE: The medical literature relevant to germline genetics is growing exponentially. Clinicians need tools monitoring and prioritizing the literature to understand the clinical implications of the pathogenic genetic variants. We developed and evaluated two machine learning models to classify abstracts as relevant to the penetrance (risk of cancer for germline mutation carriers) or prevalence of germline genetic mutations. METHODS: We conducted literature searches in PubMed and retrieved paper titles and abstracts to create an annotated dataset for training and evaluating the two machine learning classification models. Our first model is a support vector machine (SVM) which learns a linear decision rule based on the bag-of-ngrams representation of each title and abstract. Our second model is a convolutional neural network (CNN) which learns a complex nonlinear decision rule based on the raw title and abstract. We evaluated the performance of the two models on the classification of papers as relevant to penetrance or prevalence. RESULTS: For penetrance classification, we annotated 3740 paper titles and abstracts and used 60% for training the model, 20% for tuning the model, and 20% for evaluating the model. The SVM model achieves 89.53% accuracy (percentage of papers that were correctly classified) while the CNN model achieves 88.95 % accuracy. For prevalence classification, we annotated 3753 paper titles and abstracts. The SVM model achieves 89.14% accuracy while the CNN model achieves 89.13 % accuracy. CONCLUSION: Our models achieve high accuracy in classifying abstracts as relevant to penetrance or prevalence. By facilitating literature review, this tool could help clinicians and researchers keep abreast of the burgeoning knowledge of gene-cancer associations and keep the knowledge bases for clinical decision support tools up to date.
Tasks
Published 2019-04-24
URL http://arxiv.org/abs/1904.12617v1
PDF http://arxiv.org/pdf/1904.12617v1.pdf
PWC https://paperswithcode.com/paper/190412617
Repo
Framework
Title RNN-T For Latency Controlled ASR With Improved Beam Search
Authors Mahaveer Jain, Kjell Schubert, Jay Mahadeokar, Ching-Feng Yeh, Kaustubh Kalgaonkar, Anuroop Sriram, Christian Fuegen, Michael L. Seltzer
Abstract Neural transducer-based systems such as RNN Transducers (RNN-T) for automatic speech recognition (ASR) blend the individual components of a traditional hybrid ASR systems (acoustic model, language model, punctuation model, inverse text normalization) into one single model. This greatly simplifies training and inference and hence makes RNN-T a desirable choice for ASR systems. In this work, we investigate use of RNN-T in applications that require a tune-able latency budget during inference time. We also improved the decoding speed of the originally proposed RNN-T beam search algorithm. We evaluated our proposed system on English videos ASR dataset and show that neural RNN-T models can achieve comparable WER and better computational efficiency compared to a well tuned hybrid ASR baseline.
Tasks Language Modelling, Speech Recognition
Published 2019-11-05
URL https://arxiv.org/abs/1911.01629v2
PDF https://arxiv.org/pdf/1911.01629v2.pdf
PWC https://paperswithcode.com/paper/rnn-t-for-latency-controlled-asr-with
Repo
Framework

Estimating the Density of States of Boolean Satisfiability Problems on Classical and Quantum Computing Platforms

Title Estimating the Density of States of Boolean Satisfiability Problems on Classical and Quantum Computing Platforms
Authors Tuhin Sahai, Anurag Mishra, Jose Miguel Pasini, Susmit Jha
Abstract Given a Boolean formula $\phi(x)$ in conjunctive normal form (CNF), the density of states counts the number of variable assignments that violate exactly $e$ clauses, for all values of $e$. Thus, the density of states is a histogram of the number of unsatisfied clauses over all possible assignments. This computation generalizes both maximum-satisfiability (MAX-SAT) and model counting problems and not only provides insight into the entire solution space, but also yields a measure for the \emph{hardness} of the problem instance. Consequently, in real-world scenarios, this problem is typically infeasible even when using state-of-the-art algorithms. While finding an exact answer to this problem is a computationally intensive task, we propose a novel approach for estimating density of states based on the concentration of measure inequalities. The methodology results in a quadratic unconstrained binary optimization (QUBO), which is particularly amenable to quantum annealing-based solutions. We present the overall approach and compare results from the D-Wave quantum annealer against the best-known classical algorithms such as the Hamze-de Freitas-Selby (HFS) algorithm and satisfiability modulo theory (SMT) solvers.
Tasks
Published 2019-10-29
URL https://arxiv.org/abs/1910.13088v1
PDF https://arxiv.org/pdf/1910.13088v1.pdf
PWC https://paperswithcode.com/paper/estimating-the-density-of-states-of-boolean
Repo
Framework

Towards Hardware Implementation of Neural Network-based Communication Algorithms

Title Towards Hardware Implementation of Neural Network-based Communication Algorithms
Authors Fayçal Ait Aoudia, Jakob Hoydis
Abstract There is a recent interest in neural network (NN)-based communication algorithms which have shown to achieve (beyond) state-of-the-art performance for a variety of problems or lead to reduced implementation complexity. However, most work on this topic is simulation based and implementation on specialized hardware for fast inference, such as field-programmable gate arrays (FPGAs), is widely ignored. In particular for practical uses, NN weights should be quantized and inference carried out by a fixed-point instead of floating-point system, widely used in consumer class computers and graphics processing units (GPUs). Moving to such representations enables higher inference rates and complexity reductions, at the cost of precision loss. We demonstrate that it is possible to implement NN-based algorithms in fixed-point arithmetic with quantized weights at negligible performance loss and with hardware complexity compatible with practical systems, such as FPGAs and application-specific integrated circuits (ASICs).
Tasks
Published 2019-02-19
URL http://arxiv.org/abs/1902.06939v1
PDF http://arxiv.org/pdf/1902.06939v1.pdf
PWC https://paperswithcode.com/paper/towards-hardware-implementation-of-neural
Repo
Framework

Improving Robustness Without Sacrificing Accuracy with Patch Gaussian Augmentation

Title Improving Robustness Without Sacrificing Accuracy with Patch Gaussian Augmentation
Authors Raphael Gontijo Lopes, Dong Yin, Ben Poole, Justin Gilmer, Ekin D. Cubuk
Abstract Deploying machine learning systems in the real world requires both high accuracy on clean data and robustness to naturally occurring corruptions. While architectural advances have led to improved accuracy, building robust models remains challenging. Prior work has argued that there is an inherent trade-off between robustness and accuracy, which is exemplified by standard data augment techniques such as Cutout, which improves clean accuracy but not robustness, and additive Gaussian noise, which improves robustness but hurts accuracy. To overcome this trade-off, we introduce Patch Gaussian, a simple augmentation scheme that adds noise to randomly selected patches in an input image. Models trained with Patch Gaussian achieve state of the art on the CIFAR-10 and ImageNetCommon Corruptions benchmarks while also improving accuracy on clean data. We find that this augmentation leads to reduced sensitivity to high frequency noise(similar to Gaussian) while retaining the ability to take advantage of relevant high frequency information in the image (similar to Cutout). Finally, we show that Patch Gaussian can be used in conjunction with other regularization methods and data augmentation policies such as AutoAugment, and improves performance on the COCO object detection benchmark.
Tasks Data Augmentation, Object Detection
Published 2019-06-06
URL https://arxiv.org/abs/1906.02611v1
PDF https://arxiv.org/pdf/1906.02611v1.pdf
PWC https://paperswithcode.com/paper/improving-robustness-without-sacrificing
Repo
Framework

Making Bayesian Predictive Models Interpretable: A Decision Theoretic Approach

Title Making Bayesian Predictive Models Interpretable: A Decision Theoretic Approach
Authors Homayun Afrabandpey, Tomi Peltola, Juho Piironen, Aki Vehtari, Samuel Kaski
Abstract A salient approach to interpretable machine learning is to restrict modeling to simple and hence understandable models. In the Bayesian framework, this can be pursued by restricting the model structure and prior to favor interpretable models. Fundamentally, however, interpretability is about users’ preferences, not the data generation mechanism: it is more natural to formulate interpretability as a utility function. In this work, we propose an interpretability utility, which explicates the trade-off between explanation fidelity and interpretability in the Bayesian framework. The method consists of two steps. First, a reference model, possibly a black-box Bayesian predictive model compromising no accuracy, is constructed and fitted to the training data. Second, a proxy model from an interpretable model family that best mimics the predictive behaviour of the reference model is found by optimizing the interpretability utility function. The approach is model agnostic - neither the interpretable model nor the reference model are restricted to be from a certain class of models - and the optimization problem can be solved using standard tools in the chosen model family. Through experiments on real-word data sets using decision trees as interpretable models and Bayesian additive regression models as reference models, we show that for the same level of interpretability, our approach generates more accurate models than the earlier alternative of restricting the prior. We also propose a systematic way to measure stabilities of interpretabile models constructed by different interpretability approaches and show that our proposed approach generates more stable models.
Tasks Interpretable Machine Learning
Published 2019-10-21
URL https://arxiv.org/abs/1910.09358v1
PDF https://arxiv.org/pdf/1910.09358v1.pdf
PWC https://paperswithcode.com/paper/making-bayesian-predictive-models
Repo
Framework

Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks

Title Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks
Authors Steffen Eger, Paul Youssef, Iryna Gurevych
Abstract Activation functions play a crucial role in neural networks because they are the nonlinearities which have been attributed to the success story of deep learning. One of the currently most popular activation functions is ReLU, but several competitors have recently been proposed or ‘discovered’, including LReLU functions and swish. While most works compare newly proposed activation functions on few tasks (usually from image classification) and against few competitors (usually ReLU), we perform the first large-scale comparison of 21 activation functions across eight different NLP tasks. We find that a largely unknown activation function performs most stably across all tasks, the so-called penalized tanh function. We also show that it can successfully replace the sigmoid and tanh gates in LSTM cells, leading to a 2 percentage point (pp) improvement over the standard choices on a challenging NLP task.
Tasks Image Classification
Published 2019-01-09
URL http://arxiv.org/abs/1901.02671v1
PDF http://arxiv.org/pdf/1901.02671v1.pdf
PWC https://paperswithcode.com/paper/is-it-time-to-swish-comparing-deep-learning
Repo
Framework

Phase transition in PCA with missing data: Reduced signal-to-noise ratio, not sample size!

Title Phase transition in PCA with missing data: Reduced signal-to-noise ratio, not sample size!
Authors Niels Bruun Ipsen, Lars Kai Hansen
Abstract How does missing data affect our ability to learn signal structures? It has been shown that learning signal structure in terms of principal components is dependent on the ratio of sample size and dimensionality and that a critical number of observations is needed before learning starts (Biehl and Mietzner, 1993). Here we generalize this analysis to include missing data. Probabilistic principal component analysis is regularly used for estimating signal structures in datasets with missing data. Our analytic result suggests that the effect of missing data is to effectively reduce signal-to-noise ratio rather than - as generally believed - to reduce sample size. The theory predicts a phase transition in the learning curves and this is indeed found both in simulation data and in real datasets.
Tasks
Published 2019-05-02
URL https://arxiv.org/abs/1905.00709v1
PDF https://arxiv.org/pdf/1905.00709v1.pdf
PWC https://paperswithcode.com/paper/phase-transition-in-pca-with-missing-data
Repo
Framework

Distributed Machine Learning through Heterogeneous Edge Systems

Title Distributed Machine Learning through Heterogeneous Edge Systems
Authors Hanpeng Hu, Dan Wang, Chuan Wu
Abstract Many emerging AI applications request distributed machine learning (ML) among edge systems (e.g., IoT devices and PCs at the edge of the Internet), where data cannot be uploaded to a central venue for model training, due to their large volumes and/or security/privacy concerns. Edge devices are intrinsically heterogeneous in computing capacity, posing significant challenges to parameter synchronization for parallel training with the parameter server (PS) architecture. This paper proposes ADSP, a parameter synchronization scheme for distributed machine learning (ML) with heterogeneous edge systems. Eliminating the significant waiting time occurring with existing parameter synchronization models, the core idea of ADSP is to let faster edge devices continue training, while committing their model updates at strategically decided intervals. We design algorithms that decide time points for each worker to commit its model update, and ensure not only global model convergence but also faster convergence. Our testbed implementation and experiments show that ADSP outperforms existing parameter synchronization models significantly in terms of ML model convergence time, scalability and adaptability to large heterogeneity.
Tasks
Published 2019-11-16
URL https://arxiv.org/abs/1911.06949v1
PDF https://arxiv.org/pdf/1911.06949v1.pdf
PWC https://paperswithcode.com/paper/distributed-machine-learning-through
Repo
Framework

On the Global Convergence of Imitation Learning: A Case for Linear Quadratic Regulator

Title On the Global Convergence of Imitation Learning: A Case for Linear Quadratic Regulator
Authors Qi Cai, Mingyi Hong, Yongxin Chen, Zhaoran Wang
Abstract We study the global convergence of generative adversarial imitation learning for linear quadratic regulators, which is posed as minimax optimization. To address the challenges arising from non-convex-concave geometry, we analyze the alternating gradient algorithm and establish its Q-linear rate of convergence to a unique saddle point, which simultaneously recovers the globally optimal policy and reward function. We hope our results may serve as a small step towards understanding and taming the instability in imitation learning as well as in more general non-convex-concave alternating minimax optimization that arises from reinforcement learning and generative adversarial learning.
Tasks Imitation Learning
Published 2019-01-11
URL http://arxiv.org/abs/1901.03674v1
PDF http://arxiv.org/pdf/1901.03674v1.pdf
PWC https://paperswithcode.com/paper/on-the-global-convergence-of-imitation
Repo
Framework

Coordination of PV Smart Inverters Using Deep Reinforcement Learning for Grid Voltage Regulation

Title Coordination of PV Smart Inverters Using Deep Reinforcement Learning for Grid Voltage Regulation
Authors Changfu Li, Chenrui Jin, Ratnesh Sharma
Abstract Increasing adoption of solar photovoltaic (PV) presents new challenges to modern power grid due to its variable and intermittent nature. Fluctuating outputs from PV generation can cause the grid violating voltage operation limits. PV smart inverters (SIs) provide a fast-response method to regulate voltage by modulating real and/or reactive power at the connection point. Yet existing local autonomous control scheme of SIs is based on local information without coordination, which can lead to suboptimal performance. In this paper, a deep reinforcement learning (DRL) based algorithm is developed and implemented for coordinating multiple SIs. The reward scheme of the DRL is carefully designed to ensure voltage operation limits of the grid are met with more effective utilization of SI reactive power. The proposed DRL agent for voltage control can learn its policy through interaction with massive offline simulations, and adapts to load and solar variations. The performance of the DRL agent is compared against the local autonomous control on the IEEE 37 node system with thousands of scenarios. The results show a properly trained DRL agent can intelligently coordinate different SIs for maintaining grid voltage within allowable ranges, achieving reduction of PV production curtailment, and decreasing system losses.
Tasks
Published 2019-10-14
URL https://arxiv.org/abs/1910.05907v1
PDF https://arxiv.org/pdf/1910.05907v1.pdf
PWC https://paperswithcode.com/paper/coordination-of-pv-smart-inverters-using-deep
Repo
Framework

End-to-End Model for Speech Enhancement by Consistent Spectrogram Masking

Title End-to-End Model for Speech Enhancement by Consistent Spectrogram Masking
Authors Xingjian Du, Mengyao Zhu, Xuan Shi, Xinpeng Zhang, Wen Zhang, Jingdong Chen
Abstract Recently, phase processing is attracting increasinginterest in speech enhancement community. Some researchersintegrate phase estimations module into speech enhancementmodels by using complex-valued short-time Fourier transform(STFT) spectrogram based training targets, e.g. Complex RatioMask (cRM) [1]. However, masking on spectrogram would violentits consistency constraints. In this work, we prove that theinconsistent problem enlarges the solution space of the speechenhancement model and causes unintended artifacts. ConsistencySpectrogram Masking (CSM) is proposed to estimate the complexspectrogram of a signal with the consistency constraint in asimple but not trivial way. The experiments comparing ourCSM based end-to-end model with other methods are conductedto confirm that the CSM accelerate the model training andhave significant improvements in speech quality. From ourexperimental results, we assured that our method could enha
Tasks Speech Enhancement
Published 2019-01-02
URL http://arxiv.org/abs/1901.00295v1
PDF http://arxiv.org/pdf/1901.00295v1.pdf
PWC https://paperswithcode.com/paper/end-to-end-model-for-speech-enhancement-by
Repo
Framework

Investigating Correlations of Inter-coder Agreement and Machine Annotation Performance for Historical Video Data

Title Investigating Correlations of Inter-coder Agreement and Machine Annotation Performance for Historical Video Data
Authors Kader Pustu-Iren, Markus Mühling, Nikolaus Korfhage, Joanna Bars, Sabrina Bernhöft, Angelika Hörth, Bernd Freisleben, Ralph Ewerth
Abstract Video indexing approaches such as visual concept classification and person recognition are essential to enable fine-grained semantic search in large-scale video archives such as the historical video collection of former German Democratic Republic (GDR) maintained by the German Broadcasting Archive (DRA). Typically, a lexicon of visual concepts has to be defined for semantic search. However, the definition of visual concepts can be more or less subjective due to individually differing judgments of annotators, which may have an impact on annotation quality and subsequently training of supervised machine learning methods. In this paper, we analyze the inter-coder agreement for historical TV data of the former GDR for visual concept classification and person recognition. The inter-coder agreement is evaluated for a group of expert as well as non-expert annotators in order to determine differences in annotation homogeneity. Furthermore, correlations between visual recognition performance and inter-annotator agreement are measured. In this context, information about image quantity and agreement are used to predict average precision for concept classification. Finally, the influence of expert vs. non-expert annotations acquired in the study are used to evaluate person recognition.
Tasks Person Recognition
Published 2019-07-24
URL https://arxiv.org/abs/1907.10450v1
PDF https://arxiv.org/pdf/1907.10450v1.pdf
PWC https://paperswithcode.com/paper/investigating-correlations-of-inter-coder
Repo
Framework

Nonnegative Matrix Factorization with Local Similarity Learning

Title Nonnegative Matrix Factorization with Local Similarity Learning
Authors Chong Peng, Zhao Kang, Chenglizhao Chen, Qiang Cheng
Abstract Existing nonnegative matrix factorization methods focus on learning global structure of the data to construct basis and coefficient matrices, which ignores the local structure that commonly exists among data. In this paper, we propose a new type of nonnegative matrix factorization method, which learns local similarity and clustering in a mutually enhancing way. The learned new representation is more representative in that it better reveals inherent geometric property of the data. Nonlinear expansion is given and efficient multiplicative updates are developed with theoretical convergence guarantees. Extensive experimental results have confirmed the effectiveness of the proposed model.
Tasks
Published 2019-07-09
URL https://arxiv.org/abs/1907.04150v1
PDF https://arxiv.org/pdf/1907.04150v1.pdf
PWC https://paperswithcode.com/paper/nonnegative-matrix-factorization-with-local
Repo
Framework
comments powered by Disqus