January 29, 2020

3496 words 17 mins read

Paper Group ANR 758

Paper Group ANR 758

Deep Spiking Neural Network with Spike Count based Learning Rule. Speaker Sincerity Detection based on Covariance Feature Vectors and Ensemble Methods. Multimodal Age and Gender Classification Using Ear and Profile Face Images. Synthetic Datasets for Neural Program Synthesis. PDH : Probabilistic deep hashing based on MAP estimation of Hamming dista …

Deep Spiking Neural Network with Spike Count based Learning Rule

Title Deep Spiking Neural Network with Spike Count based Learning Rule
Authors Jibin Wu, Yansong Chua, Malu Zhang, Qu Yang, Guoqi Li, Haizhou Li
Abstract Deep spiking neural networks (SNNs) support asynchronous event-driven computation, massive parallelism and demonstrate great potential to improve the energy efficiency of its synchronous analog counterpart. However, insufficient attention has been paid to neural encoding when designing SNN learning rules. Remarkably, the temporal credit assignment has been performed on rate-coded spiking inputs, leading to poor learning efficiency. In this paper, we introduce a novel spike-based learning rule for rate-coded deep SNNs, whereby the spike count of each neuron is used as a surrogate for gradient backpropagation. We evaluate the proposed learning rule by training deep spiking multi-layer perceptron (MLP) and spiking convolutional neural network (CNN) on the UCI machine learning and MNIST handwritten digit datasets. We show that the proposed learning rule achieves state-of-the-art accuracies on all benchmark datasets. The proposed learning rule allows introducing latency, spike rate and hardware constraints into the SNN learning, which is superior to the indirect approach in which conventional artificial neural networks are first trained and then converted to SNNs. Hence, it allows direct deployment to the neuromorphic hardware and supports efficient inference. Notably, a test accuracy of 98.40% was achieved on the MNIST dataset in our experiments with only 10 simulation time steps, when the same latency constraint is imposed during training.
Tasks
Published 2019-02-15
URL http://arxiv.org/abs/1902.05705v1
PDF http://arxiv.org/pdf/1902.05705v1.pdf
PWC https://paperswithcode.com/paper/deep-spiking-neural-network-with-spike-count
Repo
Framework

Speaker Sincerity Detection based on Covariance Feature Vectors and Ensemble Methods

Title Speaker Sincerity Detection based on Covariance Feature Vectors and Ensemble Methods
Authors Mohammed Senoussaoui, Patrick Cardinal, Najim Dehak, Alessandro Lameiras Koerich
Abstract Automatic measuring of speaker sincerity degree is a novel research problem in computational paralinguistics. This paper proposes covariance-based feature vectors to model speech and ensembles of support vector regressors to estimate the degree of sincerity of a speaker. The elements of each covariance vector are pairwise statistics between the short-term feature components. These features are used alone as well as in combination with the ComParE acoustic feature set. The experimental results on the development set of the Sincerity Speech Corpus using a cross-validation procedure have shown an 8.1% relative improvement in the Spearman’s correlation coefficient over the baseline system.
Tasks
Published 2019-04-26
URL http://arxiv.org/abs/1904.11641v1
PDF http://arxiv.org/pdf/1904.11641v1.pdf
PWC https://paperswithcode.com/paper/speaker-sincerity-detection-based-on
Repo
Framework

Multimodal Age and Gender Classification Using Ear and Profile Face Images

Title Multimodal Age and Gender Classification Using Ear and Profile Face Images
Authors Dogucan Yaman, Fevziye Irem Eyiokur, Hazım Kemal Ekenel
Abstract In this paper, we present multimodal deep neural network frameworks for age and gender classification, which take input a profile face image as well as an ear image. Our main objective is to enhance the accuracy of soft biometric trait extraction from profile face images by additionally utilizing a promising biometric modality: ear appearance. For this purpose, we provided end-to-end multimodal deep learning frameworks. We explored different multimodal strategies by employing data, feature, and score level fusion. To increase representation and discrimination capability of the deep neural networks, we benefited from domain adaptation and employed center loss besides softmax loss. We conducted extensive experiments on the UND-F, UND-J2, and FERET datasets. Experimental results indicated that profile face images contain a rich source of information for age and gender classification. We found that the presented multimodal system achieves very high age and gender classification accuracies. Moreover, we attained superior results compared to the state-of-the-art profile face image or ear image-based age and gender classification methods.
Tasks Age And Gender Classification, Domain Adaptation
Published 2019-07-23
URL https://arxiv.org/abs/1907.10081v1
PDF https://arxiv.org/pdf/1907.10081v1.pdf
PWC https://paperswithcode.com/paper/multimodal-age-and-gender-classification
Repo
Framework

Synthetic Datasets for Neural Program Synthesis

Title Synthetic Datasets for Neural Program Synthesis
Authors Richard Shin, Neel Kant, Kavi Gupta, Christopher Bender, Brandon Trabucco, Rishabh Singh, Dawn Song
Abstract The goal of program synthesis is to automatically generate programs in a particular language from corresponding specifications, e.g. input-output behavior. Many current approaches achieve impressive results after training on randomly generated I/O examples in limited domain-specific languages (DSLs), as with string transformations in RobustFill. However, we empirically discover that applying test input generation techniques for languages with control flow and rich input space causes deep networks to generalize poorly to certain data distributions; to correct this, we propose a new methodology for controlling and evaluating the bias of synthetic data distributions over both programs and specifications. We demonstrate, using the Karel DSL and a small Calculator DSL, that training deep networks on these distributions leads to improved cross-distribution generalization performance.
Tasks Program Synthesis
Published 2019-12-27
URL https://arxiv.org/abs/1912.12345v1
PDF https://arxiv.org/pdf/1912.12345v1.pdf
PWC https://paperswithcode.com/paper/synthetic-datasets-for-neural-program-1
Repo
Framework

PDH : Probabilistic deep hashing based on MAP estimation of Hamming distance

Title PDH : Probabilistic deep hashing based on MAP estimation of Hamming distance
Authors Yosuke Kaga, Masakazu Fujio, Kenta Takahashi, Tetsushi Ohki, Masakatsu Nishigaki
Abstract With the growth of image on the web, research on hashing which enables high-speed image retrieval has been actively studied. In recent years, various hashing methods based on deep neural networks have been proposed and achieved higher precision than the other hashing methods. In these methods, multiple losses for hash codes and the parameters of neural networks are defined. They generate hash codes that minimize the weighted sum of the losses. Therefore, an expert has to tune the weights for the losses heuristically, and the probabilistic optimality of the loss function cannot be explained. In order to generate explainable hash codes without weight tuning, we theoretically derive a single loss function with no hyperparameters for the hash code from the probability distribution of the images. By generating hash codes that minimize this loss function, highly accurate image retrieval with probabilistic optimality is performed. We evaluate the performance of hashing using MNIST, CIFAR-10, SVHN and show that the proposed method outperforms the state-of-the-art hashing methods.
Tasks Image Retrieval
Published 2019-05-21
URL https://arxiv.org/abs/1905.08501v1
PDF https://arxiv.org/pdf/1905.08501v1.pdf
PWC https://paperswithcode.com/paper/pdh-probabilistic-deep-hashing-based-on-map
Repo
Framework

Neural Large Neighborhood Search for the Capacitated Vehicle Routing Problem

Title Neural Large Neighborhood Search for the Capacitated Vehicle Routing Problem
Authors André Hottung, Kevin Tierney
Abstract Learning how to automatically solve optimization problems has the potential to provide the next big leap in optimization technology. The performance of automatically learned heuristics on routing problems has been steadily improving in recent years, but approaches based purely on machine learning are still outperformed by state-of-the-art optimization methods. To close this performance gap, we propose a novel large neighborhood search (LNS) framework for vehicle routing that integrates learned heuristics for generating new solutions. The learning mechanism is based on a deep neural network with an attention mechanism and has been especially designed to be integrated into an LNS search setting. We evaluate our approach on the capacitated vehicle routing problem (CVRP) and the split delivery vehicle routing problem (SDVRP). On CVRP instances with up to 297 customers our approach significantly outperforms an LNS that uses only handcrafted heuristics and a well-known heuristic from the literature. Furthermore, we show for the CVRP and the SDVRP that our approach surpasses the performance of existing machine learning approaches and comes close to the performance of state-of-the-art optimization approaches.
Tasks
Published 2019-11-21
URL https://arxiv.org/abs/1911.09539v1
PDF https://arxiv.org/pdf/1911.09539v1.pdf
PWC https://paperswithcode.com/paper/neural-large-neighborhood-search-for-the
Repo
Framework

Multi-Armed Bandit for Energy-Efficient and Delay-Sensitive Edge Computing in Dynamic Networks with Uncertainty

Title Multi-Armed Bandit for Energy-Efficient and Delay-Sensitive Edge Computing in Dynamic Networks with Uncertainty
Authors Saeed Ghoorchian, Setareh Maghsudi
Abstract In the edge computing paradigm, mobile devices offload the computational tasks to an edge server by routing the required data over the wireless network. The full potential of edge computing becomes realized only if a smart device selects the most appropriate server in terms of the latency and energy consumption, among many available ones. The server selection problem is challenging due to the randomness of the environment and lack of prior information about the environment. Therefore, a smart device, which sequentially chooses a server under uncertainty, aims to improve its decision based on the historical time and energy consumption. The problem becomes more complicated in a dynamic environment, where key variables might undergo abrupt changes. To deal with the aforementioned problem, we first analyze the required time and energy to data transmission and processing. We then use the analysis to cast the problem as a budget-limited multi-armed bandit problem, where each arm is associated with a reward and cost, with time-variant statistical characteristics. We propose a policy to solve the formulated problem and prove a regret bound. The numerical results demonstrate the superiority of the proposed method compared to a number of existing solutions.
Tasks
Published 2019-04-12
URL https://arxiv.org/abs/1904.06258v2
PDF https://arxiv.org/pdf/1904.06258v2.pdf
PWC https://paperswithcode.com/paper/multi-armed-bandit-for-energy-efficient-and
Repo
Framework

Learning Patient Engagement in Care Management: Performance vs. Interpretability

Title Learning Patient Engagement in Care Management: Performance vs. Interpretability
Authors Subhro Das, Chandramouli Maduri, Ching-Hua Chen, Pei-Yun S. Hsueh
Abstract The health outcomes of high-need patients can be substantially influenced by the degree of patient engagement in their own care. The role of care managers includes that of enrolling patients into care programs and keeping them sufficiently engaged in the program, so that patients can attain various goals. The attainment of these goals is expected to improve the patients’ health outcomes. In this paper, we present a real world data-driven method and the behavioral engagement scoring pipeline for scoring the engagement level of a patient in two regards: (1) Their interest in enrolling into a relevant care program, and (2) their interest and commitment to program goals. We use this score to predict a patient’s propensity to respond (i.e., to a call for enrollment into a program, or to an assigned program goal). Using real-world care management data, we show that our scoring method successfully predicts patient engagement. We also show that we are able to provide interpretable insights to care managers, using prototypical patients as a point of reference, without sacrificing prediction performance.
Tasks
Published 2019-06-19
URL https://arxiv.org/abs/1906.08339v1
PDF https://arxiv.org/pdf/1906.08339v1.pdf
PWC https://paperswithcode.com/paper/learning-patient-engagement-in-care
Repo
Framework

power-law nonlinearity with maximally uniform distribution criterion for improved neural network training in automatic speech recognition

Title power-law nonlinearity with maximally uniform distribution criterion for improved neural network training in automatic speech recognition
Authors Chanwoo Kim, Mehul Kumar, Kwangyoun Kim, Dhananjaya Gowda
Abstract In this paper, we describe the Maximum Uniformity of Distribution (MUD) algorithm with the power-law nonlinearity. In this approach, we hypothesize that neural network training will become more stable if feature distribution is not too much skewed. We propose two different types of MUD approaches: power function-based MUD and histogram-based MUD. In these approaches, we first obtain the mel filterbank coefficients and apply nonlinearity functions for each filterbank channel. With the power function-based MUD, we apply a power-function based nonlinearity where power function coefficients are chosen to maximize the likelihood assuming that nonlinearity outputs follow the uniform distribution. With the histogram-based MUD, the empirical Cumulative Density Function (CDF) from the training database is employed to transform the original distribution into a uniform distribution. In MUD processing, we do not use any prior knowledge (e.g. logarithmic relation) about the energy of the incoming signal and the perceived intensity by a human. Experimental results using an end-to-end speech recognition system demonstrate that power-function based MUD shows better result than the conventional Mel Filterbank Cepstral Coefficients (MFCCs). On the LibriSpeech database, we could achieve 4.02 % WER on test-clean and 13.34 % WER on test-other without using any Language Models (LMs). The major contribution of this work is that we developed a new algorithm for designing the compressive nonlinearity in a data-driven way, which is much more flexible than the previous approaches and may be extended to other domains as well.
Tasks End-To-End Speech Recognition, Speech Recognition
Published 2019-12-22
URL https://arxiv.org/abs/1912.11041v1
PDF https://arxiv.org/pdf/1912.11041v1.pdf
PWC https://paperswithcode.com/paper/power-law-nonlinearity-with-maximally-uniform
Repo
Framework

Neural Graph Matching Network: Learning Lawler’s Quadratic Assignment Problem with Extension to Hypergraph and Multiple-graph Matching

Title Neural Graph Matching Network: Learning Lawler’s Quadratic Assignment Problem with Extension to Hypergraph and Multiple-graph Matching
Authors Runzhong Wang, Junchi Yan, Xiaokang Yang
Abstract Graph matching involves combinatorial optimization based on edge-to-edge affinity matrix, which can be generally formulated as Lawler’s Quadratic Assignment Problem (QAP). This paper presents a QAP network directly learning with the affinity matrix (equivalently the association graph) whereby the matching problem is translated into a vertex classification task. The association graph is learned by an embedding network for vertex classification, followed by Sinkhorn normalization and a cross-entropy loss for end-to-end learning. We further improve the embedding model on association graph by introducing Sinkhorn based matching-aware constraint, as well as dummy nodes to deal with unequal sizes of graphs. To our best knowledge, this is the first network to directly learn with the general Lawler’s QAP. In contrast, recent deep matching methods focus on the learning of node and edge features in two graphs respectively. We also show how to extend our network to hypergraph matching, and matching of multiple graphs. Experimental results on both synthetic graphs and real-world images show its effectiveness. For pure QAP tasks on synthetic data and QAPLIB benchmark, our method can perform competitively and even surpass state-of-the-art graph matching and QAP solvers with notable less time cost. Source code will be made public at https://github.com/Thinklab-SJTU/
Tasks Combinatorial Optimization, Graph Matching, Hypergraph Matching
Published 2019-11-26
URL https://arxiv.org/abs/1911.11308v2
PDF https://arxiv.org/pdf/1911.11308v2.pdf
PWC https://paperswithcode.com/paper/neural-graph-matching-network-learning
Repo
Framework

Implementation of an Index Optimize Technology for Highly Specialized Terms based on the Phonetic Algorithm Metaphone

Title Implementation of an Index Optimize Technology for Highly Specialized Terms based on the Phonetic Algorithm Metaphone
Authors V. Buriachok, M. Hadzhyiev, V. Sokolov, P. Skladannyi, L. Kuzmenko
Abstract When compiling databases, for example to meet the needs of healthcare establishments, there is quite a common problem with the introduction and further processing of names and last names of doctors and patients that are highly specialized both in terms of pronunciation and writing. This is because names and last names of people cannot be unique, their notation is not subject to any rules of phonetics, while their length in different languages may not match. With the advent of the Internet, this situation has become generally critical and can lead to that multiple copies of e-mails are sent to one address. It is possible to solve the specified problem by using phonetic algorithms for comparing words Daitch-Mokotoff, Soundex, NYSIIS, Polyphone, and Metaphone, as well as the Levenshtein and Jaro algorithms, Q-gram-based algorithms, which make it possible to find distances between words. The most widespread among them are the Soundex and Metaphone algorithms, which are designed to index the words based on their sound, taking into consideration the rules of pronunciation. By applying the Metaphone algorithm, an attempt has been made to optimize the phonetic search processes for tasks of fuzzy coincidence, for example, at data deduplication in various databases and registries, in order to reduce the number of errors of incorrect input of last names. An analysis of the most common last names reveals that some of them are of the Ukrainian or Russian origin. At the same time, the rules following which the names are pronounced and written, for example in Ukrainian, differ radically from basic algorithms for English and differ quite significantly for the Russian language. That is why a phonetic algorithm should take into consideration first of all the peculiarities in the formation of Ukrainian last names, which is of special relevance now.
Tasks
Published 2019-10-31
URL https://arxiv.org/abs/1911.00152v1
PDF https://arxiv.org/pdf/1911.00152v1.pdf
PWC https://paperswithcode.com/paper/implementation-of-an-index-optimize
Repo
Framework

Fusion of Heterogeneous Earth Observation Data for the Classification of Local Climate Zones

Title Fusion of Heterogeneous Earth Observation Data for the Classification of Local Climate Zones
Authors Guichen Zhang, Pedram Ghamisi, Xiao Xiang Zhu
Abstract This paper proposes a novel framework for fusing multi-temporal, multispectral satellite images and OpenStreetMap (OSM) data for the classification of local climate zones (LCZs). Feature stacking is the most commonly-used method of data fusion but does not consider the heterogeneity of multimodal optical images and OSM data, which becomes its main drawback. The proposed framework processes two data sources separately and then combines them at the model level through two fusion models (the landuse fusion model and building fusion model), which aim to fuse optical images with landuse and buildings layers of OSM data, respectively. In addition, a new approach to detecting building incompleteness of OSM data is proposed. The proposed framework was trained and tested using data from the 2017 IEEE GRSS Data Fusion Contest, and further validated on one additional test set containing test samples which are manually labeled in Munich and New York. Experimental results have indicated that compared to the feature stacking-based baseline framework the proposed framework is effective in fusing optical images with OSM data for the classification of LCZs with high generalization capability on a large scale. The classification accuracy of the proposed framework outperforms the baseline framework by more than 6% and 2%, while testing on the test set of 2017 IEEE GRSS Data Fusion Contest and the additional test set, respectively. In addition, the proposed framework is less sensitive to spectral diversities of optical satellite images and thus achieves more stable classification performance than state-of-the art frameworks.
Tasks
Published 2019-05-29
URL https://arxiv.org/abs/1905.12305v1
PDF https://arxiv.org/pdf/1905.12305v1.pdf
PWC https://paperswithcode.com/paper/fusion-of-heterogeneous-earth-observation
Repo
Framework

Statistical Testing on ASR Performance via Blockwise Bootstrap

Title Statistical Testing on ASR Performance via Blockwise Bootstrap
Authors Zhe Liu
Abstract A common question being raised in automatic speech recognition (ASR) evaluations is how reliable is an observed word error rate (WER) improvement comparing two ASR systems, where statistical hypothesis testing and confidence intervals can be utilized to tell whether this improvement is real or only due to random chance. The bootstrap resampling method has been popular for such significance analysis which is intuitive and easy to use. However, this method fails in dealing with dependent data, which is prevalent in speech world - for example, ASR performance on utterances from the same speaker could be correlated. In this paper we present blockwise bootstrap approach - by dividing evaluation utterances into nonoverlapping blocks, this method resamples these blocks instead of original data. We show that the resulting variance estimator of absolute WER difference of two ASR systems is consistent under mild conditions. We also demonstrate the validity of blockwise bootstrap method on both synthetic and real-world speech data.
Tasks Speech Recognition
Published 2019-12-19
URL https://arxiv.org/abs/1912.09508v1
PDF https://arxiv.org/pdf/1912.09508v1.pdf
PWC https://paperswithcode.com/paper/statistical-testing-on-asr-performance-via
Repo
Framework

Updating Variational Bayes: Fast sequential posterior inference

Title Updating Variational Bayes: Fast sequential posterior inference
Authors Nathaniel Tomasetti, Catherine S. Forbes, Anastasios Panagiotelis
Abstract Variational Bayesian (VB) methods produce posterior inference in a time frame considerably smaller than traditional Markov Chain Monte Carlo approaches. Although the VB posterior is an approximation, it has been shown to produce good parameter estimates and predicted values when a rich classes of approximating distributions are considered. In this paper we propose Updating VB (UVB), a recursive algorithm used to update a sequence of VB posterior approximations in an online setting, with the computation of each posterior update requiring only the data observed since the previous update. An extension to the proposed algorithm, named UVB-IS, allows the user to trade accuracy for a substantial increase in computational speed through the use of importance sampling. The two methods and their properties are detailed in two separate simulation studies. Two empirical illustrations of the proposed UVB methods are provided, including one where a Dirichlet Process Mixture model with a novel posterior dependence structure is repeatedly updated in the context of predicting the future behaviour of vehicles on a stretch of the US Highway 101.
Tasks
Published 2019-08-01
URL https://arxiv.org/abs/1908.00225v1
PDF https://arxiv.org/pdf/1908.00225v1.pdf
PWC https://paperswithcode.com/paper/updating-variational-bayes-fast-sequential
Repo
Framework

Cost-Based Goal Recognition Meets Deep Learning

Title Cost-Based Goal Recognition Meets Deep Learning
Authors Mariane Maynard, Thibault Duhamel, Froduald Kabanza
Abstract The ability to observe the effects of actions performed by others and to infer their intent, most likely goals, or course of action, is known as a plan or intention recognition cognitive capability and has long been one of the fundamental research challenges in AI. Deep learning has recently been making significant inroads on various pattern recognition problems, except for intention recognition. While extensively explored since the seventies, the problem remains unsolved for most interesting cases in various areas, ranging from natural language understanding to human behavior understanding based on video feeds. This paper compares symbolic inverse planning, one of the most investigated approaches to goal recognition, to deep learning using CNN and LTSM neural network architectures, on five synthetic benchmarks often used in the literature. The results show that the deep learning approach achieves better goal-prediction accuracy and timeliness than the symbolic cost-based plan recognizer in these domains. Although preliminary, these results point to interesting future research avenues.
Tasks Intent Detection
Published 2019-11-22
URL https://arxiv.org/abs/1911.10074v1
PDF https://arxiv.org/pdf/1911.10074v1.pdf
PWC https://paperswithcode.com/paper/cost-based-goal-recognition-meets-deep
Repo
Framework
comments powered by Disqus