October 19, 2019

3080 words 15 mins read

Paper Group ANR 239

Paper Group ANR 239

Persian Vowel recognition with MFCC and ANN on PCVC speech dataset. Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies. Synthetic Patient Generation: A Deep Learning Approach Using Variational Autoencoders. Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approxima …

Persian Vowel recognition with MFCC and ANN on PCVC speech dataset

Title Persian Vowel recognition with MFCC and ANN on PCVC speech dataset
Authors Saber Malekzadeh, Mohammad Hossein Gholizadeh, Seyed Naser Razavi
Abstract In this paper a new method for recognition of consonant-vowel phonemes combination on a new Persian speech dataset titled as PCVC (Persian Consonant-Vowel Combination) is proposed which is used to recognize Persian phonemes. In PCVC dataset, there are 20 sets of audio samples from 10 speakers which are combinations of 23 consonant and 6 vowel phonemes of Persian language. In each sample, there is a combination of one vowel and one consonant. First, the consonant phoneme is pronounced and just after it, the vowel phoneme is pronounced. Each sound sample is a frame of 2 seconds of audio. In every 2 seconds, there is an average of 0.5 second speech and the rest is silence. In this paper, the proposed method is the implementations of the MFCC (Mel Frequency Cepstrum Coefficients) on every partitioned sound sample. Then, every train sample of MFCC vector is given to a multilayer perceptron feed-forward ANN (Artificial Neural Network) for training process. At the end, the test samples are examined on ANN model for phoneme recognition. After training and testing process, the results are presented in recognition of vowels. Then, the average percent of recognition for vowel phonemes are computed.
Tasks
Published 2018-12-17
URL http://arxiv.org/abs/1812.06953v1
PDF http://arxiv.org/pdf/1812.06953v1.pdf
PWC https://paperswithcode.com/paper/persian-vowel-recognition-with-mfcc-and-ann
Repo
Framework

Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies

Title Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies
Authors Alexander Sax, Bradley Emi, Amir R. Zamir, Leonidas Guibas, Silvio Savarese, Jitendra Malik
Abstract How much does having visual priors about the world (e.g. the fact that the world is 3D) assist in learning to perform downstream motor tasks (e.g. delivering a package)? We study this question by integrating a generic perceptual skill set (e.g. a distance estimator, an edge detector, etc.) within a reinforcement learning framework–see Figure 1. This skill set (hereafter mid-level perception) provides the policy with a more processed state of the world compared to raw images. We find that using a mid-level perception confers significant advantages over training end-to-end from scratch (i.e. not leveraging priors) in navigation-oriented tasks. Agents are able to generalize to situations where the from-scratch approach fails and training becomes significantly more sample efficient. However, we show that realizing these gains requires careful selection of the mid-level perceptual skills. Therefore, we refine our findings into an efficient max-coverage feature set that can be adopted in lieu of raw images. We perform our study in completely separate buildings for training and testing and compare against visually blind baseline policies and state-of-the-art feature learning methods.
Tasks Object Detection
Published 2018-12-31
URL http://arxiv.org/abs/1812.11971v3
PDF http://arxiv.org/pdf/1812.11971v3.pdf
PWC https://paperswithcode.com/paper/mid-level-visual-representations-improve
Repo
Framework

Synthetic Patient Generation: A Deep Learning Approach Using Variational Autoencoders

Title Synthetic Patient Generation: A Deep Learning Approach Using Variational Autoencoders
Authors Ally Salim Jr
Abstract Artificial Intelligence in healthcare is a new and exciting frontier and the possibilities are endless. With deep learning approaches beating human performances in many areas, the logical next step is to attempt their application in the health space. For these and other Machine Learning approaches to produce good results and have their potential realized, the need for, and importance of, large amounts of accurate data is second to none. This is a challenge faced by many industries and more so in the healthcare space. We present an approach of using Variational Autoencoders (VAE’s) as an approach to generating more data for training deeper networks, as well as uncovering underlying patterns in diagnoses and the patients suffering from them. By training a VAE, on available data, it was able to learn the latent distribution of the patient features given the diagnosis. It is then possible, after training, to sample from the learnt latent distribution to generate new accurate patient records given the patient diagnosis.
Tasks
Published 2018-08-20
URL http://arxiv.org/abs/1808.06444v1
PDF http://arxiv.org/pdf/1808.06444v1.pdf
PWC https://paperswithcode.com/paper/synthetic-patient-generation-a-deep-learning
Repo
Framework

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

Title Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation
Authors Hamid Reza Maei
Abstract We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning where the action representation adds to the-curse-of-dimensionality; that is, with continuous or large action sets, thus making it infeasible to estimate state-action value functions (Q functions). Using state-value functions helps to lift the curse and as a result naturally turn our policy-gradient solution into classical Actor-Critic architecture whose Actor uses state-value function for the update. Our algorithms, Gradient Actor-Critic and Emphatic Actor-Critic, are derived based on the exact gradient of averaged state-value function objective and thus are guaranteed to converge to its optimal solution, while maintaining all the desirable properties of classical Actor-Critic methods with no additional hyper-parameters. To our knowledge, this is the first time that convergent off-policy learning methods have been extended to classical Actor-Critic methods with function approximation.
Tasks
Published 2018-02-21
URL http://arxiv.org/abs/1802.07842v1
PDF http://arxiv.org/pdf/1802.07842v1.pdf
PWC https://paperswithcode.com/paper/convergent-actor-critic-algorithms-under-off
Repo
Framework

VommaNet: an End-to-End Network for Disparity Estimation from Reflective and Texture-less Light Field Images

Title VommaNet: an End-to-End Network for Disparity Estimation from Reflective and Texture-less Light Field Images
Authors Haoxin Ma, Haotian Li, Zhiwen Qian, Shengxian Shi, Tingting Mu
Abstract The precise combination of image sensor and micro-lens array enables lenslet light field cameras to record both angular and spatial information of incoming light, therefore, one can calculate disparity and depth from light field images. In turn, 3D models of the recorded objects can be recovered, which is a great advantage over other imaging system. However, reflective and texture-less areas in light field images have complicated conditions, making it hard to correctly calculate disparity with existing algorithms. To tackle this problem, we introduce a novel end-to-end network VommaNet to retrieve multi-scale features from reflective and texture-less regions for accurate disparity estimation. Meanwhile, our network has achieved similar or better performance in other regions for both synthetic light field images and real-world data compared to the state-of-the-art algorithms. Currently, we achieve the best score for mean squared error (MSE) on HCI 4D Light Field Benchmark.
Tasks Disparity Estimation
Published 2018-11-17
URL http://arxiv.org/abs/1811.07124v1
PDF http://arxiv.org/pdf/1811.07124v1.pdf
PWC https://paperswithcode.com/paper/vommanet-an-end-to-end-network-for-disparity
Repo
Framework

Compact Personalized Models for Neural Machine Translation

Title Compact Personalized Models for Neural Machine Translation
Authors Joern Wuebker, Patrick Simianer, John DeNero
Abstract We propose and compare methods for gradient-based domain adaptation of self-attentive neural machine translation models. We demonstrate that a large proportion of model parameters can be frozen during adaptation with minimal or no reduction in translation quality by encouraging structured sparsity in the set of offset tensors during learning via group lasso regularization. We evaluate this technique for both batch and incremental adaptation across multiple data sets and language pairs. Our system architecture - combining a state-of-the-art self-attentive model with compact domain adaptation - provides high quality personalized machine translation that is both space and time efficient.
Tasks Domain Adaptation, Machine Translation
Published 2018-11-05
URL http://arxiv.org/abs/1811.01990v1
PDF http://arxiv.org/pdf/1811.01990v1.pdf
PWC https://paperswithcode.com/paper/compact-personalized-models-for-neural
Repo
Framework

Satisficing in Time-Sensitive Bandit Learning

Title Satisficing in Time-Sensitive Bandit Learning
Authors Daniel Russo, Benjamin Van Roy
Abstract Much of the recent literature on bandit learning focuses on algorithms that aim to converge on an optimal action. One shortcoming is that this orientation does not account for time sensitivity, which can play a crucial role when learning an optimal action requires much more information than near-optimal ones. Indeed, popular approaches such as upper-confidence-bound methods and Thompson sampling can fare poorly in such situations. We consider instead learning a satisficing action, which is near-optimal while requiring less information, and propose satisficing Thompson sampling, an algorithm that serves this purpose. We establish a general bound on expected discounted regret and study the application of satisficing Thompson sampling to linear and infinite-armed bandits, demonstrating arbitrarily large benefits over Thompson sampling. We also discuss the relation between the notion of satisficing and the theory of rate distortion, which offers guidance on the selection of satisficing actions.
Tasks
Published 2018-03-07
URL https://arxiv.org/abs/1803.02855v2
PDF https://arxiv.org/pdf/1803.02855v2.pdf
PWC https://paperswithcode.com/paper/satisficing-in-time-sensitive-bandit-learning
Repo
Framework

Multi-view Factorization AutoEncoder with Network Constraints for Multi-omic Integrative Analysis

Title Multi-view Factorization AutoEncoder with Network Constraints for Multi-omic Integrative Analysis
Authors Tianle Ma, Aidong Zhang
Abstract Multi-omic data provides multiple views of the same patients. Integrative analysis of multi-omic data is crucial to elucidate the molecular underpinning of disease etiology. However, multi-omic data has the “big p, small N” problem (the number of features is large, but the number of samples is small), it is challenging to train a complicated machine learning model from the multi-omic data alone and make it generalize well. Here we propose a framework termed Multi-view Factorization AutoEncoder with network constraints to integrate multi-omic data with domain knowledge (biological interactions networks). Our framework employs deep representation learning to learn feature embeddings and patient embeddings simultaneously, enabling us to integrate feature interaction network and patient view similarity network constraints into the training objective. The whole framework is end-to-end differentiable. We applied our approach to the TCGA Pan-cancer dataset and achieved satisfactory results to predict disease progression-free interval (PFI) and patient overall survival (OS) events. Code will be made publicly available.
Tasks Representation Learning
Published 2018-09-06
URL http://arxiv.org/abs/1809.01772v1
PDF http://arxiv.org/pdf/1809.01772v1.pdf
PWC https://paperswithcode.com/paper/multi-view-factorization-autoencoder-with
Repo
Framework

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

Title On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation
Authors Tamer Alkhouli, Gabriel Bretschner, Hermann Ney
Abstract This work investigates the alignment problem in state-of-the-art multi-head attention models based on the transformer architecture. We demonstrate that alignment extraction in transformer models can be improved by augmenting an additional alignment head to the multi-head source-to-target attention component. This is used to compute sharper attention weights. We describe how to use the alignment head to achieve competitive performance. To study the effect of adding the alignment head, we simulate a dictionary-guided translation task, where the user wants to guide translation using pre-defined dictionary entries. Using the proposed approach, we achieve up to $3.8$ % BLEU improvement when using the dictionary, in comparison to $2.4$ % BLEU in the baseline case. We also propose alignment pruning to speed up decoding in alignment-based neural machine translation (ANMT), which speeds up translation by a factor of $1.8$ without loss in translation performance. We carry out experiments on the shared WMT 2016 English$\to$Romanian news task and the BOLT Chinese$\to$English discussion forum task.
Tasks Machine Translation
Published 2018-09-11
URL http://arxiv.org/abs/1809.03985v1
PDF http://arxiv.org/pdf/1809.03985v1.pdf
PWC https://paperswithcode.com/paper/on-the-alignment-problem-in-multi-head
Repo
Framework

Compare and Contrast: Learning Prominent Visual Differences

Title Compare and Contrast: Learning Prominent Visual Differences
Authors Steven Chen, Kristen Grauman
Abstract Relative attribute models can compare images in terms of all detected properties or attributes, exhaustively predicting which image is fancier, more natural, and so on without any regard to ordering. However, when humans compare images, certain differences will naturally stick out and come to mind first. These most noticeable differences, or prominent differences, are likely to be described first. In addition, many differences, although present, may not be mentioned at all. In this work, we introduce and model prominent differences, a rich new functionality for comparing images. We collect instance-level annotations of most noticeable differences, and build a model trained on relative attribute features that predicts prominent differences for unseen pairs. We test our model on the challenging UT-Zap50K shoes and LFW10 faces datasets, and outperform an array of baseline methods. We then demonstrate how our prominence model improves two vision tasks, image search and description generation, enabling more natural communication between people and vision systems.
Tasks Image Retrieval
Published 2018-03-31
URL http://arxiv.org/abs/1804.00112v2
PDF http://arxiv.org/pdf/1804.00112v2.pdf
PWC https://paperswithcode.com/paper/compare-and-contrast-learning-prominent
Repo
Framework

On representation power of neural network-based graph embedding and beyond

Title On representation power of neural network-based graph embedding and beyond
Authors Akifumi Okuno, Hidetoshi Shimodaira
Abstract We consider the representation power of siamese-style similarity functions used in neural network-based graph embedding. The inner product similarity (IPS) with feature vectors computed via neural networks is commonly used for representing the strength of association between two nodes. However, only a little work has been done on the representation capability of IPS. A very recent work shed light on the nature of IPS and reveals that IPS has the capability of approximating any positive definite (PD) similarities. However, a simple example demonstrates the fundamental limitation of IPS to approximate non-PD similarities. We then propose a novel model named Shifted IPS (SIPS) that approximates any Conditionally PD (CPD) similarities arbitrary well. CPD is a generalization of PD with many examples such as negative Poincar'e distance and negative Wasserstein distance, thus SIPS has a potential impact to significantly improve the applicability of graph embedding without taking great care in configuring the similarity function. Our numerical experiments demonstrate the SIPS’s superiority over IPS. In theory, we further extend SIPS beyond CPD by considering the inner product in Minkowski space so that it approximates more general similarities.
Tasks Graph Embedding
Published 2018-05-31
URL http://arxiv.org/abs/1805.12332v2
PDF http://arxiv.org/pdf/1805.12332v2.pdf
PWC https://paperswithcode.com/paper/on-representation-power-of-neural-network
Repo
Framework

Big Data Meet Cyber-Physical Systems: A Panoramic Survey

Title Big Data Meet Cyber-Physical Systems: A Panoramic Survey
Authors Rachad Atat, Lingjia Liu, Jinsong Wu, Guangyu Li, Chunxuan Ye, Yang Yi
Abstract The world is witnessing an unprecedented growth of cyber-physical systems (CPS), which are foreseen to revolutionize our world {via} creating new services and applications in a variety of sectors such as environmental monitoring, mobile-health systems, intelligent transportation systems and so on. The {information and communication technology }(ICT) sector is experiencing a significant growth in { data} traffic, driven by the widespread usage of smartphones, tablets and video streaming, along with the significant growth of sensors deployments that are anticipated in the near future. {It} is expected to outstandingly increase the growth rate of raw sensed data. In this paper, we present the CPS taxonomy {via} providing a broad overview of data collection, storage, access, processing and analysis. Compared with other survey papers, this is the first panoramic survey on big data for CPS, where our objective is to provide a panoramic summary of different CPS aspects. Furthermore, CPS {require} cybersecurity to protect {them} against malicious attacks and unauthorized intrusion, which {become} a challenge with the enormous amount of data that is continuously being generated in the network. {Thus, we also} provide an overview of the different security solutions proposed for CPS big data storage, access and analytics. We also discuss big data meeting green challenges in the contexts of CPS.
Tasks
Published 2018-10-29
URL http://arxiv.org/abs/1810.12399v1
PDF http://arxiv.org/pdf/1810.12399v1.pdf
PWC https://paperswithcode.com/paper/big-data-meet-cyber-physical-systems-a
Repo
Framework

Smart energy models for atomistic simulations using a DFT-driven multifidelity approach

Title Smart energy models for atomistic simulations using a DFT-driven multifidelity approach
Authors Luca Messina, Alessio Quaglino, Alexandra Goryaeva, Mihai-Cosmin Marinica, Christophe Domain, Nicolas Castin, Giovanni Bonny, Rolf Krause
Abstract The reliability of atomistic simulations depends on the quality of the underlying energy models providing the source of physical information, for instance for the calculation of migration barriers in atomistic Kinetic Monte Carlo simulations. Accurate (high-fidelity) methods are often available, but since they are usually computationally expensive, they must be replaced by less accurate (low-fidelity) models that introduce some degrees of approximation. Machine-learning techniques such as artificial neural networks are usually employed to work around this limitation and extract the needed parameters from large databases of high-fidelity data, but the latter are often computationally expensive to produce. This work introduces an alternative method based on the multifidelity approach, where correlations between high-fidelity and low-fidelity outputs are exploited to make an educated guess of the high-fidelity outcome based only on quick low-fidelity estimations, hence without the need of running full expensive high-fidelity calculations. With respect to neural networks, this approach is expected to require less training data because of the lower amount of fitting parameters involved. The method is tested on the prediction of ab initio formation and migration energies of vacancy diffusion in iron-copper alloys, and compared with the neural networks trained on the same database.
Tasks
Published 2018-08-21
URL http://arxiv.org/abs/1808.06935v2
PDF http://arxiv.org/pdf/1808.06935v2.pdf
PWC https://paperswithcode.com/paper/smart-energy-models-for-atomistic-simulations
Repo
Framework

Finding Correspondences for Optical Flow and Disparity Estimations using a Sub-pixel Convolution-based Encoder-Decoder Network

Title Finding Correspondences for Optical Flow and Disparity Estimations using a Sub-pixel Convolution-based Encoder-Decoder Network
Authors Juan Luis Gonzalez, Muhammad Sarmad, Hyunjoo J. Lee, Munchurl Kim
Abstract Deep convolutional neural networks (DCNN) have recently shown promising results in low-level computer vision problems such as optical flow and disparity estimation, but still, have much room to further improve their performance. In this paper, we propose a novel sub-pixel convolution-based encoder-decoder network for optical flow and disparity estimations, which can extend FlowNetS and DispNet by replacing the deconvolution layers with sup-pixel convolution blocks. By using sub-pixel refinement and estimation on the decoder stages instead of deconvolution, we can significantly improve the estimation accuracy for optical flow and disparity, even with reduced numbers of parameters. We show a supervised end-to-end training of our proposed networks for optical flow and disparity estimations, and an unsupervised end-to-end training for monocular depth and pose estimations. In order to verify the effectiveness of our proposed networks, we perform intensive experiments for (i) optical flow and disparity estimations, and (ii) monocular depth and pose estimations. Throughout the extensive experiments, our proposed networks outperform the baselines such as FlowNetS and DispNet in terms of estimation accuracy and training times.
Tasks Disparity Estimation, Optical Flow Estimation
Published 2018-10-07
URL http://arxiv.org/abs/1810.03155v1
PDF http://arxiv.org/pdf/1810.03155v1.pdf
PWC https://paperswithcode.com/paper/finding-correspondences-for-optical-flow-and
Repo
Framework

Explanations for Temporal Recommendations

Title Explanations for Temporal Recommendations
Authors Homanga Bharadhwaj, Shruti Joshi
Abstract Recommendation systems are an integral part of Artificial Intelligence (AI) and have become increasingly important in the growing age of commercialization in AI. Deep learning (DL) techniques for recommendation systems (RS) provide powerful latent-feature models for effective recommendation but suffer from the major drawback of being non-interpretable. In this paper we describe a framework for explainable temporal recommendations in a DL model. We consider an LSTM based Recurrent Neural Network (RNN) architecture for recommendation and a neighbourhood-based scheme for generating explanations in the model. We demonstrate the effectiveness of our approach through experiments on the Netflix dataset by jointly optimizing for both prediction accuracy and explainability.
Tasks Recommendation Systems
Published 2018-07-17
URL http://arxiv.org/abs/1807.06161v1
PDF http://arxiv.org/pdf/1807.06161v1.pdf
PWC https://paperswithcode.com/paper/explanations-for-temporal-recommendations
Repo
Framework
comments powered by Disqus