July 27, 2019

2877 words 14 mins read

Paper Group ANR 499

Paper Group ANR 499

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods. Open-World Visual Recognition Using Knowledge Graphs. Building competitive direct acoustics-to-word models for English conversational speech recognition. Learning to Play Othello with Deep Neural Networks. Direct Acoustics-to-Word Models for English Conversational Speec …

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

Title Experimental results : Reinforcement Learning of POMDPs using Spectral Methods
Authors Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar
Abstract We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the future observations in the process. We devise a learning algorithm running through epochs, in each epoch we employ spectral techniques to learn the POMDP parameters from a trajectory generated by a fixed policy. At the end of the epoch, an optimization oracle returns the optimal memoryless planning policy which maximizes the expected reward based on the estimated POMDP model. We prove an order-optimal regret bound with respect to the optimal memoryless policy and efficient scaling with respect to the dimensionality of observation and action spaces.
Tasks Latent Variable Models
Published 2017-05-07
URL http://arxiv.org/abs/1705.02553v1
PDF http://arxiv.org/pdf/1705.02553v1.pdf
PWC https://paperswithcode.com/paper/experimental-results-reinforcement-learning
Repo
Framework

Open-World Visual Recognition Using Knowledge Graphs

Title Open-World Visual Recognition Using Knowledge Graphs
Authors Vincent P. A. Lonij, Ambrish Rawat, Maria-Irina Nicolae
Abstract In a real-world setting, visual recognition systems can be brought to make predictions for images belonging to previously unknown class labels. In order to make semantically meaningful predictions for such inputs, we propose a two-step approach that utilizes information from knowledge graphs. First, a knowledge-graph representation is learned to embed a large set of entities into a semantic space. Second, an image representation is learned to embed images into the same space. Under this setup, we are able to predict structured properties in the form of relationship triples for any open-world image. This is true even when a set of labels has been omitted from the training protocols of both the knowledge graph and image embeddings. Furthermore, we append this learning framework with appropriate smoothness constraints and show how prior knowledge can be incorporated into the model. Both these improvements combined increase performance for visual recognition by a factor of six compared to our baseline. Finally, we propose a new, extended dataset which we use for experiments.
Tasks Knowledge Graphs
Published 2017-08-28
URL http://arxiv.org/abs/1708.08310v1
PDF http://arxiv.org/pdf/1708.08310v1.pdf
PWC https://paperswithcode.com/paper/open-world-visual-recognition-using-knowledge
Repo
Framework

Building competitive direct acoustics-to-word models for English conversational speech recognition

Title Building competitive direct acoustics-to-word models for English conversational speech recognition
Authors Kartik Audhkhasi, Brian Kingsbury, Bhuvana Ramabhadran, George Saon, Michael Picheny
Abstract Direct acoustics-to-word (A2W) models in the end-to-end paradigm have received increasing attention compared to conventional sub-word based automatic speech recognition models using phones, characters, or context-dependent hidden Markov model states. This is because A2W models recognize words from speech without any decoder, pronunciation lexicon, or externally-trained language model, making training and decoding with such models simple. Prior work has shown that A2W models require orders of magnitude more training data in order to perform comparably to conventional models. Our work also showed this accuracy gap when using the English Switchboard-Fisher data set. This paper describes a recipe to train an A2W model that closes this gap and is at-par with state-of-the-art sub-word based models. We achieve a word error rate of 8.8%/13.9% on the Hub5-2000 Switchboard/CallHome test sets without any decoder or language model. We find that model initialization, training data order, and regularization have the most impact on the A2W model performance. Next, we present a joint word-character A2W model that learns to first spell the word and then recognize it. This model provides a rich output to the user instead of simple word hypotheses, making it especially useful in the case of words unseen or rarely-seen during training.
Tasks English Conversational Speech Recognition, Language Modelling, Speech Recognition
Published 2017-12-08
URL http://arxiv.org/abs/1712.03133v1
PDF http://arxiv.org/pdf/1712.03133v1.pdf
PWC https://paperswithcode.com/paper/building-competitive-direct-acoustics-to-word
Repo
Framework

Learning to Play Othello with Deep Neural Networks

Title Learning to Play Othello with Deep Neural Networks
Authors Paweł Liskowski, Wojciech Jaśkowski, Krzysztof Krawiec
Abstract Achieving superhuman playing level by AlphaGo corroborated the capabilities of convolutional neural architectures (CNNs) for capturing complex spatial patterns. This result was to a great extent due to several analogies between Go board states and 2D images CNNs have been designed for, in particular translational invariance and a relatively large board. In this paper, we verify whether CNN-based move predictors prove effective for Othello, a game with significantly different characteristics, including a much smaller board size and complete lack of translational invariance. We compare several CNN architectures and board encodings, augment them with state-of-the-art extensions, train on an extensive database of experts’ moves, and examine them with respect to move prediction accuracy and playing strength. The empirical evaluation confirms high capabilities of neural move predictors and suggests a strong correlation between prediction accuracy and playing strength. The best CNNs not only surpass all other 1-ply Othello players proposed to date but defeat (2-ply) Edax, the best open-source Othello player.
Tasks
Published 2017-11-17
URL http://arxiv.org/abs/1711.06583v1
PDF http://arxiv.org/pdf/1711.06583v1.pdf
PWC https://paperswithcode.com/paper/learning-to-play-othello-with-deep-neural
Repo
Framework

Direct Acoustics-to-Word Models for English Conversational Speech Recognition

Title Direct Acoustics-to-Word Models for English Conversational Speech Recognition
Authors Kartik Audhkhasi, Bhuvana Ramabhadran, George Saon, Michael Picheny, David Nahamoo
Abstract Recent work on end-to-end automatic speech recognition (ASR) has shown that the connectionist temporal classification (CTC) loss can be used to convert acoustics to phone or character sequences. Such systems are used with a dictionary and separately-trained Language Model (LM) to produce word sequences. However, they are not truly end-to-end in the sense of mapping acoustics directly to words without an intermediate phone representation. In this paper, we present the first results employing direct acoustics-to-word CTC models on two well-known public benchmark tasks: Switchboard and CallHome. These models do not require an LM or even a decoder at run-time and hence recognize speech with minimal complexity. However, due to the large number of word output units, CTC word models require orders of magnitude more data to train reliably compared to traditional systems. We present some techniques to mitigate this issue. Our CTC word model achieves a word error rate of 13.0%/18.8% on the Hub5-2000 Switchboard/CallHome test sets without any LM or decoder compared with 9.6%/16.0% for phone-based CTC with a 4-gram LM. We also present rescoring results on CTC word model lattices to quantify the performance benefits of a LM, and contrast the performance of word and phone CTC models.
Tasks English Conversational Speech Recognition, Language Modelling, Speech Recognition
Published 2017-03-22
URL http://arxiv.org/abs/1703.07754v1
PDF http://arxiv.org/pdf/1703.07754v1.pdf
PWC https://paperswithcode.com/paper/direct-acoustics-to-word-models-for-english
Repo
Framework

Approximate Bayes learning of stochastic differential equations

Title Approximate Bayes learning of stochastic differential equations
Authors Philipp Batz, Andreas Ruttor, Manfred Opper
Abstract We introduce a nonparametric approach for estimating drift and diffusion functions in systems of stochastic differential equations from observations of the state vector. Gaussian processes are used as flexible models for these functions and estimates are calculated directly from dense data sets using Gaussian process regression. We also develop an approximate expectation maximization algorithm to deal with the unobserved, latent dynamics between sparse observations. The posterior over states is approximated by a piecewise linearized process of the Ornstein-Uhlenbeck type and the maximum a posteriori estimation of the drift is facilitated by a sparse Gaussian process approximation.
Tasks Gaussian Processes
Published 2017-02-17
URL http://arxiv.org/abs/1702.05390v1
PDF http://arxiv.org/pdf/1702.05390v1.pdf
PWC https://paperswithcode.com/paper/approximate-bayes-learning-of-stochastic
Repo
Framework

Transfer entropy-based feedback improves performance in artificial neural networks

Title Transfer entropy-based feedback improves performance in artificial neural networks
Authors Sebastian Herzog, Christian Tetzlaff, Florentin Wörgötter
Abstract The structure of the majority of modern deep neural networks is characterized by uni- directional feed-forward connectivity across a very large number of layers. By contrast, the architecture of the cortex of vertebrates contains fewer hierarchical levels but many recurrent and feedback connections. Here we show that a small, few-layer artificial neural network that employs feedback will reach top level performance on a standard benchmark task, otherwise only obtained by large feed-forward structures. To achieve this we use feed-forward transfer entropy between neurons to structure feedback connectivity. Transfer entropy can here intuitively be understood as a measure for the relevance of certain pathways in the network, which are then amplified by feedback. Feedback may therefore be key for high network performance in small brain-like architectures.
Tasks
Published 2017-06-13
URL http://arxiv.org/abs/1706.04265v2
PDF http://arxiv.org/pdf/1706.04265v2.pdf
PWC https://paperswithcode.com/paper/transfer-entropy-based-feedback-improves
Repo
Framework

Conversion Rate Optimization through Evolutionary Computation

Title Conversion Rate Optimization through Evolutionary Computation
Authors Risto Miikkulainen, Neil Iscoe, Aaron Shagrin, Ron Cordell, Sam Nazari, Cory Schoolland, Myles Brundage, Jonathan Epstein, Randy Dean, Gurmeet Lamba
Abstract Conversion optimization means designing a web interface so that as many users as possible take a desired action on it, such as register or purchase. Such design is usually done by hand, testing one change at a time through A/B testing, or a limited number of combinations through multivariate testing, making it possible to evaluate only a small fraction of designs in a vast design space. This paper describes Sentient Ascend, an automatic conversion optimization system that uses evolutionary optimization to create effective web interface designs. Ascend makes it possible to discover and utilize interactions between the design elements that are difficult to identify otherwise. Moreover, evaluation of design candidates is done in parallel online, i.e. with a large number of real users interacting with the system. A case study on an existing media site shows that significant improvements (i.e. over 43%) are possible beyond human design. Ascend can therefore be seen as an approach to massively multivariate conversion optimization, based on a massively parallel interactive evolution.
Tasks
Published 2017-03-01
URL http://arxiv.org/abs/1703.00556v4
PDF http://arxiv.org/pdf/1703.00556v4.pdf
PWC https://paperswithcode.com/paper/conversion-rate-optimization-through
Repo
Framework

High Dimensional Multi-Level Covariance Estimation and Kriging

Title High Dimensional Multi-Level Covariance Estimation and Kriging
Authors Julio E. Castrillon-Candas
Abstract With the advent of big data sets much of the computational science and engineering communities have been moving toward data-driven approaches to regression and classification. However, they present a significant challenge due to the increasing size, complexity and dimensionality of the problems. In this paper a multi-level kriging method that scales well with dimensions is developed. A multi-level basis is constructed that is adapted to a random projection tree (or kD-tree) partitioning of the observations and a sparse grid approximation. This approach identifies the high dimensional underlying phenomena from the noise in an accurate and numerically stable manner. Furthermore, numerically unstable covariance matrices are transformed into well conditioned multi-level matrices without compromising accuracy. A-posteriori error estimates are derived, such as the sub-exponential decay of the coefficients of the multi-level covariance matrix. The multi-level method is tested on numerically unstable problems of up to 50 dimensions. Accurate solutions with feasible computational cost are obtained.
Tasks
Published 2017-01-01
URL http://arxiv.org/abs/1701.00285v1
PDF http://arxiv.org/pdf/1701.00285v1.pdf
PWC https://paperswithcode.com/paper/high-dimensional-multi-level-covariance
Repo
Framework

Towards Monetary Incentives in Social Q&A Services

Title Towards Monetary Incentives in Social Q&A Services
Authors Steve T. K. Jan, Chun Wang, Qing Zhang, Gang Wang
Abstract Community-based question answering (CQA) services are facing key challenges to motivate domain experts to provide timely answers. Recently, CQA services are exploring new incentive models to engage experts and celebrities by allowing them to set a price on their answers. In this paper, we perform a data-driven analysis on two emerging payment-based CQA systems: Fenda (China) and Whale (US). By analyzing a large dataset of 220K questions (worth 1 million USD collectively), we examine how monetary incentives affect different players in the system. We find that, while monetary incentive enables quick answers from experts, it also drives certain users to aggressively game the system for profits. In addition, in this supplier-driven marketplace, users need to proactively adjust their price to make profits. Famous people are unwilling to lower their price, which in turn hurts their income and engagement over time. Finally, we discuss the key implications to future CQA design.
Tasks Question Answering
Published 2017-03-03
URL http://arxiv.org/abs/1703.01333v2
PDF http://arxiv.org/pdf/1703.01333v2.pdf
PWC https://paperswithcode.com/paper/towards-monetary-incentives-in-social-qa
Repo
Framework

Deep Learning in Multiple Multistep Time Series Prediction

Title Deep Learning in Multiple Multistep Time Series Prediction
Authors Chuanyun Zang
Abstract The project aims to research on combining deep learning specifically Long-Short Memory (LSTM) and basic statistics in multiple multistep time series prediction. LSTM can dive into all the pages and learn the general trends of variation in a large scope, while the well selected medians for each page can keep the special seasonality of different pages so that the future trend will not fluctuate too much from the reality. A recent Kaggle competition on 145K Web Traffic Time Series Forecasting [1] is used to thoroughly illustrate and test this idea.
Tasks Time Series, Time Series Forecasting, Time Series Prediction
Published 2017-10-12
URL http://arxiv.org/abs/1710.04373v1
PDF http://arxiv.org/pdf/1710.04373v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-in-multiple-multistep-time
Repo
Framework

Multimodal Affect Analysis for Product Feedback Assessment

Title Multimodal Affect Analysis for Product Feedback Assessment
Authors Amol S Patwardhan, Gerald M Knapp
Abstract Consumers often react expressively to products such as food samples, perfume, jewelry, sunglasses, and clothing accessories. This research discusses a multimodal affect recognition system developed to classify whether a consumer likes or dislikes a product tested at a counter or kiosk, by analyzing the consumer’s facial expression, body posture, hand gestures, and voice after testing the product. A depth-capable camera and microphone system - Kinect for Windows - is utilized. An emotion identification engine has been developed to analyze the images and voice to determine affective state of the customer. The image is segmented using skin color and adaptive threshold. Face, body and hands are detected using the Haar cascade classifier. Canny edges are identified and the lip, body and hand contours are extracted using spatial filtering. Edge count and orientation around the mouth, cheeks, eyes, shoulders, fingers and the location of the edges are used as features. Classification is done by an emotion template mapping algorithm and training a classifier using support vector machines. The real-time performance, accuracy and feasibility for multimodal affect recognition in feedback assessment are evaluated.
Tasks
Published 2017-05-07
URL http://arxiv.org/abs/1705.02694v1
PDF http://arxiv.org/pdf/1705.02694v1.pdf
PWC https://paperswithcode.com/paper/multimodal-affect-analysis-for-product
Repo
Framework

Empirical Bayes Matrix Completion

Title Empirical Bayes Matrix Completion
Authors Takeru Matsuda, Fumiyasu Komaki
Abstract We develop an empirical Bayes (EB) algorithm for the matrix completion problems. The EB algorithm is motivated from the singular value shrinkage estimator for matrix means by Efron and Morris (1972). Since the EB algorithm is essentially the EM algorithm applied to a simple model, it does not require heuristic parameter tuning other than tolerance. Numerical results demonstrated that the EB algorithm achieves a good trade-off between accuracy and efficiency compared to existing algorithms and that it works particularly well when the difference between the number of rows and columns is large. Application to real data also shows the practical utility of the EB algorithm.
Tasks Matrix Completion
Published 2017-06-05
URL http://arxiv.org/abs/1706.01252v2
PDF http://arxiv.org/pdf/1706.01252v2.pdf
PWC https://paperswithcode.com/paper/empirical-bayes-matrix-completion
Repo
Framework

Stochastic Newton and Quasi-Newton Methods for Large Linear Least-squares Problems

Title Stochastic Newton and Quasi-Newton Methods for Large Linear Least-squares Problems
Authors Julianne Chung, Matthias Chung, J. Tanner Slagel, Luis Tenorio
Abstract We describe stochastic Newton and stochastic quasi-Newton approaches to efficiently solve large linear least-squares problems where the very large data sets present a significant computational burden (e.g., the size may exceed computer memory or data are collected in real-time). In our proposed framework, stochasticity is introduced in two different frameworks as a means to overcome these computational limitations, and probability distributions that can exploit structure and/or sparsity are considered. Theoretical results on consistency of the approximations for both the stochastic Newton and the stochastic quasi-Newton methods are provided. The results show, in particular, that stochastic Newton iterates, in contrast to stochastic quasi-Newton iterates, may not converge to the desired least-squares solution. Numerical examples, including an example from extreme learning machines, demonstrate the potential applications of these methods.
Tasks
Published 2017-02-23
URL http://arxiv.org/abs/1702.07367v1
PDF http://arxiv.org/pdf/1702.07367v1.pdf
PWC https://paperswithcode.com/paper/stochastic-newton-and-quasi-newton-methods
Repo
Framework

Visualizing and Improving Scattering Networks

Title Visualizing and Improving Scattering Networks
Authors Fergal Cotter, Nick Kingsbury
Abstract Scattering Transforms (or ScatterNets) introduced by Mallat are a promising start into creating a well-defined feature extractor to use for pattern recognition and image classification tasks. They are of particular interest due to their architectural similarity to Convolutional Neural Networks (CNNs), while requiring no parameter learning and still performing very well (particularly in constrained classification tasks). In this paper we visualize what the deeper layers of a ScatterNet are sensitive to using a ‘DeScatterNet’. We show that the higher orders of ScatterNets are sensitive to complex, edge-like patterns (checker-boards and rippled edges). These complex patterns may be useful for texture classification, but are quite dissimilar from the patterns visualized in second and third layers of Convolutional Neural Networks (CNNs) - the current state of the art Image Classifiers. We propose that this may be the source of the current gaps in performance between ScatterNets and CNNs (83% vs 93% on CIFAR-10 for ScatterNet+SVM vs ResNet). We then use these visualization tools to propose possible enhancements to the ScatterNet design, which show they have the power to extract features more closely resembling CNNs, while still being well-defined and having the invariance properties fundamental to ScatterNets.
Tasks Image Classification, Texture Classification
Published 2017-09-05
URL http://arxiv.org/abs/1709.01355v1
PDF http://arxiv.org/pdf/1709.01355v1.pdf
PWC https://paperswithcode.com/paper/visualizing-and-improving-scattering-networks
Repo
Framework
comments powered by Disqus