May 6, 2019

2735 words 13 mins read

Paper Group ANR 414

Towards Score Following in Sheet Music Images. Sample-efficient Deep Reinforcement Learning for Dialog Control. Matching Handwritten Document Images. Risk-Sensitive Learning and Pricing for Demand Response. The Symbolic Interior Point Method. Technical Report: A Generalized Matching Pursuit Approach for Graph-Structured Sparsity. Stationary time-ve …

Towards Score Following in Sheet Music Images


Title	Towards Score Following in Sheet Music Images
Authors	Matthias Dorfer, Andreas Arzt, Gerhard Widmer
Abstract	This paper addresses the matching of short music audio snippets to the corresponding pixel location in images of sheet music. A system is presented that simultaneously learns to read notes, listens to music and matches the currently played music to its corresponding notes in the sheet. It consists of an end-to-end multi-modal convolutional neural network that takes as input images of sheet music and spectrograms of the respective audio snippets. It learns to predict, for a given unseen audio snippet (covering approximately one bar of music), the corresponding position in the respective score line. Our results suggest that with the use of (deep) neural networks – which have proven to be powerful image processing models – working with sheet music becomes feasible and a promising future research direction.
Tasks
Published	2016-12-15
URL	http://arxiv.org/abs/1612.05050v1
PDF	http://arxiv.org/pdf/1612.05050v1.pdf
PWC	https://paperswithcode.com/paper/towards-score-following-in-sheet-music-images
Repo
Framework

Sample-efficient Deep Reinforcement Learning for Dialog Control


Title	Sample-efficient Deep Reinforcement Learning for Dialog Control
Authors	Kavosh Asadi, Jason D. Williams
Abstract	Representing a dialog policy as a recurrent neural network (RNN) is attractive because it handles partial observability, infers a latent representation of state, and can be optimized with supervised learning (SL) or reinforcement learning (RL). For RL, a policy gradient approach is natural, but is sample inefficient. In this paper, we present 3 methods for reducing the number of dialogs required to optimize an RNN-based dialog policy with RL. The key idea is to maintain a second RNN which predicts the value of the current policy, and to apply experience replay to both networks. On two tasks, these methods reduce the number of dialogs/episodes required by about a third, vs. standard policy gradient methods.
Tasks	Policy Gradient Methods
Published	2016-12-18
URL	http://arxiv.org/abs/1612.06000v1
PDF	http://arxiv.org/pdf/1612.06000v1.pdf
PWC	https://paperswithcode.com/paper/sample-efficient-deep-reinforcement-learning-1
Repo
Framework

Matching Handwritten Document Images


Title	Matching Handwritten Document Images
Authors	Praveen Krishnan, C. V. Jawahar
Abstract	We address the problem of predicting similarity between a pair of handwritten document images written by different individuals. This has applications related to matching and mining in image collections containing handwritten content. A similarity score is computed by detecting patterns of text re-usages between document images irrespective of the minor variations in word morphology, word ordering, layout and paraphrasing of the content. Our method does not depend on an accurate segmentation of words and lines. We formulate the document matching problem as a structured comparison of the word distributions across two document images. To match two word images, we propose a convolutional neural network (CNN) based feature descriptor. Performance of this representation surpasses the state-of-the-art on handwritten word spotting. Finally, we demonstrate the applicability of our method on a practical problem of matching handwritten assignments.
Tasks
Published	2016-05-19
URL	http://arxiv.org/abs/1605.05923v1
PDF	http://arxiv.org/pdf/1605.05923v1.pdf
PWC	https://paperswithcode.com/paper/matching-handwritten-document-images
Repo
Framework

Risk-Sensitive Learning and Pricing for Demand Response


Title	Risk-Sensitive Learning and Pricing for Demand Response
Authors	Kia Khezeli, Eilyan Bitar
Abstract	We consider the setting in which an electric power utility seeks to curtail its peak electricity demand by offering a fixed group of customers a uniform price for reductions in consumption relative to their predetermined baselines. The underlying demand curve, which describes the aggregate reduction in consumption in response to the offered price, is assumed to be affine and subject to unobservable random shocks. Assuming that both the parameters of the demand curve and the distribution of the random shocks are initially unknown to the utility, we investigate the extent to which the utility might dynamically adjust its offered prices to maximize its cumulative risk-sensitive payoff over a finite number of $T$ days. In order to do so effectively, the utility must design its pricing policy to balance the tradeoff between the need to learn the unknown demand model (exploration) and maximize its payoff (exploitation) over time. In this paper, we propose such a pricing policy, which is shown to exhibit an expected payoff loss over $T$ days that is at most $O(\sqrt{T}\log(T))$, relative to an oracle pricing policy that knows the underlying demand model. Moreover, the proposed pricing policy is shown to yield a sequence of prices that converge to the oracle optimal prices in the mean square sense.
Tasks
Published	2016-11-21
URL	http://arxiv.org/abs/1611.07098v3
PDF	http://arxiv.org/pdf/1611.07098v3.pdf
PWC	https://paperswithcode.com/paper/risk-sensitive-learning-and-pricing-for
Repo
Framework

The Symbolic Interior Point Method


Title	The Symbolic Interior Point Method
Authors	Martin Mladenov, Vaishak Belle, Kristian Kersting
Abstract	A recent trend in probabilistic inference emphasizes the codification of models in a formal syntax, with suitable high-level features such as individuals, relations, and connectives, enabling descriptive clarity, succinctness and circumventing the need for the modeler to engineer a custom solver. Unfortunately, bringing these linguistic and pragmatic benefits to numerical optimization has proven surprisingly challenging. In this paper, we turn to these challenges: we introduce a rich modeling language, for which an interior-point method computes approximate solutions in a generic way. While logical features easily complicates the underlying model, often yielding intricate dependencies, we exploit and cache local structure using algebraic decision diagrams (ADDs). Indeed, standard matrix-vector algebra is efficiently realizable in ADDs, but we argue and show that well-known optimization methods are not ideal for ADDs. Our engine, therefore, invokes a sophisticated matrix-free approach. We demonstrate the flexibility of the resulting symbolic-numeric optimizer on decision making and compressed sensing tasks with millions of non-zero entries.
Tasks	Decision Making
Published	2016-05-26
URL	http://arxiv.org/abs/1605.08187v3
PDF	http://arxiv.org/pdf/1605.08187v3.pdf
PWC	https://paperswithcode.com/paper/the-symbolic-interior-point-method
Repo
Framework

Technical Report: A Generalized Matching Pursuit Approach for Graph-Structured Sparsity


Title	Technical Report: A Generalized Matching Pursuit Approach for Graph-Structured Sparsity
Authors	Feng Chen, Baojian Zhou
Abstract	Sparsity-constrained optimization is an important and challenging problem that has wide applicability in data mining, machine learning, and statistics. In this paper, we focus on sparsity-constrained optimization in cases where the cost function is a general nonlinear function and, in particular, the sparsity constraint is defined by a graph-structured sparsity model. Existing methods explore this problem in the context of sparse estimation in linear models. To the best of our knowledge, this is the first work to present an efficient approximation algorithm, namely, Graph-structured Matching Pursuit (Graph-Mp), to optimize a general nonlinear function subject to graph-structured constraints. We prove that our algorithm enjoys the strong guarantees analogous to those designed for linear models in terms of convergence rate and approximation accuracy. As a case study, we specialize Graph-Mp to optimize a number of well-known graph scan statistic models for the connected subgraph detection task, and empirical evidence demonstrates that our general algorithm performs superior over state-of-the-art methods that are designed specifically for the task of connected subgraph detection.
Tasks
Published	2016-12-11
URL	http://arxiv.org/abs/1612.03364v1
PDF	http://arxiv.org/pdf/1612.03364v1.pdf
PWC	https://paperswithcode.com/paper/technical-report-a-generalized-matching
Repo
Framework

Stationary time-vertex signal processing


Title	Stationary time-vertex signal processing
Authors	Andreas Loukas, Nathanaël Perraudin
Abstract	This paper considers regression tasks involving high-dimensional multivariate processes whose structure is dependent on some {known} graph topology. We put forth a new definition of time-vertex wide-sense stationarity, or joint stationarity for short, that goes beyond product graphs. Joint stationarity helps by reducing the estimation variance and recovery complexity. In particular, for any jointly stationary process (a) one reliably learns the covariance structure from as little as a single realization of the process, and (b) solves MMSE recovery problems, such as interpolation and denoising, in computational time nearly linear on the number of edges and timesteps. Experiments with three datasets suggest that joint stationarity can yield accuracy improvements in the recovery of high-dimensional processes evolving over a graph, even when the latter is only approximately known, or the process is not strictly stationary.
Tasks	Denoising
Published	2016-11-01
URL	https://arxiv.org/abs/1611.00255v3
PDF	https://arxiv.org/pdf/1611.00255v3.pdf
PWC	https://paperswithcode.com/paper/stationary-time-vertex-signal-processing
Repo
Framework

Mesh Denoising based on Normal Voting Tensor and Binary Optimization


Title	Mesh Denoising based on Normal Voting Tensor and Binary Optimization
Authors	S. K. Yadav, U. Reitebuch, K. Polthier
Abstract	This paper presents a tensor multiplication based smoothing algorithm that follows a two step denoising method. Unlike other traditional averaging approaches, our approach uses an element based normal voting tensor to compute smooth surfaces. By introducing a binary optimization on the proposed tensor together with a local binary neighborhood concept, our algorithm better retains sharp features and produces smoother umbilical regions than previous approaches. On top of that, we provide a stochastic analysis on the different kinds of noise based on the average edge length. The quantitative and visual results demonstrate the performance our method is better compared to state of the art smoothing approaches.
Tasks	Denoising
Published	2016-07-20
URL	http://arxiv.org/abs/1607.07427v2
PDF	http://arxiv.org/pdf/1607.07427v2.pdf
PWC	https://paperswithcode.com/paper/mesh-denoising-based-on-normal-voting-tensor
Repo
Framework

Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server


Title	Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server
Authors	Arda Aytekin, Hamid Reza Feyzmahdavian, Mikael Johansson
Abstract	This paper presents an asynchronous incremental aggregated gradient algorithm and its implementation in a parameter server framework for solving regularized optimization problems. The algorithm can handle both general convex (possibly non-smooth) regularizers and general convex constraints. When the empirical data loss is strongly convex, we establish linear convergence rate, give explicit expressions for step-size choices that guarantee convergence to the optimum, and bound the associated convergence factors. The expressions have an explicit dependence on the degree of asynchrony and recover classical results under synchronous operation. Simulations and implementations on commercial compute clouds validate our findings.
Tasks
Published	2016-10-18
URL	http://arxiv.org/abs/1610.05507v1
PDF	http://arxiv.org/pdf/1610.05507v1.pdf
PWC	https://paperswithcode.com/paper/analysis-and-implementation-of-an
Repo
Framework

Automatic Discoveries of Physical and Semantic Concepts via Association Priors of Neuron Groups


Title	Automatic Discoveries of Physical and Semantic Concepts via Association Priors of Neuron Groups
Authors	Shuai Li, Kui Jia, Xiaogang Wang
Abstract	The recent successful deep neural networks are largely trained in a supervised manner. It {\it associates} complex patterns of input samples with neurons in the last layer, which form representations of {\it concepts}. In spite of their successes, the properties of complex patterns associated a learned concept remain elusive. In this work, by analyzing how neurons are associated with concepts in supervised networks, we hypothesize that with proper priors to regulate learning, neural networks can automatically associate neurons in the intermediate layers with concepts that are aligned with real world concepts, when trained only with labels that associate concepts with top level neurons, which is a plausible way for unsupervised learning. We develop a prior to verify the hypothesis and experimentally find the proposed prior help neural networks automatically learn both basic physical concepts at the lower layers, e.g., rotation of filters, and highly semantic concepts at the higher layers, e.g., fine-grained categories of an entry-level category.
Tasks
Published	2016-12-30
URL	http://arxiv.org/abs/1612.09438v2
PDF	http://arxiv.org/pdf/1612.09438v2.pdf
PWC	https://paperswithcode.com/paper/automatic-discoveries-of-physical-and
Repo
Framework

Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices


Title	Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices
Authors	Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Henderson, Przemysław Szczepaniak
Abstract	Acoustic models based on long short-term memory recurrent neural networks (LSTM-RNNs) were applied to statistical parametric speech synthesis (SPSS) and showed significant improvements in naturalness and latency over those based on hidden Markov models (HMMs). This paper describes further optimizations of LSTM-RNN-based SPSS for deployment on mobile devices; weight quantization, multi-frame inference, and robust inference using an {\epsilon}-contaminated Gaussian loss function. Experimental results in subjective listening tests show that these optimizations can make LSTM-RNN-based SPSS comparable to HMM-based SPSS in runtime speed while maintaining naturalness. Evaluations between LSTM-RNN- based SPSS and HMM-driven unit selection speech synthesis are also presented.
Tasks	Quantization, Speech Synthesis
Published	2016-06-20
URL	http://arxiv.org/abs/1606.06061v2
PDF	http://arxiv.org/pdf/1606.06061v2.pdf
PWC	https://paperswithcode.com/paper/fast-compact-and-high-quality-lstm-rnn-based
Repo
Framework

3D Human Pose Estimation Using Convolutional Neural Networks with 2D Pose Information


Title	3D Human Pose Estimation Using Convolutional Neural Networks with 2D Pose Information
Authors	Sungheon Park, Jihye Hwang, Nojun Kwak
Abstract	While there has been a success in 2D human pose estimation with convolutional neural networks (CNNs), 3D human pose estimation has not been thoroughly studied. In this paper, we tackle the 3D human pose estimation task with end-to-end learning using CNNs. Relative 3D positions between one joint and the other joints are learned via CNNs. The proposed method improves the performance of CNN with two novel ideas. First, we added 2D pose information to estimate a 3D pose from an image by concatenating 2D pose estimation result with the features from an image. Second, we have found that more accurate 3D poses are obtained by combining information on relative positions with respect to multiple joints, instead of just one root joint. Experimental results show that the proposed method achieves comparable performance to the state-of-the-art methods on Human 3.6m dataset.
Tasks	3D Human Pose Estimation, Pose Estimation
Published	2016-08-10
URL	http://arxiv.org/abs/1608.03075v2
PDF	http://arxiv.org/pdf/1608.03075v2.pdf
PWC	https://paperswithcode.com/paper/3d-human-pose-estimation-using-convolutional
Repo
Framework

Deep Feature Based Contextual Model for Object Detection


Title	Deep Feature Based Contextual Model for Object Detection
Authors	Wenqing Chu, Deng Cai
Abstract	Object detection is one of the most active areas in computer vision, which has made significant improvement in recent years. Current state-of-the-art object detection methods mostly adhere to the framework of regions with convolutional neural network (R-CNN) and only use local appearance features inside object bounding boxes. Since these approaches ignore the contextual information around the object proposals, the outcome of these detectors may generate a semantically incoherent interpretation of the input image. In this paper, we propose an ensemble object detection system which incorporates the local appearance, the contextual information in term of relationships among objects and the global scene based contextual feature generated by a convolutional neural network. The system is formulated as a fully connected conditional random field (CRF) defined on object proposals and the contextual constraints among object proposals are modeled as edges naturally. Furthermore, a fast mean field approximation method is utilized to inference in this CRF model efficiently. The experimental results demonstrate that our approach achieves a higher mean average precision (mAP) on PASCAL VOC 2007 datasets compared to the baseline algorithm Faster R-CNN.
Tasks	Object Detection
Published	2016-04-14
URL	http://arxiv.org/abs/1604.04048v1
PDF	http://arxiv.org/pdf/1604.04048v1.pdf
PWC	https://paperswithcode.com/paper/deep-feature-based-contextual-model-for
Repo
Framework

Optimal Belief Approximation


Title	Optimal Belief Approximation
Authors	Reimar H. Leike, Torsten A. Enßlin
Abstract	In Bayesian statistics probability distributions express beliefs. However, for many problems the beliefs cannot be computed analytically and approximations of beliefs are needed. We seek a loss function that quantifies how “embarrassing” it is to communicate a given approximation. We reproduce and discuss an old proof showing that there is only one ranking under the requirements that (1) the best ranked approximation is the non-approximated belief and (2) that the ranking judges approximations only by their predictions for actual outcomes. The loss function that is obtained in the derivation is equal to the Kullback-Leibler divergence when normalized. This loss function is frequently used in the literature. However, there seems to be confusion about the correct order in which its functional arguments, the approximated and non-approximated beliefs, should be used. The correct order ensures that the recipient of a communication is only deprived of the minimal amount of information. We hope that the elementary derivation settles the apparent confusion. For example when approximating beliefs with Gaussian distributions the optimal approximation is given by moment matching. This is in contrast to many suggested computational schemes.
Tasks
Published	2016-10-27
URL	http://arxiv.org/abs/1610.09018v6
PDF	http://arxiv.org/pdf/1610.09018v6.pdf
PWC	https://paperswithcode.com/paper/optimal-belief-approximation
Repo
Framework

Processing Natural Language About Ongoing Actions


Title	Processing Natural Language About Ongoing Actions
Authors	Steve Doubleday, Sean Trott, Jerome Feldman
Abstract	Actions may not proceed as planned; they may be interrupted, resumed or overridden. This is a challenge to handle in a natural language understanding system. We describe extensions to an existing implementation for the control of autonomous systems by natural language, to enable such systems to handle incoming language requests regarding actions. Language Communication with Autonomous Systems (LCAS) has been extended with support for X-nets, parameterized executable schemas representing actions. X-nets enable the system to control actions at a desired level of granularity, while providing a mechanism for language requests to be processed asynchronously. Standard semantics supported include requests to stop, continue, or override the existing action. The specific domain demonstrated is the control of motion of a simulated robot, but the approach is general, and could be applied to other domains.
Tasks
Published	2016-07-23
URL	http://arxiv.org/abs/1607.06875v2
PDF	http://arxiv.org/pdf/1607.06875v2.pdf
PWC	https://paperswithcode.com/paper/processing-natural-language-about-ongoing
Repo
Framework