July 28, 2019

2817 words 14 mins read

Paper Group ANR 210

Mining relevant interval rules. Alternating minimization for dictionary learning: Local Convergence Guarantees. Transforming Musical Signals through a Genre Classifying Convolutional Neural Network. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. Augmenting End-to-End Dialog Systems with Commonsense Knowledge …

Mining relevant interval rules


Title	Mining relevant interval rules
Authors	Thomas Guyet, René Quiniou, Véronique Masson
Abstract	This article extends the method of Garriga et al. for mining relevant rules to numerical attributes by extracting interval-based pattern rules. We propose an algorithm that extracts such rules from numerical datasets using the interval-pattern approach from Kaytoue et al. This algorithm has been implemented and evaluated on real datasets.
Tasks
Published	2017-09-11
URL	http://arxiv.org/abs/1709.03267v1
PDF	http://arxiv.org/pdf/1709.03267v1.pdf
PWC	https://paperswithcode.com/paper/mining-relevant-interval-rules
Repo
Framework

Alternating minimization for dictionary learning: Local Convergence Guarantees


Title	Alternating minimization for dictionary learning: Local Convergence Guarantees
Authors	Niladri S. Chatterji, Peter L. Bartlett
Abstract	We present theoretical guarantees for an alternating minimization algorithm for the dictionary learning/sparse coding problem. The dictionary learning problem is to factorize vector samples $y^{1},y^{2},\ldots, y^{n}$ into an appropriate basis (dictionary) $A^$ and sparse vectors $x^{1},\ldots,x^{n}$. Our algorithm is a simple alternating minimization procedure that switches between $\ell_1$ minimization and gradient descent in alternate steps. Dictionary learning and specifically alternating minimization algorithms for dictionary learning are well studied both theoretically and empirically. However, in contrast to previous theoretical analyses for this problem, we replace a condition on the operator norm (that is, the largest magnitude singular value) of the true underlying dictionary $A^$ with a condition on the matrix infinity norm (that is, the largest magnitude term). Our guarantees are under a reasonable generative model that allows for dictionaries with growing operator norms, and can handle an arbitrary level of overcompleteness, while having sparsity that is information theoretically optimal. We also establish upper bounds on the sample complexity of our algorithm.
Tasks	Dictionary Learning
Published	2017-11-09
URL	https://arxiv.org/abs/1711.03634v4
PDF	https://arxiv.org/pdf/1711.03634v4.pdf
PWC	https://paperswithcode.com/paper/alternating-minimization-for-dictionary
Repo
Framework

Transforming Musical Signals through a Genre Classifying Convolutional Neural Network


Title	Transforming Musical Signals through a Genre Classifying Convolutional Neural Network
Authors	S. Geng, G. Ren, M. Ogihara
Abstract	Convolutional neural networks (CNNs) have been successfully applied on both discriminative and generative modeling for music-related tasks. For a particular task, the trained CNN contains information representing the decision making or the abstracting process. One can hope to manipulate existing music based on this ‘informed’ network and create music with new features corresponding to the knowledge obtained by the network. In this paper, we propose a method to utilize the stored information from a CNN trained on musical genre classification task. The network was composed of three convolutional layers, and was trained to classify five-second song clips into five different genres. After training, randomly selected clips were modified by maximizing the sum of outputs from the network layers. In addition to the potential of such CNNs to produce interesting audio transformation, more information about the network and the original music could be obtained from the analysis of the generated features since these features indicate how the network ‘understands’ the music.
Tasks	Decision Making
Published	2017-06-29
URL	http://arxiv.org/abs/1706.09553v1
PDF	http://arxiv.org/pdf/1706.09553v1.pdf
PWC	https://paperswithcode.com/paper/transforming-musical-signals-through-a-genre
Repo
Framework

Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks


Title	Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks
Authors	Urs Köster, Tristan J. Webb, Xin Wang, Marcel Nassar, Arjun K. Bansal, William H. Constable, Oğuz H. Elibol, Scott Gray, Stewart Hall, Luke Hornof, Amir Khosrowshahi, Carey Kloss, Ruby J. Pai, Naveen Rao
Abstract	Deep neural networks are commonly developed and trained in 32-bit floating point format. Significant gains in performance and energy efficiency could be realized by training and inference in numerical formats optimized for deep learning. Despite advances in limited precision inference in recent years, training of neural networks in low bit-width remains a challenging problem. Here we present the Flexpoint data format, aiming at a complete replacement of 32-bit floating point format training and inference, designed to support modern deep network topologies without modifications. Flexpoint tensors have a shared exponent that is dynamically adjusted to minimize overflows and maximize available dynamic range. We validate Flexpoint by training AlexNet, a deep residual network and a generative adversarial network, using a simulator implemented with the neon deep learning framework. We demonstrate that 16-bit Flexpoint closely matches 32-bit floating point in training all three models, without any need for tuning of model hyperparameters. Our results suggest Flexpoint as a promising numerical format for future hardware for training and inference.
Tasks
Published	2017-11-06
URL	http://arxiv.org/abs/1711.02213v2
PDF	http://arxiv.org/pdf/1711.02213v2.pdf
PWC	https://paperswithcode.com/paper/flexpoint-an-adaptive-numerical-format-for
Repo
Framework

Augmenting End-to-End Dialog Systems with Commonsense Knowledge


Title	Augmenting End-to-End Dialog Systems with Commonsense Knowledge
Authors	Tom Young, Erik Cambria, Iti Chaturvedi, Minlie Huang, Hao Zhou, Subham Biswas
Abstract	Building dialog agents that can converse naturally with humans is a challenging yet intriguing problem of artificial intelligence. In open-domain human-computer conversation, where the conversational agent is expected to respond to human responses in an interesting and engaging way, commonsense knowledge has to be integrated into the model effectively. In this paper, we investigate the impact of providing commonsense knowledge about the concepts covered in the dialog. Our model represents the first attempt to integrating a large commonsense knowledge base into end-to-end conversational models. In the retrieval-based scenario, we propose the Tri-LSTM model to jointly take into account message and commonsense for selecting an appropriate response. Our experiments suggest that the knowledge-augmented models are superior to their knowledge-free counterparts in automatic evaluation.
Tasks
Published	2017-09-16
URL	http://arxiv.org/abs/1709.05453v3
PDF	http://arxiv.org/pdf/1709.05453v3.pdf
PWC	https://paperswithcode.com/paper/augmenting-end-to-end-dialog-systems-with
Repo
Framework

Optimal and Myopic Information Acquisition


Title	Optimal and Myopic Information Acquisition
Authors	Annie Liang, Xiaosheng Mu, Vasilis Syrgkanis
Abstract	We consider the problem of optimal dynamic information acquisition from many correlated information sources. Each period, the decision-maker jointly takes an action and allocates a fixed number of observations across the available sources. His payoff depends on the actions taken and on an unknown state. In the canonical setting of jointly normal information sources, we show that the optimal dynamic information acquisition rule proceeds myopically after finitely many periods. If signals are acquired in large blocks each period, then the optimal rule turns out to be myopic from period 1. These results demonstrate the possibility of robust and “simple” optimal information acquisition, and simplify the analysis of dynamic information acquisition in a widely used informational environment.
Tasks
Published	2017-03-18
URL	http://arxiv.org/abs/1703.06367v4
PDF	http://arxiv.org/pdf/1703.06367v4.pdf
PWC	https://paperswithcode.com/paper/optimal-and-myopic-information-acquisition
Repo
Framework

Important New Developments in Arabographic Optical Character Recognition (OCR)


Title	Important New Developments in Arabographic Optical Character Recognition (OCR)
Authors	Maxim Romanov, Matthew Thomas Miller, Sarah Bowen Savant, Benjamin Kiessling
Abstract	The OpenITI team has achieved Optical Character Recognition (OCR) accuracy rates for classical Arabic-script texts in the high nineties. These numbers are based on our tests of seven different Arabic-script texts of varying quality and typefaces, totaling over 7,000 lines. These accuracy rates not only represent a distinct improvement over the actual accuracy rates of the various proprietary OCR options for classical Arabic-script texts, but, equally important, they are produced using an open-source OCR software, thus enabling us to make this Arabic-script OCR technology freely available to the broader Islamic, Persian, and Arabic Studies communities.
Tasks	Optical Character Recognition
Published	2017-03-28
URL	http://arxiv.org/abs/1703.09550v1
PDF	http://arxiv.org/pdf/1703.09550v1.pdf
PWC	https://paperswithcode.com/paper/important-new-developments-in-arabographic
Repo
Framework

Weakly Supervised Classification in High Energy Physics


Title	Weakly Supervised Classification in High Energy Physics
Authors	Lucio Mwinmaarong Dery, Benjamin Nachman, Francesco Rubbo, Ariel Schwartzman
Abstract	As machine learning algorithms become increasingly sophisticated to exploit subtle features of the data, they often become more dependent on simulations. This paper presents a new approach called weakly supervised classification in which class proportions are the only input into the machine learning algorithm. Using one of the most challenging binary classification tasks in high energy physics - quark versus gluon tagging - we show that weakly supervised classification can match the performance of fully supervised algorithms. Furthermore, by design, the new algorithm is insensitive to any mis-modeling of discriminating features in the data by the simulation. Weakly supervised classification is a general procedure that can be applied to a wide variety of learning problems to boost performance and robustness when detailed simulations are not reliable or not available.
Tasks
Published	2017-02-01
URL	http://arxiv.org/abs/1702.00414v4
PDF	http://arxiv.org/pdf/1702.00414v4.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-classification-in-high
Repo
Framework

PixelGAN Autoencoders


Title	PixelGAN Autoencoders
Authors	Alireza Makhzani, Brendan Frey
Abstract	In this paper, we describe the “PixelGAN autoencoder”, a generative autoencoder in which the generative path is a convolutional autoregressive neural network on pixels (PixelCNN) that is conditioned on a latent code, and the recognition path uses a generative adversarial network (GAN) to impose a prior distribution on the latent code. We show that different priors result in different decompositions of information between the latent code and the autoregressive decoder. For example, by imposing a Gaussian distribution as the prior, we can achieve a global vs. local decomposition, or by imposing a categorical distribution as the prior, we can disentangle the style and content information of images in an unsupervised fashion. We further show how the PixelGAN autoencoder with a categorical prior can be directly used in semi-supervised settings and achieve competitive semi-supervised classification results on the MNIST, SVHN and NORB datasets.
Tasks	Unsupervised Image Classification, Unsupervised MNIST
Published	2017-06-02
URL	http://arxiv.org/abs/1706.00531v1
PDF	http://arxiv.org/pdf/1706.00531v1.pdf
PWC	https://paperswithcode.com/paper/pixelgan-autoencoders
Repo
Framework

On learning the structure of Bayesian Networks and submodular function maximization


Title	On learning the structure of Bayesian Networks and submodular function maximization
Authors	Giulio Caravagna, Daniele Ramazzotti, Guido Sanguinetti
Abstract	Learning the structure of dependencies among multiple random variables is a problem of considerable theoretical and practical interest. In practice, score optimisation with multiple restarts provides a practical and surprisingly successful solution, yet the conditions under which this may be a well founded strategy are poorly understood. In this paper, we prove that the problem of identifying the structure of a Bayesian Network via regularised score optimisation can be recast, in expectation, as a submodular optimisation problem, thus guaranteeing optimality with high probability. This result both explains the practical success of optimisation heuristics, and suggests a way to improve on such algorithms by artificially simulating multiple data sets via a bootstrap procedure. We show on several synthetic data sets that the resulting algorithm yields better recovery performance than the state of the art, and illustrate in a real cancer genomic study how such an approach can lead to valuable practical insights.
Tasks
Published	2017-06-07
URL	http://arxiv.org/abs/1706.02386v1
PDF	http://arxiv.org/pdf/1706.02386v1.pdf
PWC	https://paperswithcode.com/paper/on-learning-the-structure-of-bayesian
Repo
Framework

Multi-Armed Bandits with Metric Movement Costs


Title	Multi-Armed Bandits with Metric Movement Costs
Authors	Tomer Koren, Roi Livni, Yishay Mansour
Abstract	We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution gives a tight characterization of the expected minimax regret in this setting, in terms of a complexity measure $\mathcal{C}$ of the underlying metric which depends on its covering numbers. In finite metric spaces with $k$ actions, we give an efficient algorithm that achieves regret of the form $\widetilde{O}(\max{\mathcal{C}^{1/3}T^{2/3},\sqrt{kT}})$, and show that this is the best possible. Our regret bound generalizes previous known regret bounds for some special cases: (i) the unit-switching cost regret $\widetilde{\Theta}(\max{k^{1/3}T^{2/3},\sqrt{kT}})$ where $\mathcal{C}=\Theta(k)$, and (ii) the interval metric with regret $\widetilde{\Theta}(\max{T^{2/3},\sqrt{kT}})$ where $\mathcal{C}=\Theta(1)$. For infinite metrics spaces with Lipschitz loss functions, we derive a tight regret bound of $\widetilde{\Theta}(T^{\frac{d+1}{d+2}})$ where $d \ge 1$ is the Minkowski dimension of the space, which is known to be tight even when there are no switching costs.
Tasks	Multi-Armed Bandits
Published	2017-10-24
URL	http://arxiv.org/abs/1710.08997v1
PDF	http://arxiv.org/pdf/1710.08997v1.pdf
PWC	https://paperswithcode.com/paper/multi-armed-bandits-with-metric-movement
Repo
Framework

Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities


Title	Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities
Authors	Ruitong Huang, Tor Lattimore, András György, Csaba Szepesvári
Abstract	The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are convex and positively curved. In this paper we ask whether there are other “lucky” settings when FTL achieves sublinear, “small” regret. In particular, we study the fundamental problem of linear prediction over a non-empty convex, compact domain. Amongst other results, we prove that the curvature of the boundary of the domain can act as if the losses were curved: In this case, we prove that as long as the mean of the loss vectors have positive lengths bounded away from zero, FTL enjoys a logarithmic growth rate of regret, while, e.g., for polytope domains and stochastic data it enjoys finite expected regret. Building on a previously known meta-algorithm, we also get an algorithm that simultaneously enjoys the worst-case guarantees and the bound available for FTL.
Tasks
Published	2017-02-10
URL	http://arxiv.org/abs/1702.03040v1
PDF	http://arxiv.org/pdf/1702.03040v1.pdf
PWC	https://paperswithcode.com/paper/following-the-leader-and-fast-rates-in-linear
Repo
Framework

Provable benefits of representation learning


Title	Provable benefits of representation learning
Authors	Sanjeev Arora, Andrej Risteski
Abstract	There is general consensus that learning representations is useful for a variety of reasons, e.g. efficient use of labeled data (semi-supervised learning), transfer learning and understanding hidden structure of data. Popular techniques for representation learning include clustering, manifold learning, kernel-learning, autoencoders, Boltzmann machines, etc. To study the relative merits of these techniques, it’s essential to formalize the definition and goals of representation learning, so that they are all become instances of the same definition. This paper introduces such a formal framework that also formalizes the utility of learning the representation. It is related to previous Bayesian notions, but with some new twists. We show the usefulness of our framework by exhibiting simple and natural settings – linear mixture models and loglinear models, where the power of representation learning can be formally shown. In these examples, representation learning can be performed provably and efficiently under plausible assumptions (despite being NP-hard), and furthermore: (i) it greatly reduces the need for labeled data (semi-supervised learning) and (ii) it allows solving classification tasks when simpler approaches like nearest neighbors require too much data (iii) it is more powerful than manifold learning methods.
Tasks	Representation Learning, Transfer Learning
Published	2017-06-14
URL	http://arxiv.org/abs/1706.04601v1
PDF	http://arxiv.org/pdf/1706.04601v1.pdf
PWC	https://paperswithcode.com/paper/provable-benefits-of-representation-learning
Repo
Framework

Neural Decision Trees


Title	Neural Decision Trees
Authors	Randall Balestriero
Abstract	In this paper we propose a synergistic melting of neural networks and decision trees (DT) we call neural decision trees (NDT). NDT is an architecture a la decision tree where each splitting node is an independent multilayer perceptron allowing oblique decision functions or arbritrary nonlinear decision function if more than one layer is used. This way, each MLP can be seen as a node of the tree. We then show that with the weight sharing asumption among those units, we end up with a Hashing Neural Network (HNN) which is a multilayer perceptron with sigmoid activation function for the last layer as opposed to the standard softmax. The output units then jointly represent the probability to be in a particular region. The proposed framework allows for global optimization as opposed to greedy in DT and differentiability w.r.t. all parameters and the input, allowing easy integration in any learnable pipeline, for example after CNNs for computer vision tasks. We also demonstrate the modeling power of HNN allowing to learn union of disjoint regions for final clustering or classification making it more general and powerful than standard softmax MLP requiring linear separability thus reducing the need on the inner layer to perform complex data transformations. We finally show experiments for supervised, semi-suppervised and unsupervised tasks and compare results with standard DTs and MLPs.
Tasks
Published	2017-02-23
URL	http://arxiv.org/abs/1702.07360v2
PDF	http://arxiv.org/pdf/1702.07360v2.pdf
PWC	https://paperswithcode.com/paper/neural-decision-trees
Repo
Framework

Joint Discovery of Object States and Manipulation Actions


Title	Joint Discovery of Object States and Manipulation Actions
Authors	Jean-Baptiste Alayrac, Josev Sivic, Ivan Laptev, Simon Lacoste-Julien
Abstract	Many human activities involve object manipulations aiming to modify the object state. Examples of common state changes include full/empty bottle, open/closed door, and attached/detached car wheel. In this work, we seek to automatically discover the states of objects and the associated manipulation actions. Given a set of videos for a particular task, we propose a joint model that learns to identify object states and to localize state-modifying actions. Our model is formulated as a discriminative clustering cost with constraints. We assume a consistent temporal order for the changes in object states and manipulation actions, and introduce new optimization techniques to learn model parameters without additional supervision. We demonstrate successful discovery of seven manipulation actions and corresponding object states on a new dataset of videos depicting real-life object manipulations. We show that our joint formulation results in an improvement of object state discovery by action recognition and vice versa.
Tasks	Temporal Action Localization
Published	2017-02-09
URL	http://arxiv.org/abs/1702.02738v3
PDF	http://arxiv.org/pdf/1702.02738v3.pdf
PWC	https://paperswithcode.com/paper/joint-discovery-of-object-states-and
Repo
Framework