January 25, 2020

3097 words 15 mins read

Paper Group ANR 1628

Disentangled Skill Embeddings for Reinforcement Learning. A Survey of Recent Scalability Improvements for Semidefinite Programming with Applications in Machine Learning, Control, and Robotics. Quantifying the Semantic Core of Gender Systems. Computing Valid p-values for Image Segmentation by Selective Inference. Streetscape augmentation using gener …

Disentangled Skill Embeddings for Reinforcement Learning


Title	Disentangled Skill Embeddings for Reinforcement Learning
Authors	Janith C. Petangoda, Sergio Pascual-Diaz, Vincent Adam, Peter Vrancx, Jordi Grau-Moya
Abstract	We propose a novel framework for multi-task reinforcement learning (MTRL). Using a variational inference formulation, we learn policies that generalize across both changing dynamics and goals. The resulting policies are parametrized by shared parameters that allow for transfer between different dynamics and goal conditions, and by task-specific latent-space embeddings that allow for specialization to particular tasks. We show how the latent-spaces enable generalization to unseen dynamics and goals conditions. Additionally, policies equipped with such embeddings serve as a space of skills (or options) for hierarchical reinforcement learning. Since we can change task dynamics and goals independently, we name our framework Disentangled Skill Embeddings (DSE).
Tasks	Hierarchical Reinforcement Learning
Published	2019-06-21
URL	https://arxiv.org/abs/1906.09223v1
PDF	https://arxiv.org/pdf/1906.09223v1.pdf
PWC	https://paperswithcode.com/paper/disentangled-skill-embeddings-for
Repo
Framework

A Survey of Recent Scalability Improvements for Semidefinite Programming with Applications in Machine Learning, Control, and Robotics


Title	A Survey of Recent Scalability Improvements for Semidefinite Programming with Applications in Machine Learning, Control, and Robotics
Authors	Anirudha Majumdar, Georgina Hall, Amir Ali Ahmadi
Abstract	Historically, scalability has been a major challenge to the successful application of semidefinite programming in fields such as machine learning, control, and robotics. In this paper, we survey recent approaches for addressing this challenge including (i) approaches for exploiting structure (e.g., sparsity and symmetry) in a problem, (ii) approaches that produce low-rank approximate solutions to semidefinite programs, (iii) more scalable algorithms that rely on augmented Lagrangian techniques and the alternating direction method of multipliers, and (iv) approaches that trade off scalability with conservatism (e.g., by approximating semidefinite programs with linear and second-order cone programs). For each class of approaches we provide a high-level exposition, an entry-point to the corresponding literature, and examples drawn from machine learning, control, or robotics. We also present a list of software packages that implement many of the techniques discussed in the paper. Our hope is that this paper will serve as a gateway to the rich and exciting literature on scalable semidefinite programming for both theorists and practitioners.
Tasks
Published	2019-08-14
URL	https://arxiv.org/abs/1908.05209v3
PDF	https://arxiv.org/pdf/1908.05209v3.pdf
PWC	https://paperswithcode.com/paper/a-survey-of-recent-scalability-improvements
Repo
Framework

Quantifying the Semantic Core of Gender Systems


Title	Quantifying the Semantic Core of Gender Systems
Authors	Adina Williams, Ryan Cotterell, Lawrence Wolf-Sonkin, Damián Blasi, Hanna Wallach
Abstract	Many of the world’s languages employ grammatical gender on the lexeme. For example, in Spanish, the word for ‘house’ (casa) is feminine, whereas the word for ‘paper’ (papel) is masculine. To a speaker of a genderless language, this assignment seems to exist with neither rhyme nor reason. But is the assignment of inanimate nouns to grammatical genders truly arbitrary? We present the first large-scale investigation of the arbitrariness of noun-gender assignments. To that end, we use canonical correlation analysis to correlate the grammatical gender of inanimate nouns with an externally grounded definition of their lexical semantics. We find that 18 languages exhibit a significant correlation between grammatical gender and lexical semantics.
Tasks
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13497v1
PDF	https://arxiv.org/pdf/1910.13497v1.pdf
PWC	https://paperswithcode.com/paper/quantifying-the-semantic-core-of-gender
Repo
Framework

Computing Valid p-values for Image Segmentation by Selective Inference


Title	Computing Valid p-values for Image Segmentation by Selective Inference
Authors	Kosuke Tanizaki, Noriaki Hashimoto, Yu Inatsu, Hidekata Hontani, Ichiro Takeuchi
Abstract	Image segmentation is one of the most fundamental tasks of computer vision. In many practical applications, it is essential to properly evaluate the reliability of individual segmentation results. In this study, we propose a novel framework to provide the statistical significance of segmentation results in the form of p-values. Specifically, we consider a statistical hypothesis test for determining the difference between the object and the background regions. This problem is challenging because the difference can be deceptively large (called segmentation bias) due to the adaptation of the segmentation algorithm to the data. To overcome this difficulty, we introduce a statistical approach called selective inference, and develop a framework to compute valid p-values in which the segmentation bias is properly accounted for. Although the proposed framework is potentially applicable to various segmentation algorithms, we focus in this paper on graph cut-based and threshold-based segmentation algorithms, and develop two specific methods to compute valid p-values for the segmentation results obtained by these algorithms. We prove the theoretical validity of these two methods and demonstrate their practicality by applying them to segmentation problems for medical images.
Tasks	Semantic Segmentation
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00629v2
PDF	https://arxiv.org/pdf/1906.00629v2.pdf
PWC	https://paperswithcode.com/paper/190600629
Repo
Framework


Title	Streetscape augmentation using generative adversarial networks: insights related to health and wellbeing
Authors	Jasper S. Wijnands, Kerry A. Nice, Jason Thompson, Haifeng Zhao, Mark Stevenson
Abstract	Deep learning using neural networks has provided advances in image style transfer, merging the content of one image (e.g., a photo) with the style of another (e.g., a painting). Our research shows this concept can be extended to analyse the design of streetscapes in relation to health and wellbeing outcomes. An Australian population health survey (n=34,000) was used to identify the spatial distribution of health and wellbeing outcomes, including general health and social capital. For each outcome, the most and least desirable locations formed two domains. Streetscape design was sampled using around 80,000 Google Street View images per domain. Generative adversarial networks translated these images from one domain to the other, preserving the main structure of the input image, but transforming the `style’ from locations where self-reported health was bad to locations where it was good. These translations indicate that areas in Melbourne with good general health are characterised by sufficient green space and compactness of the urban environment, whilst streetscape imagery related to high social capital contained more and wider footpaths, fewer fences and more grass. Beyond identifying relationships, the method is a first step towards computer-generated design interventions that have the potential to improve population health and wellbeing. \|
Tasks	Style Transfer
Published	2019-05-14
URL	https://arxiv.org/abs/1905.06464v1
PDF	https://arxiv.org/pdf/1905.06464v1.pdf
PWC	https://paperswithcode.com/paper/streetscape-augmentation-using-generative
Repo
Framework

Trading Convergence Rate with Computational Budget in High Dimensional Bayesian Optimization


Title	Trading Convergence Rate with Computational Budget in High Dimensional Bayesian Optimization
Authors	Hung Tran-The, Sunil Gupta, Santu Rana, Svetha Venkatesh
Abstract	Scaling Bayesian optimisation (BO) to high-dimensional search spaces is a active and open research problems particularly when no assumptions are made on function structure. The main reason is that at each iteration, BO requires to find global maximisation of acquisition function, which itself is a non-convex optimization problem in the original search space. With growing dimensions, the computational budget for this maximisation gets increasingly short leading to inaccurate solution of the maximisation. This inaccuracy adversely affects both the convergence and the efficiency of BO. We propose a novel approach where the acquisition function only requires maximisation on a discrete set of low dimensional subspaces embedded in the original high-dimensional search space. Our method is free of any low dimensional structure assumption on the function unlike many recent high-dimensional BO methods. Optimising acquisition function in low dimensional subspaces allows our method to obtain accurate solutions within limited computational budget. We show that in spite of this convenience, our algorithm remains convergent. In particular, cumulative regret of our algorithm only grows sub-linearly with the number of iterations. More importantly, as evident from our regret bounds, our algorithm provides a way to trade the convergence rate with the number of subspaces used in the optimisation. Finally, when the number of subspaces is “sufficiently large”, our algorithm’s cumulative regret is at most $\mathcal{O}^{}(\sqrt{T\gamma_T})$ as opposed to $\mathcal{O}^{}(\sqrt{DT\gamma_T})$ for the GP-UCB of Srinivas et al. (2012), reducing a crucial factor $\sqrt{D}$ where $D$ being the dimensional number of input space.
Tasks	Bayesian Optimisation
Published	2019-11-27
URL	https://arxiv.org/abs/1911.11950v1
PDF	https://arxiv.org/pdf/1911.11950v1.pdf
PWC	https://paperswithcode.com/paper/trading-convergence-rate-with-computational
Repo
Framework

Robust Matrix Completion State Estimation in Distribution Systems


Title	Robust Matrix Completion State Estimation in Distribution Systems
Authors	Bo Liu, Hongyu Wu, Yingchen Zhang, Rui Yang, Andrey Bernstein
Abstract	Due to the insufficient measurements in the distribution system state estimation (DSSE), full observability and redundant measurements are difficult to achieve without using the pseudo measurements. The matrix completion state estimation (MCSE) combines the matrix completion and power system model to estimate voltage by exploring the low-rank characteristics of the matrix. This paper proposes a robust matrix completion state estimation (RMCSE) to estimate the voltage in a distribution system under a low-observability condition. Tradition state estimation weighted least squares (WLS) method requires full observability to calculate the states and needs redundant measurements to proceed a bad data detection. The proposed method improves the robustness of the MCSE to bad data by minimizing the rank of the matrix and measurements residual with different weights. It can estimate the system state in a low-observability system and has robust estimates without the bad data detection process in the face of multiple bad data. The method is numerically evaluated on the IEEE 33-node radial distribution system. The estimation performance and robustness of RMCSE are compared with the WLS with the largest normalized residual bad data identification (WLS-LNR), and the MCSE.
Tasks	Matrix Completion
Published	2019-02-06
URL	https://arxiv.org/abs/1902.02009v4
PDF	https://arxiv.org/pdf/1902.02009v4.pdf
PWC	https://paperswithcode.com/paper/robust-matrix-completion-state-estimation-in
Repo
Framework

Music Style Classification with Compared Methods in XGB and BPNN


Title	Music Style Classification with Compared Methods in XGB and BPNN
Authors	Lifeng Tan, Cong Jin, Zhiyuan Cheng, Xin Lv, Leiyu Song
Abstract	Scientists have used many different classification methods to solve the problem of music classification. But the efficiency of each classification is different. In this paper, we propose two compared methods on the task of music style classification. More specifically, feature extraction for representing timbral texture, rhythmic content and pitch content are proposed. Comparative evaluations on performances of two classifiers were conducted for music classification with different styles. The result shows that XGB is better suited for small datasets than BPNN
Tasks	Music Classification
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01203v1
PDF	https://arxiv.org/pdf/1912.01203v1.pdf
PWC	https://paperswithcode.com/paper/music-style-classification-with-compared
Repo
Framework

On the complexity of logistic regression models


Title	On the complexity of logistic regression models
Authors	Nicola Bulso, Matteo Marsili, Yasser Roudi
Abstract	We investigate the complexity of logistic regression models which is defined by counting the number of indistinguishable distributions that the model can represent (Balasubramanian, 1997). We find that the complexity of logistic models with binary inputs does not only depend on the number of parameters but also on the distribution of inputs in a non-trivial way which standard treatments of complexity do not address. In particular, we observe that correlations among inputs induce effective dependencies among parameters thus constraining the model and, consequently, reducing its complexity. We derive simple relations for the upper and lower bounds of the complexity. Furthermore, we show analytically that, defining the model parameters on a finite support rather than the entire axis, decreases the complexity in a manner that critically depends on the size of the domain. Based on our findings, we propose a novel model selection criterion which takes into account the entropy of the input distribution. We test our proposal on the problem of selecting the input variables of a logistic regression model in a Bayesian Model Selection framework. In our numerical tests, we find that, while the reconstruction errors of standard model selection approaches (AIC, BIC, $\ell_1$ regularization) strongly depend on the sparsity of the ground truth, the reconstruction error of our method is always close to the minimum in all conditions of sparsity, data size and strength of input correlations. Finally, we observe that, when considering categorical instead of binary inputs, in a simple and mathematically tractable case, the contribution of the alphabet size to the complexity is very small compared to that of parameter space dimension. We further explore the issue by analysing the dataset of the “13 keys to the White House” which is a method for forecasting the outcomes of US presidential elections.
Tasks	Model Selection
Published	2019-03-01
URL	http://arxiv.org/abs/1903.00386v1
PDF	http://arxiv.org/pdf/1903.00386v1.pdf
PWC	https://paperswithcode.com/paper/on-the-complexity-of-logistic-regression
Repo
Framework

Tutorial on Implied Posterior Probability for SVMs


Title	Tutorial on Implied Posterior Probability for SVMs
Authors	Georgi Nalbantov, Svetoslav Ivanov
Abstract	Implied posterior probability of a given model (say, Support Vector Machines (SVM)) at a point $\bf{x}$ is an estimate of the class posterior probability pertaining to the class of functions of the model applied to a given dataset. It can be regarded as a score (or estimate) for the true posterior probability, which can then be calibrated/mapped onto expected (non-implied by the model) posterior probability implied by the underlying functions, which have generated the data. In this tutorial we discuss how to compute implied posterior probabilities of SVMs for the binary classification case as well as how to calibrate them via a standard method of isotonic regression.
Tasks
Published	2019-09-30
URL	https://arxiv.org/abs/1910.00062v1
PDF	https://arxiv.org/pdf/1910.00062v1.pdf
PWC	https://paperswithcode.com/paper/tutorial-on-implied-posterior-probability-for
Repo
Framework

RandNet: deep learning with compressed measurements of images


Title	RandNet: deep learning with compressed measurements of images
Authors	Thomas Chang, Bahareh Tolooshams, Demba Ba
Abstract	Principal component analysis, dictionary learning, and auto-encoders are all unsupervised methods for learning representations from a large amount of training data. In all these methods, the higher the dimensions of the input data, the longer it takes to learn. We introduce a class of neural networks, termed RandNet, for learning representations using compressed random measurements of data of interest, such as images. RandNet extends the convolutional recurrent sparse auto-encoder architecture to dense networks and, more importantly, to the case when the input data are compressed random measurements of the original data. Compressing the input data makes it possible to fit a larger number of batches in memory during training. Moreover, in the case of sparse measurements,training is more efficient computationally. We demonstrate that, in unsupervised settings, RandNet performs dictionary learning using compressed data. In supervised settings, we show that RandNet can classify MNIST images with minimal loss in accuracy, despite being trained with random projections of the images that result in a 50% reduction in size. Overall, our results provide a general principled framework for training neural networks using compressed data.
Tasks	Dictionary Learning
Published	2019-08-25
URL	https://arxiv.org/abs/1908.09258v1
PDF	https://arxiv.org/pdf/1908.09258v1.pdf
PWC	https://paperswithcode.com/paper/randnet-deep-learning-with-compressed
Repo
Framework

Natural Language Generation for Non-Expert Users


Title	Natural Language Generation for Non-Expert Users
Authors	Van Duc Nguyen, Tran Cao Son, Enrico Pontelli
Abstract	Motivated by the difficulty in presenting computational results, especially when the results are a collection of atoms in a logical language, to users, who are not proficient in computer programming and/or the logical representation of the results, we propose a system for automatic generation of natural language descriptions for applications targeting mainstream users. Differently from many earlier systems with the same aim, the proposed system does not employ templates for the generation task. It assumes that there exist some natural language sentences in the application domain and uses this repository for the natural language description. It does not require, however, a large corpus as it is often required in machine learning approaches. The systems consist of two main components. The first one aims at analyzing the sentences and constructs a Grammatical Framework (GF) for given sentences and is implemented using the Stanford parser and an answer set program. The second component is for sentence construction and relies on GF Library. The paper includes two use cases to demostrate the capability of the system. As the sentence construction is done via GF, the paper includes a use case evaluation showing that the proposed system could also be utilized in addressing a challenge to create an abstract Wikipedia, which is recently discussed in the BlueSky session of the 2018 International Semantic Web Conference.
Tasks	Text Generation
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08250v1
PDF	https://arxiv.org/pdf/1909.08250v1.pdf
PWC	https://paperswithcode.com/paper/natural-language-generation-for-non-expert
Repo
Framework

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned


Title	Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Authors	Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, Ivan Titov
Abstract	Multi-head self-attention is a key component of the Transformer, a state-of-the-art architecture for neural machine translation. In this work we evaluate the contribution made by individual attention heads in the encoder to the overall performance of the model and analyze the roles played by them. We find that the most important and confident heads play consistent and often linguistically-interpretable roles. When pruning heads using a method based on stochastic gates and a differentiable relaxation of the L0 penalty, we observe that specialized heads are last to be pruned. Our novel pruning method removes the vast majority of heads without seriously affecting performance. For example, on the English-Russian WMT dataset, pruning 38 out of 48 encoder heads results in a drop of only 0.15 BLEU.
Tasks	Machine Translation
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09418v2
PDF	https://arxiv.org/pdf/1905.09418v2.pdf
PWC	https://paperswithcode.com/paper/analyzing-multi-head-self-attention
Repo
Framework

Neural Sequential Phrase Grounding (SeqGROUND)


Title	Neural Sequential Phrase Grounding (SeqGROUND)
Authors	Pelin Dogan, Leonid Sigal, Markus Gross
Abstract	We propose an end-to-end approach for phrase grounding in images. Unlike prior methods that typically attempt to ground each phrase independently by building an image-text embedding, our architecture formulates grounding of multiple phrases as a sequential and contextual process. Specifically, we encode region proposals and all phrases into two stacks of LSTM cells, along with so-far grounded phrase-region pairs. These LSTM stacks collectively capture context for grounding of the next phrase. The resulting architecture, which we call SeqGROUND, supports many-to-many matching by allowing an image region to be matched to multiple phrases and vice versa. We show competitive performance on the Flickr30K benchmark dataset and, through ablation studies, validate the efficacy of sequential grounding as well as individual design choices in our model architecture.
Tasks	Phrase Grounding
Published	2019-03-18
URL	http://arxiv.org/abs/1903.07669v1
PDF	http://arxiv.org/pdf/1903.07669v1.pdf
PWC	https://paperswithcode.com/paper/neural-sequential-phrase-grounding-seqground
Repo
Framework

Transfer Entropy: where Shannon meets Turing


Title	Transfer Entropy: where Shannon meets Turing
Authors	David Sigtermans
Abstract	Transfer entropy is capable of capturing nonlinear source-destination relations between multi-variate time series. It is a measure of association between source data that are transformed into destination data via a set of linear transformations between their probability mass functions. The resulting tensor formalism is used to show that in specific cases, e.g., in the case the system consists of three stochastic processes, bivariate analysis suffices to distinguish true relations from false relations. This allows us to determine the causal structure as far as encoded in the probability mass functions of noisy data. The tensor formalism was also used to derive the Data Processing Inequality for transfer entropy.
Tasks	Time Series
Published	2019-04-19
URL	https://arxiv.org/abs/1904.09163v3
PDF	https://arxiv.org/pdf/1904.09163v3.pdf
PWC	https://paperswithcode.com/paper/transfer-entropy-where-shannon-meets-turing
Repo
Framework