Paper Group ANR 1628
Disentangled Skill Embeddings for Reinforcement Learning. A Survey of Recent Scalability Improvements for Semidefinite Programming with Applications in Machine Learning, Control, and Robotics. Quantifying the Semantic Core of Gender Systems. Computing Valid p-values for Image Segmentation by Selective Inference. Streetscape augmentation using gener …
Disentangled Skill Embeddings for Reinforcement Learning
Title | Disentangled Skill Embeddings for Reinforcement Learning |
Authors | Janith C. Petangoda, Sergio Pascual-Diaz, Vincent Adam, Peter Vrancx, Jordi Grau-Moya |
Abstract | We propose a novel framework for multi-task reinforcement learning (MTRL). Using a variational inference formulation, we learn policies that generalize across both changing dynamics and goals. The resulting policies are parametrized by shared parameters that allow for transfer between different dynamics and goal conditions, and by task-specific latent-space embeddings that allow for specialization to particular tasks. We show how the latent-spaces enable generalization to unseen dynamics and goals conditions. Additionally, policies equipped with such embeddings serve as a space of skills (or options) for hierarchical reinforcement learning. Since we can change task dynamics and goals independently, we name our framework Disentangled Skill Embeddings (DSE). |
Tasks | Hierarchical Reinforcement Learning |
Published | 2019-06-21 |
URL | https://arxiv.org/abs/1906.09223v1 |
https://arxiv.org/pdf/1906.09223v1.pdf | |
PWC | https://paperswithcode.com/paper/disentangled-skill-embeddings-for |
Repo | |
Framework | |
A Survey of Recent Scalability Improvements for Semidefinite Programming with Applications in Machine Learning, Control, and Robotics
Title | A Survey of Recent Scalability Improvements for Semidefinite Programming with Applications in Machine Learning, Control, and Robotics |
Authors | Anirudha Majumdar, Georgina Hall, Amir Ali Ahmadi |
Abstract | Historically, scalability has been a major challenge to the successful application of semidefinite programming in fields such as machine learning, control, and robotics. In this paper, we survey recent approaches for addressing this challenge including (i) approaches for exploiting structure (e.g., sparsity and symmetry) in a problem, (ii) approaches that produce low-rank approximate solutions to semidefinite programs, (iii) more scalable algorithms that rely on augmented Lagrangian techniques and the alternating direction method of multipliers, and (iv) approaches that trade off scalability with conservatism (e.g., by approximating semidefinite programs with linear and second-order cone programs). For each class of approaches we provide a high-level exposition, an entry-point to the corresponding literature, and examples drawn from machine learning, control, or robotics. We also present a list of software packages that implement many of the techniques discussed in the paper. Our hope is that this paper will serve as a gateway to the rich and exciting literature on scalable semidefinite programming for both theorists and practitioners. |
Tasks | |
Published | 2019-08-14 |
URL | https://arxiv.org/abs/1908.05209v3 |
https://arxiv.org/pdf/1908.05209v3.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-of-recent-scalability-improvements |
Repo | |
Framework | |
Quantifying the Semantic Core of Gender Systems
Title | Quantifying the Semantic Core of Gender Systems |
Authors | Adina Williams, Ryan Cotterell, Lawrence Wolf-Sonkin, Damián Blasi, Hanna Wallach |
Abstract | Many of the world’s languages employ grammatical gender on the lexeme. For example, in Spanish, the word for ‘house’ (casa) is feminine, whereas the word for ‘paper’ (papel) is masculine. To a speaker of a genderless language, this assignment seems to exist with neither rhyme nor reason. But is the assignment of inanimate nouns to grammatical genders truly arbitrary? We present the first large-scale investigation of the arbitrariness of noun-gender assignments. To that end, we use canonical correlation analysis to correlate the grammatical gender of inanimate nouns with an externally grounded definition of their lexical semantics. We find that 18 languages exhibit a significant correlation between grammatical gender and lexical semantics. |
Tasks | |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1910.13497v1 |
https://arxiv.org/pdf/1910.13497v1.pdf | |
PWC | https://paperswithcode.com/paper/quantifying-the-semantic-core-of-gender |
Repo | |
Framework | |
Computing Valid p-values for Image Segmentation by Selective Inference
Title | Computing Valid p-values for Image Segmentation by Selective Inference |
Authors | Kosuke Tanizaki, Noriaki Hashimoto, Yu Inatsu, Hidekata Hontani, Ichiro Takeuchi |
Abstract | Image segmentation is one of the most fundamental tasks of computer vision. In many practical applications, it is essential to properly evaluate the reliability of individual segmentation results. In this study, we propose a novel framework to provide the statistical significance of segmentation results in the form of p-values. Specifically, we consider a statistical hypothesis test for determining the difference between the object and the background regions. This problem is challenging because the difference can be deceptively large (called segmentation bias) due to the adaptation of the segmentation algorithm to the data. To overcome this difficulty, we introduce a statistical approach called selective inference, and develop a framework to compute valid p-values in which the segmentation bias is properly accounted for. Although the proposed framework is potentially applicable to various segmentation algorithms, we focus in this paper on graph cut-based and threshold-based segmentation algorithms, and develop two specific methods to compute valid p-values for the segmentation results obtained by these algorithms. We prove the theoretical validity of these two methods and demonstrate their practicality by applying them to segmentation problems for medical images. |
Tasks | Semantic Segmentation |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.00629v2 |
https://arxiv.org/pdf/1906.00629v2.pdf | |
PWC | https://paperswithcode.com/paper/190600629 |
Repo | |
Framework | |
Streetscape augmentation using generative adversarial networks: insights related to health and wellbeing
Title | Streetscape augmentation using generative adversarial networks: insights related to health and wellbeing |
Authors | Jasper S. Wijnands, Kerry A. Nice, Jason Thompson, Haifeng Zhao, Mark Stevenson |
Abstract | Deep learning using neural networks has provided advances in image style transfer, merging the content of one image (e.g., a photo) with the style of another (e.g., a painting). Our research shows this concept can be extended to analyse the design of streetscapes in relation to health and wellbeing outcomes. An Australian population health survey (n=34,000) was used to identify the spatial distribution of health and wellbeing outcomes, including general health and social capital. For each outcome, the most and least desirable locations formed two domains. Streetscape design was sampled using around 80,000 Google Street View images per domain. Generative adversarial networks translated these images from one domain to the other, preserving the main structure of the input image, but transforming the `style’ from locations where self-reported health was bad to locations where it was good. These translations indicate that areas in Melbourne with good general health are characterised by sufficient green space and compactness of the urban environment, whilst streetscape imagery related to high social capital contained more and wider footpaths, fewer fences and more grass. Beyond identifying relationships, the method is a first step towards computer-generated design interventions that have the potential to improve population health and wellbeing. | |
Tasks | Style Transfer |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.06464v1 |
https://arxiv.org/pdf/1905.06464v1.pdf | |
PWC | https://paperswithcode.com/paper/streetscape-augmentation-using-generative |
Repo | |
Framework | |
Trading Convergence Rate with Computational Budget in High Dimensional Bayesian Optimization
Title | Trading Convergence Rate with Computational Budget in High Dimensional Bayesian Optimization |
Authors | Hung Tran-The, Sunil Gupta, Santu Rana, Svetha Venkatesh |
Abstract | Scaling Bayesian optimisation (BO) to high-dimensional search spaces is a active and open research problems particularly when no assumptions are made on function structure. The main reason is that at each iteration, BO requires to find global maximisation of acquisition function, which itself is a non-convex optimization problem in the original search space. With growing dimensions, the computational budget for this maximisation gets increasingly short leading to inaccurate solution of the maximisation. This inaccuracy adversely affects both the convergence and the efficiency of BO. We propose a novel approach where the acquisition function only requires maximisation on a discrete set of low dimensional subspaces embedded in the original high-dimensional search space. Our method is free of any low dimensional structure assumption on the function unlike many recent high-dimensional BO methods. Optimising acquisition function in low dimensional subspaces allows our method to obtain accurate solutions within limited computational budget. We show that in spite of this convenience, our algorithm remains convergent. In particular, cumulative regret of our algorithm only grows sub-linearly with the number of iterations. More importantly, as evident from our regret bounds, our algorithm provides a way to trade the convergence rate with the number of subspaces used in the optimisation. Finally, when the number of subspaces is “sufficiently large”, our algorithm’s cumulative regret is at most $\mathcal{O}^{}(\sqrt{T\gamma_T})$ as opposed to $\mathcal{O}^{}(\sqrt{DT\gamma_T})$ for the GP-UCB of Srinivas et al. (2012), reducing a crucial factor $\sqrt{D}$ where $D$ being the dimensional number of input space. |
Tasks | Bayesian Optimisation |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.11950v1 |
https://arxiv.org/pdf/1911.11950v1.pdf | |
PWC | https://paperswithcode.com/paper/trading-convergence-rate-with-computational |
Repo | |
Framework | |
Robust Matrix Completion State Estimation in Distribution Systems
Title | Robust Matrix Completion State Estimation in Distribution Systems |
Authors | Bo Liu, Hongyu Wu, Yingchen Zhang, Rui Yang, Andrey Bernstein |
Abstract | Due to the insufficient measurements in the distribution system state estimation (DSSE), full observability and redundant measurements are difficult to achieve without using the pseudo measurements. The matrix completion state estimation (MCSE) combines the matrix completion and power system model to estimate voltage by exploring the low-rank characteristics of the matrix. This paper proposes a robust matrix completion state estimation (RMCSE) to estimate the voltage in a distribution system under a low-observability condition. Tradition state estimation weighted least squares (WLS) method requires full observability to calculate the states and needs redundant measurements to proceed a bad data detection. The proposed method improves the robustness of the MCSE to bad data by minimizing the rank of the matrix and measurements residual with different weights. It can estimate the system state in a low-observability system and has robust estimates without the bad data detection process in the face of multiple bad data. The method is numerically evaluated on the IEEE 33-node radial distribution system. The estimation performance and robustness of RMCSE are compared with the WLS with the largest normalized residual bad data identification (WLS-LNR), and the MCSE. |
Tasks | Matrix Completion |
Published | 2019-02-06 |
URL | https://arxiv.org/abs/1902.02009v4 |
https://arxiv.org/pdf/1902.02009v4.pdf | |
PWC | https://paperswithcode.com/paper/robust-matrix-completion-state-estimation-in |
Repo | |
Framework | |
Music Style Classification with Compared Methods in XGB and BPNN
Title | Music Style Classification with Compared Methods in XGB and BPNN |
Authors | Lifeng Tan, Cong Jin, Zhiyuan Cheng, Xin Lv, Leiyu Song |
Abstract | Scientists have used many different classification methods to solve the problem of music classification. But the efficiency of each classification is different. In this paper, we propose two compared methods on the task of music style classification. More specifically, feature extraction for representing timbral texture, rhythmic content and pitch content are proposed. Comparative evaluations on performances of two classifiers were conducted for music classification with different styles. The result shows that XGB is better suited for small datasets than BPNN |
Tasks | Music Classification |
Published | 2019-12-03 |
URL | https://arxiv.org/abs/1912.01203v1 |
https://arxiv.org/pdf/1912.01203v1.pdf | |
PWC | https://paperswithcode.com/paper/music-style-classification-with-compared |
Repo | |
Framework | |
On the complexity of logistic regression models
Title | On the complexity of logistic regression models |
Authors | Nicola Bulso, Matteo Marsili, Yasser Roudi |
Abstract | We investigate the complexity of logistic regression models which is defined by counting the number of indistinguishable distributions that the model can represent (Balasubramanian, 1997). We find that the complexity of logistic models with binary inputs does not only depend on the number of parameters but also on the distribution of inputs in a non-trivial way which standard treatments of complexity do not address. In particular, we observe that correlations among inputs induce effective dependencies among parameters thus constraining the model and, consequently, reducing its complexity. We derive simple relations for the upper and lower bounds of the complexity. Furthermore, we show analytically that, defining the model parameters on a finite support rather than the entire axis, decreases the complexity in a manner that critically depends on the size of the domain. Based on our findings, we propose a novel model selection criterion which takes into account the entropy of the input distribution. We test our proposal on the problem of selecting the input variables of a logistic regression model in a Bayesian Model Selection framework. In our numerical tests, we find that, while the reconstruction errors of standard model selection approaches (AIC, BIC, $\ell_1$ regularization) strongly depend on the sparsity of the ground truth, the reconstruction error of our method is always close to the minimum in all conditions of sparsity, data size and strength of input correlations. Finally, we observe that, when considering categorical instead of binary inputs, in a simple and mathematically tractable case, the contribution of the alphabet size to the complexity is very small compared to that of parameter space dimension. We further explore the issue by analysing the dataset of the “13 keys to the White House” which is a method for forecasting the outcomes of US presidential elections. |
Tasks | Model Selection |
Published | 2019-03-01 |
URL | http://arxiv.org/abs/1903.00386v1 |
http://arxiv.org/pdf/1903.00386v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-complexity-of-logistic-regression |
Repo | |
Framework | |
Tutorial on Implied Posterior Probability for SVMs
Title | Tutorial on Implied Posterior Probability for SVMs |
Authors | Georgi Nalbantov, Svetoslav Ivanov |
Abstract | Implied posterior probability of a given model (say, Support Vector Machines (SVM)) at a point $\bf{x}$ is an estimate of the class posterior probability pertaining to the class of functions of the model applied to a given dataset. It can be regarded as a score (or estimate) for the true posterior probability, which can then be calibrated/mapped onto expected (non-implied by the model) posterior probability implied by the underlying functions, which have generated the data. In this tutorial we discuss how to compute implied posterior probabilities of SVMs for the binary classification case as well as how to calibrate them via a standard method of isotonic regression. |
Tasks | |
Published | 2019-09-30 |
URL | https://arxiv.org/abs/1910.00062v1 |
https://arxiv.org/pdf/1910.00062v1.pdf | |
PWC | https://paperswithcode.com/paper/tutorial-on-implied-posterior-probability-for |
Repo | |
Framework | |
RandNet: deep learning with compressed measurements of images
Title | RandNet: deep learning with compressed measurements of images |
Authors | Thomas Chang, Bahareh Tolooshams, Demba Ba |
Abstract | Principal component analysis, dictionary learning, and auto-encoders are all unsupervised methods for learning representations from a large amount of training data. In all these methods, the higher the dimensions of the input data, the longer it takes to learn. We introduce a class of neural networks, termed RandNet, for learning representations using compressed random measurements of data of interest, such as images. RandNet extends the convolutional recurrent sparse auto-encoder architecture to dense networks and, more importantly, to the case when the input data are compressed random measurements of the original data. Compressing the input data makes it possible to fit a larger number of batches in memory during training. Moreover, in the case of sparse measurements,training is more efficient computationally. We demonstrate that, in unsupervised settings, RandNet performs dictionary learning using compressed data. In supervised settings, we show that RandNet can classify MNIST images with minimal loss in accuracy, despite being trained with random projections of the images that result in a 50% reduction in size. Overall, our results provide a general principled framework for training neural networks using compressed data. |
Tasks | Dictionary Learning |
Published | 2019-08-25 |
URL | https://arxiv.org/abs/1908.09258v1 |
https://arxiv.org/pdf/1908.09258v1.pdf | |
PWC | https://paperswithcode.com/paper/randnet-deep-learning-with-compressed |
Repo | |
Framework | |
Natural Language Generation for Non-Expert Users
Title | Natural Language Generation for Non-Expert Users |
Authors | Van Duc Nguyen, Tran Cao Son, Enrico Pontelli |
Abstract | Motivated by the difficulty in presenting computational results, especially when the results are a collection of atoms in a logical language, to users, who are not proficient in computer programming and/or the logical representation of the results, we propose a system for automatic generation of natural language descriptions for applications targeting mainstream users. Differently from many earlier systems with the same aim, the proposed system does not employ templates for the generation task. It assumes that there exist some natural language sentences in the application domain and uses this repository for the natural language description. It does not require, however, a large corpus as it is often required in machine learning approaches. The systems consist of two main components. The first one aims at analyzing the sentences and constructs a Grammatical Framework (GF) for given sentences and is implemented using the Stanford parser and an answer set program. The second component is for sentence construction and relies on GF Library. The paper includes two use cases to demostrate the capability of the system. As the sentence construction is done via GF, the paper includes a use case evaluation showing that the proposed system could also be utilized in addressing a challenge to create an abstract Wikipedia, which is recently discussed in the BlueSky session of the 2018 International Semantic Web Conference. |
Tasks | Text Generation |
Published | 2019-09-18 |
URL | https://arxiv.org/abs/1909.08250v1 |
https://arxiv.org/pdf/1909.08250v1.pdf | |
PWC | https://paperswithcode.com/paper/natural-language-generation-for-non-expert |
Repo | |
Framework | |
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Title | Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned |
Authors | Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, Ivan Titov |
Abstract | Multi-head self-attention is a key component of the Transformer, a state-of-the-art architecture for neural machine translation. In this work we evaluate the contribution made by individual attention heads in the encoder to the overall performance of the model and analyze the roles played by them. We find that the most important and confident heads play consistent and often linguistically-interpretable roles. When pruning heads using a method based on stochastic gates and a differentiable relaxation of the L0 penalty, we observe that specialized heads are last to be pruned. Our novel pruning method removes the vast majority of heads without seriously affecting performance. For example, on the English-Russian WMT dataset, pruning 38 out of 48 encoder heads results in a drop of only 0.15 BLEU. |
Tasks | Machine Translation |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09418v2 |
https://arxiv.org/pdf/1905.09418v2.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-multi-head-self-attention |
Repo | |
Framework | |
Neural Sequential Phrase Grounding (SeqGROUND)
Title | Neural Sequential Phrase Grounding (SeqGROUND) |
Authors | Pelin Dogan, Leonid Sigal, Markus Gross |
Abstract | We propose an end-to-end approach for phrase grounding in images. Unlike prior methods that typically attempt to ground each phrase independently by building an image-text embedding, our architecture formulates grounding of multiple phrases as a sequential and contextual process. Specifically, we encode region proposals and all phrases into two stacks of LSTM cells, along with so-far grounded phrase-region pairs. These LSTM stacks collectively capture context for grounding of the next phrase. The resulting architecture, which we call SeqGROUND, supports many-to-many matching by allowing an image region to be matched to multiple phrases and vice versa. We show competitive performance on the Flickr30K benchmark dataset and, through ablation studies, validate the efficacy of sequential grounding as well as individual design choices in our model architecture. |
Tasks | Phrase Grounding |
Published | 2019-03-18 |
URL | http://arxiv.org/abs/1903.07669v1 |
http://arxiv.org/pdf/1903.07669v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-sequential-phrase-grounding-seqground |
Repo | |
Framework | |
Transfer Entropy: where Shannon meets Turing
Title | Transfer Entropy: where Shannon meets Turing |
Authors | David Sigtermans |
Abstract | Transfer entropy is capable of capturing nonlinear source-destination relations between multi-variate time series. It is a measure of association between source data that are transformed into destination data via a set of linear transformations between their probability mass functions. The resulting tensor formalism is used to show that in specific cases, e.g., in the case the system consists of three stochastic processes, bivariate analysis suffices to distinguish true relations from false relations. This allows us to determine the causal structure as far as encoded in the probability mass functions of noisy data. The tensor formalism was also used to derive the Data Processing Inequality for transfer entropy. |
Tasks | Time Series |
Published | 2019-04-19 |
URL | https://arxiv.org/abs/1904.09163v3 |
https://arxiv.org/pdf/1904.09163v3.pdf | |
PWC | https://paperswithcode.com/paper/transfer-entropy-where-shannon-meets-turing |
Repo | |
Framework | |