Paper Group NANR 54
Theory and Evaluation Metrics for Learning Disentangled Representations. LSTOD: Latent Spatial-Temporal Origin-Destination prediction model and its applications in ride-sharing platforms. TransINT: Embedding Implication Rules in Knowledge Graphs with Isomorphic Intersections of Linear Subspaces. Unsupervised Model Selection for Variational Disentan …
Theory and Evaluation Metrics for Learning Disentangled Representations
Title | Theory and Evaluation Metrics for Learning Disentangled Representations |
Authors | Anonymous |
Abstract | We make two theoretical contributions to disentanglement learning by (a) defining precise semantics of disentangled representations, and (b) establishing robust metrics for evaluation. First, we characterize the concept “disentangled representations” used in supervised and unsupervised methods along three dimensions–informativeness, separability and interpretability–which can be expressed and quantified explicitly using information-theoretic constructs. This helps explain the behaviors of several well-known disentanglement learning models. We then propose robust metrics for measuring informativeness, separability and interpretability. Through a comprehensive suite of experiments, we show that our metrics correctly characterize the representations learned by different methods and are consistent with qualitative (visual) results. Thus, the metrics allow disentanglement learning methods to be compared on a fair ground. We also empirically uncovered new interesting properties of VAE-based methods and interpreted them with our formulation. These findings are promising and hopefully will encourage the design of more theoretically driven models for learning disentangled representations. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJgK0h4Ywr |
https://openreview.net/pdf?id=HJgK0h4Ywr | |
PWC | https://paperswithcode.com/paper/theory-and-evaluation-metrics-for-learning-1 |
Repo | |
Framework | |
LSTOD: Latent Spatial-Temporal Origin-Destination prediction model and its applications in ride-sharing platforms
Title | LSTOD: Latent Spatial-Temporal Origin-Destination prediction model and its applications in ride-sharing platforms |
Authors | Anonymous |
Abstract | Origin-Destination (OD) flow data is an important instrument in transportation studies. Precise prediction of customer demands from each original location to a destination given a series of previous snapshots helps ride-sharing platforms to better understand their market mechanism. However, most existing prediction methods ignore the network structure of OD flow data and fail to utilize the topological dependencies among related OD pairs. In this paper, we propose a latent spatial-temporal origin-destination (LSTOD) model, with a novel convolutional neural network (CNN) filter to learn the spatial features of OD pairs from a graph perspective and an attention structure to capture their long-term periodicity. Experiments on a real customer request dataset with available OD information from a ride-sharing platform demonstrate the advantage of LSTOD in achieving at least 6.5% improvement in prediction accuracy over the second best model. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkgZxpVFvH |
https://openreview.net/pdf?id=BkgZxpVFvH | |
PWC | https://paperswithcode.com/paper/lstod-latent-spatial-temporal-origin |
Repo | |
Framework | |
TransINT: Embedding Implication Rules in Knowledge Graphs with Isomorphic Intersections of Linear Subspaces
Title | TransINT: Embedding Implication Rules in Knowledge Graphs with Isomorphic Intersections of Linear Subspaces |
Authors | Anonymous |
Abstract | Knowledge Graphs (KG), composed of entities and relations, provide a structured representation of knowledge. For easy access to statistical approaches on relational data, multiple methods to embed a KG as components of R^d have been introduced. We propose TransINT, a novel and interpretable KG embedding method that isomorphically preserves the implication ordering among relations in the embedding space. TransINT maps set of entities (tied by a relation) to continuous sets of vectors that are inclusion-ordered isomorphically to relation implications. With a novel parameter sharing scheme, TransINT enables automatic training on missing but implied facts without rule grounding. We achieve new state-of-the-art performances with signficant margins in Link Prediction and Triple Classification on FB122 dataset, with boosted performance even on test instances that cannot be inferred by logical rules. The angles between the continuous sets embedded by TransINT provide an interpretable way to mine semantic relatedness and implication rules among relations. |
Tasks | Knowledge Graphs, Link Prediction |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1lxvxBtvr |
https://openreview.net/pdf?id=r1lxvxBtvr | |
PWC | https://paperswithcode.com/paper/transint-embedding-implication-rules-in |
Repo | |
Framework | |
Unsupervised Model Selection for Variational Disentangled Representation Learning
Title | Unsupervised Model Selection for Variational Disentangled Representation Learning |
Authors | Anonymous |
Abstract | Disentangled representations have recently been shown to improve fairness, data efficiency and generalisation in simple supervised and reinforcement learning tasks. To extend the benefits of disentangled representations to more complex domains and practical applications, it is important to enable hyperparameter tuning and model selection of existing unsupervised approaches without requiring access to ground truth attribute labels, which are not available for most datasets. This paper addresses this problem by introducing a simple yet robust and reliable method for unsupervised disentangled model selection. We show that our approach performs comparably to the existing supervised alternatives across 5400 models from six state of the art unsupervised disentangled representation learning model classes. Furthermore, we show that the ranking produced by our approach correlates well with the final task performance on two different domains. |
Tasks | Model Selection, Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SyxL2TNtvr |
https://openreview.net/pdf?id=SyxL2TNtvr | |
PWC | https://paperswithcode.com/paper/unsupervised-model-selection-for-variational |
Repo | |
Framework | |
Optimizing Loss Landscape Connectivity via Neuron Alignment
Title | Optimizing Loss Landscape Connectivity via Neuron Alignment |
Authors | Anonymous |
Abstract | The loss landscapes of deep neural networks are poorly understood due to their high nonconvexity. Empirically, the local optima of these loss functions can be connected by a simple curve in model space, along which the loss remains fairly constant. Yet, current path finding algorithms do not consider the influence of symmetry in the loss surface caused by weight permutations of the networks corresponding to the minima. We propose a framework to investigate the effect of symmetry on the landscape connectivity by directly optimizing the weight permutations of the networks being connected. Through utilizing an existing neuron alignment technique, we derive an initialization for the weight permutations. Empirically, this initialization is critical for efficiently learning a simple, planar, low-loss curve between networks that successfully generalizes. Additionally, we introduce a proximal alternating minimization scheme to address if an optimal permutation can be learned, with some provable convergence guarantees. We find that the learned parameterized curve is still a low-loss curve after permuting the weights of the endpoint models, for a subset of permutations. We also show that there is small but steady performance gain in performance of the ensembles constructed from the learned curve, when considering weight space symmetry. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1erJJrYPH |
https://openreview.net/pdf?id=B1erJJrYPH | |
PWC | https://paperswithcode.com/paper/optimizing-loss-landscape-connectivity-via |
Repo | |
Framework | |
All Simulations Are Not Equal: Simulation Reweighing for Imperfect Information Games
Title | All Simulations Are Not Equal: Simulation Reweighing for Imperfect Information Games |
Authors | Anonymous |
Abstract | Imperfect information games are challenging benchmarks for artificial intelligent systems. To reason and plan under uncertainty is a key towards general AI. Traditionally, large amounts of simulations are used in imperfect information games, and they sometimes perform sub-optimally due to large state and action spaces. In this work, we propose a simulation reweighing mechanism using neural networks. It performs backwards verification to public previous actions and assign proper belief weights to the simulations from the information set of the current observation, using an incomplete state solver network (ISSN). We use simulation reweighing in the playing phase of the game contract bridge, and show that it outperforms previous state-of-the-art Monte Carlo simulation based methods, and achieves better play per decision. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJlyLgrFvB |
https://openreview.net/pdf?id=HJlyLgrFvB | |
PWC | https://paperswithcode.com/paper/all-simulations-are-not-equal-simulation |
Repo | |
Framework | |
First-Order Preconditioning via Hypergradient Descent
Title | First-Order Preconditioning via Hypergradient Descent |
Authors | Anonymous |
Abstract | Standard gradient-descent methods are susceptible to a range of issues that can impede training, such as high correlations and different scaling in parameter space. These difficulties can be addressed by second-order approaches that apply a preconditioning matrix to the gradient to improve convergence. Unfortunately, such algorithms typically struggle to scale to high-dimensional problems, in part because the calculation of specific preconditioners such as the inverse Hessian or Fisher information matrix is highly expensive. We introduce first-order preconditioning (FOP), a fast, scalable approach that generalizes previous work on hypergradient descent (Almeida et al., 1998; Maclaurin et al., 2015; Baydin et al., 2017) to learn a preconditioning matrix that only makes use of first-order information. Experiments show that FOP is able to improve the performance of standard deep learning optimizers on several visual classification tasks with minimal computational overhead. We also investigate the properties of the learned preconditioning matrices and perform a preliminary theoretical analysis of the algorithm. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Skg3104FDS |
https://openreview.net/pdf?id=Skg3104FDS | |
PWC | https://paperswithcode.com/paper/first-order-preconditioning-via-hypergradient-1 |
Repo | |
Framework | |
Learning to Represent Programs with Property Signatures
Title | Learning to Represent Programs with Property Signatures |
Authors | Anonymous |
Abstract | We introduce the notion of property signatures, a representation for programs and program specifications meant for consumption by machine learning algorithms. Given a function with input type τ_in and output type τ_out, a property is a function of type: (τ_in, τ_out) → Bool that (informally) describes some simple property of the function under consideration. For instance, if τ_in and τ_out are both lists of the same type, one property might ask ‘is the input list the same length as the output list?’. If we have a list of such properties, we can evaluate them all for our function to get a list of outputs that we will call the property signature. Crucially, we can ‘guess’ the property signature for a function given only a set of input/output pairs meant to specify that function. We discuss several potential applications of property signatures and show experimentally that they can be used to improve over a baseline synthesizer so that it emits twice as many programs in less than one-tenth of the time. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rylHspEKPr |
https://openreview.net/pdf?id=rylHspEKPr | |
PWC | https://paperswithcode.com/paper/learning-to-represent-programs-with-property |
Repo | |
Framework | |
Deep geometric matrix completion: Are we doing it right?
Title | Deep geometric matrix completion: Are we doing it right? |
Authors | Anonymous |
Abstract | We address the problem of reconstructing a matrix from a subset of its entries. Current methods, branded as geometric matrix completion, augment classical rank regularization techniques by incorporating geometric information into the solution. This information is usually provided as graphs encoding relations between rows/columns. In this work we propose a simple spectral approach for solving the matrix completion problem, via the framework of functional maps. We introduce the zoomout loss, a multiresolution spectral geometric loss inspired by recent advances in shape correspondence, whose minimization leads to state-of-the-art results on various recommender systems datasets. Surprisingly, for some datasets we were able to achieve comparable results even without incorporating geometric information. This puts into question both the quality of such information and current methods’ ability to use it in a meaningful and efficient way. |
Tasks | Matrix Completion, Recommendation Systems |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJxyzxrYPH |
https://openreview.net/pdf?id=BJxyzxrYPH | |
PWC | https://paperswithcode.com/paper/deep-geometric-matrix-completion-are-we-doing |
Repo | |
Framework | |
BRIDGING ADVERSARIAL SAMPLES AND ADVERSARIAL NETWORKS
Title | BRIDGING ADVERSARIAL SAMPLES AND ADVERSARIAL NETWORKS |
Authors | Anonymous |
Abstract | Generative adversarial networks have achieved remarkable performance on various tasks but suffer from sensitivity to hyper-parameters, training instability, and mode collapse. We find that this is partly due to gradient given by non-robust discriminator containing non-informative adversarial noise, which can hinder generator from catching the pattern of real samples. Inspired by defense against adversarial samples, we introduce adversarial training of discriminator on real samples that does not exist in classic GANs framework to make adversarial training symmetric, which can balance min-max game and make discriminator more robust. Robust discriminator can give more informative gradient with less adversarial noise, which can stabilize training and accelerate convergence. We validate the proposed method on image generation tasks with varied network architectures quantitatively. Experiments show that training stability, perceptual quality, and diversity of generated samples are consistently improved with small additional training computation cost. |
Tasks | Image Generation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rklPITVKvS |
https://openreview.net/pdf?id=rklPITVKvS | |
PWC | https://paperswithcode.com/paper/bridging-adversarial-samples-and-adversarial |
Repo | |
Framework | |
Hallucinative Topological Memory for Zero-Shot Visual Planning
Title | Hallucinative Topological Memory for Zero-Shot Visual Planning |
Authors | Anonymous |
Abstract | In visual planning (VP), an agent learns to plan goal-directed behavior from observations of a dynamical system obtained offline, e.g., images obtained from self-supervised robot interaction. VP algorithms essentially combine data-driven perception and planning, and are important for robotic manipulation and navigation domains, among others. A recent and promising approach to VP is the semi-parametric topological memory (SPTM) method, where image samples are treated as nodes in a graph, and the connectivity in the graph is learned using deep image classification. Thus, the learned graph represents the topological connectivity of the data, and planning can be performed using conventional graph search methods. However, training SPTM necessitates a suitable loss function for the connectivity classifier, which requires non-trivial manual tuning. More importantly, SPTM is constricted in its ability to generalize to changes in the domain, as its graph is constructed from direct observations and thus requires collecting new samples for planning. In this paper, we propose Hallucinative Topological Memory (HTM), which overcomes these shortcomings. In HTM, instead of training a discriminative classifier we train an energy function using contrastive predictive coding. In addition, we learn a conditional VAE model that generates samples given a context image of the domain, and use these hallucinated samples for building the connectivity graph, allowing for zero-shot generalization to domain changes. In simulated domains, HTM outperforms conventional SPTM and visual foresight methods in terms of both plan quality and success in long-horizon planning. |
Tasks | Image Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkgF4kSFPB |
https://openreview.net/pdf?id=BkgF4kSFPB | |
PWC | https://paperswithcode.com/paper/hallucinative-topological-memory-for-zero |
Repo | |
Framework | |
Policy Tree Network
Title | Policy Tree Network |
Authors | Anonymous |
Abstract | Decision-time planning policies with implicit dynamics models have been shown to work in discrete action spaces with Q learning. However, decision-time planning with implicit dynamics models in continuous action space has proven to be a difficult problem. Recent work in Reinforcement Learning has allowed for implicit model based approaches to be extended to Policy Gradient methods. In this work we propose Policy Tree Network (PTN). Policy Tree Network lies at the intersection of Model-Based Reinforcement Learning and Model-Free Reinforcement Learning. Policy Tree Network is a novel approach which, for the first time, demonstrates how to leverage an implicit model to perform decision-time planning with Policy Gradient methods in continuous action spaces. This work is empirically justified on 8 standard MuJoCo environments so that it can easily be compared with similar work done in this area. Additionally, we offer a lower bound on the worst case change in the mean of the policy when tree planning is used and theoretically justify our design choices. |
Tasks | Policy Gradient Methods, Q-Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJlrS1rYwH |
https://openreview.net/pdf?id=HJlrS1rYwH | |
PWC | https://paperswithcode.com/paper/policy-tree-network |
Repo | |
Framework | |
Self-Supervised Learning of Appliance Usage
Title | Self-Supervised Learning of Appliance Usage |
Authors | Anonymous |
Abstract | Learning home appliance usage is important for understanding people’s activities and optimizing energy consumption. The problem is modeled as an event detection task, where the objective is to learn when a user turns an appliance on, and which appliance it is (microwave, hair dryer, etc.). Ideally, we would like to solve the problem in an unsupervised way so that the method can be applied to new homes and new appliances without any labels. To this end, we introduce a new deep learning model that takes input from two home sensors: 1) a smart electricity meter that outputs the total energy consumed by the home as a function of time, and 2) a motion sensor that outputs the locations of the residents over time. The model learns the distribution of the residents’ locations conditioned on the home energy signal. We show that this cross-modal prediction task allows us to detect when a particular appliance is used, and the location of the appliance in the home, all in a self-supervised manner, without any labeled data. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1lJzyStvS |
https://openreview.net/pdf?id=B1lJzyStvS | |
PWC | https://paperswithcode.com/paper/self-supervised-learning-of-appliance-usage |
Repo | |
Framework | |
Gradient-Based Neural DAG Learning
Title | Gradient-Based Neural DAG Learning |
Authors | Anonymous |
Abstract | We propose a novel score-based approach to learning a directed acyclic graph (DAG) from observational data. We adapt a recently proposed continuous constrained optimization formulation to allow for nonlinear relationships between variables using neural networks. This extension allows to model complex interactions while avoiding the combinatorial nature of the problem. In addition to comparing our method to existing continuous optimization methods, we provide missing empirical comparisons to nonlinear greedy search methods. On both synthetic and real-world data sets, this new method outperforms current continuous methods on most tasks while being competitive with existing greedy search methods on important metrics for causal inference. |
Tasks | Causal Inference |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rklbKA4YDS |
https://openreview.net/pdf?id=rklbKA4YDS | |
PWC | https://paperswithcode.com/paper/gradient-based-neural-dag-learning-1 |
Repo | |
Framework | |
Understanding the Limitations of Conditional Generative Models
Title | Understanding the Limitations of Conditional Generative Models |
Authors | Anonymous |
Abstract | Class-conditional generative models hold promise to overcome the shortcomings of their discriminative counterparts. They are a natural choice to solve discriminative tasks in a robust manner as they jointly optimize for predictive performance and accurate modeling of the input distribution. In this work, we investigate robust classification with likelihood-based generative models from a theoretical and practical perspective to investigate if they can deliver on their promises. Our analysis focuses on a spectrum of robustness properties: (1) Detection of worst-case outliers in the form of adversarial examples; (2) Detection of average-case outliers in the form of ambiguous inputs and (3) Detection of incorrectly labeled in-distribution inputs. Our theoretical result reveals that it is impossible to guarantee detectability of adversarially-perturbed inputs even for near-optimal generative classifiers. Experimentally, we find that while we are able to train robust models for MNIST, robustness completely breaks down on CIFAR10. We relate this failure to various undesirable model properties that can be traced to the maximum likelihood training objective. Despite being a common choice in the literature, our results indicate that likelihood-based conditional generative models may are surprisingly ineffective for robust classification. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1lPleBFvH |
https://openreview.net/pdf?id=r1lPleBFvH | |
PWC | https://paperswithcode.com/paper/understanding-the-limitations-of-conditional |
Repo | |
Framework | |