January 30, 2020

3080 words 15 mins read

Paper Group ANR 324

Paper Group ANR 324

Deep Learning Solutions for TanDEM-X-based Forest Classification. Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems. Data-generating models under which the random forest algorithm performs badly. ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots. Variational Tracking and Prediction with Generati …

Deep Learning Solutions for TanDEM-X-based Forest Classification

Title Deep Learning Solutions for TanDEM-X-based Forest Classification
Authors Antonio Mazza, Francescopaolo Sica
Abstract In the last few years, deep learning (DL) has been successfully and massively employed in computer vision for discriminative tasks, such as image classification or object detection. This kind of problems are core to many remote sensing (RS) applications as well, though with domain-specific peculiarities. Therefore, there is a growing interest on the use of DL methods for RS tasks. Here, we consider the forest/non-forest classification problem with TanDEM-X data, and test two state-of-the-art DL models, suitably adapting them to the specific task. Our experiments confirm the great potential of DL methods for RS applications.
Tasks Image Classification, Object Detection
Published 2019-02-01
URL http://arxiv.org/abs/1902.00274v1
PDF http://arxiv.org/pdf/1902.00274v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-solutions-for-tandem-x-based
Repo
Framework

Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems

Title Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems
Authors Tianle Cai, Ruiqi Gao, Jikai Hou, Siyu Chen, Dong Wang, Di He, Zhihua Zhang, Liwei Wang
Abstract First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks. Second-order methods, despite their better convergence rate, are rarely used in practice due to the prohibitive computational cost in calculating the second-order information. In this paper, we propose a novel Gram-Gauss-Newton (GGN) algorithm to train deep neural networks for regression problems with square loss. Our method draws inspiration from the connection between neural network optimization and kernel regression of neural tangent kernel (NTK). Different from typical second-order methods that have heavy computational cost in each iteration, GGN only has minor overhead compared to first-order methods such as SGD. We also give theoretical results to show that for sufficiently wide neural networks, the convergence rate of GGN is \emph{quadratic}. Furthermore, we provide convergence guarantee for mini-batch GGN algorithm, which is, to our knowledge, the first convergence result for the mini-batch version of a second-order method on overparameterized neural networks. Preliminary experiments on regression tasks demonstrate that for training standard networks, our GGN algorithm converges much faster and achieves better performance than SGD.
Tasks
Published 2019-05-28
URL https://arxiv.org/abs/1905.11675v2
PDF https://arxiv.org/pdf/1905.11675v2.pdf
PWC https://paperswithcode.com/paper/a-gram-gauss-newton-method-learning
Repo
Framework

Data-generating models under which the random forest algorithm performs badly

Title Data-generating models under which the random forest algorithm performs badly
Authors José A. Ferreira
Abstract Examples are given of data-generating models under which some versions of the random forest algorithm may fail to be consistent or at least may be extremely slow to converge to the optimal predictor. Evidence provided for these properties is based on partly intuitive and partly rigorous arguments and on numerical experiments. Although one can always choose a model under which random forests perform very badly, in each case simple methods based on statistics of variable use' and variable importance’ can be used to construct a better predictor based on a sort of mixture of random forests.
Tasks
Published 2019-10-02
URL https://arxiv.org/abs/1910.00943v5
PDF https://arxiv.org/pdf/1910.00943v5.pdf
PWC https://paperswithcode.com/paper/a-note-on-the-consistency-of-the-random
Repo
Framework

ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots

Title ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots
Authors Michael Ahn, Henry Zhu, Kristian Hartikainen, Hugo Ponte, Abhishek Gupta, Sergey Levine, Vikash Kumar
Abstract ROBEL is an open-source platform of cost-effective robots designed for reinforcement learning in the real world. ROBEL introduces two robots, each aimed to accelerate reinforcement learning research in different task domains: D’Claw is a three-fingered hand robot that facilitates learning dexterous manipulation tasks, and D’Kitty is a four-legged robot that facilitates learning agile legged locomotion tasks. These low-cost, modular robots are easy to maintain and are robust enough to sustain on-hardware reinforcement learning from scratch with over 14000 training hours registered on them to date. To leverage this platform, we propose an extensible set of continuous control benchmark tasks for each robot. These tasks feature dense and sparse task objectives, and additionally introduce score metrics as hardware-safety. We provide benchmark scores on an initial set of tasks using a variety of learning-based methods. Furthermore, we show that these results can be replicated across copies of the robots located in different institutions. Code, documentation, design files, detailed assembly instructions, final policies, baseline details, task videos, and all supplementary materials required to reproduce the results are available at www.roboticsbenchmarks.org.
Tasks Continuous Control
Published 2019-09-25
URL https://arxiv.org/abs/1909.11639v3
PDF https://arxiv.org/pdf/1909.11639v3.pdf
PWC https://paperswithcode.com/paper/robel-robotics-benchmarks-for-learning-with
Repo
Framework

Variational Tracking and Prediction with Generative Disentangled State-Space Models

Title Variational Tracking and Prediction with Generative Disentangled State-Space Models
Authors Adnan Akhundov, Maximilian Soelch, Justin Bayer, Patrick van der Smagt
Abstract We address tracking and prediction of multiple moving objects in visual data streams as inference and sampling in a disentangled latent state-space model. By encoding objects separately and including explicit position information in the latent state space, we perform tracking via amortized variational Bayesian inference of the respective latent positions. Inference is implemented in a modular neural framework tailored towards our disentangled latent space. Generative and inference model are jointly learned from observations only. Comparing to related prior work, we empirically show that our Markovian state-space assumption enables faithful and much improved long-term prediction well beyond the training horizon. Further, our inference model correctly decomposes frames into objects, even in the presence of occlusions. Tracking performance is increased significantly over prior art.
Tasks Bayesian Inference
Published 2019-10-14
URL https://arxiv.org/abs/1910.06205v1
PDF https://arxiv.org/pdf/1910.06205v1.pdf
PWC https://paperswithcode.com/paper/variational-tracking-and-prediction-with
Repo
Framework

Coin_flipper at eHealth-KD Challenge 2019: Voting LSTMs for Key Phrases and Semantic Relation Identification Applied to Spanish eHealth Texts

Title Coin_flipper at eHealth-KD Challenge 2019: Voting LSTMs for Key Phrases and Semantic Relation Identification Applied to Spanish eHealth Texts
Authors Neus Català, Mario Martin
Abstract This paper describes our approach presented for the eHealth-KD 2019 challenge. Our participation was aimed at testing how far we could go using generic tools for Text-Processing but, at the same time, using common optimization techniques in the field of Data Mining. The architecture proposed for both tasks of the challenge is a standard stacked 2-layer bi-LSTM. The main particularities of our approach are: (a) The use of a surrogate function of F1 as loss function to close the gap between the minimization function and the evaluation metric, and (b) The generation of an ensemble of models for generating predictions by majority vote. Our system ranked second with an F1 score of 62.18% in the main task by a narrow margin with the winner that scored 63.94%.
Tasks
Published 2019-09-26
URL https://arxiv.org/abs/1909.12339v1
PDF https://arxiv.org/pdf/1909.12339v1.pdf
PWC https://paperswithcode.com/paper/coin_flipper-at-ehealth-kd-challenge-2019
Repo
Framework

On the Long-term Impact of Algorithmic Decision Policies: Effort Unfairness and Feature Segregation through Social Learning

Title On the Long-term Impact of Algorithmic Decision Policies: Effort Unfairness and Feature Segregation through Social Learning
Authors Hoda Heidari, Vedant Nanda, Krishna P. Gummadi
Abstract Most existing notions of algorithmic fairness are one-shot: they ensure some form of allocative equality at the time of decision making, but do not account for the adverse impact of the algorithmic decisions today on the long-term welfare and prosperity of certain segments of the population. We take a broader perspective on algorithmic fairness. We propose an effort-based measure of fairness and present a data-driven framework for characterizing the long-term impact of algorithmic policies on reshaping the underlying population. Motivated by the psychological literature on \emph{social learning} and the economic literature on equality of opportunity, we propose a micro-scale model of how individuals may respond to decision-making algorithms. We employ existing measures of segregation from sociology and economics to quantify the resulting macro-scale population-level change. Importantly, we observe that different models may shift the group-conditional distribution of qualifications in different directions. Our findings raise a number of important questions regarding the formalization of fairness for decision-making models.
Tasks Decision Making
Published 2019-03-04
URL https://arxiv.org/abs/1903.01209v2
PDF https://arxiv.org/pdf/1903.01209v2.pdf
PWC https://paperswithcode.com/paper/on-the-long-term-impact-of-algorithmic
Repo
Framework

Learning and Evaluating General Linguistic Intelligence

Title Learning and Evaluating General Linguistic Intelligence
Authors Dani Yogatama, Cyprien de Masson d’Autume, Jerome Connor, Tomas Kocisky, Mike Chrzanowski, Lingpeng Kong, Angeliki Lazaridou, Wang Ling, Lei Yu, Chris Dyer, Phil Blunsom
Abstract We define general linguistic intelligence as the ability to reuse previously acquired knowledge about a language’s lexicon, syntax, semantics, and pragmatic conventions to adapt to new tasks quickly. Using this definition, we analyze state-of-the-art natural language understanding models and conduct an extensive empirical investigation to evaluate them against these criteria through a series of experiments that assess the task-independence of the knowledge being acquired by the learning process. In addition to task performance, we propose a new evaluation metric based on an online encoding of the test data that quantifies how quickly an existing agent (model) learns a new task. Our results show that while the field has made impressive progress in terms of model architectures that generalize to many tasks, these models still require a lot of in-domain training examples (e.g., for fine tuning, training task-specific modules), and are prone to catastrophic forgetting. Moreover, we find that far from solving general tasks (e.g., document question answering), our models are overfitting to the quirks of particular datasets (e.g., SQuAD). We discuss missing components and conjecture on how to make progress toward general linguistic intelligence.
Tasks Question Answering
Published 2019-01-31
URL http://arxiv.org/abs/1901.11373v1
PDF http://arxiv.org/pdf/1901.11373v1.pdf
PWC https://paperswithcode.com/paper/learning-and-evaluating-general-linguistic
Repo
Framework

Equivalent and Approximate Transformations of Deep Neural Networks

Title Equivalent and Approximate Transformations of Deep Neural Networks
Authors Abhinav Kumar, Thiago Serra, Srikumar Ramalingam
Abstract Two networks are equivalent if they produce the same output for any given input. In this paper, we study the possibility of transforming a deep neural network to another network with a different number of units or layers, which can be either equivalent, a local exact approximation, or a global linear approximation of the original network. On the practical side, we show that certain rectified linear units (ReLUs) can be safely removed from a network if they are always active or inactive for any valid input. If we only need an equivalent network for a smaller domain, then more units can be removed and some layers collapsed. On the theoretical side, we constructively show that for any feed-forward ReLU network, there exists a global linear approximation to a 2-hidden-layer shallow network with a fixed number of units. This result is a balance between the increasing number of units for arbitrary approximation with a single layer and the known upper bound of $\lceil log(n_0+1)\rceil +1$ layers for exact representation, where $n_0$ is the input dimension. While the transformed network may require an exponential number of units to capture the activation patterns of the original network, we show that it can be made substantially smaller by only accounting for the patterns that define linear regions. Based on experiments with ReLU networks on the MNIST dataset, we found that $l_1$-regularization and adversarial training reduces the number of linear regions significantly as the number of stable units increases due to weight sparsity. Therefore, we can also intentionally train ReLU networks to allow for effective loss-less compression and approximation.
Tasks
Published 2019-05-27
URL https://arxiv.org/abs/1905.11428v1
PDF https://arxiv.org/pdf/1905.11428v1.pdf
PWC https://paperswithcode.com/paper/equivalent-and-approximate-transformations-of
Repo
Framework

Subspace Detours: Building Transport Plans that are Optimal on Subspace Projections

Title Subspace Detours: Building Transport Plans that are Optimal on Subspace Projections
Authors Boris Muzellec, Marco Cuturi
Abstract Computing optimal transport (OT) between measures in high dimensions is doomed by the curse of dimensionality. A popular approach to avoid this curse is to project input measures on lower-dimensional subspaces (1D lines in the case of sliced Wasserstein distances), solve the OT problem between these reduced measures, and settle for the Wasserstein distance between these reductions, rather than that between the original measures. This approach is however difficult to extend to the case in which one wants to compute an OT map (a Monge map) between the original measures. Since computations are carried out on lower-dimensional projections, classical map estimation techniques can only produce maps operating in these reduced dimensions. We propose in this work two methods to extrapolate, from an transport map that is optimal on a subspace, one that is nearly optimal in the entire space. We prove that the best optimal transport plan that takes such “subspace detours” is a generalization of the Knothe-Rosenblatt transport. We show that these plans can be explicitly formulated when comparing Gaussian measures (between which the Wasserstein distance is commonly referred to as the Bures or Fr'echet distance). We provide an algorithm to select optimal subspaces given pairs of Gaussian measures, and study scenarios in which that mediating subspace can be selected using prior information. We consider applications to semantic mediation between elliptic word embeddings and domain adaptation with Gaussian mixture models.
Tasks Domain Adaptation, Word Embeddings
Published 2019-05-24
URL https://arxiv.org/abs/1905.10099v4
PDF https://arxiv.org/pdf/1905.10099v4.pdf
PWC https://paperswithcode.com/paper/subspace-detours-building-transport-plans
Repo
Framework

Integrating Graph Contextualized Knowledge into Pre-trained Language Models

Title Integrating Graph Contextualized Knowledge into Pre-trained Language Models
Authors Bin He, Di Zhou, Jinghui Xiao, Xin jiang, Qun Liu, Nicholas Jing Yuan, Tong Xu
Abstract Complex node interactions are common in knowledge graphs, and these interactions also contain rich knowledge information. However, traditional methods usually treat a triple as a training unit during the knowledge representation learning (KRL) procedure, neglecting contextualized information of the nodes in knowledge graphs (KGs). We generalize the modeling object to a very general form, which theoretically supports any subgraph extracted from the knowledge graph, and these subgraphs are fed into a novel transformer-based model to learn the knowledge embeddings. To broaden usage scenarios of knowledge, pre-trained language models are utilized to build a model that incorporates the learned knowledge representations. Experimental results demonstrate that our model achieves the state-of-the-art performance on several medical NLP tasks, and improvement above TransE indicates that our KRL method captures the graph contextualized information effectively.
Tasks Knowledge Graphs, Representation Learning
Published 2019-11-30
URL https://arxiv.org/abs/1912.00147v2
PDF https://arxiv.org/pdf/1912.00147v2.pdf
PWC https://paperswithcode.com/paper/integrating-graph-contextualized-knowledge
Repo
Framework

Latent Universal Task-Specific BERT

Title Latent Universal Task-Specific BERT
Authors Alon Rozental, Zohar Kelrich, Daniel Fleischer
Abstract This paper describes a language representation model which combines the Bidirectional Encoder Representations from Transformers (BERT) learning mechanism described in Devlin et al. (2018) with a generalization of the Universal Transformer model described in Dehghani et al. (2018). We further improve this model by adding a latent variable that represents the persona and topics of interests of the writer for each training example. We also describe a simple method to improve the usefulness of our language representation for solving problems in a specific domain at the expense of its ability to generalize to other fields. Finally, we release a pre-trained language representation model for social texts that was trained on 100 million tweets.
Tasks
Published 2019-05-16
URL https://arxiv.org/abs/1905.06638v1
PDF https://arxiv.org/pdf/1905.06638v1.pdf
PWC https://paperswithcode.com/paper/latent-universal-task-specific-bert
Repo
Framework

Estimation of Individualized Decision Rules Based on an Optimized Covariate-Dependent Equivalent of Random Outcomes

Title Estimation of Individualized Decision Rules Based on an Optimized Covariate-Dependent Equivalent of Random Outcomes
Authors Zhengling Qi, Ying Cui, Yufeng Liu, Jong-Shi Pang
Abstract Recent exploration of optimal individualized decision rules (IDRs) for patients in precision medicine has attracted a lot of attention due to the heterogeneous responses of patients to different treatments. In the existing literature of precision medicine, an optimal IDR is defined as a decision function mapping from the patients’ covariate space into the treatment space that maximizes the expected outcome of each individual. Motivated by the concept of Optimized Certainty Equivalent (OCE) introduced originally in \cite{ben1986expected} that includes the popular conditional-value-of risk (CVaR) \cite{rockafellar2000optimization}, we propose a decision-rule based optimized covariates dependent equivalent (CDE) for individualized decision making problems. Our proposed IDR-CDE broadens the existing expected-mean outcome framework in precision medicine and enriches the previous concept of the OCE. Numerical experiments demonstrate that our overall approach outperforms existing methods in estimating optimal IDRs under heavy-tail distributions of the data.
Tasks Decision Making
Published 2019-08-27
URL https://arxiv.org/abs/1908.10742v1
PDF https://arxiv.org/pdf/1908.10742v1.pdf
PWC https://paperswithcode.com/paper/estimation-of-individualized-decision-rules
Repo
Framework

Learning to Infer Program Sketches

Title Learning to Infer Program Sketches
Authors Maxwell Nye, Luke Hewitt, Joshua Tenenbaum, Armando Solar-Lezama
Abstract Our goal is to build systems which write code automatically from the kinds of specifications humans can most easily provide, such as examples and natural language instruction. The key idea of this work is that a flexible combination of pattern recognition and explicit reasoning can be used to solve these complex programming problems. We propose a method for dynamically integrating these types of information. Our novel intermediate representation and training algorithm allow a program synthesis system to learn, without direct supervision, when to rely on pattern recognition and when to perform symbolic search. Our model matches the memorization and generalization performance of neural synthesis and symbolic search, respectively, and achieves state-of-the-art performance on a dataset of simple English description-to-code programming problems.
Tasks Program Synthesis
Published 2019-02-17
URL https://arxiv.org/abs/1902.06349v2
PDF https://arxiv.org/pdf/1902.06349v2.pdf
PWC https://paperswithcode.com/paper/learning-to-infer-program-sketches
Repo
Framework

A Methodology to Select Topology Generators for WANET Simulations (Extended Version)

Title A Methodology to Select Topology Generators for WANET Simulations (Extended Version)
Authors Michael O’Sullivan, Leonardo Aniello, Vladimiro Sassone
Abstract Many academic and industrial research works on WANETs rely on simulations, at least in the first stages, to obtain preliminary results to be subsequently validated in real settings. Topology generators (TG) are commonly used to generate the initial placement of nodes in artificial WANET topologies, where those simulations take place. The significance of these experiments heavily depends on the representativeness of artificial topologies. Indeed, if they were not drawn fairly, obtained results would apply only to a subset of possible configurations, hence they would lack of the appropriate generality required to port them to the real world. Although using many TGs could mitigate this issue by generating topologies in several different ways, that would entail a significant additional effort. Hence, the problem arises of what TGs to choose, among a number of available generators, to maximise the representativeness of generated topologies and reduce the number of TGs to use. In this paper, we address that problem by investigating the presence of bias in the initial placement of nodes in artificial WANET topologies produced by different TGs. We propose a methodology to assess such bias and introduce two metrics to quantify the diversity of the topologies generated by a TG with respect to all the available TGs, which can be used to select what TGs to use. We carry out experiments on three well-known TGs, namely BRITE, NPART and GT-ITM. Obtained results show that using the artificial networks produced by a single TG can introduce bias.
Tasks
Published 2019-08-26
URL https://arxiv.org/abs/1908.09577v1
PDF https://arxiv.org/pdf/1908.09577v1.pdf
PWC https://paperswithcode.com/paper/a-methodology-to-select-topology-generators
Repo
Framework
comments powered by Disqus