July 26, 2019

2883 words 14 mins read

Paper Group ANR 774

Paper Group ANR 774

Deep Generative Dual Memory Network for Continual Learning. Causal Inference through the Method of Direct Estimation. Causal Inference on Multivariate and Mixed-Type Data. Simple rules for complex decisions. On the Equivalence of Holographic and Complex Embeddings for Link Prediction. Block-Cyclic Stochastic Coordinate Descent for Deep Neural Netwo …

Deep Generative Dual Memory Network for Continual Learning

Title Deep Generative Dual Memory Network for Continual Learning
Authors Nitin Kamra, Umang Gupta, Yan Liu
Abstract Despite advances in deep learning, neural networks can only learn multiple tasks when trained on them jointly. When tasks arrive sequentially, they lose performance on previously learnt tasks. This phenomenon called catastrophic forgetting is a fundamental challenge to overcome before neural networks can learn continually from incoming data. In this work, we derive inspiration from human memory to develop an architecture capable of learning continuously from sequentially incoming tasks, while averting catastrophic forgetting. Specifically, our contributions are: (i) a dual memory architecture emulating the complementary learning systems (hippocampus and the neocortex) in the human brain, (ii) memory consolidation via generative replay of past experiences, (iii) demonstrating advantages of generative replay and dual memories via experiments, and (iv) improved performance retention on challenging tasks even for low capacity models. Our architecture displays many characteristics of the mammalian memory and provides insights on the connection between sleep and learning.
Tasks Continual Learning
Published 2017-10-28
URL http://arxiv.org/abs/1710.10368v2
PDF http://arxiv.org/pdf/1710.10368v2.pdf
PWC https://paperswithcode.com/paper/deep-generative-dual-memory-network-for
Repo
Framework

Causal Inference through the Method of Direct Estimation

Title Causal Inference through the Method of Direct Estimation
Authors Marc Ratkovic, Dustin Tingley
Abstract The intersection of causal inference and machine learning is a rapidly advancing field. We propose a new approach, the method of direct estimation, that draws on both traditions in order to obtain nonparametric estimates of treatment effects. The approach focuses on estimating the effect of fluctuations in a treatment variable on an outcome. A tensor-spline implementation enables rich interactions between functional bases allowing for the approach to capture treatment/covariate interactions. We show how new innovations in Bayesian sparse modeling readily handle the proposed framework, and then document its performance in simulation and applied examples. Furthermore we show how the method of direct estimation can easily extend to structural estimators commonly used in a variety of disciplines, like instrumental variables, mediation analysis, and sequential g-estimation.
Tasks Causal Inference
Published 2017-03-16
URL http://arxiv.org/abs/1703.05849v2
PDF http://arxiv.org/pdf/1703.05849v2.pdf
PWC https://paperswithcode.com/paper/causal-inference-through-the-method-of-direct
Repo
Framework

Causal Inference on Multivariate and Mixed-Type Data

Title Causal Inference on Multivariate and Mixed-Type Data
Authors Alexander Marx, Jilles Vreeken
Abstract Given data over the joint distribution of two random variables $X$ and $Y$, we consider the problem of inferring the most likely causal direction between $X$ and $Y$. In particular, we consider the general case where both $X$ and $Y$ may be univariate or multivariate, and of the same or mixed data types. We take an information theoretic approach, based on Kolmogorov complexity, from which it follows that first describing the data over cause and then that of effect given cause is shorter than the reverse direction. The ideal score is not computable, but can be approximated through the Minimum Description Length (MDL) principle. Based on MDL, we propose two scores, one for when both $X$ and $Y$ are of the same single data type, and one for when they are mixed-type. We model dependencies between $X$ and $Y$ using classification and regression trees. As inferring the optimal model is NP-hard, we propose Crack, a fast greedy algorithm to determine the most likely causal direction directly from the data. Empirical evaluation on a wide range of data shows that Crack reliably, and with high accuracy, infers the correct causal direction on both univariate and multivariate cause-effect pairs over both single and mixed-type data.
Tasks Causal Inference
Published 2017-02-21
URL http://arxiv.org/abs/1702.06385v2
PDF http://arxiv.org/pdf/1702.06385v2.pdf
PWC https://paperswithcode.com/paper/causal-inference-on-multivariate-and-mixed
Repo
Framework

Simple rules for complex decisions

Title Simple rules for complex decisions
Authors Jongbin Jung, Connor Concannon, Ravi Shroff, Sharad Goel, Daniel G. Goldstein
Abstract From doctors diagnosing patients to judges setting bail, experts often base their decisions on experience and intuition rather than on statistical models. While understandable, relying on intuition over models has often been found to result in inferior outcomes. Here we present a new method, select-regress-and-round, for constructing simple rules that perform well for complex decisions. These rules take the form of a weighted checklist, can be applied mentally, and nonetheless rival the performance of modern machine learning algorithms. Our method for creating these rules is itself simple, and can be carried out by practitioners with basic statistics knowledge. We demonstrate this technique with a detailed case study of judicial decisions to release or detain defendants while they await trial. In this application, as in many policy settings, the effects of proposed decision rules cannot be directly observed from historical data: if a rule recommends releasing a defendant that the judge in reality detained, we do not observe what would have happened under the proposed action. We address this key counterfactual estimation problem by drawing on tools from causal inference. We find that simple rules significantly outperform judges and are on par with decisions derived from random forests trained on all available features. Generalizing to 22 varied decision-making domains, we find this basic result replicates. We conclude with an analytical framework that helps explain why these simple decision rules perform as well as they do.
Tasks Causal Inference, Decision Making
Published 2017-02-15
URL http://arxiv.org/abs/1702.04690v3
PDF http://arxiv.org/pdf/1702.04690v3.pdf
PWC https://paperswithcode.com/paper/simple-rules-for-complex-decisions
Repo
Framework
Title On the Equivalence of Holographic and Complex Embeddings for Link Prediction
Authors Katsuhiko Hayashi, Masashi Shimbo
Abstract We show the equivalence of two state-of-the-art link prediction/knowledge graph completion methods: Nickel et al’s holographic embedding and Trouillon et al.‘s complex embedding. We first consider a spectral version of the holographic embedding, exploiting the frequency domain in the Fourier transform for efficient computation. The analysis of the resulting method reveals that it can be viewed as an instance of the complex embedding with certain constraints cast on the initial vectors upon training. Conversely, any complex embedding can be converted to an equivalent holographic embedding.
Tasks Knowledge Graph Completion, Link Prediction
Published 2017-02-18
URL http://arxiv.org/abs/1702.05563v3
PDF http://arxiv.org/pdf/1702.05563v3.pdf
PWC https://paperswithcode.com/paper/on-the-equivalence-of-holographic-and-complex
Repo
Framework

Block-Cyclic Stochastic Coordinate Descent for Deep Neural Networks

Title Block-Cyclic Stochastic Coordinate Descent for Deep Neural Networks
Authors Kensuke Nakamura, Stefano Soatto, Byung-Woo Hong
Abstract We present a stochastic first-order optimization algorithm, named BCSC, that adds a cyclic constraint to stochastic block-coordinate descent. It uses different subsets of the data to update different subsets of the parameters, thus limiting the detrimental effect of outliers in the training set. Empirical tests in benchmark datasets show that our algorithm outperforms state-of-the-art optimization methods in both accuracy as well as convergence speed. The improvements are consistent across different architectures, and can be combined with other training techniques and regularization methods.
Tasks
Published 2017-11-20
URL http://arxiv.org/abs/1711.07190v1
PDF http://arxiv.org/pdf/1711.07190v1.pdf
PWC https://paperswithcode.com/paper/block-cyclic-stochastic-coordinate-descent
Repo
Framework

Zero-Shot Recognition using Dual Visual-Semantic Mapping Paths

Title Zero-Shot Recognition using Dual Visual-Semantic Mapping Paths
Authors Yanan Li, Donghui Wang, Huanhang Hu, Yuetan Lin, Yueting Zhuang
Abstract Zero-shot recognition aims to accurately recognize objects of unseen classes by using a shared visual-semantic mapping between the image feature space and the semantic embedding space. This mapping is learned on training data of seen classes and is expected to have transfer ability to unseen classes. In this paper, we tackle this problem by exploiting the intrinsic relationship between the semantic space manifold and the transfer ability of visual-semantic mapping. We formalize their connection and cast zero-shot recognition as a joint optimization problem. Motivated by this, we propose a novel framework for zero-shot recognition, which contains dual visual-semantic mapping paths. Our analysis shows this framework can not only apply prior semantic knowledge to infer underlying semantic manifold in the image feature space, but also generate optimized semantic embedding space, which can enhance the transfer ability of the visual-semantic mapping to unseen classes. The proposed method is evaluated for zero-shot recognition on four benchmark datasets, achieving outstanding results.
Tasks Zero-Shot Learning
Published 2017-03-15
URL http://arxiv.org/abs/1703.05002v2
PDF http://arxiv.org/pdf/1703.05002v2.pdf
PWC https://paperswithcode.com/paper/zero-shot-recognition-using-dual-visual
Repo
Framework

Spectrum Monitoring for Radar Bands using Deep Convolutional Neural Networks

Title Spectrum Monitoring for Radar Bands using Deep Convolutional Neural Networks
Authors Ahmed Selim, Francisco Paisana, Jerome A. Arokkiam, Yi Zhang, Linda Doyle, Luiz A. DaSilva
Abstract In this paper, we present a spectrum monitoring framework for the detection of radar signals in spectrum sharing scenarios. The core of our framework is a deep convolutional neural network (CNN) model that enables Measurement Capable Devices to identify the presence of radar signals in the radio spectrum, even when these signals are overlapped with other sources of interference, such as commercial LTE and WLAN. We collected a large dataset of RF measurements, which include the transmissions of multiple radar pulse waveforms, downlink LTE, WLAN, and thermal noise. We propose a pre-processing data representation that leverages the amplitude and phase shifts of the collected samples. This representation allows our CNN model to achieve a classification accuracy of 99.6% on our testing dataset. The trained CNN model is then tested under various SNR values, outperforming other models, such as spectrogram-based CNN models.
Tasks
Published 2017-05-01
URL http://arxiv.org/abs/1705.00462v1
PDF http://arxiv.org/pdf/1705.00462v1.pdf
PWC https://paperswithcode.com/paper/spectrum-monitoring-for-radar-bands-using
Repo
Framework

Faster Discovery of Faster System Configurations with Spectral Learning

Title Faster Discovery of Faster System Configurations with Spectral Learning
Authors Vivek Nair, Tim Menzies, Norbert Siegmund, Sven Apel
Abstract Despite the huge spread and economical importance of configurable software systems, there is unsatisfactory support in utilizing the full potential of these systems with respect to finding performance-optimal configurations. Prior work on predicting the performance of software configurations suffered from either (a) requiring far too many sample configurations or (b) large variances in their predictions. Both these problems can be avoided using the WHAT spectral learner. WHAT’s innovation is the use of the spectrum (eigenvalues) of the distance matrix between the configurations of a configurable software system, to perform dimensionality reduction. Within that reduced configuration space, many closely associated configurations can be studied by executing only a few sample configurations. For the subject systems studied here, a few dozen samples yield accurate and stable predictors - less than 10% prediction error, with a standard deviation of less than 2%. When compared to the state of the art, WHAT (a) requires 2 to 10 times fewer samples to achieve similar prediction accuracies, and (b) its predictions are more stable (i.e., have lower standard deviation). Furthermore, we demonstrate that predictive models generated by WHAT can be used by optimizers to discover system configurations that closely approach the optimal performance.
Tasks Dimensionality Reduction
Published 2017-01-27
URL http://arxiv.org/abs/1701.08106v2
PDF http://arxiv.org/pdf/1701.08106v2.pdf
PWC https://paperswithcode.com/paper/faster-discovery-of-faster-system
Repo
Framework

Lifelong Metric Learning

Title Lifelong Metric Learning
Authors Gan Sun, Yang Cong, Ji Liu, Xiaowei Xu
Abstract The state-of-the-art online learning approaches are only capable of learning the metric for predefined tasks. In this paper, we consider lifelong learning problem to mimic “human learning”, i.e., endowing a new capability to the learned metric for a new task from new online samples and incorporating previous experiences and knowledge. Therefore, we propose a new metric learning framework: lifelong metric learning (LML), which only utilizes the data of the new task to train the metric model while preserving the original capabilities. More specifically, the proposed LML maintains a common subspace for all learned metrics, named lifelong dictionary, transfers knowledge from the common subspace to each new metric task with task-specific idiosyncrasy, and redefines the common subspace over time to maximize performance across all metric tasks. For model optimization, we apply online passive aggressive optimization algorithm to solve the proposed LML framework, where the lifelong dictionary and task-specific partition are optimized alternatively and consecutively. Finally, we evaluate our approach by analyzing several multi-task metric learning datasets. Extensive experimental results demonstrate effectiveness and efficiency of the proposed framework.
Tasks Metric Learning
Published 2017-05-03
URL http://arxiv.org/abs/1705.01209v2
PDF http://arxiv.org/pdf/1705.01209v2.pdf
PWC https://paperswithcode.com/paper/lifelong-metric-learning
Repo
Framework

Bayesian Joint Modelling for Object Localisation in Weakly Labelled Images

Title Bayesian Joint Modelling for Object Localisation in Weakly Labelled Images
Authors Zhiyuan Shi, Timothy M. Hospedales, Tao Xiang
Abstract We address the problem of localisation of objects as bounding boxes in images and videos with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. In this paper, a novel framework based on Bayesian joint topic modelling is proposed, which differs significantly from the existing ones in that: (1) All foreground object classes are modelled jointly in a single generative model that encodes multiple object co-existence so that “explaining away” inference can resolve ambiguity and lead to better learning and localisation. (2) Image backgrounds are shared across classes to better learn varying surroundings and “push out” objects of interest. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Moreover, the Bayesian formulation enables the exploitation of various types of prior knowledge to compensate for the limited supervision offered by weakly labelled data, as well as Bayesian domain adaptation for transfer learning. Extensive experiments on the PASCAL VOC, ImageNet and YouTube-Object videos datasets demonstrate the effectiveness of our Bayesian joint model for weakly supervised object localisation.
Tasks Domain Adaptation, Transfer Learning
Published 2017-06-19
URL http://arxiv.org/abs/1706.05952v1
PDF http://arxiv.org/pdf/1706.05952v1.pdf
PWC https://paperswithcode.com/paper/bayesian-joint-modelling-for-object
Repo
Framework

Online Convex Optimization with Unconstrained Domains and Losses

Title Online Convex Optimization with Unconstrained Domains and Losses
Authors Ashok Cutkosky, Kwabena Boahen
Abstract We propose an online convex optimization algorithm (RescaledExp) that achieves optimal regret in the unconstrained setting without prior knowledge of any bounds on the loss functions. We prove a lower bound showing an exponential separation between the regret of existing algorithms that require a known bound on the loss functions and any algorithm that does not require such knowledge. RescaledExp matches this lower bound asymptotically in the number of iterations. RescaledExp is naturally hyperparameter-free and we demonstrate empirically that it matches prior optimization algorithms that require hyperparameter optimization.
Tasks Hyperparameter Optimization
Published 2017-03-07
URL http://arxiv.org/abs/1703.02622v1
PDF http://arxiv.org/pdf/1703.02622v1.pdf
PWC https://paperswithcode.com/paper/online-convex-optimization-with-unconstrained
Repo
Framework

On the use of bootstrap with variational inference: Theory, interpretation, and a two-sample test example

Title On the use of bootstrap with variational inference: Theory, interpretation, and a two-sample test example
Authors Yen-Chi Chen, Y. Samuel Wang, Elena A. Erosheva
Abstract Variational inference is a general approach for approximating complex density functions, such as those arising in latent variable models, popular in machine learning. It has been applied to approximate the maximum likelihood estimator and to carry out Bayesian inference, however, quantification of uncertainty with variational inference remains challenging from both theoretical and practical perspectives. This paper is concerned with developing uncertainty measures for variational inference by using bootstrap procedures. We first develop two general bootstrap approaches for assessing the uncertainty of a variational estimate and the study the underlying bootstrap theory in both fixed- and increasing-dimension settings. We then use the bootstrap approach and our theoretical results in the context of mixed membership modeling with multivariate binary data on functional disability from the National Long Term Care Survey. We carry out a two-sample approach to test for changes in the repeated measures of functional disability for the subset of individuals present in 1989 and 1994 waves.
Tasks Bayesian Inference, Latent Variable Models
Published 2017-11-29
URL http://arxiv.org/abs/1711.11057v2
PDF http://arxiv.org/pdf/1711.11057v2.pdf
PWC https://paperswithcode.com/paper/on-the-use-of-bootstrap-with-variational
Repo
Framework

Using Context Events in Neural Network Models for Event Temporal Status Identification

Title Using Context Events in Neural Network Models for Event Temporal Status Identification
Authors Zeyu Dai, Wenlin Yao, Ruihong Huang
Abstract Focusing on the task of identifying event temporal status, we find that events directly or indirectly governing the target event in a dependency tree are most important contexts. Therefore, we extract dependency chains containing context events and use them as input in neural network models, which consistently outperform previous models using local context words as input. Visualization verifies that the dependency chain representation can effectively capture the context events which are closely related to the target event and play key roles in predicting event temporal status.
Tasks
Published 2017-10-12
URL http://arxiv.org/abs/1710.04344v1
PDF http://arxiv.org/pdf/1710.04344v1.pdf
PWC https://paperswithcode.com/paper/using-context-events-in-neural-network-models
Repo
Framework

Deep learning-based assessment of tumor-associated stroma for diagnosing breast cancer in histopathology images

Title Deep learning-based assessment of tumor-associated stroma for diagnosing breast cancer in histopathology images
Authors Babak Ehteshami Bejnordi, Jimmy Linz, Ben Glass, Maeve Mullooly, Gretchen L Gierach, Mark E Sherman, Nico Karssemeijer, Jeroen van der Laak, Andrew H Beck
Abstract Diagnosis of breast carcinomas has so far been limited to the morphological interpretation of epithelial cells and the assessment of epithelial tissue architecture. Consequently, most of the automated systems have focused on characterizing the epithelial regions of the breast to detect cancer. In this paper, we propose a system for classification of hematoxylin and eosin (H&E) stained breast specimens based on convolutional neural networks that primarily targets the assessment of tumor-associated stroma to diagnose breast cancer patients. We evaluate the performance of our proposed system using a large cohort containing 646 breast tissue biopsies. Our evaluations show that the proposed system achieves an area under ROC of 0.92, demonstrating the discriminative power of previously neglected tumor-associated stroma as a diagnostic biomarker.
Tasks
Published 2017-02-19
URL http://arxiv.org/abs/1702.05803v1
PDF http://arxiv.org/pdf/1702.05803v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-based-assessment-of-tumor
Repo
Framework
comments powered by Disqus