Paper Group ANR 774
Deep Generative Dual Memory Network for Continual Learning. Causal Inference through the Method of Direct Estimation. Causal Inference on Multivariate and Mixed-Type Data. Simple rules for complex decisions. On the Equivalence of Holographic and Complex Embeddings for Link Prediction. Block-Cyclic Stochastic Coordinate Descent for Deep Neural Netwo …
Deep Generative Dual Memory Network for Continual Learning
Title | Deep Generative Dual Memory Network for Continual Learning |
Authors | Nitin Kamra, Umang Gupta, Yan Liu |
Abstract | Despite advances in deep learning, neural networks can only learn multiple tasks when trained on them jointly. When tasks arrive sequentially, they lose performance on previously learnt tasks. This phenomenon called catastrophic forgetting is a fundamental challenge to overcome before neural networks can learn continually from incoming data. In this work, we derive inspiration from human memory to develop an architecture capable of learning continuously from sequentially incoming tasks, while averting catastrophic forgetting. Specifically, our contributions are: (i) a dual memory architecture emulating the complementary learning systems (hippocampus and the neocortex) in the human brain, (ii) memory consolidation via generative replay of past experiences, (iii) demonstrating advantages of generative replay and dual memories via experiments, and (iv) improved performance retention on challenging tasks even for low capacity models. Our architecture displays many characteristics of the mammalian memory and provides insights on the connection between sleep and learning. |
Tasks | Continual Learning |
Published | 2017-10-28 |
URL | http://arxiv.org/abs/1710.10368v2 |
http://arxiv.org/pdf/1710.10368v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-generative-dual-memory-network-for |
Repo | |
Framework | |
Causal Inference through the Method of Direct Estimation
Title | Causal Inference through the Method of Direct Estimation |
Authors | Marc Ratkovic, Dustin Tingley |
Abstract | The intersection of causal inference and machine learning is a rapidly advancing field. We propose a new approach, the method of direct estimation, that draws on both traditions in order to obtain nonparametric estimates of treatment effects. The approach focuses on estimating the effect of fluctuations in a treatment variable on an outcome. A tensor-spline implementation enables rich interactions between functional bases allowing for the approach to capture treatment/covariate interactions. We show how new innovations in Bayesian sparse modeling readily handle the proposed framework, and then document its performance in simulation and applied examples. Furthermore we show how the method of direct estimation can easily extend to structural estimators commonly used in a variety of disciplines, like instrumental variables, mediation analysis, and sequential g-estimation. |
Tasks | Causal Inference |
Published | 2017-03-16 |
URL | http://arxiv.org/abs/1703.05849v2 |
http://arxiv.org/pdf/1703.05849v2.pdf | |
PWC | https://paperswithcode.com/paper/causal-inference-through-the-method-of-direct |
Repo | |
Framework | |
Causal Inference on Multivariate and Mixed-Type Data
Title | Causal Inference on Multivariate and Mixed-Type Data |
Authors | Alexander Marx, Jilles Vreeken |
Abstract | Given data over the joint distribution of two random variables $X$ and $Y$, we consider the problem of inferring the most likely causal direction between $X$ and $Y$. In particular, we consider the general case where both $X$ and $Y$ may be univariate or multivariate, and of the same or mixed data types. We take an information theoretic approach, based on Kolmogorov complexity, from which it follows that first describing the data over cause and then that of effect given cause is shorter than the reverse direction. The ideal score is not computable, but can be approximated through the Minimum Description Length (MDL) principle. Based on MDL, we propose two scores, one for when both $X$ and $Y$ are of the same single data type, and one for when they are mixed-type. We model dependencies between $X$ and $Y$ using classification and regression trees. As inferring the optimal model is NP-hard, we propose Crack, a fast greedy algorithm to determine the most likely causal direction directly from the data. Empirical evaluation on a wide range of data shows that Crack reliably, and with high accuracy, infers the correct causal direction on both univariate and multivariate cause-effect pairs over both single and mixed-type data. |
Tasks | Causal Inference |
Published | 2017-02-21 |
URL | http://arxiv.org/abs/1702.06385v2 |
http://arxiv.org/pdf/1702.06385v2.pdf | |
PWC | https://paperswithcode.com/paper/causal-inference-on-multivariate-and-mixed |
Repo | |
Framework | |
Simple rules for complex decisions
Title | Simple rules for complex decisions |
Authors | Jongbin Jung, Connor Concannon, Ravi Shroff, Sharad Goel, Daniel G. Goldstein |
Abstract | From doctors diagnosing patients to judges setting bail, experts often base their decisions on experience and intuition rather than on statistical models. While understandable, relying on intuition over models has often been found to result in inferior outcomes. Here we present a new method, select-regress-and-round, for constructing simple rules that perform well for complex decisions. These rules take the form of a weighted checklist, can be applied mentally, and nonetheless rival the performance of modern machine learning algorithms. Our method for creating these rules is itself simple, and can be carried out by practitioners with basic statistics knowledge. We demonstrate this technique with a detailed case study of judicial decisions to release or detain defendants while they await trial. In this application, as in many policy settings, the effects of proposed decision rules cannot be directly observed from historical data: if a rule recommends releasing a defendant that the judge in reality detained, we do not observe what would have happened under the proposed action. We address this key counterfactual estimation problem by drawing on tools from causal inference. We find that simple rules significantly outperform judges and are on par with decisions derived from random forests trained on all available features. Generalizing to 22 varied decision-making domains, we find this basic result replicates. We conclude with an analytical framework that helps explain why these simple decision rules perform as well as they do. |
Tasks | Causal Inference, Decision Making |
Published | 2017-02-15 |
URL | http://arxiv.org/abs/1702.04690v3 |
http://arxiv.org/pdf/1702.04690v3.pdf | |
PWC | https://paperswithcode.com/paper/simple-rules-for-complex-decisions |
Repo | |
Framework | |
On the Equivalence of Holographic and Complex Embeddings for Link Prediction
Title | On the Equivalence of Holographic and Complex Embeddings for Link Prediction |
Authors | Katsuhiko Hayashi, Masashi Shimbo |
Abstract | We show the equivalence of two state-of-the-art link prediction/knowledge graph completion methods: Nickel et al’s holographic embedding and Trouillon et al.‘s complex embedding. We first consider a spectral version of the holographic embedding, exploiting the frequency domain in the Fourier transform for efficient computation. The analysis of the resulting method reveals that it can be viewed as an instance of the complex embedding with certain constraints cast on the initial vectors upon training. Conversely, any complex embedding can be converted to an equivalent holographic embedding. |
Tasks | Knowledge Graph Completion, Link Prediction |
Published | 2017-02-18 |
URL | http://arxiv.org/abs/1702.05563v3 |
http://arxiv.org/pdf/1702.05563v3.pdf | |
PWC | https://paperswithcode.com/paper/on-the-equivalence-of-holographic-and-complex |
Repo | |
Framework | |
Block-Cyclic Stochastic Coordinate Descent for Deep Neural Networks
Title | Block-Cyclic Stochastic Coordinate Descent for Deep Neural Networks |
Authors | Kensuke Nakamura, Stefano Soatto, Byung-Woo Hong |
Abstract | We present a stochastic first-order optimization algorithm, named BCSC, that adds a cyclic constraint to stochastic block-coordinate descent. It uses different subsets of the data to update different subsets of the parameters, thus limiting the detrimental effect of outliers in the training set. Empirical tests in benchmark datasets show that our algorithm outperforms state-of-the-art optimization methods in both accuracy as well as convergence speed. The improvements are consistent across different architectures, and can be combined with other training techniques and regularization methods. |
Tasks | |
Published | 2017-11-20 |
URL | http://arxiv.org/abs/1711.07190v1 |
http://arxiv.org/pdf/1711.07190v1.pdf | |
PWC | https://paperswithcode.com/paper/block-cyclic-stochastic-coordinate-descent |
Repo | |
Framework | |
Zero-Shot Recognition using Dual Visual-Semantic Mapping Paths
Title | Zero-Shot Recognition using Dual Visual-Semantic Mapping Paths |
Authors | Yanan Li, Donghui Wang, Huanhang Hu, Yuetan Lin, Yueting Zhuang |
Abstract | Zero-shot recognition aims to accurately recognize objects of unseen classes by using a shared visual-semantic mapping between the image feature space and the semantic embedding space. This mapping is learned on training data of seen classes and is expected to have transfer ability to unseen classes. In this paper, we tackle this problem by exploiting the intrinsic relationship between the semantic space manifold and the transfer ability of visual-semantic mapping. We formalize their connection and cast zero-shot recognition as a joint optimization problem. Motivated by this, we propose a novel framework for zero-shot recognition, which contains dual visual-semantic mapping paths. Our analysis shows this framework can not only apply prior semantic knowledge to infer underlying semantic manifold in the image feature space, but also generate optimized semantic embedding space, which can enhance the transfer ability of the visual-semantic mapping to unseen classes. The proposed method is evaluated for zero-shot recognition on four benchmark datasets, achieving outstanding results. |
Tasks | Zero-Shot Learning |
Published | 2017-03-15 |
URL | http://arxiv.org/abs/1703.05002v2 |
http://arxiv.org/pdf/1703.05002v2.pdf | |
PWC | https://paperswithcode.com/paper/zero-shot-recognition-using-dual-visual |
Repo | |
Framework | |
Spectrum Monitoring for Radar Bands using Deep Convolutional Neural Networks
Title | Spectrum Monitoring for Radar Bands using Deep Convolutional Neural Networks |
Authors | Ahmed Selim, Francisco Paisana, Jerome A. Arokkiam, Yi Zhang, Linda Doyle, Luiz A. DaSilva |
Abstract | In this paper, we present a spectrum monitoring framework for the detection of radar signals in spectrum sharing scenarios. The core of our framework is a deep convolutional neural network (CNN) model that enables Measurement Capable Devices to identify the presence of radar signals in the radio spectrum, even when these signals are overlapped with other sources of interference, such as commercial LTE and WLAN. We collected a large dataset of RF measurements, which include the transmissions of multiple radar pulse waveforms, downlink LTE, WLAN, and thermal noise. We propose a pre-processing data representation that leverages the amplitude and phase shifts of the collected samples. This representation allows our CNN model to achieve a classification accuracy of 99.6% on our testing dataset. The trained CNN model is then tested under various SNR values, outperforming other models, such as spectrogram-based CNN models. |
Tasks | |
Published | 2017-05-01 |
URL | http://arxiv.org/abs/1705.00462v1 |
http://arxiv.org/pdf/1705.00462v1.pdf | |
PWC | https://paperswithcode.com/paper/spectrum-monitoring-for-radar-bands-using |
Repo | |
Framework | |
Faster Discovery of Faster System Configurations with Spectral Learning
Title | Faster Discovery of Faster System Configurations with Spectral Learning |
Authors | Vivek Nair, Tim Menzies, Norbert Siegmund, Sven Apel |
Abstract | Despite the huge spread and economical importance of configurable software systems, there is unsatisfactory support in utilizing the full potential of these systems with respect to finding performance-optimal configurations. Prior work on predicting the performance of software configurations suffered from either (a) requiring far too many sample configurations or (b) large variances in their predictions. Both these problems can be avoided using the WHAT spectral learner. WHAT’s innovation is the use of the spectrum (eigenvalues) of the distance matrix between the configurations of a configurable software system, to perform dimensionality reduction. Within that reduced configuration space, many closely associated configurations can be studied by executing only a few sample configurations. For the subject systems studied here, a few dozen samples yield accurate and stable predictors - less than 10% prediction error, with a standard deviation of less than 2%. When compared to the state of the art, WHAT (a) requires 2 to 10 times fewer samples to achieve similar prediction accuracies, and (b) its predictions are more stable (i.e., have lower standard deviation). Furthermore, we demonstrate that predictive models generated by WHAT can be used by optimizers to discover system configurations that closely approach the optimal performance. |
Tasks | Dimensionality Reduction |
Published | 2017-01-27 |
URL | http://arxiv.org/abs/1701.08106v2 |
http://arxiv.org/pdf/1701.08106v2.pdf | |
PWC | https://paperswithcode.com/paper/faster-discovery-of-faster-system |
Repo | |
Framework | |
Lifelong Metric Learning
Title | Lifelong Metric Learning |
Authors | Gan Sun, Yang Cong, Ji Liu, Xiaowei Xu |
Abstract | The state-of-the-art online learning approaches are only capable of learning the metric for predefined tasks. In this paper, we consider lifelong learning problem to mimic “human learning”, i.e., endowing a new capability to the learned metric for a new task from new online samples and incorporating previous experiences and knowledge. Therefore, we propose a new metric learning framework: lifelong metric learning (LML), which only utilizes the data of the new task to train the metric model while preserving the original capabilities. More specifically, the proposed LML maintains a common subspace for all learned metrics, named lifelong dictionary, transfers knowledge from the common subspace to each new metric task with task-specific idiosyncrasy, and redefines the common subspace over time to maximize performance across all metric tasks. For model optimization, we apply online passive aggressive optimization algorithm to solve the proposed LML framework, where the lifelong dictionary and task-specific partition are optimized alternatively and consecutively. Finally, we evaluate our approach by analyzing several multi-task metric learning datasets. Extensive experimental results demonstrate effectiveness and efficiency of the proposed framework. |
Tasks | Metric Learning |
Published | 2017-05-03 |
URL | http://arxiv.org/abs/1705.01209v2 |
http://arxiv.org/pdf/1705.01209v2.pdf | |
PWC | https://paperswithcode.com/paper/lifelong-metric-learning |
Repo | |
Framework | |
Bayesian Joint Modelling for Object Localisation in Weakly Labelled Images
Title | Bayesian Joint Modelling for Object Localisation in Weakly Labelled Images |
Authors | Zhiyuan Shi, Timothy M. Hospedales, Tao Xiang |
Abstract | We address the problem of localisation of objects as bounding boxes in images and videos with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. In this paper, a novel framework based on Bayesian joint topic modelling is proposed, which differs significantly from the existing ones in that: (1) All foreground object classes are modelled jointly in a single generative model that encodes multiple object co-existence so that “explaining away” inference can resolve ambiguity and lead to better learning and localisation. (2) Image backgrounds are shared across classes to better learn varying surroundings and “push out” objects of interest. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Moreover, the Bayesian formulation enables the exploitation of various types of prior knowledge to compensate for the limited supervision offered by weakly labelled data, as well as Bayesian domain adaptation for transfer learning. Extensive experiments on the PASCAL VOC, ImageNet and YouTube-Object videos datasets demonstrate the effectiveness of our Bayesian joint model for weakly supervised object localisation. |
Tasks | Domain Adaptation, Transfer Learning |
Published | 2017-06-19 |
URL | http://arxiv.org/abs/1706.05952v1 |
http://arxiv.org/pdf/1706.05952v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-joint-modelling-for-object |
Repo | |
Framework | |
Online Convex Optimization with Unconstrained Domains and Losses
Title | Online Convex Optimization with Unconstrained Domains and Losses |
Authors | Ashok Cutkosky, Kwabena Boahen |
Abstract | We propose an online convex optimization algorithm (RescaledExp) that achieves optimal regret in the unconstrained setting without prior knowledge of any bounds on the loss functions. We prove a lower bound showing an exponential separation between the regret of existing algorithms that require a known bound on the loss functions and any algorithm that does not require such knowledge. RescaledExp matches this lower bound asymptotically in the number of iterations. RescaledExp is naturally hyperparameter-free and we demonstrate empirically that it matches prior optimization algorithms that require hyperparameter optimization. |
Tasks | Hyperparameter Optimization |
Published | 2017-03-07 |
URL | http://arxiv.org/abs/1703.02622v1 |
http://arxiv.org/pdf/1703.02622v1.pdf | |
PWC | https://paperswithcode.com/paper/online-convex-optimization-with-unconstrained |
Repo | |
Framework | |
On the use of bootstrap with variational inference: Theory, interpretation, and a two-sample test example
Title | On the use of bootstrap with variational inference: Theory, interpretation, and a two-sample test example |
Authors | Yen-Chi Chen, Y. Samuel Wang, Elena A. Erosheva |
Abstract | Variational inference is a general approach for approximating complex density functions, such as those arising in latent variable models, popular in machine learning. It has been applied to approximate the maximum likelihood estimator and to carry out Bayesian inference, however, quantification of uncertainty with variational inference remains challenging from both theoretical and practical perspectives. This paper is concerned with developing uncertainty measures for variational inference by using bootstrap procedures. We first develop two general bootstrap approaches for assessing the uncertainty of a variational estimate and the study the underlying bootstrap theory in both fixed- and increasing-dimension settings. We then use the bootstrap approach and our theoretical results in the context of mixed membership modeling with multivariate binary data on functional disability from the National Long Term Care Survey. We carry out a two-sample approach to test for changes in the repeated measures of functional disability for the subset of individuals present in 1989 and 1994 waves. |
Tasks | Bayesian Inference, Latent Variable Models |
Published | 2017-11-29 |
URL | http://arxiv.org/abs/1711.11057v2 |
http://arxiv.org/pdf/1711.11057v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-use-of-bootstrap-with-variational |
Repo | |
Framework | |
Using Context Events in Neural Network Models for Event Temporal Status Identification
Title | Using Context Events in Neural Network Models for Event Temporal Status Identification |
Authors | Zeyu Dai, Wenlin Yao, Ruihong Huang |
Abstract | Focusing on the task of identifying event temporal status, we find that events directly or indirectly governing the target event in a dependency tree are most important contexts. Therefore, we extract dependency chains containing context events and use them as input in neural network models, which consistently outperform previous models using local context words as input. Visualization verifies that the dependency chain representation can effectively capture the context events which are closely related to the target event and play key roles in predicting event temporal status. |
Tasks | |
Published | 2017-10-12 |
URL | http://arxiv.org/abs/1710.04344v1 |
http://arxiv.org/pdf/1710.04344v1.pdf | |
PWC | https://paperswithcode.com/paper/using-context-events-in-neural-network-models |
Repo | |
Framework | |
Deep learning-based assessment of tumor-associated stroma for diagnosing breast cancer in histopathology images
Title | Deep learning-based assessment of tumor-associated stroma for diagnosing breast cancer in histopathology images |
Authors | Babak Ehteshami Bejnordi, Jimmy Linz, Ben Glass, Maeve Mullooly, Gretchen L Gierach, Mark E Sherman, Nico Karssemeijer, Jeroen van der Laak, Andrew H Beck |
Abstract | Diagnosis of breast carcinomas has so far been limited to the morphological interpretation of epithelial cells and the assessment of epithelial tissue architecture. Consequently, most of the automated systems have focused on characterizing the epithelial regions of the breast to detect cancer. In this paper, we propose a system for classification of hematoxylin and eosin (H&E) stained breast specimens based on convolutional neural networks that primarily targets the assessment of tumor-associated stroma to diagnose breast cancer patients. We evaluate the performance of our proposed system using a large cohort containing 646 breast tissue biopsies. Our evaluations show that the proposed system achieves an area under ROC of 0.92, demonstrating the discriminative power of previously neglected tumor-associated stroma as a diagnostic biomarker. |
Tasks | |
Published | 2017-02-19 |
URL | http://arxiv.org/abs/1702.05803v1 |
http://arxiv.org/pdf/1702.05803v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-based-assessment-of-tumor |
Repo | |
Framework | |