Paper Group ANR 330
Majorization Minimization Methods to Distributed Pose Graph Optimization with Convergence Guarantees. A Kernel to Exploit Informative Missingness in Multivariate Time Series from EHRs. Missing Data Imputation for Classification Problems. Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models. Dynami …
Majorization Minimization Methods to Distributed Pose Graph Optimization with Convergence Guarantees
Title | Majorization Minimization Methods to Distributed Pose Graph Optimization with Convergence Guarantees |
Authors | Taosha Fan, Todd Murphey |
Abstract | In this paper, we consider the problem of distributed pose graph optimization (PGO) that has extensive applications in multi-robot simultaneous localization and mapping (SLAM). We propose majorization minimization methods to distributed PGO and show that our proposed methods are guaranteed to converge to first-order critical points under mild conditions. Furthermore, since our proposed methods rely a proximal operator of distributed PGO, the convergence rate can be significantly accelerated with Nesterov’s method, and more importantly, the acceleration induces no compromise of theoretical guarantees. In addition, we also present accelerated majorization minimization methods to the distributed chordal initialization that have a quadratic convergence, which can be used to compute an initial guess for distributed PGO. The efficacy of this work is validated through applications on a number of 2D and 3D SLAM datasets and comparisons with existing state-of-the-art methods, which indicates that our proposed methods have faster convergence and result in better solutions to distributed PGO. |
Tasks | Simultaneous Localization and Mapping |
Published | 2020-03-11 |
URL | https://arxiv.org/abs/2003.05353v1 |
https://arxiv.org/pdf/2003.05353v1.pdf | |
PWC | https://paperswithcode.com/paper/majorization-minimization-methods-to |
Repo | |
Framework | |
A Kernel to Exploit Informative Missingness in Multivariate Time Series from EHRs
Title | A Kernel to Exploit Informative Missingness in Multivariate Time Series from EHRs |
Authors | Karl Øyvind Mikalsen, Cristina Soguero-Ruiz, Robert Jenssen |
Abstract | A large fraction of the electronic health records (EHRs) consists of clinical measurements collected over time, such as lab tests and vital signs, which provide important information about a patient’s health status. These sequences of clinical measurements are naturally represented as time series, characterized by multiple variables and large amounts of missing data, which complicate the analysis. In this work, we propose a novel kernel which is capable of exploiting both the information from the observed values as well the information hidden in the missing patterns in multivariate time series (MTS) originating e.g. from EHRs. The kernel, called TCK$_{IM}$, is designed using an ensemble learning strategy in which the base models are novel mixed mode Bayesian mixture models which can effectively exploit informative missingness without having to resort to imputation methods. Moreover, the ensemble approach ensures robustness to hyperparameters and therefore TCK$_{IM}$ is particularly well suited if there is a lack of labels - a known challenge in medical applications. Experiments on three real-world clinical datasets demonstrate the effectiveness of the proposed kernel. |
Tasks | Imputation, Time Series |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2002.12359v1 |
https://arxiv.org/pdf/2002.12359v1.pdf | |
PWC | https://paperswithcode.com/paper/a-kernel-to-exploit-informative-missingness |
Repo | |
Framework | |
Missing Data Imputation for Classification Problems
Title | Missing Data Imputation for Classification Problems |
Authors | Arkopal Choudhury, Michael R. Kosorok |
Abstract | Imputation of missing data is a common application in various classification problems where the feature training matrix has missingness. A widely used solution to this imputation problem is based on the lazy learning technique, $k$-nearest neighbor (kNN) approach. However, most of the previous work on missing data does not take into account the presence of the class label in the classification problem. Also, existing kNN imputation methods use variants of Minkowski distance as a measure of distance, which does not work well with heterogeneous data. In this paper, we propose a novel iterative kNN imputation technique based on class weighted grey distance between the missing datum and all the training data. Grey distance works well in heterogeneous data with missing instances. The distance is weighted by Mutual Information (MI) which is a measure of feature relevance between the features and the class label. This ensures that the imputation of the training data is directed towards improving classification performance. This class weighted grey kNN imputation algorithm demonstrates improved performance when compared to other kNN imputation algorithms, as well as standard imputation algorithms such as MICE and missForest, in imputation and classification problems. These problems are based on simulated scenarios and UCI datasets with various rates of missingness. |
Tasks | Imputation |
Published | 2020-02-25 |
URL | https://arxiv.org/abs/2002.10709v1 |
https://arxiv.org/pdf/2002.10709v1.pdf | |
PWC | https://paperswithcode.com/paper/missing-data-imputation-for-classification |
Repo | |
Framework | |
Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models
Title | Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models |
Authors | Wangchunshu Zhou, Ke Xu |
Abstract | Automated evaluation of open domain natural language generation (NLG) models remains a challenge and widely used metrics such as BLEU and Perplexity can be misleading in some cases. In our paper, we propose to evaluate natural language generation models by learning to compare a pair of generated sentences by fine-tuning BERT, which has been shown to have good natural language understanding ability. We also propose to evaluate the model-level quality of NLG models with sample-level comparison results with skill rating system. While able to be trained in a fully self-supervised fashion, our model can be further fine-tuned with a little amount of human preference annotation to better imitate human judgment. In addition to evaluating trained models, we propose to apply our model as a performance indicator during training for better hyperparameter tuning and early-stopping. We evaluate our approach on both story generation and chit-chat dialogue response generation. Experimental results show that our model correlates better with human preference compared with previous automated evaluation approaches. Training with the proposed metric yields better performance in human evaluation, which further demonstrates the effectiveness of the proposed model. |
Tasks | Text Generation |
Published | 2020-02-12 |
URL | https://arxiv.org/abs/2002.05058v1 |
https://arxiv.org/pdf/2002.05058v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-compare-for-better-training-and |
Repo | |
Framework | |
Dynamic Energy Dispatch in Isolated Microgrids Based on Deep Reinforcement Learning
Title | Dynamic Energy Dispatch in Isolated Microgrids Based on Deep Reinforcement Learning |
Authors | Lei Lei, Yue Tan, Glenn Dahlenburg, Wei Xiang, Kan Zheng |
Abstract | This paper focuses on deep reinforcement learning (DRL)-based energy dispatch for isolated microgrids (MGs) with diesel generators (DGs), photovoltaic (PV) panels, and a battery. A finite-horizon Partial Observable Markov Decision Process (POMDP) model is formulated and solved by learning from historical data to capture the uncertainty in future electricity consumption and renewable power generation. In order to deal with the instability problem of DRL algorithms and unique characteristics of finite-horizon models, two novel DRL algorithms, namely, FH-DDPG and FH-RDPG, are proposed to derive energy dispatch policies with and without fully observable state information. A case study using real isolated microgrid data is performed, where the performance of the proposed algorithms are compared with the myopic algorithm as well as other baseline DRL algorithms. Moreover, the impact of uncertainties on MG performance is decoupled into two levels and evaluated respectively. |
Tasks | |
Published | 2020-02-07 |
URL | https://arxiv.org/abs/2002.02581v1 |
https://arxiv.org/pdf/2002.02581v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-energy-dispatch-in-isolated |
Repo | |
Framework | |
An Overview of Perception and Decision-Making in Autonomous Systems in the Era of Learning
Title | An Overview of Perception and Decision-Making in Autonomous Systems in the Era of Learning |
Authors | Yang Tang, Chaoqiang Zhao, Jianrui Wang, Chongzhen Zhang, Qiyu Sun, Weixing Zheng, Wenli Du, Feng Qian, Juergen Kurths |
Abstract | Autonomous systems possess the features of inferring their own ego-motion, autonomously understanding their surroundings, and planning trajectories. With the applications of deep learning and reinforcement learning, the perception and decision-making abilities of autonomous systems are being efficiently addressed, and many new learning-based algorithms have surfaced with respect to autonomous perception and decision-making. In this review, we focus on the applications of learning-based approaches in perception and decision-making in autonomous systems, which is different from previous reviews that discussed traditional methods. First, we delineate the existing classical simultaneous localization and mapping (SLAM) solutions and review the environmental perception and understanding methods based on deep learning, including deep learning-based monocular depth estimation, ego-motion prediction, image enhancement, object detection, semantic segmentation, and their combinations with traditional SLAM frameworks. Second, we briefly summarize the existing motion planning techniques, such as path planning and trajectory planning methods, and discuss the navigation methods based on reinforcement learning. Finally, we examine the several challenges and promising directions discussed and concluded in related research for future works in the era of computer science, automatic control, and robotics. |
Tasks | Decision Making, Depth Estimation, Image Enhancement, Monocular Depth Estimation, Motion Planning, motion prediction, Object Detection, Semantic Segmentation, Simultaneous Localization and Mapping |
Published | 2020-01-08 |
URL | https://arxiv.org/abs/2001.02319v3 |
https://arxiv.org/pdf/2001.02319v3.pdf | |
PWC | https://paperswithcode.com/paper/perception-and-decision-making-of-autonomous |
Repo | |
Framework | |
Metafeatures-based Rule-Extraction for Classifiers on Behavioral and Textual Data
Title | Metafeatures-based Rule-Extraction for Classifiers on Behavioral and Textual Data |
Authors | Yanou Ramon, David Martens, Theodoros Evgeniou, Stiene Praet |
Abstract | Machine learning using behavioral and text data can result in highly accurate prediction models, but these are often very difficult to interpret. Linear models require investigating thousands of coefficients, while the opaqueness of nonlinear models makes things even worse. Rule-extraction techniques have been proposed to combine the desired predictive behaviour of complex “black-box” models with explainability. However, rule-extraction in the context of ultra-high-dimensional and sparse data can be challenging, and has thus far received scant attention. Because of the sparsity and massive dimensionality, rule-extraction might fail in their primary explainability goal as the black-box model may need to be replaced by many rules, leaving the user again with an incomprehensible model. To address this problem, we develop and test a rule-extraction methodology based on higher-level, less-sparse “metafeatures”. We empirically validate the quality of the rules in terms of fidelity, explanation stability and accuracy over a collection of data sets, and benchmark their performance against rules extracted using the original features. Our analysis points to key trade-offs between explainability, fidelity, accuracy, and stability that Machine Learning researchers and practitioners need to consider. Results indicate that the proposed metafeatures approach leads to better trade-offs between these, and is better able to mimic the black-box model. There is an average decrease of the loss in fidelity, accuracy, and stability from using metafeatures instead of the original fine-grained features by respectively 18.08%, 20.15% and 17.73%, all statistically significant at a 5% significance level. Metafeatures thus improve a key “cost of explainability”, which we define as the loss in fidelity when replacing a black-box with an explainable model. |
Tasks | |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.04792v1 |
https://arxiv.org/pdf/2003.04792v1.pdf | |
PWC | https://paperswithcode.com/paper/metafeatures-based-rule-extraction-for |
Repo | |
Framework | |
Quantum Private Information Retrieval from Coded and Colluding Servers
Title | Quantum Private Information Retrieval from Coded and Colluding Servers |
Authors | Matteo Allaix, Lukas Holzbaur, Tefjol Pllaha, Camilla Hollanti |
Abstract | In the classical private information retrieval (PIR) setup, a user wants to retrieve a file from a database or a distributed storage system (DSS) without revealing the file identity to the servers holding the data. In the quantum PIR (QPIR) setting, a user privately retrieves a classical file by downloading quantum systems from the servers. The QPIR problem has been treated by Song et al. in the case of replicated servers, both without collusion and with all but one servers colluding. In this paper, the QPIR setting is extended to account for MDS coded servers. The proposed protocol works for any $[n,k]$-MDS code and $t$-collusion with $t=n-k$. Similarly to the previous cases, the rates achieved are better than those known or conjectured in the classical counterparts. It is also demonstrated how the retrieval rates can be significantly improved by using locally repairable codes (LRCs) consisting of disjoint repair groups, each of which is an MDS code. Finally, numerical results based on an implementation on an IBM quantum computer are presented, showing that even the current stability of quantum computers gives satisfactory results. |
Tasks | Information Retrieval |
Published | 2020-01-16 |
URL | https://arxiv.org/abs/2001.05883v2 |
https://arxiv.org/pdf/2001.05883v2.pdf | |
PWC | https://paperswithcode.com/paper/quantum-private-information-retrieval-from |
Repo | |
Framework | |
Reinforcement Learning through Active Inference
Title | Reinforcement Learning through Active Inference |
Authors | Alexander Tschantz, Beren Millidge, Anil K. Seth, Christopher L. Buckley |
Abstract | The central tenet of reinforcement learning (RL) is that agents seek to maximize the sum of cumulative rewards. In contrast, active inference, an emerging framework within cognitive and computational neuroscience, proposes that agents act to maximize the evidence for a biased generative model. Here, we illustrate how ideas from active inference can augment traditional RL approaches by (i) furnishing an inherent balance of exploration and exploitation, and (ii) providing a more flexible conceptualization of reward. Inspired by active inference, we develop and implement a novel objective for decision making, which we term the free energy of the expected future. We demonstrate that the resulting algorithm successfully balances exploration and exploitation, simultaneously achieving robust performance on several challenging RL benchmarks with sparse, well-shaped, and no rewards. |
Tasks | Decision Making |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2002.12636v1 |
https://arxiv.org/pdf/2002.12636v1.pdf | |
PWC | https://paperswithcode.com/paper/reinforcement-learning-through-active |
Repo | |
Framework | |
Pacemaker: Intermediate Teacher Knowledge Distillation For On-The-Fly Convolutional Neural Network
Title | Pacemaker: Intermediate Teacher Knowledge Distillation For On-The-Fly Convolutional Neural Network |
Authors | Wonchul Son, Youngbin Kim, Wonseok Song, Youngsu Moon, Wonjun Hwang |
Abstract | There is a need for an on-the-fly computational process with very low performance system such as system-on-chip (SoC) and embedded device etc. This paper presents pacemaker knowledge distillation as intermediate ensemble teacher to use convolutional neural network in these systems. For on-the-fly system, we consider student model using 1xN shape on-the-fly filter and teacher model using normal NxN shape filter. We note three points about training student model, caused by applying on-the-fly filter. First, same depth but unavoidable thin model compression. Second, the large capacity gap and parameter size gap due to only the horizontal field must be selected not the vertical receptive. Third, the performance instability and degradation of direct distilling. To solve these problems, we propose intermediate teacher, named pacemaker, for an on-the-fly student. So, student can be trained from pacemaker and original teacher step by step. Experiments prove our proposed method make significant performance (accuracy) improvements: on CIFAR100, 5.39% increased in WRN-40-4 than conventional knowledge distillation which shows even low performance than baseline. And we solve train instability, occurred when conventional knowledge distillation was applied without proposed method, by reducing deviation range by applying proposed method pacemaker knowledge distillation. |
Tasks | Model Compression |
Published | 2020-03-09 |
URL | https://arxiv.org/abs/2003.03944v1 |
https://arxiv.org/pdf/2003.03944v1.pdf | |
PWC | https://paperswithcode.com/paper/pacemaker-intermediate-teacher-knowledge |
Repo | |
Framework | |
Amortised Learning by Wake-Sleep
Title | Amortised Learning by Wake-Sleep |
Authors | Li K. Wenliang, Theodore Moskovitz, Heishiro Kanagawa, Maneesh Sahani |
Abstract | Models that employ latent variables to capture structure in observed data lie at the heart of many current unsupervised learning algorithms, but exact maximum-likelihood learning for powerful and flexible latent-variable models is almost always intractable. Thus, state-of-the-art approaches either abandon the maximum-likelihood framework entirely, or else rely on a variety of variational approximations to the posterior distribution over the latents. Here, we propose an alternative approach that we call amortised learning. Rather than computing an approximation to the posterior over latents, we use a wake-sleep Monte-Carlo strategy to learn a function that directly estimates the maximum-likelihood parameter updates. Amortised learning is possible whenever samples of latents and observations can be simulated from the generative model, treating the model as a “black box”. We demonstrate its effectiveness on a wide range of complex models, including those with latents that are discrete or supported on non-Euclidean spaces. |
Tasks | Latent Variable Models |
Published | 2020-02-22 |
URL | https://arxiv.org/abs/2002.09737v1 |
https://arxiv.org/pdf/2002.09737v1.pdf | |
PWC | https://paperswithcode.com/paper/amortised-learning-by-wake-sleep |
Repo | |
Framework | |
An Efficient Method of Training Small Models for Regression Problems with Knowledge Distillation
Title | An Efficient Method of Training Small Models for Regression Problems with Knowledge Distillation |
Authors | Makoto Takamoto, Yusuke Morishita, Hitoshi Imaoka |
Abstract | Compressing deep neural network (DNN) models becomes a very important and necessary technique for real-world applications, such as deploying those models on mobile devices. Knowledge distillation is one of the most popular methods for model compression, and many studies have been made on developing this technique. However, those studies mainly focused on classification problems, and very few attempts have been made on regression problems, although there are many application of DNNs on regression problems. In this paper, we propose a new formalism of knowledge distillation for regression problems. First, we propose a new loss function, teacher outlier rejection loss, which rejects outliers in training samples using teacher model predictions. Second, we consider a multi-task network with two outputs: one estimates training labels which is in general contaminated by noisy labels; And the other estimates teacher model’s output which is expected to modify the noise labels following the memorization effects. By considering the multi-task network, training of the feature extraction of student models becomes more effective, and it allows us to obtain a better student model than one trained from scratch. We performed comprehensive evaluation with one simple toy model: sinusoidal function, and two open datasets: MPIIGaze, and Multi-PIE. Our results show consistent improvement in accuracy regardless of the annotation error level in the datasets. |
Tasks | Model Compression |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2002.12597v1 |
https://arxiv.org/pdf/2002.12597v1.pdf | |
PWC | https://paperswithcode.com/paper/an-efficient-method-of-training-small-models |
Repo | |
Framework | |
Gradual Channel Pruning while Training using Feature Relevance Scores for Convolutional Neural Networks
Title | Gradual Channel Pruning while Training using Feature Relevance Scores for Convolutional Neural Networks |
Authors | Sai Aparna Aketi, Sourjya Roy, Anand Raghunathan, Kaushik Roy |
Abstract | The enormous inference cost of deep neural networks can be scaled down by network compression. Pruning is one of the predominant approaches used for deep network compression. However, existing pruning techniques have one or more of the following limitations: 1) Additional energy cost on top of the compute heavy training stage due to pruning and fine-tuning stages, 2) Layer-wise pruning based on the statistics of a particular, ignoring the effect of error propagation in the network, 3) Lack of an efficient estimate for determining the important channels globally, 4) Unstructured pruning requires specialized hardware for effective use. To address all the above issues, we present a simple-yet-effective gradual channel pruning while training methodology using a novel data driven metric referred as Feature relevance score. The proposed technique gets rid of the additional retraining cycles by pruning least important channels in a structured fashion at fixed intervals during the actual training phase. Feature relevance scores help in efficiently evaluating the contribution of each channel towards the discriminative power of the network. We demonstrate the effectiveness of the proposed methodology on architectures such as VGG and ResNet using datasets such as CIFAR-10, CIFAR-100 and ImageNet, and successfully achieve significant model compression while trading off less than $1%$ accuracy. Notably on CIFAR-10 dataset trained on ResNet-110, our approach achieves $2.4\times$ compression and a $56%$ reduction in FLOPs with an accuracy drop of $0.01%$ compared to the unpruned network. |
Tasks | Model Compression |
Published | 2020-02-23 |
URL | https://arxiv.org/abs/2002.09958v1 |
https://arxiv.org/pdf/2002.09958v1.pdf | |
PWC | https://paperswithcode.com/paper/gradual-channel-pruning-while-training-using |
Repo | |
Framework | |
Counterfactual fairness: removing direct effects through regularization
Title | Counterfactual fairness: removing direct effects through regularization |
Authors | Pietro G. Di Stefano, James M. Hickey, Vlasios Vasileiou |
Abstract | Building machine learning models that are fair with respect to an unprivileged group is a topical problem. Modern fairness-aware algorithms often ignore causal effects and enforce fairness through modifications applicable to only a subset of machine learning models. In this work, we propose a new definition of fairness that incorporates causality through the Controlled Direct Effect (CDE). We develop regularizations to tackle classical fairness measures and present a causal regularization that satisfies our new fairness definition by removing the impact of unprivileged group variables on the model outcomes as measured by the CDE. These regularizations are applicable to any model trained using by iteratively minimizing a loss through differentiation. We demonstrate our approaches using both gradient boosting and logistic regression on: a synthetic dataset, the UCI Adult (Census) Dataset, and a real-world credit-risk dataset. Our results were found to mitigate unfairness from the predictions with small reductions in model performance. |
Tasks | |
Published | 2020-02-25 |
URL | https://arxiv.org/abs/2002.10774v2 |
https://arxiv.org/pdf/2002.10774v2.pdf | |
PWC | https://paperswithcode.com/paper/counterfactual-fairness-removing-direct |
Repo | |
Framework | |
Neural Pose Transfer by Spatially Adaptive Instance Normalization
Title | Neural Pose Transfer by Spatially Adaptive Instance Normalization |
Authors | Jiashun Wang, Chao Wen, Yanwei Fu, Haitao Lin, Tianyun Zou, Xiangyang Xue, Yinda Zhang |
Abstract | Pose transfer has been studied for decades, in which the pose of a source mesh is applied to a target mesh. Particularly in this paper, we are interested in transferring the pose of source human mesh to deform the target human mesh, while the source and target meshes may have different identity information. Traditional studies assume that the paired source and target meshes are existed with the point-wise correspondences of user annotated landmarks/mesh points, which requires heavy labelling efforts. On the other hand, the generalization ability of deep models is limited, when the source and target meshes have different identities. To break this limitation, we proposes the first neural pose transfer model that solves the pose transfer via the latest technique for image style transfer, leveraging the newly proposed component – spatially adaptive instance normalization. Our model does not require any correspondences between the source and target meshes. Extensive experiments show that the proposed model can effectively transfer deformation from source to target meshes, and has good generalization ability to deal with unseen identities or poses of meshes. Code is available at https://github.com/jiashunwang/Neural-Pose-Transfer . |
Tasks | Pose Transfer, Style Transfer |
Published | 2020-03-16 |
URL | https://arxiv.org/abs/2003.07254v1 |
https://arxiv.org/pdf/2003.07254v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-pose-transfer-by-spatially-adaptive |
Repo | |
Framework | |