April 3, 2020

3339 words 16 mins read

Paper Group ANR 5

Paper Group ANR 5

Be Like Water: Robustness to Extraneous Variables Via Adaptive Feature Normalization. BB_Evac: Fast Location-Sensitive Behavior-Based Building Evacuation. AE-OT-GAN: Training GANs from data specific latent distribution. Optimal Confidence Regions for the Multinomial Parameter. Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domai …

Be Like Water: Robustness to Extraneous Variables Via Adaptive Feature Normalization

Title Be Like Water: Robustness to Extraneous Variables Via Adaptive Feature Normalization
Authors Aakash Kaku, Sreyas Mohan, Avinash Parnandi, Heidi Schambra, Carlos Fernandez-Granda
Abstract Extraneous variables are variables that are irrelevant for a certain task, but heavily affect the distribution of the available data. In this work, we show that the presence of such variables can degrade the performance of deep-learning models. We study three datasets where there is a strong influence of known extraneous variables: classification of upper-body movements in stroke patients, annotation of surgical activities, and recognition of corrupted images. Models trained with batch normalization learn features that are highly dependent on the extraneous variables. In batch normalization, the statistics used to normalize the features are learned from the training set and fixed at test time, which produces a mismatch in the presence of varying extraneous variables. We demonstrate that estimating the feature statistics adaptively during inference, as in instance normalization, addresses this issue, producing normalized features that are more robust to changes in the extraneous variables. This results in a significant gain in performance for different network architectures and choices of feature statistics.
Published 2020-02-10
URL https://arxiv.org/abs/2002.04019v2
PDF https://arxiv.org/pdf/2002.04019v2.pdf
PWC https://paperswithcode.com/paper/be-like-water-robustness-to-extraneous

BB_Evac: Fast Location-Sensitive Behavior-Based Building Evacuation

Title BB_Evac: Fast Location-Sensitive Behavior-Based Building Evacuation
Authors Subhra Mazumdar, Arindam Pal, Francesco Parisi, V. S. Subrahmanian
Abstract Past work on evacuation planning assumes that evacuees will follow instructions – however, there is ample evidence that this is not the case. While some people will follow instructions, others will follow their own desires. In this paper, we present a formal definition of a behavior-based evacuation problem (BBEP) in which a human behavior model is taken into account when planning an evacuation. We show that a specific form of constraints can be used to express such behaviors. We show that BBEPs can be solved exactly via an integer program called BB_IP, and inexactly by a much faster algorithm that we call BB_Evac. We conducted a detailed experimental evaluation of both algorithms applied to buildings (though in principle the algorithms can be applied to any graphs) and show that the latter is an order of magnitude faster than BB_IP while producing results that are almost as good on one real-world building graph and as well as on several synthetically generated graphs.
Published 2020-02-19
URL https://arxiv.org/abs/2002.08114v1
PDF https://arxiv.org/pdf/2002.08114v1.pdf
PWC https://paperswithcode.com/paper/bb_evac-fast-location-sensitive-behavior

AE-OT-GAN: Training GANs from data specific latent distribution

Title AE-OT-GAN: Training GANs from data specific latent distribution
Authors Dongsheng An, Yang Guo, Min Zhang, Xin Qi, Na Lei, Shing-Tung Yau, Xianfeng Gu
Abstract Though generative adversarial networks (GANs) areprominent models to generate realistic and crisp images,they often encounter the mode collapse problems and arehard to train, which comes from approximating the intrinsicdiscontinuous distribution transform map with continuousDNNs. The recently proposed AE-OT model addresses thisproblem by explicitly computing the discontinuous distribu-tion transform map through solving a semi-discrete optimaltransport (OT) map in the latent space of the autoencoder.However the generated images are blurry. In this paper, wepropose the AE-OT-GAN model to utilize the advantages ofthe both models: generate high quality images and at thesame time overcome the mode collapse/mixture problems.Specifically, we first faithfully embed the low dimensionalimage manifold into the latent space by training an autoen-coder (AE). Then we compute the optimal transport (OT)map that pushes forward the uniform distribution to the la-tent distribution supported on the latent manifold. Finally,our GAN model is trained to generate high quality imagesfrom the latent distribution, the distribution transform mapfrom which to the empirical data distribution will be con-tinuous. The paired data between the latent code and thereal images gives us further constriction about the generator.Experiments on simple MNIST dataset and complex datasetslike Cifar-10 and CelebA show the efficacy and efficiency ofour proposed method.
Published 2020-01-11
URL https://arxiv.org/abs/2001.03698v2
PDF https://arxiv.org/pdf/2001.03698v2.pdf
PWC https://paperswithcode.com/paper/ae-ot-gan-training-gans-from-data-specific

Optimal Confidence Regions for the Multinomial Parameter

Title Optimal Confidence Regions for the Multinomial Parameter
Authors Matthew L. Malloy, Ardhendu Tripathy, Robert D. Nowak
Abstract Construction of tight confidence regions and intervals is central to statistical inference and decision-making. Consider an empirical distribution $\widehat{\boldsymbol{p}}$ generated from $n$ iid realizations of a random variable that takes one of $k$ possible values according to an unknown distribution $\boldsymbol{p}$. This is analogous with a single draw from a multinomial distribution. A confidence region is a subset of the probability simplex that depends on $\widehat{\boldsymbol{p}}$ and contains the unknown $\boldsymbol{p}$ with a specified confidence. This paper shows how one can construct minimum average volume confidence regions, answering a long standing question. We also show the optimality of the regions directly translates to optimal confidence intervals of functionals, such as the mean, variance and median.
Tasks Decision Making
Published 2020-02-03
URL https://arxiv.org/abs/2002.01044v1
PDF https://arxiv.org/pdf/2002.01044v1.pdf
PWC https://paperswithcode.com/paper/optimal-confidence-regions-for-the

Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation

Title Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation
Authors Yunhan Zhao, Shu Kong, Daeyun Shin, Charless Fowlkes
Abstract Leveraging synthetically rendered data offers great potential to improve monocular depth estimation, but closing the synthetic-real domain gap is a non-trivial and important task. While much recent work has focused on unsupervised domain adaptation, we consider a more realistic scenario where a large amount of synthetic training data is supplemented by a small set of real images with ground-truth. In this setting we find that existing domain translation approaches are difficult to train and offer little advantage over simple baselines that use a mix of real and synthetic data. A key failure mode is that real-world images contain novel objects and clutter not present in synthetic training. This high-level domain shift isn’t handled by existing image translation models. Based on these observations, we develop an attentional module that learns to identify and remove (hard) out-of-domain regions in real images in order to improve depth prediction for a model trained primarily on synthetic data. We carry out extensive experiments to validate our attend-remove-complete approach (ARC) and find that it significantly outperforms state-of-the-art domain adaptation methods for depth prediction. Visualizing the removed regions provides interpretable insights into the synthetic-real domain gap.
Tasks Depth Estimation, Domain Adaptation, Monocular Depth Estimation, Unsupervised Domain Adaptation
Published 2020-02-27
URL https://arxiv.org/abs/2002.12114v1
PDF https://arxiv.org/pdf/2002.12114v1.pdf
PWC https://paperswithcode.com/paper/domain-decluttering-simplifying-images-to

Adaptive machine learning strategies for network calibration of IoT smart air quality monitoring devices

Title Adaptive machine learning strategies for network calibration of IoT smart air quality monitoring devices
Authors Saverio De Vito, Girolamo Di Francia, Elena Esposito, Sergio Ferlito, Fabrizio Formisano, Ettore Massera
Abstract Air Quality Multi-sensors Systems (AQMS) are IoT devices based on low cost chemical microsensors array that recently have showed capable to provide relatively accurate air pollutant quantitative estimations. Their availability permits to deploy pervasive Air Quality Monitoring (AQM) networks that will solve the geographical sparseness issue that affect the current network of AQ Regulatory Monitoring Systems (AQRMS). Unfortunately their accuracy have shown limited in long term field deployments due to negative influence of several technological issues including sensors poisoning or ageing, non target gas interference, lack of fabrication repeatability, etc. Seasonal changes in probability distribution of priors, observables and hidden context variables (i.e. non observable interferents) challenge field data driven calibration models which short to mid term performances recently rose to the attention of Urban authorithies and monitoring agencies. In this work, we address this non stationary framework with adaptive learning strategies in order to prolong the validity of multisensors calibration models enabling continuous learning. Relevant parameters influence in different network and note-to-node recalibration scenario is analyzed. Results are hence useful for pervasive deployment aimed to permanent high resolution AQ mapping in urban scenarios as well as for the use of AQMS as AQRMS backup systems providing data when AQRMS data are unavailable due to faults or scheduled mainteinance.
Tasks Calibration
Published 2020-03-24
URL https://arxiv.org/abs/2003.12011v1
PDF https://arxiv.org/pdf/2003.12011v1.pdf
PWC https://paperswithcode.com/paper/adaptive-machine-learning-strategies-for

Exploiting Language Instructions for Interpretable and Compositional Reinforcement Learning

Title Exploiting Language Instructions for Interpretable and Compositional Reinforcement Learning
Authors Michiel van der Meer, Matteo Pirotta, Elia Bruni
Abstract In this work, we present an alternative approach to making an agent compositional through the use of a diagnostic classifier. Because of the need for explainable agents in automated decision processes, we attempt to interpret the latent space from an RL agent to identify its current objective in a complex language instruction. Results show that the classification process causes changes in the hidden states which makes them more easily interpretable, but also causes a shift in zero-shot performance to novel instructions. Lastly, we limit the supervisory signal on the classification, and observe a similar but less notable effect.
Published 2020-01-13
URL https://arxiv.org/abs/2001.04418v1
PDF https://arxiv.org/pdf/2001.04418v1.pdf
PWC https://paperswithcode.com/paper/exploiting-language-instructions-for

Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation

Title Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation
Authors Jian Liang, Dapeng Hu, Jiashi Feng
Abstract Unsupervised domain adaptation (UDA) aims to leverage the knowledge learned from a labeled source dataset to solve similar tasks in a new unlabeled domain. Prior UDA methods typically require to access the source data when learning to adapt the model, making them risky and inefficient for decentralized private data. In this work we tackle a novel setting where only a trained source model is available and investigate how we can effectively utilize such a model without source data to solve UDA problems. To this end, we propose a simple yet generic representation learning framework, named \emph{Source HypOthesis Transfer} (SHOT). Specifically, SHOT freezes the classifier module (hypothesis) of the source model and learns the target-specific feature extraction module by exploiting both information maximization and self-supervised pseudo-labeling to implicitly align representations from the target domains to the source hypothesis. In this way, the learned target model can directly predict the labels of target data. We further investigate several techniques to refine the network architecture to parameterize the source model for better transfer performance. To verify its versatility, we evaluate SHOT in a variety of adaptation cases including closed-set, partial-set, and open-set domain adaptation. Experiments indicate that SHOT yields state-of-the-art results among multiple domain adaptation benchmarks.
Tasks Domain Adaptation, Representation Learning, Unsupervised Domain Adaptation
Published 2020-02-20
URL https://arxiv.org/abs/2002.08546v1
PDF https://arxiv.org/pdf/2002.08546v1.pdf
PWC https://paperswithcode.com/paper/do-we-really-need-to-access-the-source-data

Unsupervised Domain Adaptive Object Detection using Forward-Backward Cyclic Adaptation

Title Unsupervised Domain Adaptive Object Detection using Forward-Backward Cyclic Adaptation
Authors Siqi Yang, Lin Wu, Arnold Wiliem, Brian C. Lovell
Abstract We present a novel approach to perform the unsupervised domain adaptation for object detection through forward-backward cyclic (FBC) training. Recent adversarial training based domain adaptation methods have shown their effectiveness on minimizing domain discrepancy via marginal feature distributions alignment. However, aligning the marginal feature distributions does not guarantee the alignment of class conditional distributions. This limitation is more evident when adapting object detectors as the domain discrepancy is larger compared to the image classification task, e.g. various number of objects exist in one image and the majority of content in an image is the background. This motivates us to learn domain invariance for category level semantics via gradient alignment. Intuitively, if the gradients of two domains point in similar directions, then the learning of one domain can improve that of another domain. To achieve gradient alignment, we propose Forward-Backward Cyclic Adaptation, which iteratively computes adaptation from source to target via backward hopping and from target to source via forward passing. In addition, we align low-level features for adapting holistic color/texture via adversarial training. However, the detector performs well on both domains is not ideal for target domain. As such, in each cycle, domain diversity is enforced by maximum entropy regularization on the source domain to penalize confident source-specific learning and minimum entropy regularization on target domain to intrigue target-specific learning. Theoretical analysis of the training process is provided, and extensive experiments on challenging cross-domain object detection datasets have shown the superiority of our approach over the state-of-the-art.
Tasks Domain Adaptation, Image Classification, Object Detection, Unsupervised Domain Adaptation
Published 2020-02-03
URL https://arxiv.org/abs/2002.00575v1
PDF https://arxiv.org/pdf/2002.00575v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-domain-adaptive-object-detection

NukeBERT: A Pre-trained language model for Low Resource Nuclear Domain

Title NukeBERT: A Pre-trained language model for Low Resource Nuclear Domain
Authors Ayush Jain, Meenachi Ganesamoorty
Abstract Significant advances have been made in recent years on Natural Language Processing with machines surpassing human performance in many tasks, including but not limited to Question Answering. The majority of deep learning methods for Question Answering targets domains with large datasets and highly matured literature. The area of Nuclear and Atomic energy has largely remained unexplored in exploiting non-annotated data for driving industry viable applications. Due to lack of dataset, a new dataset was created from the 7000 research papers on nuclear domain. This paper contributes to research in understanding nuclear domain knowledge which is then evaluated on Nuclear Question Answering Dataset (NQuAD) created by nuclear domain experts as part of this research. NQuAD contains 612 questions developed on 181 paragraphs randomly selected from the IGCAR research paper corpus. In this paper, the Nuclear Bidirectional Encoder Representational Transformers (NukeBERT) is proposed, which incorporates a novel technique for building BERT vocabulary to make it suitable for tasks with less training data. The experiments evaluated on NQuAD revealed that NukeBERT was able to outperform BERT significantly, thus validating the adopted methodology. Training NukeBERT is computationally expensive and hence we will be open-sourcing the NukeBERT pretrained weights and NQuAD for fostering further research work in the nuclear domain.
Tasks Language Modelling, Question Answering
Published 2020-03-30
URL https://arxiv.org/abs/2003.13821v1
PDF https://arxiv.org/pdf/2003.13821v1.pdf
PWC https://paperswithcode.com/paper/nukebert-a-pre-trained-language-model-for-low

P $\approx$ NP, at least in Visual Question Answering

Title P $\approx$ NP, at least in Visual Question Answering
Authors Shailza Jolly, Sebastian Palacio, Joachim Folz, Federico Raue, Joern Hees, Andreas Dengel
Abstract In recent years, progress in the Visual Question Answering (VQA) field has largely been driven by public challenges and large datasets. One of the most widely-used of these is the VQA 2.0 dataset, consisting of polar (“yes/no”) and non-polar questions. Looking at the question distribution over all answers, we find that the answers “yes” and “no” account for 38 % of the questions, while the remaining 62% are spread over the more than 3000 remaining answers. While several sources of biases have already been investigated in the field, the effects of such an over-representation of polar vs. non-polar questions remain unclear. In this paper, we measure the potential confounding factors when polar and non-polar samples are used jointly to train a baseline VQA classifier, and compare it to an upper bound where the over-representation of polar questions is excluded from the training. Further, we perform cross-over experiments to analyze how well the feature spaces align. Contrary to expectations, we find no evidence of counterproductive effects in the joint training of unbalanced classes. In fact, by exploring the intermediate feature space of visual-text embeddings, we find that the feature space of polar questions already encodes sufficient structure to answer many non-polar questions. Our results indicate that the polar (P) and the non-polar (NP) feature spaces are strongly aligned, hence the expression P $\approx$ NP
Tasks Question Answering, Visual Question Answering
Published 2020-03-26
URL https://arxiv.org/abs/2003.11844v2
PDF https://arxiv.org/pdf/2003.11844v2.pdf
PWC https://paperswithcode.com/paper/p-approx-np-at-least-in-visual-question

Integrating Physiological Time Series and Clinical Notes with Deep Learning for Improved ICU Mortality Prediction

Title Integrating Physiological Time Series and Clinical Notes with Deep Learning for Improved ICU Mortality Prediction
Authors Satya Narayan Shukla, Benjamin M. Marlin
Abstract Intensive Care Unit Electronic Health Records (ICU EHRs) store multimodal data about patients including clinical notes, sparse and irregularly sampled physiological time series, lab results, and more. To date, most methods designed to learn predictive models from ICU EHR data have focused on a single modality. In this paper, we leverage the recently proposed interpolation-prediction deep learning architecture(Shukla and Marlin 2019) as a basis for exploring how physiological time series data and clinical notes can be integrated into a unified mortality prediction model. We study both early and late fusion approaches and demonstrate how the relative predictive value of clinical text and physiological data change over time. Our results show that a late fusion approach can provide a statistically significant improvement in mortality prediction performance over using individual modalities in isolation.
Tasks Mortality Prediction, Time Series
Published 2020-03-24
URL https://arxiv.org/abs/2003.11059v1
PDF https://arxiv.org/pdf/2003.11059v1.pdf
PWC https://paperswithcode.com/paper/integrating-physiological-time-series-and

Testing Goodness of Fit of Conditional Density Models with Kernels

Title Testing Goodness of Fit of Conditional Density Models with Kernels
Authors Wittawat Jitkrittum, Heishiro Kanagawa, Bernhard Schölkopf
Abstract We propose two nonparametric statistical tests of goodness of fit for conditional distributions: given a conditional probability density function $p(yx)$ and a joint sample, decide whether the sample is drawn from $p(yx)r_x(x)$ for some density $r_x$. Our tests, formulated with a Stein operator, can be applied to any differentiable conditional density model, and require no knowledge of the normalizing constant. We show that 1) our tests are consistent against any fixed alternative conditional model; 2) the statistics can be estimated easily, requiring no density estimation as an intermediate step; and 3) our second test offers an interpretable test result providing insight on where the conditional model does not fit well in the domain of the covariate. We demonstrate the interpretability of our test on a task of modeling the distribution of New York City’s taxi drop-off location given a pick-up point. To our knowledge, our work is the first to propose such conditional goodness-of-fit tests that simultaneously have all these desirable properties.
Tasks Density Estimation
Published 2020-02-24
URL https://arxiv.org/abs/2002.10271v1
PDF https://arxiv.org/pdf/2002.10271v1.pdf
PWC https://paperswithcode.com/paper/testing-goodness-of-fit-of-conditional

Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows

Title Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows
Authors Ruizhi Deng, Bo Chang, Marcus A. Brubaker, Greg Mori, Andreas Lehrmann
Abstract Normalizing flows transform a simple base distribution into a complex target distribution and have proved to be powerful models for data generation and density estimation. In this work, we propose a novel type of normalizing flow driven by a differential deformation of the continuous-time Wiener process. As a result, we obtain a rich time series model whose observable process inherits many of the appealing properties of its base process, such as efficient computation of likelihoods and marginals. Furthermore, our continuous treatment provides a natural framework for irregular time series with an independent arrival process, including straightforward interpolation. We illustrate the desirable properties of the proposed model on popular stochastic processes and demonstrate its superior flexibility to variational RNN and latent ODE baselines in a series of experiments on synthetic and real-world data.
Tasks Density Estimation, Irregular Time Series, Time Series
Published 2020-02-24
URL https://arxiv.org/abs/2002.10516v1
PDF https://arxiv.org/pdf/2002.10516v1.pdf
PWC https://paperswithcode.com/paper/modeling-continuous-stochastic-processes-with

Kernel interpolation with continuous volume sampling

Title Kernel interpolation with continuous volume sampling
Authors Ayoub Belhadji, Rémi Bardenet, Pierre Chainais
Abstract A fundamental task in kernel methods is to pick nodes and weights, so as to approximate a given function from an RKHS by the weighted sum of kernel translates located at the nodes. This is the crux of kernel density estimation, kernel quadrature, or interpolation from discrete samples. Furthermore, RKHSs offer a convenient mathematical and computational framework. We introduce and analyse continuous volume sampling (VS), the continuous counterpart – for choosing node locations – of a discrete distribution introduced in (Deshpande & Vempala, 2006). Our contribution is theoretical: we prove almost optimal bounds for interpolation and quadrature under VS. While similar bounds already exist for some specific RKHSs using ad-hoc node constructions, VS offers bounds that apply to any Mercer kernel and depend on the spectrum of the associated integration operator. We emphasize that, unlike previous randomized approaches that rely on regularized leverage scores or determinantal point processes, evaluating the pdf of VS only requires pointwise evaluations of the kernel. VS is thus naturally amenable to MCMC samplers.
Tasks Density Estimation, Point Processes
Published 2020-02-22
URL https://arxiv.org/abs/2002.09677v1
PDF https://arxiv.org/pdf/2002.09677v1.pdf
PWC https://paperswithcode.com/paper/kernel-interpolation-with-continuous-volume
comments powered by Disqus