February 1, 2020

3064 words 15 mins read

Paper Group AWR 92

Paper Group AWR 92

Unsupervised Annotation of Phenotypic Abnormalities via Semantic Latent Representations on Electronic Health Records. Curriculum semi-supervised segmentation. Agnostic Federated Learning. Deep Multi-View Learning via Task-Optimal CCA. Utterance-to-Utterance Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots. …

Unsupervised Annotation of Phenotypic Abnormalities via Semantic Latent Representations on Electronic Health Records

Title Unsupervised Annotation of Phenotypic Abnormalities via Semantic Latent Representations on Electronic Health Records
Authors Jingqing Zhang, Xiaoyu Zhang, Kai Sun, Xian Yang, Chengliang Dai, Yike Guo
Abstract The extraction of phenotype information which is naturally contained in electronic health records (EHRs) has been found to be useful in various clinical informatics applications such as disease diagnosis. However, due to imprecise descriptions, lack of gold standards and the demand for efficiency, annotating phenotypic abnormalities on millions of EHR narratives is still challenging. In this work, we propose a novel unsupervised deep learning framework to annotate the phenotypic abnormalities from EHRs via semantic latent representations. The proposed framework takes the advantage of Human Phenotype Ontology (HPO), which is a knowledge base of phenotypic abnormalities, to standardize the annotation results. Experiments have been conducted on 52,722 EHRs from MIMIC-III dataset. Quantitative and qualitative analysis have shown the proposed framework achieves state-of-the-art annotation performance and computational efficiency compared with other methods.
Tasks
Published 2019-11-10
URL https://arxiv.org/abs/1911.03862v1
PDF https://arxiv.org/pdf/1911.03862v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-annotation-of-phenotypic
Repo https://github.com/JingqingZ/Semantic-HPO
Framework pytorch

Curriculum semi-supervised segmentation

Title Curriculum semi-supervised segmentation
Authors Hoel Kervadec, Jose Dolz, Eric Granger, Ismail Ben Ayed
Abstract This study investigates a curriculum-style strategy for semi-supervised CNN segmentation, which devises a regression network to learn image-level information such as the size of a target region. These regressions are used to effectively regularize the segmentation network, constraining softmax predictions of the unlabeled images to match the inferred label distributions. Our framework is based on inequality constraints that tolerate uncertainties with inferred knowledge, e.g., regressed region size, and can be employed for a large variety of region attributes. We evaluated our proposed strategy for left ventricle segmentation in magnetic resonance images (MRI), and compared it to standard proposal-based semi-supervision strategies. Our strategy leverages unlabeled data in more efficiently, and achieves very competitive results, approaching the performance of full-supervision.
Tasks Semantic Segmentation, Semi-Supervised Semantic Segmentation
Published 2019-04-10
URL https://arxiv.org/abs/1904.05236v2
PDF https://arxiv.org/pdf/1904.05236v2.pdf
PWC https://paperswithcode.com/paper/curriculum-semi-supervised-segmentation
Repo https://github.com/LIVIAETS/semi_curriculum
Framework pytorch

Agnostic Federated Learning

Title Agnostic Federated Learning
Authors Mehryar Mohri, Gary Sivek, Ananda Theertha Suresh
Abstract A key learning scenario in large-scale applications is that of federated learning, where a centralized model is trained based on data originating from a large number of clients. We argue that, with the existing training and inference, federated models can be biased towards different clients. Instead, we propose a new framework of agnostic federated learning, where the centralized model is optimized for any target distribution formed by a mixture of the client distributions. We further show that this framework naturally yields a notion of fairness. We present data-dependent Rademacher complexity guarantees for learning with this objective, which guide the definition of an algorithm for agnostic federated learning. We also give a fast stochastic optimization algorithm for solving the corresponding optimization problem, for which we prove convergence bounds, assuming a convex loss function and hypothesis set. We further empirically demonstrate the benefits of our approach in several datasets. Beyond federated learning, our framework and algorithm can be of interest to other learning scenarios such as cloud computing, domain adaptation, drifting, and other contexts where the training and test distributions do not coincide.
Tasks Domain Adaptation, Stochastic Optimization
Published 2019-02-01
URL http://arxiv.org/abs/1902.00146v1
PDF http://arxiv.org/pdf/1902.00146v1.pdf
PWC https://paperswithcode.com/paper/agnostic-federated-learning
Repo https://github.com/litian96/fair_flearn
Framework none

Deep Multi-View Learning via Task-Optimal CCA

Title Deep Multi-View Learning via Task-Optimal CCA
Authors Heather D. Couture, Roland Kwitt, J. S. Marron, Melissa Troester, Charles M. Perou, Marc Niethammer
Abstract Canonical Correlation Analysis (CCA) is widely used for multimodal data analysis and, more recently, for discriminative tasks such as multi-view learning; however, it makes no use of class labels. Recent CCA methods have started to address this weakness but are limited in that they do not simultaneously optimize the CCA projection for discrimination and the CCA projection itself, or they are linear only. We address these deficiencies by simultaneously optimizing a CCA-based and a task objective in an end-to-end manner. Together, these two objectives learn a non-linear CCA projection to a shared latent space that is highly correlated and discriminative. Our method shows a significant improvement over previous state-of-the-art (including deep supervised approaches) for cross-view classification, regularization with a second view, and semi-supervised learning on real data.
Tasks MULTI-VIEW LEARNING
Published 2019-07-17
URL https://arxiv.org/abs/1907.07739v1
PDF https://arxiv.org/pdf/1907.07739v1.pdf
PWC https://paperswithcode.com/paper/deep-multi-view-learning-via-task-optimal-cca
Repo https://github.com/hdcouture/TOCCA
Framework tf

Utterance-to-Utterance Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots

Title Utterance-to-Utterance Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots
Authors Jia-Chen Gu, Zhen-Hua Ling, Quan Liu
Abstract This paper proposes an utterance-to-utterance interactive matching network (U2U-IMN) for multi-turn response selection in retrieval-based chatbots. Different from previous methods following context-to-response matching or utterance-to-response matching frameworks, this model treats both contexts and responses as sequences of utterances when calculating the matching degrees between them. For a context-response pair, the U2U-IMN model first encodes each utterance separately using recurrent and self-attention layers. Then, a global and bidirectional interaction between the context and the response is conducted using the attention mechanism to collect the matching information between them. The distances between context and response utterances are employed as a prior component when calculating the attention weights. Finally, sentence-level aggregation and context-response-level aggregation are executed in turn to obtain the feature vector for matching degree prediction. Experiments on four public datasets showed that our proposed method outperformed baseline methods on all metrics, achieving a new state-of-the-art performance and demonstrating compatibility across domains for multi-turn response selection.
Tasks Conversational Response Selection
Published 2019-11-16
URL https://arxiv.org/abs/1911.06940v1
PDF https://arxiv.org/pdf/1911.06940v1.pdf
PWC https://paperswithcode.com/paper/utterance-to-utterance-interactive-matching
Repo https://github.com/JasonForJoy/U2U-IMN
Framework tf

Deep Gaussian Processes with Importance-Weighted Variational Inference

Title Deep Gaussian Processes with Importance-Weighted Variational Inference
Authors Hugh Salimbeni, Vincent Dutordoir, James Hensman, Marc Peter Deisenroth
Abstract Deep Gaussian processes (DGPs) can model complex marginal densities as well as complex mappings. Non-Gaussian marginals are essential for modelling real-world data, and can be generated from the DGP by incorporating uncorrelated variables to the model. Previous work on DGP models has introduced noise additively and used variational inference with a combination of sparse Gaussian processes and mean-field Gaussians for the approximate posterior. Additive noise attenuates the signal, and the Gaussian form of variational distribution may lead to an inaccurate posterior. We instead incorporate noisy variables as latent covariates, and propose a novel importance-weighted objective, which leverages analytic results and provides a mechanism to trade off computation for improved accuracy. Our results demonstrate that the importance-weighted objective works well in practice and consistently outperforms classical variational inference, especially for deeper models.
Tasks Gaussian Processes
Published 2019-05-14
URL https://arxiv.org/abs/1905.05435v1
PDF https://arxiv.org/pdf/1905.05435v1.pdf
PWC https://paperswithcode.com/paper/deep-gaussian-processes-with-importance
Repo https://github.com/hughsalimbeni/DGPs_with_IWVI
Framework tf

Deep Neural Architecture Search with Deep Graph Bayesian Optimization

Title Deep Neural Architecture Search with Deep Graph Bayesian Optimization
Authors Lizheng Ma, Jiaxu Cui, Bo Yang
Abstract Bayesian optimization (BO) is an effective method of finding the global optima of black-box functions. Recently BO has been applied to neural architecture search and shows better performance than pure evolutionary strategies. All these methods adopt Gaussian processes (GPs) as surrogate function, with the handcraft similarity metrics as input. In this work, we propose a Bayesian graph neural network as a new surrogate, which can automatically extract features from deep neural architectures, and use such learned features to fit and characterize black-box objectives and their uncertainty. Based on the new surrogate, we then develop a graph Bayesian optimization framework to address the challenging task of deep neural architecture search. Experiment results show our method significantly outperforms the comparative methods on benchmark tasks.
Tasks Gaussian Processes, Neural Architecture Search
Published 2019-05-14
URL https://arxiv.org/abs/1905.06159v1
PDF https://arxiv.org/pdf/1905.06159v1.pdf
PWC https://paperswithcode.com/paper/deep-neural-architecture-search-with-deep
Repo https://github.com/0h-n0/tfdbonas
Framework tf

ARAML: A Stable Adversarial Training Framework for Text Generation

Title ARAML: A Stable Adversarial Training Framework for Text Generation
Authors Pei Ke, Fei Huang, Minlie Huang, Xiaoyan Zhu
Abstract Most of the existing generative adversarial networks (GAN) for text generation suffer from the instability of reinforcement learning training algorithms such as policy gradient, leading to unstable performance. To tackle this problem, we propose a novel framework called Adversarial Reward Augmented Maximum Likelihood (ARAML). During adversarial training, the discriminator assigns rewards to samples which are acquired from a stationary distribution near the data rather than the generator’s distribution. The generator is optimized with maximum likelihood estimation augmented by the discriminator’s rewards instead of policy gradient. Experiments show that our model can outperform state-of-the-art text GANs with a more stable training process.
Tasks Text Generation
Published 2019-08-20
URL https://arxiv.org/abs/1908.07195v1
PDF https://arxiv.org/pdf/1908.07195v1.pdf
PWC https://paperswithcode.com/paper/araml-a-stable-adversarial-training-framework
Repo https://github.com/kepei1106/ARAML
Framework tf

On Exact Computation with an Infinitely Wide Neural Net

Title On Exact Computation with an Infinitely Wide Neural Net
Authors Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang
Abstract How well does a classic deep net architecture like AlexNet or VGG19 classify on a standard dataset such as CIFAR-10 when its width — namely, number of channels in convolutional layers, and number of nodes in fully-connected internal layers — is allowed to increase to infinity? Such questions have come to the forefront in the quest to theoretically understand deep learning and its mysteries about optimization and generalization. They also connect deep learning to notions such as Gaussian processes and kernels. A recent paper [Jacot et al., 2018] introduced the Neural Tangent Kernel (NTK) which captures the behavior of fully-connected deep nets in the infinite width limit trained by gradient descent; this object was implicit in some other recent papers. An attraction of such ideas is that a pure kernel-based method is used to capture the power of a fully-trained deep net of infinite width. The current paper gives the first efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which we call Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm. This results in a significant new benchmark for the performance of a pure kernel-based method on CIFAR-10, being $10%$ higher than the methods reported in [Novak et al., 2019], and only $6%$ lower than the performance of the corresponding finite deep net architecture (once batch normalization, etc. are turned off). Theoretically, we also give the first non-asymptotic proof showing that a fully-trained sufficiently wide net is indeed equivalent to the kernel regression predictor using NTK.
Tasks Gaussian Processes
Published 2019-04-26
URL https://arxiv.org/abs/1904.11955v2
PDF https://arxiv.org/pdf/1904.11955v2.pdf
PWC https://paperswithcode.com/paper/on-exact-computation-with-an-infinitely-wide
Repo https://github.com/ruosongwang/CNTK
Framework none

Non-linear aggregation of filters to improve image denoising

Title Non-linear aggregation of filters to improve image denoising
Authors Benjamin Guedj, Juliette Rengot
Abstract We introduce a novel aggregation method to efficiently perform image denoising. Preliminary filters are aggregated in a non-linear fashion, using a new metric of pixel proximity based on how the pool of filters reaches a consensus. We provide a theoretical bound to support our aggregation scheme, its numerical performance is illustrated and we show that the aggregate significantly outperforms each of the preliminary filters.
Tasks Denoising, Image Denoising
Published 2019-04-01
URL https://arxiv.org/abs/1904.00865v2
PDF https://arxiv.org/pdf/1904.00865v2.pdf
PWC https://paperswithcode.com/paper/non-linear-aggregation-of-filters-to-improve
Repo https://github.com/rengotj/cobra_denoising
Framework none

Color Image and Multispectral Image Denoising Using Block Diagonal Representation

Title Color Image and Multispectral Image Denoising Using Block Diagonal Representation
Authors Zhaoming Kong, Xiaowei Yang
Abstract Filtering images of more than one channel is challenging in terms of both efficiency and effectiveness. By grouping similar patches to utilize the self-similarity and sparse linear approximation of natural images, recent nonlocal and transform-domain methods have been widely used in color and multispectral image (MSI) denoising. Many related methods focus on the modeling of group level correlation to enhance sparsity, which often resorts to a recursive strategy with a large number of similar patches. The importance of the patch level representation is understated. In this paper, we mainly investigate the influence and potential of representation at patch level by considering a general formulation with block diagonal matrix. We further show that by training a proper global patch basis, along with a local principal component analysis transform in the grouping dimension, a simple transform-threshold-inverse method could produce very competitive results. Fast implementation is also developed to reduce computational complexity. Extensive experiments on both simulated and real datasets demonstrate its robustness, effectiveness and efficiency.
Tasks Denoising, Image Denoising
Published 2019-02-11
URL http://arxiv.org/abs/1902.03954v1
PDF http://arxiv.org/pdf/1902.03954v1.pdf
PWC https://paperswithcode.com/paper/color-image-and-multispectral-image-denoising
Repo https://github.com/ZhaomingKong/color_image_denoising
Framework none

3d-SMRnet: Achieving a new quality of MPI system matrix recovery by deep learning

Title 3d-SMRnet: Achieving a new quality of MPI system matrix recovery by deep learning
Authors Ivo Matteo Baltruschat, Patryk Szwargulski, Florian Griese, Mirco Grosser, René Werner, Tobias Knopp
Abstract Magnetic particle imaging (MPI) data is commonly reconstructed using a system matrix acquired in a time-consuming calibration measurement. The calibration approach has the important advantage over model-based reconstruction that it takes the complex particle physics as well as system imperfections into account. This benefit comes for the cost that the system matrix needs to be re-calibrated whenever the scan parameters, particle types or even the particle environment (e.g. viscosity or temperature) changes. One route for reducing the calibration time is the sampling of the system matrix at a subset of the spatial positions of the intended field-of-view and employing system matrix recovery. Recent approaches used compressed sensing (CS) and achieved subsampling factors up to 28 that still allowed reconstructing MPI images of sufficient quality. In this work, we propose a novel framework with a 3d-System Matrix Recovery Network and demonstrate it to recover a 3d system matrix with a subsampling factor of 64 in less than one minute and to outperform CS in terms of system matrix quality, reconstructed image quality, and processing time. The advantage of our method is demonstrated by reconstructing open access MPI datasets. The model is further shown to be capable of inferring system matrices for different particle types.
Tasks Calibration
Published 2019-05-08
URL https://arxiv.org/abs/1905.03026v1
PDF https://arxiv.org/pdf/1905.03026v1.pdf
PWC https://paperswithcode.com/paper/3d-smrnet-achieving-a-new-quality-of-mpi
Repo https://github.com/Ivo-B/3dSMRnet
Framework pytorch

diffGrad: An Optimization Method for Convolutional Neural Networks

Title diffGrad: An Optimization Method for Convolutional Neural Networks
Authors Shiv Ram Dubey, Soumendu Chakraborty, Swalpa Kumar Roy, Snehasis Mukherjee, Satish Kumar Singh, Bidyut Baran Chaudhuri
Abstract Stochastic Gradient Decent (SGD) is one of the core techniques behind the success of deep neural networks. The gradient provides information on the direction in which a function has the steepest rate of change. The main problem with basic SGD is to change by equal sized steps for all parameters, irrespective of gradient behavior. Hence, an efficient way of deep network optimization is to make adaptive step sizes for each parameter. Recently, several attempts have been made to improve gradient descent methods such as AdaGrad, AdaDelta, RMSProp and Adam. These methods rely on the square roots of exponential moving averages of squared past gradients. Thus, these methods do not take advantage of local change in gradients. In this paper, a novel optimizer is proposed based on the difference between the present and the immediate past gradient (i.e., diffGrad). In the proposed diffGrad optimization technique, the step size is adjusted for each parameter in such a way that it should have a larger step size for faster gradient changing parameters and a lower step size for lower gradient changing parameters. The convergence analysis is done using the regret bound approach of online learning framework. Rigorous analysis is made in this paper over three synthetic complex non-convex functions. The image categorization experiments are also conducted over the CIFAR10 and CIFAR100 datasets to observe the performance of diffGrad with respect to the state-of-the-art optimizers such as SGDM, AdaGrad, AdaDelta, RMSProp, AMSGrad, and Adam. The residual unit (ResNet) based Convolutional Neural Networks (CNN) architecture is used in the experiments. The experiments show that diffGrad outperforms other optimizers. Also, we show that diffGrad performs uniformly well for training CNN using different activation functions. The source code is made publicly available at https://github.com/shivram1987/diffGrad.
Tasks Image Categorization
Published 2019-09-12
URL https://arxiv.org/abs/1909.11015v3
PDF https://arxiv.org/pdf/1909.11015v3.pdf
PWC https://paperswithcode.com/paper/diffgrad-an-optimization-method-for
Repo https://github.com/jettify/pytorch-optimizer
Framework pytorch

An Adaptive and Momental Bound Method for Stochastic Learning

Title An Adaptive and Momental Bound Method for Stochastic Learning
Authors Jianbang Ding, Xuancheng Ren, Ruixuan Luo, Xu Sun
Abstract Training deep neural networks requires intricate initialization and careful selection of learning rates. The emergence of stochastic gradient optimization methods that use adaptive learning rates based on squared past gradients, e.g., AdaGrad, AdaDelta, and Adam, eases the job slightly. However, such methods have also been proven problematic in recent studies with their own pitfalls including non-convergence issues and so on. Alternative variants have been proposed for enhancement, such as AMSGrad, AdaShift and AdaBound. In this work, we identify a new problem of adaptive learning rate methods that exhibits at the beginning of learning where Adam produces extremely large learning rates that inhibit the start of learning. We propose the Adaptive and Momental Bound (AdaMod) method to restrict the adaptive learning rates with adaptive and momental upper bounds. The dynamic learning rate bounds are based on the exponential moving averages of the adaptive learning rates themselves, which smooth out unexpected large learning rates and stabilize the training of deep neural networks. Our experiments verify that AdaMod eliminates the extremely large learning rates throughout the training and brings significant improvements especially on complex networks such as DenseNet and Transformer, compared to Adam. Our implementation is available at: https://github.com/lancopku/AdaMod
Tasks Stochastic Optimization
Published 2019-10-27
URL https://arxiv.org/abs/1910.12249v1
PDF https://arxiv.org/pdf/1910.12249v1.pdf
PWC https://paperswithcode.com/paper/an-adaptive-and-momental-bound-method-for
Repo https://github.com/lancopku/AdaMod
Framework pytorch

Learning to Learn Words from Visual Scenes

Title Learning to Learn Words from Visual Scenes
Authors Dídac Surís, Dave Epstein, Heng Ji, Shih-Fu Chang, Carl Vondrick
Abstract Language acquisition is the process of learning words from the surrounding scene. We introduce a meta-learning framework that learns how to learn word representations from unconstrained scenes. We leverage the natural compositional structure of language to create training episodes that cause a meta-learner to learn strong policies for language acquisition. Experiments on two datasets show that our approach is able to more rapidly acquire novel words as well as more robustly generalize to unseen compositions, significantly outperforming established baselines. A key advantage of our approach is that it is data efficient, allowing representations to be learned from scratch without language pre-training. Visualizations and analysis suggest visual information helps our approach learn a rich cross-modal representation from minimal examples. Project webpage is available at https://expert.cs.columbia.edu/
Tasks Language Acquisition, Language Modelling, Meta-Learning
Published 2019-11-25
URL https://arxiv.org/abs/1911.11237v2
PDF https://arxiv.org/pdf/1911.11237v2.pdf
PWC https://paperswithcode.com/paper/learning-to-learn-words-from-narrated-video
Repo https://github.com/cvlab-columbia/expert
Framework pytorch
comments powered by Disqus