January 27, 2020

3431 words 17 mins read

Paper Group ANR 1150

Pretrained Language Models for Document-Level Neural Machine Translation. Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation. Single Camera Training for Person Re-identification. Image Deconvolution with Deep Image and Kernel Priors. Liver segmentation and metastases detection in MR images using convolutional neural …

Pretrained Language Models for Document-Level Neural Machine Translation


Title	Pretrained Language Models for Document-Level Neural Machine Translation
Authors	Liangyou Li, Xin Jiang, Qun Liu
Abstract	Previous work on document-level NMT usually focuses on limited contexts because of degraded performance on larger contexts. In this paper, we investigate on using large contexts with three main contributions: (1) Different from previous work which pertrained models on large-scale sentence-level parallel corpora, we use pretrained language models, specifically BERT, which are trained on monolingual documents; (2) We propose context manipulation methods to control the influence of large contexts, which lead to comparable results on systems using small and large contexts; (3) We introduce a multi-task training for regularization to avoid models overfitting our training corpora, which further improves our systems together with a deeper encoder. Experiments are conducted on the widely used IWSLT data sets with three language pairs, i.e., Chinese–English, French–English and Spanish–English. Results show that our systems are significantly better than three previously reported document-level systems.
Tasks	Machine Translation
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03110v1
PDF	https://arxiv.org/pdf/1911.03110v1.pdf
PWC	https://paperswithcode.com/paper/pretrained-language-models-for-document-level
Repo
Framework

Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation


Title	Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation
Authors	Colin Wei, Tengyu Ma
Abstract	Existing Rademacher complexity bounds for neural networks rely only on norm control of the weight matrices and depend exponentially on depth via a product of the matrix norms. Lower bounds show that this exponential dependence on depth is unavoidable when no additional properties of the training data are considered. We suspect that this conundrum comes from the fact that these bounds depend on the training data only through the margin. In practice, many data-dependent techniques such as Batchnorm improve the generalization performance. For feedforward neural nets as well as RNNs, we obtain tighter Rademacher complexity bounds by considering additional data-dependent properties of the network: the norms of the hidden layers of the network, and the norms of the Jacobians of each layer with respect to the previous layers. Our bounds scale polynomially in depth when these empirical quantities are small, as is usually the case in practice. To obtain these bounds, we develop general tools for augmenting a sequence of functions to make their composition Lipschitz and then covering the augmented functions. Inspired by our theory, we directly regularize the network’s Jacobians during training and empirically demonstrate that this improves test performance.
Tasks
Published	2019-05-09
URL	https://arxiv.org/abs/1905.03684v2
PDF	https://arxiv.org/pdf/1905.03684v2.pdf
PWC	https://paperswithcode.com/paper/190503684
Repo
Framework

Single Camera Training for Person Re-identification


Title	Single Camera Training for Person Re-identification
Authors	Tianyu Zhang, Lingxi Xie, Longhui Wei, Yongfei Zhang, Bo Li, Qi Tian
Abstract	Person re-identification (ReID) aims at finding the same person in different cameras. Training such systems usually requires a large amount of cross-camera pedestrians to be annotated from surveillance videos, which is labor-consuming especially when the number of cameras is large. Differently, this paper investigates ReID in an unexplored single-camera-training (SCT) setting, where each person in the training set appears in only one camera. To the best of our knowledge, this setting was never studied before. SCT enjoys the advantage of low-cost data collection and annotation, and thus eases ReID systems to be trained in a brand new environment. However, it raises major challenges due to the lack of cross-camera person occurrences, which conventional approaches heavily rely on to extract discriminative features. The key to dealing with the challenges in the SCT setting lies in designing an effective mechanism to complement cross-camera annotation. We start with a regular deep network for feature extraction, upon which we propose a novel loss function named multi-camera negative loss (MCNL). This is a metric learning loss motivated by probability, suggesting that in a multi-camera system, one image is more likely to be closer to the most similar negative sample in other cameras than to the most similar negative sample in the same camera. In experiments, MCNL significantly boosts ReID accuracy in the SCT setting, which paves the way of fast deployment of ReID systems with good performance on new target scenes.
Tasks	Metric Learning, Person Re-Identification
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10848v1
PDF	https://arxiv.org/pdf/1909.10848v1.pdf
PWC	https://paperswithcode.com/paper/single-camera-training-for-person-re
Repo
Framework

Image Deconvolution with Deep Image and Kernel Priors


Title	Image Deconvolution with Deep Image and Kernel Priors
Authors	Zhunxuan Wang, Zipei Wang, Qiqi Li, Hakan Bilen
Abstract	Image deconvolution is the process of recovering convolutional degraded images, which is always a hard inverse problem because of its mathematically ill-posed property. On the success of the recently proposed deep image prior (DIP), we build an image deconvolution model with deep image and kernel priors (DIKP). DIP is a learning-free representation which uses neural net structures to express image prior information, and it showed great success in many energy-based models, e.g. denoising, super-resolution, inpainting. Instead, our DIKP model uses such priors in image deconvolution to model not only images but also kernels, combining the ideas of traditional learning-free deconvolution methods with neural nets. In this paper, we show that DIKP improve the performance of learning-free image deconvolution, and we experimentally demonstrate this on the standard benchmark of six standard test images in terms of PSNR and visual effects.
Tasks	Denoising, Image Deconvolution, Super-Resolution
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08386v1
PDF	https://arxiv.org/pdf/1910.08386v1.pdf
PWC	https://paperswithcode.com/paper/image-deconvolution-with-deep-image-and
Repo
Framework

Liver segmentation and metastases detection in MR images using convolutional neural networks


Title	Liver segmentation and metastases detection in MR images using convolutional neural networks
Authors	Mariëlle J. A. Jansen, Hugo J. Kuijf, Maarten Niekel, Wouter B. Veldhuis, Frank J. Wessels, Max A. Viergever, Josien P. W. Pluim
Abstract	Primary tumors have a high likelihood of developing metastases in the liver and early detection of these metastases is crucial for patient outcome. We propose a method based on convolutional neural networks (CNN) to detect liver metastases. First, the liver was automatically segmented using the six phases of abdominal dynamic contrast enhanced (DCE) MR images. Next, DCE-MR and diffusion weighted (DW) MR images are used for metastases detection within the liver mask. The liver segmentations have a median Dice similarity coefficient of 0.95 compared with manual annotations. The metastases detection method has a sensitivity of 99.8% with a median of 2 false positives per image. The combination of the two MR sequences in a dual pathway network is proven valuable for the detection of liver metastases. In conclusion, a high quality liver segmentation can be obtained in which we can successfully detect liver metastases.
Tasks	Liver Segmentation
Published	2019-10-15
URL	https://arxiv.org/abs/1910.06635v1
PDF	https://arxiv.org/pdf/1910.06635v1.pdf
PWC	https://paperswithcode.com/paper/liver-segmentation-and-metastases-detection
Repo
Framework

Domain-Agnostic Learning with Anatomy-Consistent Embedding for Cross-Modality Liver Segmentation


Title	Domain-Agnostic Learning with Anatomy-Consistent Embedding for Cross-Modality Liver Segmentation
Authors	Junlin Yang, Nicha C. Dvornek, Fan Zhang, Juntang Zhuang, Julius Chapiro, MingDe Lin, James S. Duncan
Abstract	Domain Adaptation (DA) has the potential to greatly help the generalization of deep learning models. However, the current literature usually assumes to transfer the knowledge from the source domain to a specific known target domain. Domain Agnostic Learning (DAL) proposes a new task of transferring knowledge from the source domain to data from multiple heterogeneous target domains. In this work, we propose the Domain-Agnostic Learning framework with Anatomy-Consistent Embedding (DALACE) that works on both domain-transfer and task-transfer to learn a disentangled representation, aiming to not only be invariant to different modalities but also preserve anatomical structures for the DA and DAL tasks in cross-modality liver segmentation. We validated and compared our model with state-of-the-art methods, including CycleGAN, Task Driven Generative Adversarial Network (TD-GAN), and Domain Adaptation via Disentangled Representations (DADR). For the DA task, our DALACE model outperformed CycleGAN, TD-GAN ,and DADR with DSC of 0.847 compared to 0.721, 0.793 and 0.806. For the DAL task, our model improved the performance with DSC of 0.794 from 0.522, 0.719 and 0.742 by CycleGAN, TD-GAN, and DADR. Further, we visualized the success of disentanglement, which added human interpretability of the learned meaningful representations. Through ablation analysis, we specifically showed the concrete benefits of disentanglement for downstream tasks and the role of supervision for better disentangled representation with segmentation consistency to be invariant to domains with the proposed Domain-Agnostic Module (DAM) and to preserve anatomical information with the proposed Anatomy-Preserving Module (APM).
Tasks	Domain Adaptation, Liver Segmentation
Published	2019-08-27
URL	https://arxiv.org/abs/1908.10489v1
PDF	https://arxiv.org/pdf/1908.10489v1.pdf
PWC	https://paperswithcode.com/paper/domain-agnostic-learning-with-anatomy
Repo
Framework

AHINE: Adaptive Heterogeneous Information Network Embedding


Title	AHINE: Adaptive Heterogeneous Information Network Embedding
Authors	Yucheng Lin, Xiaoqing Yang, Zang Li, Jieping Ye
Abstract	Network embedding is an effective way to solve the network analytics problems such as node classification, link prediction, etc. It represents network elements using low dimensional vectors such that the graph structural information and properties are maximumly preserved. Many prior works focused on embeddings for networks with the same type of edges or vertices, while some works tried to generate embeddings for heterogeneous network using mechanisms like specially designed meta paths. In this paper, we propose two novel algorithms, GHINE (General Heterogeneous Information Network Embedding) and AHINE (Adaptive Heterogeneous Information Network Embedding), to compute distributed representations for elements in heterogeneous networks. Specially, AHINE uses an adaptive deep model to learn network embeddings that maximizes the likelihood of preserving the relationship chains between non-adjacent nodes. We apply our embeddings to a large network of points of interest (POIs) and achieve superior accuracy on some prediction problems on a ride-hailing platform. In addition, we show that AHINE outperforms state-of-the-art methods on a set of learning tasks on public datasets, including node labelling and similarity ranking in bibliographic networks.
Tasks	Link Prediction, Network Embedding, Node Classification
Published	2019-08-20
URL	https://arxiv.org/abs/1909.01087v1
PDF	https://arxiv.org/pdf/1909.01087v1.pdf
PWC	https://paperswithcode.com/paper/ahine-adaptive-heterogeneous-information
Repo
Framework

Optimal input configuration of dynamic contrast enhanced MRI in convolutional neural networks for liver segmentation


Title	Optimal input configuration of dynamic contrast enhanced MRI in convolutional neural networks for liver segmentation
Authors	Mariëlle J. A. Jansen, Hugo J. Kuijf, Josien P. W. Pluim
Abstract	Most MRI liver segmentation methods use a structural 3D scan as input, such as a T1 or T2 weighted scan. Segmentation performance may be improved by utilizing both structural and functional information, as contained in dynamic contrast enhanced (DCE) MR series. Dynamic information can be incorporated in a segmentation method based on convolutional neural networks in a number of ways. In this study, the optimal input configuration of DCE MR images for convolutional neural networks (CNNs) is studied. The performance of three different input configurations for CNNs is studied for a liver segmentation task. The three configurations are I) one phase image of the DCE-MR series as input image; II) the separate phases of the DCE-MR as input images; and III) the separate phases of the DCE-MR as channels of one input image. The three input configurations are fed into a dilated fully convolutional network and into a small U-net. The CNNs were trained using 19 annotated DCE-MR series and tested on another 19 annotated DCE-MR series. The performance of the three input configurations for both networks is evaluated against manual annotations. The results show that both neural networks perform better when the separate phases of the DCE-MR series are used as channels of an input image in comparison to one phase as input image or the separate phases as input images. No significant difference between the performances of the two network architectures was found for the separate phases as channels of an input image.
Tasks	Liver Segmentation
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08251v1
PDF	https://arxiv.org/pdf/1908.08251v1.pdf
PWC	https://paperswithcode.com/paper/optimal-input-configuration-of-dynamic
Repo
Framework

Unsupervised Domain Adaptation via Disentangled Representations: Application to Cross-Modality Liver Segmentation


Title	Unsupervised Domain Adaptation via Disentangled Representations: Application to Cross-Modality Liver Segmentation
Authors	Junlin Yang, Nicha C. Dvornek, Fan Zhang, Julius Chapiro, MingDe Lin, James S. Duncan
Abstract	A deep learning model trained on some labeled data from a certain source domain generally performs poorly on data from different target domains due to domain shifts. Unsupervised domain adaptation methods address this problem by alleviating the domain shift between the labeled source data and the unlabeled target data. In this work, we achieve cross-modality domain adaptation, i.e. between CT and MRI images, via disentangled representations. Compared to learning a one-to-one mapping as the state-of-art CycleGAN, our model recovers a many-to-many mapping between domains to capture the complex cross-domain relations. It preserves semantic feature-level information by finding a shared content space instead of a direct pixelwise style transfer. Domain adaptation is achieved in two steps. First, images from each domain are embedded into two spaces, a shared domain-invariant content space and a domain-specific style space. Next, the representation in the content space is extracted to perform a task. We validated our method on a cross-modality liver segmentation task, to train a liver segmentation model on CT images that also performs well on MRI. Our method achieved Dice Similarity Coefficient (DSC) of 0.81, outperforming a CycleGAN-based method of 0.72. Moreover, our model achieved good generalization to joint-domain learning, in which unpaired data from different modalities are jointly learned to improve the segmentation performance on each individual modality. Lastly, under a multi-modal target domain with significant diversity, our approach exhibited the potential for diverse image generation and remained effective with DSC of 0.74 on multi-phasic MRI while the CycleGAN-based method performed poorly with a DSC of only 0.52.
Tasks	Domain Adaptation, Image Generation, Liver Segmentation, Style Transfer, Unsupervised Domain Adaptation
Published	2019-07-31
URL	https://arxiv.org/abs/1907.13590v2
PDF	https://arxiv.org/pdf/1907.13590v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-domain-adaptation-via-1
Repo
Framework

Target-Oriented Deformation of Visual-Semantic Embedding Space


Title	Target-Oriented Deformation of Visual-Semantic Embedding Space
Authors	Takashi Matsubara
Abstract	Multimodal embedding is a crucial research topic for cross-modal understanding, data mining, and translation. Many studies have attempted to extract representations from given entities and align them in a shared embedding space. However, because entities in different modalities exhibit different abstraction levels and modality-specific information, it is insufficient to embed related entities close to each other. In this study, we propose the Target-Oriented Deformation Network (TOD-Net), a novel module that continuously deforms the embedding space into a new space under a given condition, thereby adjusting similarities between entities. Unlike methods based on cross-modal attention, TOD-Net is a post-process applied to the embedding space learned by existing embedding systems and improves their performances of retrieval. In particular, when combined with cutting-edge models, TOD-Net gains the state-of-the-art cross-modal retrieval model associated with the MSCOCO dataset. Qualitative analysis reveals that TOD-Net successfully emphasizes entity-specific concepts and retrieves diverse targets via handling higher levels of diversity than existing models.
Tasks	Cross-Modal Retrieval
Published	2019-10-15
URL	https://arxiv.org/abs/1910.06514v1
PDF	https://arxiv.org/pdf/1910.06514v1.pdf
PWC	https://paperswithcode.com/paper/target-oriented-deformation-of-visual
Repo
Framework

Are sample means in multi-armed bandits positively or negatively biased?


Title	Are sample means in multi-armed bandits positively or negatively biased?
Authors	Jaehyeok Shin, Aaditya Ramdas, Alessandro Rinaldo
Abstract	It is well known that in stochastic multi-armed bandits (MAB), the sample mean of an arm is typically not an unbiased estimator of its true mean. In this paper, we decouple three different sources of this selection bias: adaptive \emph{sampling} of arms, adaptive \emph{stopping} of the experiment, and adaptively \emph{choosing} which arm to study. Through a new notion called ``optimism’’ that captures certain natural monotonic behaviors of algorithms, we provide a clean and unified analysis of how optimistic rules affect the sign of the bias. The main takeaway message is that optimistic sampling induces a negative bias, but optimistic stopping and optimistic choosing both induce a positive bias. These results are derived in a general stochastic MAB setup that is entirely agnostic to the final aim of the experiment (regret minimization or best-arm identification or anything else). We provide examples of optimistic rules of each type, demonstrate that simulations confirm our theoretical predictions, and pose some natural but hard open problems. \|
Tasks	Multi-Armed Bandits
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11397v2
PDF	https://arxiv.org/pdf/1905.11397v2.pdf
PWC	https://paperswithcode.com/paper/the-bias-of-the-sample-mean-in-multi-armed
Repo
Framework

Bayesian Structure Adaptation for Continual Learning


Title	Bayesian Structure Adaptation for Continual Learning
Authors	Abhishek Kumar, Sunabha Chatterjee, Piyush Rai
Abstract	Continual Learning is a learning paradigm where learning systems are trained with sequential or streaming tasks. Two notable directions among the recent advances in continual learning with neural networks are ($i$) variational Bayes based regularization by learning priors from previous tasks, and, ($ii$) learning the structure of deep networks to adapt to new tasks. So far, these two approaches have been orthogonal. We present a novel Bayesian approach to continual learning based on learning the structure of deep neural networks, addressing the shortcomings of both these approaches. The proposed model learns the deep structure for each task by learning which weights to be used, and supports inter-task transfer through the overlapping of different sparse subsets of weights learned by different tasks. Experimental results on supervised and unsupervised benchmarks shows that our model performs comparably or better than recent advances in continual learning setting.
Tasks	Continual Learning
Published	2019-12-08
URL	https://arxiv.org/abs/1912.03624v2
PDF	https://arxiv.org/pdf/1912.03624v2.pdf
PWC	https://paperswithcode.com/paper/nonparametric-bayesian-structure-adaptation
Repo
Framework

Architectural configurations, atlas granularity and functional connectivity with diagnostic value in Autism Spectrum Disorder


Title	Architectural configurations, atlas granularity and functional connectivity with diagnostic value in Autism Spectrum Disorder
Authors	Cooper J. Mellema, Alex Treacher, Kevin P. Nguyen, Albert Montillo
Abstract	Currently, the diagnosis of Autism Spectrum Disorder (ASD) is dependent upon a subjective, time-consuming evaluation of behavioral tests by an expert clinician. Non-invasive functional MRI (fMRI) characterizes brain connectivity and may be used to inform diagnoses and democratize medicine. However, successful construction of deep learning models from fMRI requires addressing key choices about the model’s architecture, including the number of layers and number of neurons per layer. Meanwhile, deriving functional connectivity (FC) features from fMRI requires choosing an atlas with an appropriate level of granularity. Once a model has been built, it is vital to determine which features are predictive of ASD and if similar features are learned across atlas granularity levels. To identify aptly suited architectural configurations, probability distributions of the configurations of high versus low performing models are compared. To determine the effect of atlas granularity, connectivity features are derived from atlases with 3 levels of granularity and important features are ranked with permutation feature importance. Results show the highest performing models use between 2-4 hidden layers and 16-64 neurons per layer, granularity dependent. Connectivity features identified as important across all 3 atlas granularity levels include FC to the supplementary motor gyrus and language association cortex, regions associated with deficits in social and sensory processing in ASD. Importantly, the cerebellum, often not included in functional analyses, is also identified as a region whose abnormal connectivity is highly predictive of ASD. Results of this study identify important regions to include in future studies of ASD, help assist in the selection of network architectures, and help identify appropriate levels of granularity to facilitate the development of accurate diagnostic models of ASD.
Tasks	Feature Importance
Published	2019-11-25
URL	https://arxiv.org/abs/1911.11024v1
PDF	https://arxiv.org/pdf/1911.11024v1.pdf
PWC	https://paperswithcode.com/paper/architectural-configurations-atlas
Repo
Framework

Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings


Title	Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings
Authors	Shweta Mahajan, Teresa Botschen, Iryna Gurevych, Stefan Roth
Abstract	One of the key challenges in learning joint embeddings of multiple modalities, e.g. of images and text, is to ensure coherent cross-modal semantics that generalize across datasets. We propose to address this through joint Gaussian regularization of the latent representations. Building on Wasserstein autoencoders (WAEs) to encode the input in each domain, we enforce the latent embeddings to be similar to a Gaussian prior that is shared across the two domains, ensuring compatible continuity of the encoded semantic representations of images and texts. Semantic alignment is achieved through supervision from matching image-text pairs. To show the benefits of our semi-supervised representation, we apply it to cross-modal retrieval and phrase localization. We not only achieve state-of-the-art accuracy, but significantly better generalization across datasets, owing to the semantic continuity of the latent space.
Tasks	Cross-Modal Retrieval
Published	2019-09-14
URL	https://arxiv.org/abs/1909.06635v1
PDF	https://arxiv.org/pdf/1909.06635v1.pdf
PWC	https://paperswithcode.com/paper/joint-wasserstein-autoencoders-for-aligning
Repo
Framework

An information theoretic approach to the autoencoder


Title	An information theoretic approach to the autoencoder
Authors	Vincenzo Crescimanna, Bruce Graham
Abstract	We present a variation of the Autoencoder (AE) that explicitly maximizes the mutual information between the input data and the hidden representation. The proposed model, the InfoMax Autoencoder (IMAE), by construction is able to learn a robust representation and good prototypes of the data. IMAE is compared both theoretically and then computationally with the state of the art models: the Denoising and Contractive Autoencoders in the one-hidden layer setting and the Variational Autoencoder in the multi-layer case. Computational experiments are performed with the MNIST and Fashion-MNIST datasets and demonstrate particularly the strong clusterization performance of IMAE.
Tasks	Denoising
Published	2019-01-23
URL	http://arxiv.org/abs/1901.08019v1
PDF	http://arxiv.org/pdf/1901.08019v1.pdf
PWC	https://paperswithcode.com/paper/an-information-theoretic-approach-to-the
Repo
Framework