February 1, 2020

3244 words 16 mins read

Paper Group AWR 222

SMART tracking: Simultaneous anatomical imaging and real-time passive device tracking for MR-guided interventions. exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models. Extracting Success from IBM’s 20-Qubit Machines Using Error-Aware Compilation. Multi-step Cascaded Networks for Brain Tumor Segmentation. Similar …

SMART tracking: Simultaneous anatomical imaging and real-time passive device tracking for MR-guided interventions


Title	SMART tracking: Simultaneous anatomical imaging and real-time passive device tracking for MR-guided interventions
Authors	Frank Zijlstra, Max A. Viergever, Peter R. Seevinck
Abstract	Purpose: This study demonstrates a proof of concept of a method for simultaneous anatomical imaging and real-time (SMART) passive device tracking for MR-guided interventions. Methods: Phase Correlation template matching was combined with a fast undersampled radial multi-echo acquisition using the white marker phenomenon after the first echo. In this way, the first echo provides anatomical contrast, whereas the other echoes provide white marker contrast to allow accurate device localization using fast simulations and template matching. This approach was tested on tracking of five 0.5 mm steel markers in an agarose phantom and on insertion of an MRI-compatible 20 Gauge titanium needle in ex vivo porcine tissue. The locations of the steel markers were quantitatively compared to the marker locations as found on a CT scan of the same phantom. Results: The average pairwise error between the MRI and CT locations was 0.30 mm for tracking of stationary steel spheres and 0.29 mm during motion. Qualitative evaluation of the tracking of needle insertions showed that tracked positions were stable throughout needle insertion and retraction. Conclusions: The proposed SMART tracking method provided accurate passive tracking of devices at high framerates, inclusion of real-time anatomical scanning, and the capability of automatic slice positioning. Furthermore, the method does not require specialized hardware and could therefore be applied to track any rigid metal device that causes appreciable magnetic field distortions.
Tasks
Published	2019-08-28
URL	https://arxiv.org/abs/1908.10769v1
PDF	https://arxiv.org/pdf/1908.10769v1.pdf
PWC	https://paperswithcode.com/paper/smart-tracking-simultaneous-anatomical
Repo	https://github.com/FrankZijlstra/SmartTracking
Framework	none

exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models


Title	exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models
Authors	Benjamin Hoover, Hendrik Strobelt, Sebastian Gehrmann
Abstract	Large language models can produce powerful contextual representations that lead to improvements across many NLP tasks. Since these models are typically guided by a sequence of learned self attention mechanisms and may comprise undesired inductive biases, it is paramount to be able to explore what the attention has learned. While static analyses of these models lead to targeted insights, interactive tools are more dynamic and can help humans better gain an intuition for the model-internal reasoning process. We present exBERT, an interactive tool named after the popular BERT language model, that provides insights into the meaning of the contextual representations by matching a human-specified input to similar contexts in a large annotated dataset. By aggregating the annotations of the matching similar contexts, exBERT helps intuitively explain what each attention-head has learned.
Tasks	Language Modelling
Published	2019-10-11
URL	https://arxiv.org/abs/1910.05276v1
PDF	https://arxiv.org/pdf/1910.05276v1.pdf
PWC	https://paperswithcode.com/paper/exbert-a-visual-analysis-tool-to-explore
Repo	https://github.com/common-english/bert-all
Framework	pytorch

Extracting Success from IBM’s 20-Qubit Machines Using Error-Aware Compilation


Title	Extracting Success from IBM’s 20-Qubit Machines Using Error-Aware Compilation
Authors	Shin Nishio, Yulu Pan, Takahiko Satoh, Hideharu Amano, Rodney Van Meter
Abstract	NISQ (Noisy, Intermediate-Scale Quantum) computing requires error mitigation to achieve meaningful computation. Our compilation tool development focuses on the fact that the error rates of individual qubits are not equal, with a goal of maximizing the success probability of real-world subroutines such as an adder circuit. We begin by establishing a metric for choosing among possible paths and circuit alternatives for executing gates between variables placed far apart within the processor, and test our approach on two IBM 20-qubit systems named Tokyo and Poughkeepsie. We find that a single-number metric describing the fidelity of individual gates is a useful but imperfect guide. Our compiler uses this subsystem and maps complete circuits onto the machine using a beam search-based heuristic that will scale as processor and program sizes grow. To evaluate the whole compilation process, we compiled and executed adder circuits, then calculated the KL-divergence (a measure of the distance between two probability distributions). For a circuit within the capabilities of the hardware, our compilation increases estimated success probability and reduces KL-divergence relative to an error-oblivious placement.
Tasks
Published	2019-03-26
URL	http://arxiv.org/abs/1903.10963v1
PDF	http://arxiv.org/pdf/1903.10963v1.pdf
PWC	https://paperswithcode.com/paper/extracting-success-from-ibms-20-qubit
Repo	https://github.com/parton-quark/parton-quark.github.io
Framework	none

Multi-step Cascaded Networks for Brain Tumor Segmentation


Title	Multi-step Cascaded Networks for Brain Tumor Segmentation
Authors	Xiangyu Li, Gongning Luo, Kuanquan Wang
Abstract	Automatic brain tumor segmentation method plays an extremely important role in the whole process of brain tumor diagnosis and treatment. In this paper, we propose a multi-step cascaded network which takes the hierarchical topology of the brain tumor substructures into consideration and segments the substructures from coarse to fine .During segmentation, the result of the former step is utilized as the prior information for the next step to guide the finer segmentation process. The whole network is trained in an end-to-end fashion. Besides, to alleviate the gradient vanishing issue and reduce overfitting, we added several auxiliary outputs as a kind of deep supervision for each step and introduced several data augmentation strategies, respectively, which proved to be quite efficient for brain tumor segmentation. Lastly, focal loss is utilized to solve the problem of remarkably imbalance of the tumor regions and background. Our model is tested on the BraTS 2019 validation dataset, the preliminary results of mean dice coefficients are 0.886, 0.813, 0.771 for the whole tumor, tumor core and enhancing tumor respectively. Code is available at https://github.com/JohnleeHIT/Brats2019
Tasks	Brain Tumor Segmentation, Data Augmentation
Published	2019-08-16
URL	https://arxiv.org/abs/1908.05887v3
PDF	https://arxiv.org/pdf/1908.05887v3.pdf
PWC	https://paperswithcode.com/paper/multi-step-cascaded-networks-for-brain-tumor
Repo	https://github.com/JohnleeHIT/Brats2019
Framework	tf

Similarity of Neural Network Representations Revisited


Title	Similarity of Neural Network Representations Revisited
Authors	Simon Kornblith, Mohammad Norouzi, Honglak Lee, Geoffrey Hinton
Abstract	Recent work has sought to understand the behavior of neural networks by comparing representations between layers and between different trained models. We examine methods for comparing neural network representations based on canonical correlation analysis (CCA). We show that CCA belongs to a family of statistics for measuring multivariate similarity, but that neither CCA nor any other statistic that is invariant to invertible linear transformation can measure meaningful similarities between representations of higher dimension than the number of data points. We introduce a similarity index that measures the relationship between representational similarity matrices and does not suffer from this limitation. This similarity index is equivalent to centered kernel alignment (CKA) and is also closely connected to CCA. Unlike CCA, CKA can reliably identify correspondences between representations in networks trained from different initializations.
Tasks
Published	2019-05-01
URL	https://arxiv.org/abs/1905.00414v4
PDF	https://arxiv.org/pdf/1905.00414v4.pdf
PWC	https://paperswithcode.com/paper/similarity-of-neural-network-representations
Repo	https://github.com/yl2488/CKA-Centered-Kernel-Alignment
Framework	none

Poisson-Randomized Gamma Dynamical Systems


Title	Poisson-Randomized Gamma Dynamical Systems
Authors	Aaron Schein, Scott W. Linderman, Mingyuan Zhou, David M. Blei, Hanna Wallach
Abstract	This paper presents the Poisson-randomized gamma dynamical system (PRGDS), a model for sequentially observed count tensors that encodes a strong inductive bias toward sparsity and burstiness. The PRGDS is based on a new motif in Bayesian latent variable modeling, an alternating chain of discrete Poisson and continuous gamma latent states that is analytically convenient and computationally tractable. This motif yields closed-form complete conditionals for all variables by way of the Bessel distribution and a novel discrete distribution that we call the shifted confluent hypergeometric distribution. We draw connections to closely related models and compare the PRGDS to these models in studies of real-world count data sets of text, international events, and neural spike trains. We find that a sparse variant of the PRGDS, which allows the continuous gamma latent states to take values of exactly zero, often obtains better predictive performance than other models and is uniquely capable of inferring latent structures that are highly localized in time.
Tasks
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12991v1
PDF	https://arxiv.org/pdf/1910.12991v1.pdf
PWC	https://paperswithcode.com/paper/poisson-randomized-gamma-dynamical-systems
Repo	https://github.com/aschein/PRGDS
Framework	none

Penalty Method for Inversion-Free Deep Bilevel Optimization


Title	Penalty Method for Inversion-Free Deep Bilevel Optimization
Authors	Akshay Mehra, Jihun Hamm
Abstract	Bilevel optimization problems are at the center of several important machine learning problems such as hyperparameter tuning, data denoising, meta- and few-shot learning, data poisoning. Different from simultaneous or multi-objective optimization, bilevel optimization requires computing the inverse of the Hessian of the lower-level cost function to obtain the exact descent direction for the upper-level cost. In this paper, we propose a new method for solving deep bilevel optimization problems using the penalty function which avoids computing the inverse. We prove convergence of our method under mild conditions and show that it computes the exact hypergradient asymptotically. Small space and time complexity of our method enables us to solve large-scale bilevel problems involving deep neural networks with several million parameters. We present results of our method for data denoising on MNIST/CIFAR10/SVHN datasets, for few-shot learning on Omniglot/Mini-Imagenet datasets and for training-data poisoning on MNIST/Imagenet datasets. In all experiments, our method outperforms or is comparable to previously proposed methods both in terms of accuracy and run-time.
Tasks	bilevel optimization, data poisoning, Denoising, Few-Shot Learning, Omniglot
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03432v4
PDF	https://arxiv.org/pdf/1911.03432v4.pdf
PWC	https://paperswithcode.com/paper/penalty-method-for-inversion-free-deep
Repo	https://github.com/jihunhamm/bilevel-penalty
Framework	tf

TAPER: Time-Aware Patient EHR Representation


Title	TAPER: Time-Aware Patient EHR Representation
Authors	Sajad Darabi, Mohammad Kachuee, Shayan Fazeli, Majid Sarrafzadeh
Abstract	Effective representation learning of electronic health records is a challenging task and is becoming more important as the availability of such data is becoming pervasive. The data contained in these records are irregular and contain multiple modalities such as notes, and medical codes. They are preempted by medical conditions the patient may have, and are typically jotted down by medical staff. Accompanying codes are notes containing valuable information about patients beyond the structured information contained in electronic health records. We use transformer networks and the recently proposed BERT language model to embed these data streams into a unified vector representation. The presented approach effectively encodes a patient’s visit data into a single distributed representation, which can be used for downstream tasks. Our model demonstrates superior performance and generalization on mortality, readmission and length of stay tasks using the publicly available MIMIC-III ICU dataset.
Tasks	Language Modelling, Representation Learning
Published	2019-08-11
URL	https://arxiv.org/abs/1908.03971v3
PDF	https://arxiv.org/pdf/1908.03971v3.pdf
PWC	https://paperswithcode.com/paper/taper-time-aware-patient-ehr-representation
Repo	https://github.com/sajaddarabi/TAPER
Framework	pytorch

CrypTFlow: Secure TensorFlow Inference


Title	CrypTFlow: Secure TensorFlow Inference
Authors	Nishant Kumar, Mayank Rathee, Nishanth Chandran, Divya Gupta, Aseem Rastogi, Rahul Sharma
Abstract	We present CrypTFlow, a first of its kind system that converts TensorFlow inference code into Secure Multi-party Computation (MPC) protocols at the push of a button. To do this, we build three components. Our first component, Athos, is an end-to-end compiler from TensorFlow to a variety of semi-honest MPC protocols. The second component, Porthos, is an improved semi-honest 3-party protocol that provides significant speedups for TensorFlow like applications. Finally, to provide malicious secure MPC protocols, our third component, Aramis, is a novel technique that uses hardware with integrity guarantees to convert any semi-honest MPC protocol into an MPC protocol that provides malicious security. The malicious security of the protocols output by Aramis relies on integrity of the hardware and semi-honest security of MPC. Moreover, our system matches the inference accuracy of plaintext TensorFlow. We experimentally demonstrate the power of our system by showing the secure inference of real-world neural networks such as ResNet50 and DenseNet121 over the ImageNet dataset with running times of about 30 seconds for semi-honest security and under two minutes for malicious security. Prior work in the area of secure inference has been limited to semi-honest security of small networks over tiny datasets such as MNIST or CIFAR. Even on MNIST/CIFAR, CrypTFlow outperforms prior work.
Tasks
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07814v2
PDF	https://arxiv.org/pdf/1909.07814v2.pdf
PWC	https://paperswithcode.com/paper/cryptflow-secure-tensorflow-inference
Repo	https://github.com/mpc-msri/EzPC
Framework	tf

HIDRA: Head Initialization across Dynamic targets for Robust Architectures


Title	HIDRA: Head Initialization across Dynamic targets for Robust Architectures
Authors	Rafael Rego Drumond, Lukas Brinkmeyer, Josif Grabocka, Lars Schmidt-Thieme
Abstract	The performance of gradient-based optimization strategies depends heavily on the initial weights of the parametric model. Recent works show that there exist weight initializations from which optimization procedures can find the task-specific parameters faster than from uniformly random initializations and that such a weight initialization can be learned by optimizing a specific model architecture across similar tasks via MAML (Model-Agnostic Meta-Learning). Current methods are limited to populations of classification tasks that share the same number of classes due to the static model architectures used during meta-learning. In this paper, we present HIDRA, a meta-learning approach that enables training and evaluating across tasks with any number of target variables. We show that Model-Agnostic Meta-Learning trains a distribution for all the neurons in the output layer and a specific weight initialization for the ones in the hidden layers. HIDRA explores this by learning one master neuron, which is used to initialize any number of output neurons for a new task. Extensive experiments on the Miniimagenet and Omniglot data sets demonstrate that HIDRA improves over standard approaches while generalizing to tasks with any number of target variables. Moreover, our approach is shown to robustify low-capacity models in learning across complex tasks with a high number of classes for which regular MAML fails to learn any feasible initialization.
Tasks	Meta-Learning, Omniglot
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12749v2
PDF	https://arxiv.org/pdf/1910.12749v2.pdf
PWC	https://paperswithcode.com/paper/hidra-head-initialization-across-dynamic
Repo	https://github.com/radrumond/hidra
Framework	tf

Looking Fast and Slow: Memory-Guided Mobile Video Object Detection


Title	Looking Fast and Slow: Memory-Guided Mobile Video Object Detection
Authors	Mason Liu, Menglong Zhu, Marie White, Yinxiao Li, Dmitry Kalenichenko
Abstract	Models and examples built with TensorFlow
Tasks	Object Detection, Object Recognition, Video Object Detection
Published	2019-03-25
URL	http://arxiv.org/abs/1903.10172v1
PDF	http://arxiv.org/pdf/1903.10172v1.pdf
PWC	https://paperswithcode.com/paper/looking-fast-and-slow-memory-guided-mobile
Repo	https://github.com/vikrant7/pytorch-looking-fast-and-slow
Framework	pytorch

Perceptual Evaluation of Adversarial Attacks for CNN-based Image Classification


Title	Perceptual Evaluation of Adversarial Attacks for CNN-based Image Classification
Authors	Sid Ahmed Fezza, Yassine Bakhti, Wassim Hamidouche, Olivier Déforges
Abstract	Deep neural networks (DNNs) have recently achieved state-of-the-art performance and provide significant progress in many machine learning tasks, such as image classification, speech processing, natural language processing, etc. However, recent studies have shown that DNNs are vulnerable to adversarial attacks. For instance, in the image classification domain, adding small imperceptible perturbations to the input image is sufficient to fool the DNN and to cause misclassification. The perturbed image, called \textit{adversarial example}, should be visually as close as possible to the original image. However, all the works proposed in the literature for generating adversarial examples have used the $L_{p}$ norms ($L_{0}$, $L_{2}$ and $L_{\infty}$) as distance metrics to quantify the similarity between the original image and the adversarial example. Nonetheless, the $L_{p}$ norms do not correlate with human judgment, making them not suitable to reliably assess the perceptual similarity/fidelity of adversarial examples. In this paper, we present a database for visual fidelity assessment of adversarial examples. We describe the creation of the database and evaluate the performance of fifteen state-of-the-art full-reference (FR) image fidelity assessment metrics that could substitute $L_{p}$ norms. The database as well as subjective scores are publicly available to help designing new metrics for adversarial examples and to facilitate future research works.
Tasks	Image Classification
Published	2019-06-01
URL	https://arxiv.org/abs/1906.00204v1
PDF	https://arxiv.org/pdf/1906.00204v1.pdf
PWC	https://paperswithcode.com/paper/190600204
Repo	https://github.com/safezza/IQA-CNN-Adversarial-Attacks
Framework	none

PC-DARTS: Partial Channel Connections for Memory-Efficient Differentiable Architecture Search


Title	PC-DARTS: Partial Channel Connections for Memory-Efficient Differentiable Architecture Search
Authors	Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, Hongkai Xiong
Abstract	Differentiable architecture search (DARTS) provided a fast solution in finding effective network architectures, but suffered from large memory and computing overheads in jointly training a super-network and searching for an optimal architecture. In this paper, we present a novel approach, namely, Partially-Connected DARTS, by sampling a small part of super-network to reduce the redundancy in exploring the network space, thereby performing a more efficient search without comprising the performance. In particular, we perform operation search in a subset of channels while bypassing the held out part in a shortcut. This strategy may suffer from an undesired inconsistency on selecting the edges of super-net caused by sampling different channels. We alleviate it using edge normalization, which adds a new set of edge-level parameters to reduce uncertainty in search. Thanks to the reduced memory cost, PC-DARTS can be trained with a larger batch size and, consequently, enjoys both faster speed and higher training stability. Experimental results demonstrate the effectiveness of the proposed method. Specifically, we achieve an error rate of 2.57% on CIFAR10 with merely 0.1 GPU-days for architecture search, and a state-of-the-art top-1 error rate of 24.2% on ImageNet (under the mobile setting) using 3.8 GPU-days for search. Our code has been made available at: https://github.com/yuhuixu1993/PC-DARTS.
Tasks	Neural Architecture Search
Published	2019-07-12
URL	https://arxiv.org/abs/1907.05737v3
PDF	https://arxiv.org/pdf/1907.05737v3.pdf
PWC	https://paperswithcode.com/paper/pc-darts-partial-channel-connections-for
Repo	https://github.com/xkp793003821/PC-DARTS-COOPER
Framework	pytorch

Frustratingly Easy Person Re-Identification: Generalizing Person Re-ID in Practice


Title	Frustratingly Easy Person Re-Identification: Generalizing Person Re-ID in Practice
Authors	Jieru Jia, Qiuqi Ruan, Timothy M. Hospedales
Abstract	Contemporary person re-identification (\reid) methods usually require access to data from the deployment camera network during training in order to perform well. This is because contemporary \reid{} models trained on one dataset do not generalise to other camera networks due to the domain-shift between datasets. This requirement is often the bottleneck for deploying \reid{} systems in practical security or commercial applications, as it may be impossible to collect this data in advance or prohibitively costly to annotate it. This paper alleviates this issue by proposing a simple baseline for domain generalizable~(DG) person re-identification. That is, to learn a \reid{} model from a set of source domains that is suitable for application to unseen datasets out-of-the-box, without any model updating. Specifically, we observe that the domain discrepancy in \reid{} is due to style and content variance across datasets and demonstrate appropriate Instance and Feature Normalization alleviates much of the resulting domain-shift in Deep \reid{} models. Instance Normalization~(IN) in early layers filters out style statistic variations and Feature Normalization~(FN) in deep layers is able to further eliminate disparity in content statistics. Compared to contemporary alternatives, this approach is extremely simple to implement, while being faster to train and test, thus making it an extremely valuable baseline for implementing \reid{} in practice. With a few lines of code, it increases the rank 1 \reid{} accuracy by {11.8%, 33.2%, 12.8% and 8.5%} on the VIPeR, PRID, GRID, and i-LIDS benchmarks respectively. Source codes are available at \url{https://github.com/BJTUJia/person_reID_DualNorm}.
Tasks	Person Re-Identification
Published	2019-05-09
URL	https://arxiv.org/abs/1905.03422v3
PDF	https://arxiv.org/pdf/1905.03422v3.pdf
PWC	https://paperswithcode.com/paper/190503422
Repo	https://github.com/BJTUJia/person_reID_DualNorm
Framework	pytorch

Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention


Title	Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention
Authors	Biao Zhang, Ivan Titov, Rico Sennrich
Abstract	The general trend in NLP is towards increasing model capacity and performance via deeper neural networks. However, simply stacking more layers of the popular Transformer architecture for machine translation results in poor convergence and high computational overhead. Our empirical analysis suggests that convergence is poor due to gradient vanishing caused by the interaction between residual connections and layer normalization. We propose depth-scaled initialization (DS-Init), which decreases parameter variance at the initialization stage, and reduces output variance of residual connections so as to ease gradient back-propagation through normalization layers. To address computational cost, we propose a merged attention sublayer (MAtt) which combines a simplified averagebased self-attention sublayer and the encoderdecoder attention sublayer on the decoder side. Results on WMT and IWSLT translation tasks with five translation directions show that deep Transformers with DS-Init and MAtt can substantially outperform their base counterpart in terms of BLEU (+1.1 BLEU on average for 12-layer models), while matching the decoding speed of the baseline model thanks to the efficiency improvements of MAtt.
Tasks	Machine Translation
Published	2019-08-29
URL	https://arxiv.org/abs/1908.11365v1
PDF	https://arxiv.org/pdf/1908.11365v1.pdf
PWC	https://paperswithcode.com/paper/improving-deep-transformer-with-depth-scaled
Repo	https://github.com/bzhangGo/zero
Framework	tf