January 31, 2020

3464 words 17 mins read

Paper Group AWR 439

Torus Graphs for Multivariate Phase Coupling Analysis. On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition. Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking. Transfer Learning for Brain Tumor Segmentation. Linkage Based Face Clustering via Graph Convolution Network. Extracting temporal features into a s …

Torus Graphs for Multivariate Phase Coupling Analysis


Title	Torus Graphs for Multivariate Phase Coupling Analysis
Authors	Natalie Klein, Josue Orellana, Scott Brincat, Earl K. Miller, Robert E. Kass
Abstract	Angular measurements are often modeled as circular random variables, where there are natural circular analogues of moments, including correlation. Because a product of circles is a torus, a d-dimensional vector of circular random variables lies on a d-dimensional torus. For such vectors we present here a class of graphical models, which we call torus graphs, based on the full exponential family with pairwise interactions. The topological distinction between a torus and Euclidean space has several important consequences. Our development was motivated by the problem of identifying phase coupling among oscillatory signals recorded from multiple electrodes in the brain: oscillatory phases across electrodes might tend to advance or recede together, indicating coordination across brain areas. The data analyzed here consisted of 24 phase angles measured repeatedly across 840 experimental trials (replications) during a memory task, where the electrodes were in 4 distinct brain regions, all known to be active while memories are being stored or retrieved. In realistic numerical simulations, we found that a standard pairwise assessment, known as phase locking value, is unable to describe multivariate phase interactions, but that torus graphs can accurately identify conditional associations. Torus graphs generalize several more restrictive approaches that have appeared in various scientific literatures, and produced intuitive results in the data we analyzed. Torus graphs thus unify multivariate analysis of circular data and present fertile territory for future research.
Tasks
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11044v1
PDF	https://arxiv.org/pdf/1910.11044v1.pdf
PWC	https://paperswithcode.com/paper/torus-graphs-for-multivariate-phase-coupling
Repo	https://github.com/natalieklein/torus-graphs
Framework	none

On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition


Title	On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition
Authors	Kazuki Irie, Rohit Prabhavalkar, Anjuli Kannan, Antoine Bruguier, David Rybach, Patrick Nguyen
Abstract	In conventional speech recognition, phoneme-based models outperform grapheme-based models for non-phonetic languages such as English. The performance gap between the two typically reduces as the amount of training data is increased. In this work, we examine the impact of the choice of modeling unit for attention-based encoder-decoder models. We conduct experiments on the LibriSpeech 100hr, 460hr, and 960hr tasks, using various target units (phoneme, grapheme, and word-piece); across all tasks, we find that grapheme or word-piece models consistently outperform phoneme-based models, even though they are evaluated without a lexicon or an external language model. We also investigate model complementarity: we find that we can improve WERs by up to 9% relative by rescoring N-best lists generated from a strong word-piece based baseline with either the phoneme or the grapheme model. Rescoring an N-best list generated by the phonemic system, however, provides limited improvements. Further analysis shows that the word-piece-based models produce more diverse N-best hypotheses, and thus lower oracle WERs, than phonemic models.
Tasks	Language Modelling, Sequence-To-Sequence Speech Recognition, Speech Recognition
Published	2019-02-05
URL	https://arxiv.org/abs/1902.01955v2
PDF	https://arxiv.org/pdf/1902.01955v2.pdf
PWC	https://paperswithcode.com/paper/model-unit-exploration-for-sequence-to
Repo	https://github.com/colaprograms/speechify
Framework	tf

Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking


Title	Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking
Authors	Eric Crawford, Joelle Pineau
Abstract	The ability to detect and track objects in the visual world is a crucial skill for any intelligent agent, as it is a necessary precursor to any object-level reasoning process. Moreover, it is important that agents learn to track objects without supervision (i.e. without access to annotated training videos) since this will allow agents to begin operating in new environments with minimal human assistance. The task of learning to discover and track objects in videos, which we call \textit{unsupervised object tracking}, has grown in prominence in recent years; however, most architectures that address it still struggle to deal with large scenes containing many objects. In the current work, we propose an architecture that scales well to the large-scene, many-object setting by employing spatially invariant computations (convolutions and spatial attention) and representations (a spatially local object specification scheme). In a series of experiments, we demonstrate a number of attractive features of our architecture; most notably, that it outperforms competing methods at tracking objects in cluttered scenes with many objects, and that it can generalize well to videos that are larger and/or contain more objects than videos encountered during training.
Tasks	Object Tracking
Published	2019-11-20
URL	https://arxiv.org/abs/1911.09033v1
PDF	https://arxiv.org/pdf/1911.09033v1.pdf
PWC	https://paperswithcode.com/paper/exploiting-spatial-invariance-for-scalable
Repo	https://github.com/e2crawfo/silot
Framework	tf

Transfer Learning for Brain Tumor Segmentation


Title	Transfer Learning for Brain Tumor Segmentation
Authors	Jonas Wacker, Marcelo Ladeira, José Eduardo Vaz Nascimento
Abstract	Gliomas are the most common malignant brain tumors that are treated with chemoradiotherapy and surgery. Magnetic Resonance Imaging (MRI) is used by radiotherapists to manually segment brain lesions and to observe their development throughout the therapy. The manual image segmentation process is time-consuming and results tend to vary among different human raters. Therefore, there is a substantial demand for automatic image segmentation algorithms that produce a reliable and accurate segmentation of various brain tissue types. Recent advances in deep learning have led to convolutional neural network architectures that excel at various visual recognition tasks. They have been successfully applied to the medical context including medical image segmentation. In particular, fully convolutional networks (FCNs) such as the U-Net produce state-of-the-art results in the automatic segmentation of brain tumors. MRI brain scans are volumetric and exist in various co-registered modalities that serve as input channels for these FCN architectures. Training algorithms for brain tumor segmentation on this complex input requires large amounts of computational resources and is prone to overfitting. In this work, we construct FCNs with pretrained convolutional encoders. We show that we can stabilize the training process this way and produce more robust predictions. We evaluate our methods on publicly available data as well as on a privately acquired clinical dataset. We also show that the impact of pretraining is even higher for predictions on the clinical data.
Tasks	Brain Tumor Segmentation, Medical Image Segmentation, Semantic Segmentation, Transfer Learning
Published	2019-12-28
URL	https://arxiv.org/abs/1912.12452v1
PDF	https://arxiv.org/pdf/1912.12452v1.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-for-brain-tumor
Repo	https://github.com/joneswack/brats-pretraining
Framework	pytorch

Linkage Based Face Clustering via Graph Convolution Network


Title	Linkage Based Face Clustering via Graph Convolution Network
Authors	Zhongdao Wang, Liang Zheng, Yali Li, Shengjin Wang
Abstract	In this paper, we present an accurate and scalable approach to the face clustering task. We aim at grouping a set of faces by their potential identities. We formulate this task as a link prediction problem: a link exists between two faces if they are of the same identity. The key idea is that we find the local context in the feature space around an instance (face) contains rich information about the linkage relationship between this instance and its neighbors. By constructing sub-graphs around each instance as input data, which depict the local context, we utilize the graph convolution network (GCN) to perform reasoning and infer the likelihood of linkage between pairs in the sub-graphs. Experiments show that our method is more robust to the complex distribution of faces than conventional methods, yielding favorably comparable results to state-of-the-art methods on standard face clustering benchmarks, and is scalable to large datasets. Furthermore, we show that the proposed method does not need the number of clusters as prior, is aware of noises and outliers, and can be extended to a multi-view version for more accurate clustering accuracy.
Tasks	Link Prediction
Published	2019-03-27
URL	http://arxiv.org/abs/1903.11306v3
PDF	http://arxiv.org/pdf/1903.11306v3.pdf
PWC	https://paperswithcode.com/paper/linkage-based-face-clustering-via-graph
Repo	https://github.com/Zhongdao/gcn_clustering
Framework	pytorch

Extracting temporal features into a spatial domain using autoencoders for sperm video analysis


Title	Extracting temporal features into a spatial domain using autoencoders for sperm video analysis
Authors	Vajira Thambawita, Pål Halvorsen, Hugo Hammer, Michael Riegler, Trine B. Haugen
Abstract	In this paper, we present a two-step deep learning method that is used to predict sperm motility and morphology-based on video recordings of human spermatozoa. First, we use an autoencoder to extract temporal features from a given semen video and plot these into image-space, which we call feature-images. Second, these feature-images are used to perform transfer learning to predict the motility and morphology values of human sperm. The presented method shows it’s capability to extract temporal information into spatial domain feature-images which can be used with traditional convolutional neural networks. Furthermore, the accuracy of the predicted motility of a given semen sample shows that a deep learning-based model can capture the temporal information of microscopic recordings of human semen.
Tasks	Transfer Learning
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03100v1
PDF	https://arxiv.org/pdf/1911.03100v1.pdf
PWC	https://paperswithcode.com/paper/extracting-temporal-features-into-a-spatial
Repo	https://github.com/vlbthambawita/MedicoTask_2019_paper_2
Framework	pytorch

Automatic Annotation of Hip Anatomy in Fluoroscopy for Robust and Efficient 2D/3D Registration


Title	Automatic Annotation of Hip Anatomy in Fluoroscopy for Robust and Efficient 2D/3D Registration
Authors	Robert Grupp, Mathias Unberath, Cong Gao, Rachel Hegeman, Ryan Murphy, Clayton Alexander, Yoshito Otake, Benjamin McArthur, Mehran Armand, Russell Taylor
Abstract	Fluoroscopy is the standard imaging modality used to guide hip surgery and is therefore a natural sensor for computer-assisted navigation. In order to efficiently solve the complex registration problems presented during navigation, human-assisted annotations of the intraoperative image are typically required. This manual initialization interferes with the surgical workflow and diminishes any advantages gained from navigation. We propose a method for fully automatic registration using annotations produced by a neural network. Neural networks are trained to simultaneously segment anatomy and identify landmarks in fluoroscopy. Training data is obtained using an intraoperatively incompatible 2D/3D registration of hip anatomy. Ground truth 2D labels are established using projected 3D annotations. Intraoperative registration couples an intensity-based strategy with annotations inferred by the network and requires no human assistance. Ground truth labels were obtained in 366 fluoroscopic images across 6 cadaveric specimens. In a leave-one-subject-out experiment, networks obtained mean dice coefficients for left and right hemipelves, left and right femurs of 0.86, 0.87, 0.90, and 0.84. The mean 2D landmark error was 5.0 mm. The pelvis was registered within 1 degree for 86% of the images when using the proposed intraoperative approach with an average runtime of 7 seconds. In comparison, an intensity-only approach without manual initialization, registered the pelvis to 1 degree in 18% of images. We have created the first accurately annotated, non-synthetic, dataset of hip fluoroscopy. By using these annotations as training data for neural networks, state of the art performance in fluoroscopic segmentation and landmark localization was achieved. Integrating these annotations allows for a robust, fully automatic, and efficient intraoperative registration during fluoroscopic navigation of the hip.
Tasks
Published	2019-11-16
URL	https://arxiv.org/abs/1911.07042v2
PDF	https://arxiv.org/pdf/1911.07042v2.pdf
PWC	https://paperswithcode.com/paper/automatic-annotation-of-hip-anatomy-in
Repo	https://github.com/rg2/DeepFluoroLabeling-IPCAI2020
Framework	pytorch

Attacking Vision-based Perception in End-to-End Autonomous Driving Models


Title	Attacking Vision-based Perception in End-to-End Autonomous Driving Models
Authors	Adith Boloor, Karthik Garimella, Xin He, Christopher Gill, Yevgeniy Vorobeychik, Xuan Zhang
Abstract	Recent advances in machine learning, especially techniques such as deep neural networks, are enabling a range of emerging applications. One such example is autonomous driving, which often relies on deep learning for perception. However, deep learning-based perception has been shown to be vulnerable to a host of subtle adversarial manipulations of images. Nevertheless, the vast majority of such demonstrations focus on perception that is disembodied from end-to-end control. We present novel end-to-end attacks on autonomous driving in simulation, using simple physically realizable attacks: the painting of black lines on the road. These attacks target deep neural network models for end-to-end autonomous driving control. A systematic investigation shows that such attacks are easy to engineer, and we describe scenarios (e.g., right turns) in which they are highly effective. We define several objective functions that quantify the success of an attack and develop techniques based on Bayesian Optimization to efficiently traverse the search space of higher dimensional attacks. Additionally, we define a novel class of hijacking attacks, where painted lines on the road cause the driver-less car to follow a target path. Through the use of network deconvolution, we provide insights into the successful attacks, which appear to work by mimicking activations of entirely different scenarios. Our code is available at https://github.com/xz-group/AdverseDrive
Tasks	Autonomous Driving
Published	2019-10-02
URL	https://arxiv.org/abs/1910.01907v1
PDF	https://arxiv.org/pdf/1910.01907v1.pdf
PWC	https://paperswithcode.com/paper/attacking-vision-based-perception-in-end-to
Repo	https://github.com/xz-group/AdverseDrive
Framework	none

Optuna: A Next-generation Hyperparameter Optimization Framework


Title	Optuna: A Next-generation Hyperparameter Optimization Framework
Authors	Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, Masanori Koyama
Abstract	The purpose of this study is to introduce new design-criteria for next-generation hyperparameter optimization software. The criteria we propose include (1) define-by-run API that allows users to construct the parameter search space dynamically, (2) efficient implementation of both searching and pruning strategies, and (3) easy-to-setup, versatile architecture that can be deployed for various purposes, ranging from scalable distributed computing to light-weight experiment conducted via interactive interface. In order to prove our point, we will introduce Optuna, an optimization software which is a culmination of our effort in the development of a next generation optimization software. As an optimization software designed with define-by-run principle, Optuna is particularly the first of its kind. We will present the design-techniques that became necessary in the development of the software that meets the above criteria, and demonstrate the power of our new design through experimental results and real world applications. Our software is available under the MIT license (https://github.com/pfnet/optuna/).
Tasks	Hyperparameter Optimization
Published	2019-07-25
URL	https://arxiv.org/abs/1907.10902v1
PDF	https://arxiv.org/pdf/1907.10902v1.pdf
PWC	https://paperswithcode.com/paper/optuna-a-next-generation-hyperparameter
Repo	https://github.com/pfnet/optuna
Framework	tf

Adaptive Weighting Multi-Field-of-View CNN for Semantic Segmentation in Pathology


Title	Adaptive Weighting Multi-Field-of-View CNN for Semantic Segmentation in Pathology
Authors	Hiroki Tokunaga, Yuki Teramoto, Akihiko Yoshizawa, Ryoma Bise
Abstract	Automated digital histopathology image segmentation is an important task to help pathologists diagnose tumors and cancer subtypes. For pathological diagnosis of cancer subtypes, pathologists usually change the magnification of whole-slide images (WSI) viewers. A key assumption is that the importance of the magnifications depends on the characteristics of the input image, such as cancer subtypes. In this paper, we propose a novel semantic segmentation method, called Adaptive-Weighting-Multi-Field-of-View-CNN (AWMF-CNN), that can adaptively use image features from images with different magnifications to segment multiple cancer subtype regions in the input image. The proposed method aggregates several expert CNNs for images of different magnifications by adaptively changing the weight of each expert depending on the input image. It leverages information in the images with different magnifications that might be useful for identifying the subtypes. It outperformed other state-of-the-art methods in experiments.
Tasks	Semantic Segmentation
Published	2019-04-12
URL	http://arxiv.org/abs/1904.06040v1
PDF	http://arxiv.org/pdf/1904.06040v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-weighting-multi-field-of-view-cnn
Repo	https://github.com/t-hrk155/AWMF-CNN
Framework	tf

Understanding Composition of Word Embeddings via Tensor Decomposition


Title	Understanding Composition of Word Embeddings via Tensor Decomposition
Authors	Abraham Frandsen, Rong Ge
Abstract	Word embedding is a powerful tool in natural language processing. In this paper we consider the problem of word embedding composition -– given vector representations of two words, compute a vector for the entire phrase. We give a generative model that can capture specific syntactic relations between words. Under our model, we prove that the correlations between three words (measured by their PMI) form a tensor that has an approximate low rank Tucker decomposition. The result of the Tucker decomposition gives the word embeddings as well as a core tensor, which can be used to produce better compositions of the word embeddings. We also complement our theoretical results with experiments that verify our assumptions, and demonstrate the effectiveness of the new composition method.
Tasks	Word Embeddings
Published	2019-02-02
URL	http://arxiv.org/abs/1902.00613v1
PDF	http://arxiv.org/pdf/1902.00613v1.pdf
PWC	https://paperswithcode.com/paper/understanding-composition-of-word-embeddings
Repo	https://github.com/abefrandsen/syntactic-rand-walk
Framework	tf

Explainable Anatomical Shape Analysis through Deep Hierarchical Generative Models


Title	Explainable Anatomical Shape Analysis through Deep Hierarchical Generative Models
Authors	Carlo Biffi, Juan J. Cerrolaza, Giacomo Tarroni, Wenjia Bai, Antonio de Marvao, Ozan Oktay, Christian Ledig, Loic Le Folgoc, Konstantinos Kamnitsas, Georgia Doumou, Jinming Duan, Sanjay K. Prasad, Stuart A. Cook, Declan P. O’Regan, Daniel Rueckert
Abstract	Quantification of anatomical shape changes currently relies on scalar global indexes which are largely insensitive to regional or asymmetric modifications. Accurate assessment of pathology-driven anatomical remodeling is a crucial step for the diagnosis and treatment of many conditions. Deep learning approaches have recently achieved wide success in the analysis of medical images, but they lack interpretability in the feature extraction and decision processes. In this work, we propose a new interpretable deep learning model for shape analysis. In particular, we exploit deep generative networks to model a population of anatomical segmentations through a hierarchy of conditional latent variables. At the highest level of this hierarchy, a two-dimensional latent space is simultaneously optimised to discriminate distinct clinical conditions, enabling the direct visualisation of the classification space. Moreover, the anatomical variability encoded by this discriminative latent space can be visualised in the segmentation space thanks to the generative properties of the model, making the classification task transparent. This approach yielded high accuracy in the categorisation of healthy and remodelled left ventricles when tested on unseen segmentations from our own multi-centre dataset as well as in an external validation set, and on hippocampi from healthy controls and patients with Alzheimer’s disease when tested on ADNI data. More importantly, it enabled the visualisation in three-dimensions of both global and regional anatomical features which better discriminate between the conditions under exam. The proposed approach scales effectively to large populations, facilitating high-throughput analysis of normal anatomy and pathology in large-scale studies of volumetric imaging.
Tasks
Published	2019-06-28
URL	https://arxiv.org/abs/1907.00058v2
PDF	https://arxiv.org/pdf/1907.00058v2.pdf
PWC	https://paperswithcode.com/paper/explainable-shape-analysis-through-deep
Repo	https://github.com/UK-Digital-Heart-Project/lvae_mlp
Framework	tf

Measuring Patient Similarities via a Deep Architecture with Medical Concept Embedding


Title	Measuring Patient Similarities via a Deep Architecture with Medical Concept Embedding
Authors	Zihao Zhu, Changchang Yin, Buyue Qian, Yu Cheng, Jishang Wei, Fei Wang
Abstract	Evaluating the clinical similarities between pairwise patients is a fundamental problem in healthcare informatics. A proper patient similarity measure enables various downstream applications, such as cohort study and treatment comparative effectiveness research. One major carrier for conducting patient similarity research is Electronic Health Records(EHRs), which are usually heterogeneous, longitudinal, and sparse. Though existing studies on learning patient similarity from EHRs have shown being useful in solving real clinical problems, their applicability is limited due to the lack of medical interpretations. Moreover, most previous methods assume a vector-based representation for patients, which typically requires aggregation of medical events over a certain time period. As a consequence, temporal information will be lost. In this paper, we propose a patient similarity evaluation framework based on the temporal matching of longitudinal patient EHRs. Two efficient methods are presented, unsupervised and supervised, both of which preserve the temporal properties in EHRs. The supervised scheme takes a convolutional neural network architecture and learns an optimal representation of patient clinical records with medical concept embedding. The empirical results on real-world clinical data demonstrate substantial improvement over the baselines. We make our code and sample data available for further study.
Tasks
Published	2019-02-09
URL	http://arxiv.org/abs/1902.03376v1
PDF	http://arxiv.org/pdf/1902.03376v1.pdf
PWC	https://paperswithcode.com/paper/measuring-patient-similarities-via-a-deep
Repo	https://github.com/yinchangchang/patient_similarity
Framework	tf

Subspace Attack: Exploiting Promising Subspaces for Query-Efficient Black-box Attacks


Title	Subspace Attack: Exploiting Promising Subspaces for Query-Efficient Black-box Attacks
Authors	Ziang Yan, Yiwen Guo, Changshui Zhang
Abstract	Unlike the white-box counterparts that are widely studied and readily accessible, adversarial examples in black-box settings are generally more Herculean on account of the difficulty of estimating gradients. Many methods achieve the task by issuing numerous queries to target classification systems, which makes the whole procedure costly and suspicious to the systems. In this paper, we aim at reducing the query complexity of black-box attacks in this category. We propose to exploit gradients of a few reference models which arguably span some promising search subspaces. Experimental results show that, in comparison with the state-of-the-arts, our method can gain up to 2x and 4x reductions in the requisite mean and medium numbers of queries with much lower failure rates even if the reference models are trained on a small and inadequate dataset disjoint to the one for training the victim model. Code and models for reproducing our results will be made publicly available.
Tasks
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04392v1
PDF	https://arxiv.org/pdf/1906.04392v1.pdf
PWC	https://paperswithcode.com/paper/subspace-attack-exploiting-promising
Repo	https://github.com/ZiangYan/subspace-attack.pytorch
Framework	pytorch

Improving Question Answering over Incomplete KBs with Knowledge-Aware Reader


Title	Improving Question Answering over Incomplete KBs with Knowledge-Aware Reader
Authors	Wenhan Xiong, Mo Yu, Shiyu Chang, Xiaoxiao Guo, William Yang Wang
Abstract	We propose a new end-to-end question answering model, which learns to aggregate answer evidence from an incomplete knowledge base (KB) and a set of retrieved text snippets. Under the assumptions that the structured KB is easier to query and the acquired knowledge can help the understanding of unstructured text, our model first accumulates knowledge of entities from a question-related KB subgraph; then reformulates the question in the latent space and reads the texts with the accumulated entity knowledge at hand. The evidence from KB and texts are finally aggregated to predict answers. On the widely-used KBQA benchmark WebQSP, our model achieves consistent improvements across settings with different extents of KB incompleteness.
Tasks	Question Answering
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07098v2
PDF	https://arxiv.org/pdf/1905.07098v2.pdf
PWC	https://paperswithcode.com/paper/improving-question-answering-over-incomplete
Repo	https://github.com/dujiaxin/Knowledge-Aware-Reader
Framework	pytorch