October 20, 2019

3403 words 16 mins read

Paper Group ANR 63

GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification. Efficiently Learning Nonstationary Gaussian Processes for Real World Impact. The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation. Modeling Meaning Associated with Documental Entities: Introducing t …

GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification


Title	GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification
Authors	Maayan Frid-Adar, Idit Diamant, Eyal Klang, Michal Amitai, Jacob Goldberger, Hayit Greenspan
Abstract	Deep learning methods, and in particular convolutional neural networks (CNNs), have led to an enormous breakthrough in a wide range of computer vision tasks, primarily by using large-scale annotated datasets. However, obtaining such datasets in the medical domain remains a challenge. In this paper, we present methods for generating synthetic medical images using recently presented deep learning Generative Adversarial Networks (GANs). Furthermore, we show that generated medical images can be used for synthetic data augmentation, and improve the performance of CNN for medical image classification. Our novel method is demonstrated on a limited dataset of computed tomography (CT) images of 182 liver lesions (53 cysts, 64 metastases and 65 hemangiomas). We first exploit GAN architectures for synthesizing high quality liver lesion ROIs. Then we present a novel scheme for liver lesion classification using CNN. Finally, we train the CNN using classic data augmentation and our synthetic data augmentation and compare performance. In addition, we explore the quality of our synthesized examples using visualization and expert assessment. The classification performance using only classic data augmentation yielded 78.6% sensitivity and 88.4% specificity. By adding the synthetic data augmentation the results increased to 85.7% sensitivity and 92.4% specificity. We believe that this approach to synthetic data augmentation can generalize to other medical classification applications and thus support radiologists’ efforts to improve diagnosis.
Tasks	Computed Tomography (CT), Data Augmentation, Image Augmentation, Image Classification
Published	2018-03-03
URL	http://arxiv.org/abs/1803.01229v1
PDF	http://arxiv.org/pdf/1803.01229v1.pdf
PWC	https://paperswithcode.com/paper/gan-based-synthetic-medical-image
Repo
Framework

Efficiently Learning Nonstationary Gaussian Processes for Real World Impact


Title	Efficiently Learning Nonstationary Gaussian Processes for Real World Impact
Authors	Sahil Garg
Abstract	Most real world phenomena such as sunlight distribution under a forest canopy, minerals concentration, stock valuation, exhibit nonstationary dynamics i.e. phenomenon variation changes depending on the locality. Nonstationary dynamics pose both theoretical and practical challenges to statistical machine learning algorithms that aim to accurately capture the complexities governing the evolution of such processes. Typically the nonstationary dynamics are modeled using nonstationary Gaussian Process models (NGPS) that employ local latent dynamics parameterization to correspondingly model the nonstationary real observable dynamics. Recently, an approach based on most likely induced latent dynamics representation attracted research community’s attention for a while. The approach could not be employed for large scale real world applications because learning a most likely latent dynamics representation involves maximization of marginal likelihood of the observed real dynamics that becomes intractable as the number of induced latent points grows with problem size. We have established a direct relationship between informativeness of the induced latent dynamics and the marginal likelihood of the observed real dynamics. This opens up the possibility of maximizing marginal likelihood of observed real dynamics indirectly by near optimally maximizing entropy or mutual information gain on the induced latent dynamics using greedy algorithms. Therefore, for an efficient yet accurate inference, we propose to build an induced latent dynamics representation using a novel algorithm LISAL that adaptively maximizes entropy or mutual information on the induced latent dynamics and marginal likelihood of observed real dynamics in an iterative manner. The relevance of LISAL is validated using real world datasets.
Tasks	Gaussian Processes
Published	2018-04-27
URL	http://arxiv.org/abs/1804.10318v3
PDF	http://arxiv.org/pdf/1804.10318v3.pdf
PWC	https://paperswithcode.com/paper/efficiently-learning-nonstationary-gaussian
Repo
Framework

The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation


Title	The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation
Authors	Ke Chen, Weilin Zhang, Shlomo Dubnov, Gus Xia, Wei Li
Abstract	With recent breakthroughs in artificial neural networks, deep generative models have become one of the leading techniques for computational creativity. Despite very promising progress on image and short sequence generation, symbolic music generation remains a challenging problem since the structure of compositions are usually complicated. In this study, we attempt to solve the melody generation problem constrained by the given chord progression. This music meta-creation problem can also be incorporated into a plan recognition system with user inputs and predictive structural outputs. In particular, we explore the effect of explicit architectural encoding of musical structure via comparing two sequential generative models: LSTM (a type of RNN) and WaveNet (dilated temporal-CNN). As far as we know, this is the first study of applying WaveNet to symbolic music generation, as well as the first systematic comparison between temporal-CNN and RNN for music generation. We conduct a survey for evaluation in our generations and implemented Variable Markov Oracle in music pattern discovery. Experimental results show that to encode structure more explicitly using a stack of dilated convolution layers improved the performance significantly, and a global encoding of underlying chord progression into the generation procedure gains even more.
Tasks	Music Generation
Published	2018-11-20
URL	http://arxiv.org/abs/1811.08380v3
PDF	http://arxiv.org/pdf/1811.08380v3.pdf
PWC	https://paperswithcode.com/paper/the-effect-of-explicit-structure-encoding-of
Repo
Framework

Modeling Meaning Associated with Documental Entities: Introducing the Brussels Quantum Approach


Title	Modeling Meaning Associated with Documental Entities: Introducing the Brussels Quantum Approach
Authors	Diederik Aerts, Massimiliano Sassoli de Bianchi, Sandro Sozzo, Tomas Veloz
Abstract	We show that the Brussels operational-realistic approach to quantum physics and quantum cognition offers a fundamental strategy for modeling the meaning associated with collections of documental entities. To do so, we take the World Wide Web as a paradigmatic example and emphasize the importance of distinguishing the Web, made of printed documents, from a more abstract meaning entity, which we call the Quantum Web, or QWeb, where the former is considered to be the collection of traces that can be left by the latter, in specific measurements, similarly to how a non-spatial quantum entity, like an electron, can leave localized traces of impact on a detection screen. The double-slit experiment is extensively used to illustrate the rationale of the modeling, which is guided by how physicists constructed quantum theory to describe the behavior of the microscopic entities. We also emphasize that the superposition principle and the associated interference effects are not sufficient to model all experimental probabilistic data, like those obtained by counting the relative number of documents containing certain words and co-occurrences of words. For this, additional effects, like context effects, must also be taken into consideration.
Tasks
Published	2018-08-03
URL	http://arxiv.org/abs/1808.03677v1
PDF	http://arxiv.org/pdf/1808.03677v1.pdf
PWC	https://paperswithcode.com/paper/modeling-meaning-associated-with-documental
Repo
Framework

Toward Multimodal Interaction in Scalable Visual Digital Evidence Visualization Using Computer Vision Techniques and ISS


Title	Toward Multimodal Interaction in Scalable Visual Digital Evidence Visualization Using Computer Vision Techniques and ISS
Authors	Serguei A. Mokhov, Miao Song, Jashanjot Singh, Joey Paquet, Mourad Debbabi, Sudhir Mudur
Abstract	Visualization requirements in Forensic Lucid have to do with different levels of case knowledge abstraction, representation, aggregation, as well as the operational aspects as the final long-term goal of this proposal. It encompasses anything from the finer detailed representation of hierarchical contexts to Forensic Lucid programs, to the documented evidence and its management, its linkage to programs, to evaluation, and to the management of GIPSY software networks. This includes an ability to arbitrarily switch between those views combined with usable multimodal interaction. The purpose is to determine how the findings can be applied to Forensic Lucid and investigation case management. It is also natural to want a convenient and usable evidence visualization, its semantic linkage and the reasoning machinery for it. Thus, we propose a scalable management, visualization, and evaluation of digital evidence using the modified interactive 3D documentary system - Illimitable Space System - (ISS) to represent, semantically link, and provide a usable interface to digital investigators that is navigable via different multimodal interaction techniques using Computer Vision techniques including gestures, as well as eye-gaze and audio.
Tasks
Published	2018-08-01
URL	http://arxiv.org/abs/1808.00118v1
PDF	http://arxiv.org/pdf/1808.00118v1.pdf
PWC	https://paperswithcode.com/paper/toward-multimodal-interaction-in-scalable
Repo
Framework

Deep-neural-network based sinogram synthesis for sparse-view CT image reconstruction


Title	Deep-neural-network based sinogram synthesis for sparse-view CT image reconstruction
Authors	Hoyeon Lee, Jongha Lee, Hyeongseok Kim, Byungchul Cho, Seungryong Cho
Abstract	Recently, a number of approaches to low-dose computed tomography (CT) have been developed and deployed in commercialized CT scanners. Tube current reduction is perhaps the most actively explored technology with advanced image reconstruction algorithms. Sparse data sampling is another viable option to the low-dose CT, and sparse-view CT has been particularly of interest among the researchers in CT community. Since analytic image reconstruction algorithms would lead to severe image artifacts, various iterative algorithms have been developed for reconstructing images from sparsely view-sampled projection data. However, iterative algorithms take much longer computation time than the analytic algorithms, and images are usually prone to different types of image artifacts that heavily depend on the reconstruction parameters. Interpolation methods have also been utilized to fill the missing data in the sinogram of sparse-view CT thus providing synthetically full data for analytic image reconstruction. In this work, we introduce a deep-neural-network-enabled sinogram synthesis method for sparse-view CT, and show its outperformance to the existing interpolation methods and also to the iterative image reconstruction approach.
Tasks	Computed Tomography (CT), Image Reconstruction
Published	2018-03-02
URL	http://arxiv.org/abs/1803.00694v2
PDF	http://arxiv.org/pdf/1803.00694v2.pdf
PWC	https://paperswithcode.com/paper/deep-neural-network-based-sinogram-synthesis
Repo
Framework

Unconstrained Iris Segmentation using Convolutional Neural Networks


Title	Unconstrained Iris Segmentation using Convolutional Neural Networks
Authors	Sohaib Ahmad, Benjamin Fuller
Abstract	The extraction of consistent and identifiable features from an image of the human iris is known as iris recognition. Identifying which pixels belong to the iris, known as segmentation, is the first stage of iris recognition. Errors in segmentation propagate to later stages. Current segmentation approaches are tuned to specific environments. We propose using a convolution neural network for iris segmentation. Our algorithm is accurate when trained in a single environment and tested in multiple environments. Our network builds on the Mask R-CNN framework (He et al., ICCV 2017). Our approach segments faster than previous approaches including the Mask R-CNN network. Our network is accurate when trained on a single environment and tested with a different sensors (either visible light or near-infrared). Its accuracy degrades when trained with a visible light sensor and tested with a near-infrared sensor (and vice versa). A small amount of retraining of the visible light model (using a few samples from a near-infrared dataset) yields a tuned network accurate in both settings. For training and testing, this work uses the Casia v4 Interval, Notre Dame 0405, Ubiris v2, and IITD datasets.
Tasks	Iris Recognition, Iris Segmentation
Published	2018-12-19
URL	http://arxiv.org/abs/1812.08245v1
PDF	http://arxiv.org/pdf/1812.08245v1.pdf
PWC	https://paperswithcode.com/paper/unconstrained-iris-segmentation-using
Repo
Framework

Speech recognition with quaternion neural networks


Title	Speech recognition with quaternion neural networks
Authors	Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Renato De Mori
Abstract	Neural network architectures are at the core of powerful automatic speech recognition systems (ASR). However, while recent researches focus on novel model architectures, the acoustic input features remain almost unchanged. Traditional ASR systems rely on multidimensional acoustic features such as the Mel filter bank energies alongside with the first, and second order derivatives to characterize time-frames that compose the signal sequence. Considering that these components describe three different views of the same element, neural networks have to learn both the internal relations that exist within these features, and external or global dependencies that exist between the time-frames. Quaternion-valued neural networks (QNN), recently received an important interest from researchers to process and learn such relations in multidimensional spaces. Indeed, quaternion numbers and QNNs have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies, and to solve many tasks with up to four times less learning parameters than real-valued models. We propose to investigate modern quaternion-valued models such as convolutional and recurrent quaternion neural networks in the context of speech recognition with the TIMIT dataset. The experiments show that QNNs always outperform real-valued equivalent models with way less free parameters, leading to a more efficient, compact, and expressive representation of the relevant information.
Tasks	Speech Recognition
Published	2018-11-21
URL	http://arxiv.org/abs/1811.09678v1
PDF	http://arxiv.org/pdf/1811.09678v1.pdf
PWC	https://paperswithcode.com/paper/speech-recognition-with-quaternion-neural
Repo
Framework

Distance Based Source Domain Selection for Sentiment Classification


Title	Distance Based Source Domain Selection for Sentiment Classification
Authors	Lex Razoux Schultz, Marco Loog, Peyman Mohajerin Esfahani
Abstract	Automated sentiment classification (SC) on short text fragments has received increasing attention in recent years. Performing SC on unseen domains with few or no labeled samples can significantly affect the classification performance due to different expression of sentiment in source and target domain. In this study, we aim to mitigate this undesired impact by proposing a methodology based on a predictive measure, which allows us to select an optimal source domain from a set of candidates. The proposed measure is a linear combination of well-known distance functions between probability distributions supported on the source and target domains (e.g. Earth Mover’s distance and Kullback-Leibler divergence). The performance of the proposed methodology is validated through an SC case study in which our numerical experiments suggest a significant improvement in the cross domain classification error in comparison with a random selected source domain for both a naive and adaptive learning setting. In the case of more heterogeneous datasets, the predictability feature of the proposed model can be utilized to further select a subset of candidate domains, where the corresponding classifier outperforms the one trained on all available source domains. This observation reinforces a hypothesis that our proposed model may also be deployed as a means to filter out redundant information during a training phase of SC.
Tasks	Sentiment Analysis
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09271v1
PDF	http://arxiv.org/pdf/1808.09271v1.pdf
PWC	https://paperswithcode.com/paper/distance-based-source-domain-selection-for
Repo
Framework

Biomedical Question Answering via Weighted Neural Network Passage Retrieval


Title	Biomedical Question Answering via Weighted Neural Network Passage Retrieval
Authors	Ferenc Galkó, Carsten Eickhoff
Abstract	The amount of publicly available biomedical literature has been growing rapidly in recent years, yet question answering systems still struggle to exploit the full potential of this source of data. In a preliminary processing step, many question answering systems rely on retrieval models for identifying relevant documents and passages. This paper proposes a weighted cosine distance retrieval scheme based on neural network word embeddings. Our experiments are based on publicly available data and tasks from the BioASQ biomedical question answering challenge and demonstrate significant performance gains over a wide range of state-of-the-art models.
Tasks	Question Answering, Word Embeddings
Published	2018-01-09
URL	http://arxiv.org/abs/1801.02832v1
PDF	http://arxiv.org/pdf/1801.02832v1.pdf
PWC	https://paperswithcode.com/paper/biomedical-question-answering-via-weighted
Repo
Framework

Snap Angle Prediction for 360$^{\circ}$ Panoramas


Title	Snap Angle Prediction for 360$^{\circ}$ Panoramas
Authors	Bo Xiong, Kristen Grauman
Abstract	360$^{\circ}$ panoramas are a rich medium, yet notoriously difficult to visualize in the 2D image plane. We explore how intelligent rotations of a spherical image may enable content-aware projection with fewer perceptible distortions. Whereas existing approaches assume the viewpoint is fixed, intuitively some viewing angles within the sphere preserve high-level objects better than others. To discover the relationship between these optimal snap angles and the spherical panorama’s content, we develop a reinforcement learning approach for the cubemap projection model. Implemented as a deep recurrent neural network, our method selects a sequence of rotation actions and receives reward for avoiding cube boundaries that overlap with important foreground objects. We show our approach creates more visually pleasing panoramas while using 5x less computation than the baseline.
Tasks
Published	2018-03-31
URL	http://arxiv.org/abs/1804.00126v2
PDF	http://arxiv.org/pdf/1804.00126v2.pdf
PWC	https://paperswithcode.com/paper/snap-angle-prediction-for-360circ-panoramas
Repo
Framework

Genaue modellbasierte Identifikation von gynäkologischen Katheterpfaden für die MRT-bildgestützte Brachytherapie


Title	Genaue modellbasierte Identifikation von gynäkologischen Katheterpfaden für die MRT-bildgestützte Brachytherapie
Authors	Andre Mastmeyer
Abstract	German text, english abstract: Mortality in gynecologic cancers, including cervical, ovarian, vaginal and vulvar cancers, is more than 6% internationally [1]. In many countries external radiotherapy is supplemented by brachytherapy with high locally administered doses as standard. The superior ability of magnetic resonance imaging (MRI) to differentiate soft tissue has led to an increasing use of this imaging technique in the intraoperative planning and implementation of brachytherapy. A technical challenge associated with the use of MRI imaging for brachytherapy - in contrast to computed tomography (CT) imaging - is the dark-diffuse appearance and thus difficult identification of the catheter paths in the resulting images. This problem is addressed by the precise method described herein of tracing the catheters from the catheter tip. The average identification time for a single catheter path was three seconds on a standard PC. Segmentation time, accuracy and precision are promising indicators of the value of this method for the clinical application of image-guided gynecological brachytherapy. After surgery (OP), the healthy surrounding tissue of the tumor is usually irradiated. This reduces the risk of leaving behind residual cells that would likely cause a recurrence of the cancer or the formation of metastases - secondary tumors elsewhere in the body. In the case of a tumor on the cervix or prostate, the operation is minimally invasive, ie. the removal of the cancer and the irradiation are performed cost-effectively and risk-avoiding by keyhole surgery instead of open surgery.
Tasks	Computed Tomography (CT)
Published	2018-02-27
URL	http://arxiv.org/abs/1803.00492v2
PDF	http://arxiv.org/pdf/1803.00492v2.pdf
PWC	https://paperswithcode.com/paper/genaue-modellbasierte-identifikation-von
Repo
Framework

Translating and Segmenting Multimodal Medical Volumes with Cycle- and Shape-Consistency Generative Adversarial Network


Title	Translating and Segmenting Multimodal Medical Volumes with Cycle- and Shape-Consistency Generative Adversarial Network
Authors	Zizhao Zhang, Lin Yang, Yefeng Zheng
Abstract	Synthesized medical images have several important applications, e.g., as an intermedium in cross-modality image registration and as supplementary training samples to boost the generalization capability of a classifier. Especially, synthesized computed tomography (CT) data can provide X-ray attenuation map for radiation therapy planning. In this work, we propose a generic cross-modality synthesis approach with the following targets: 1) synthesizing realistic looking 3D images using unpaired training data, 2) ensuring consistent anatomical structures, which could be changed by geometric distortion in cross-modality synthesis and 3) improving volume segmentation by using synthetic data for modalities with limited training samples. We show that these goals can be achieved with an end-to-end 3D convolutional neural network (CNN) composed of mutually-beneficial generators and segmentors for image synthesis and segmentation tasks. The generators are trained with an adversarial loss, a cycle-consistency loss, and also a shape-consistency loss, which is supervised by segmentors, to reduce the geometric distortion. From the segmentation view, the segmentors are boosted by synthetic data from generators in an online manner. Generators and segmentors prompt each other alternatively in an end-to-end training fashion. With extensive experiments on a dataset including a total of 4,496 CT and magnetic resonance imaging (MRI) cardiovascular volumes, we show both tasks are beneficial to each other and coupling these two tasks results in better performance than solving them exclusively.
Tasks	Computed Tomography (CT), Image Generation, Image Registration
Published	2018-02-27
URL	http://arxiv.org/abs/1802.09655v2
PDF	http://arxiv.org/pdf/1802.09655v2.pdf
PWC	https://paperswithcode.com/paper/translating-and-segmenting-multimodal-medical
Repo
Framework

Deep Structure Inference Network for Facial Action Unit Recognition


Title	Deep Structure Inference Network for Facial Action Unit Recognition
Authors	Ciprian A. Corneanu, Meysam Madadi, Sergio Escalera
Abstract	Facial expressions are combinations of basic components called Action Units (AU). Recognizing AUs is key for developing general facial expression analysis. In recent years, most efforts in automatic AU recognition have been dedicated to learning combinations of local features and to exploiting correlations between Action Units. In this paper, we propose a deep neural architecture that tackles both problems by combining learned local and global features in its initial stages and replicating a message passing algorithm between classes similar to a graphical model inference approach in later stages. We show that by training the model end-to-end with increased supervision we improve state-of-the-art by 5.3% and 8.2% performance on BP4D and DISFA datasets, respectively.
Tasks	Facial Action Unit Detection
Published	2018-03-15
URL	http://arxiv.org/abs/1803.05873v2
PDF	http://arxiv.org/pdf/1803.05873v2.pdf
PWC	https://paperswithcode.com/paper/deep-structure-inference-network-for-facial
Repo
Framework

Adversarial Defense by Stratified Convolutional Sparse Coding


Title	Adversarial Defense by Stratified Convolutional Sparse Coding
Authors	Bo Sun, Nian-hsuan Tsai, Fangchen Liu, Ronald Yu, Hao Su
Abstract	We propose an adversarial defense method that achieves state-of-the-art performance among attack-agnostic adversarial defense methods while also maintaining robustness to input resolution, scale of adversarial perturbation, and scale of dataset size. Based on convolutional sparse coding, we construct a stratified low-dimensional quasi-natural image space that faithfully approximates the natural image space while also removing adversarial perturbations. We introduce a novel Sparse Transformation Layer (STL) in between the input image and the first layer of the neural network to efficiently project images into our quasi-natural image space. Our experiments show state-of-the-art performance of our method compared to other attack-agnostic adversarial defense methods in various adversarial settings.
Tasks	Adversarial Defense
Published	2018-11-30
URL	https://arxiv.org/abs/1812.00037v2
PDF	https://arxiv.org/pdf/1812.00037v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-defense-by-stratified
Repo
Framework