January 26, 2020

3039 words 15 mins read

Paper Group ANR 1360

Paper Group ANR 1360

Reconstruction of Undersampled 3D Non-Cartesian Image-Based Navigators for Coronary MRA Using an Unrolled Deep Learning Model. Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval. Variational Spectral Graph Convolutional Networks. Deep Physiological State Space Model for Clinical Forecasting. BAOD: Budget-Aware Object Detect …

Reconstruction of Undersampled 3D Non-Cartesian Image-Based Navigators for Coronary MRA Using an Unrolled Deep Learning Model

Title Reconstruction of Undersampled 3D Non-Cartesian Image-Based Navigators for Coronary MRA Using an Unrolled Deep Learning Model
Authors Mario O. Malavé, Corey A. Baron, Srivathsan P. Koundinyan, Christopher M. Sandino, Frank Ong, Joseph Y. Cheng, Dwight G. Nishimura
Abstract Purpose: To rapidly reconstruct undersampled 3D non-Cartesian image-based navigators (iNAVs) using an unrolled deep learning (DL) model for non-rigid motion correction in coronary magnetic resonance angiography (CMRA). Methods: An unrolled network is trained to reconstruct beat-to-beat 3D iNAVs acquired as part of a CMRA sequence. The unrolled model incorporates a non-uniform FFT operator to perform the data consistency operation, and the regularization term is learned by a convolutional neural network (CNN) based on the proximal gradient descent algorithm. The training set includes 6,000 3D iNAVs acquired from 7 different subjects and 11 scans using a variable-density (VD) cones trajectory. For testing, 3D iNAVs from 4 additional subjects are reconstructed using the unrolled model. To validate reconstruction accuracy, global and localized motion estimates from DL model-based 3D iNAVs are compared with those extracted from 3D iNAVs reconstructed with $\textit{l}{1}$-ESPIRiT. Then, the high-resolution coronary MRA images motion corrected with autofocusing using the $\textit{l}{1}$-ESPIRiT and DL model-based 3D iNAVs are assessed for differences. Results: 3D iNAVs reconstructed using the DL model-based approach and conventional $\textit{l}{1}$-ESPIRiT generate similar global and localized motion estimates and provide equivalent coronary image quality. Reconstruction with the unrolled network completes in a fraction of the time compared to CPU and GPU implementations of $\textit{l}{1}$-ESPIRiT (20x and 3x speed increases, respectively). Conclusion: We have developed a deep neural network architecture to reconstruct undersampled 3D non-Cartesian VD cones iNAVs. Our approach decreases reconstruction time for 3D iNAVs, while preserving the accuracy of non-rigid motion information offered by them for correction.
Tasks
Published 2019-10-24
URL https://arxiv.org/abs/1910.11414v1
PDF https://arxiv.org/pdf/1910.11414v1.pdf
PWC https://paperswithcode.com/paper/reconstruction-of-undersampled-3d-non
Repo
Framework

Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval

Title Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval
Authors Tao Yao, Xiangwei Kong, Lianshan Yan, Wenjing Tang, Qi Tian
Abstract Supervised cross-modal hashing has gained increasing research interest on large-scale retrieval task owning to its satisfactory performance and efficiency. However, it still has some challenging issues to be further studied: 1) most of them fail to well preserve the semantic correlations in hash codes because of the large heterogenous gap; 2) most of them relax the discrete constraint on hash codes, leading to large quantization error and consequent low performance; 3) most of them suffer from relatively high memory cost and computational complexity during training procedure, which makes them unscalable. In this paper, to address above issues, we propose a supervised cross-modal hashing method based on matrix factorization dubbed Efficient Discrete Supervised Hashing (EDSH). Specifically, collective matrix factorization on heterogenous features and semantic embedding with class labels are seamlessly integrated to learn hash codes. Therefore, the feature based similarities and semantic correlations can be both preserved in hash codes, which makes the learned hash codes more discriminative. Then an efficient discrete optimal algorithm is proposed to handle the scalable issue. Instead of learning hash codes bit-by-bit, hash codes matrix can be obtained directly which is more efficient. Extensive experimental results on three public real-world datasets demonstrate that EDSH produces a superior performance in both accuracy and scalability over some existing cross-modal hashing methods.
Tasks Cross-Modal Retrieval, Quantization
Published 2019-05-03
URL https://arxiv.org/abs/1905.01304v1
PDF https://arxiv.org/pdf/1905.01304v1.pdf
PWC https://paperswithcode.com/paper/efficient-discrete-supervised-hashing-for
Repo
Framework

Variational Spectral Graph Convolutional Networks

Title Variational Spectral Graph Convolutional Networks
Authors Louis Tiao, Pantelis Elinas, Harrison Nguyen, Edwin V. Bonilla
Abstract We propose a Bayesian approach to spectral graph convolutional networks (GCNs) where the graph parameters are considered as random variables. We develop an inference algorithm to estimate the posterior over these parameters and use it to incorporate prior information that is not naturally considered by standard GCN. The key to our approach is to define a smooth posterior parameterization over the adjacency matrix characterizing the graph, which we estimate via stochastic variational inference. Our experiments show that we can outperform standard GCN methods in the task of semi-supervised classification in noisy-graph regimes.
Tasks
Published 2019-06-05
URL https://arxiv.org/abs/1906.01852v1
PDF https://arxiv.org/pdf/1906.01852v1.pdf
PWC https://paperswithcode.com/paper/variational-spectral-graph-convolutional
Repo
Framework

Deep Physiological State Space Model for Clinical Forecasting

Title Deep Physiological State Space Model for Clinical Forecasting
Authors Yuan Xue, Denny Zhou, Nan Du, Andrew Dai, Zhen Xu, Kun Zhang, Claire Cui
Abstract Clinical forecasting based on electronic medical records (EMR) can uncover the temporal correlations between patients’ conditions and outcomes from sequences of longitudinal clinical measurements. In this work, we propose an intervention-augmented deep state space generative model to capture the interactions among clinical measurements and interventions by explicitly modeling the dynamics of patients’ latent states. Based on this model, we are able to make a joint prediction of the trajectories of future observations and interventions. Empirical evaluations show that our proposed model compares favorably to several state-of-the-art methods on real EMR data.
Tasks
Published 2019-12-04
URL https://arxiv.org/abs/1912.01762v1
PDF https://arxiv.org/pdf/1912.01762v1.pdf
PWC https://paperswithcode.com/paper/deep-physiological-state-space-model-for
Repo
Framework

BAOD: Budget-Aware Object Detection

Title BAOD: Budget-Aware Object Detection
Authors Alejandro Pardo, Mengmeng Xu, Ali Thabet, Pablo Arbelaez, Bernard Ghanem
Abstract We study the problem of object detection from a novel perspective in which annotation budget constraints are taken into consideration, appropriately coined Budget Aware Object Detection (BAOD). When provided with a fixed budget, we propose a strategy for building a diverse and informative dataset that can be used to optimally train a robust detector. We investigate both optimization and learning-based methods to sample which images to annotate and what type of annotation (strongly or weakly supervised) to annotate them with. We adopt a hybrid supervised learning framework to train the object detector from both these types of annotation. We conduct a comprehensive empirical study showing that a handcrafted optimization method outperforms other selection techniques including random sampling, uncertainty sampling and active learning. By combining an optimal image/annotation selection scheme with hybrid supervised learning to solve the BAOD problem, we show that one can achieve the performance of a strongly supervised detector on PASCAL-VOC 2007 while saving 12.8% of its original annotation budget. Furthermore, when $100%$ of the budget is used, it surpasses this performance by 2.0 mAP percentage points.
Tasks Active Learning, Object Detection
Published 2019-04-10
URL http://arxiv.org/abs/1904.05443v1
PDF http://arxiv.org/pdf/1904.05443v1.pdf
PWC https://paperswithcode.com/paper/baod-budget-aware-object-detection
Repo
Framework

Developing an App to interpret Chest X-rays to support the diagnosis of respiratory pathology with Artificial Intelligence

Title Developing an App to interpret Chest X-rays to support the diagnosis of respiratory pathology with Artificial Intelligence
Authors Andrew Elkins, Felipe F. Freitas, Veronica Sanz
Abstract In this paper we present our work to improve access to diagnosis in remote areas where good quality medical services may be lacking. We develop new Machine Learning methodologies for deployment onto mobile devices to help the early diagnosis of a number of life-threatening conditions using X-ray images. By using the latest developments in fast and portable Artificial Intelligence environments, we develop a smartphone app using an Artificial Neural Network to assist physicians in their diagnostic.
Tasks
Published 2019-06-26
URL https://arxiv.org/abs/1906.11282v1
PDF https://arxiv.org/pdf/1906.11282v1.pdf
PWC https://paperswithcode.com/paper/developing-an-app-to-interpret-chest-x-rays
Repo
Framework

FairST: Equitable Spatial and Temporal Demand Prediction for New Mobility Systems

Title FairST: Equitable Spatial and Temporal Demand Prediction for New Mobility Systems
Authors An Yan, Bill Howe
Abstract Emerging transportation modes, including car-sharing, bike-sharing, and ride-hailing, are transforming urban mobility but have been shown to reinforce socioeconomic inequities. Spatiotemporal demand prediction models for these new mobility regimes must therefore consider fairness as a first-class design requirement. We present FairST, a fairness-aware model for predicting demand for new mobility systems. Our approach utilizes 1D, 2D and 3D convolutions to integrate various urban features and learn the spatial-temporal dynamics of a mobility system, but we include fairness metrics as a form of regularization to make the predictions more equitable across demographic groups. We propose two novel spatiotemporal fairness metrics, a region-based fairness gap (RFG) and an individual-based fairness gap (IFG). Both quantify equity in a spatiotemporal context, but vary by whether demographics are labeled at the region level (RFG) or whether population distribution information is available (IFG). Experimental results on real bike share and ride share datasets demonstrate the effectiveness of the proposed model: FairST not only reduces the fairness gap by more than 80%, but can surprisingly achieve better accuracy than state-of-the-art yet fairness-oblivious methods including LSTMs, ConvLSTMs, and 3D CNN.
Tasks
Published 2019-06-21
URL https://arxiv.org/abs/1907.03827v1
PDF https://arxiv.org/pdf/1907.03827v1.pdf
PWC https://paperswithcode.com/paper/fairst-equitable-spatial-and-temporal-demand
Repo
Framework

Deep Learning Based Segmentation Free License Plate Recognition Using Roadway Surveillance Camera Images

Title Deep Learning Based Segmentation Free License Plate Recognition Using Roadway Surveillance Camera Images
Authors Alperen Elihos, Burak Balci, Bensu Alkan, Yusuf Artan
Abstract Smart automated traffic enforcement solutions have been gaining popularity in recent years. These solutions are ubiquitously used for seat-belt violation detection, red-light violation detection and speed violation detection purposes. Highly accurate license plate recognition is an indispensable part of these systems. However, general license plate recognition systems require high resolution images for high performance. In this study, we propose a novel license plate recognition method for general roadway surveillance cameras. Proposed segmentation free license plate recognition algorithm utilizes deep learning based object detection techniques in the character detection and recognition process. Proposed method has been tested on 2000 images captured on a roadway.
Tasks License Plate Recognition, Object Detection
Published 2019-12-05
URL https://arxiv.org/abs/1912.02441v1
PDF https://arxiv.org/pdf/1912.02441v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-based-segmentation-free-license
Repo
Framework

3DFaceGAN: Adversarial Nets for 3D Face Representation, Generation, and Translation

Title 3DFaceGAN: Adversarial Nets for 3D Face Representation, Generation, and Translation
Authors Stylianos Moschoglou, Stylianos Ploumpis, Mihalis Nicolaou, Athanasios Papaioannou, Stefanos Zafeiriou
Abstract Over the past few years, Generative Adversarial Networks (GANs) have garnered increased interest among researchers in Computer Vision, with applications including, but not limited to, image generation, translation, imputation, and super-resolution. Nevertheless, no GAN-based method has been proposed in the literature that can successfully represent, generate or translate 3D facial shapes (meshes). This can be primarily attributed to two facts, namely that (a) publicly available 3D face databases are scarce as well as limited in terms of sample size and variability (e.g., few subjects, little diversity in race and gender), and (b) mesh convolutions for deep networks present several challenges that are not entirely tackled in the literature, leading to operator approximations and model instability, often failing to preserve high-frequency components of the distribution. As a result, linear methods such as Principal Component Analysis (PCA) have been mainly utilized towards 3D shape analysis, despite being unable to capture non-linearities and high frequency details of the 3D face - such as eyelid and lip variations. In this work, we present 3DFaceGAN, the first GAN tailored towards modeling the distribution of 3D facial surfaces, while retaining the high frequency details of 3D face shapes. We conduct an extensive series of both qualitative and quantitative experiments, where the merits of 3DFaceGAN are clearly demonstrated against other, state-of-the-art methods in tasks such as 3D shape representation, generation, and translation.
Tasks 3D Shape Analysis, 3D Shape Representation, Image Generation, Imputation, Super-Resolution
Published 2019-05-01
URL https://arxiv.org/abs/1905.00307v2
PDF https://arxiv.org/pdf/1905.00307v2.pdf
PWC https://paperswithcode.com/paper/3dfacegan-adversarial-nets-for-3d-face
Repo
Framework

Graph-based Spatial-temporal Feature Learning for Neuromorphic Vision Sensing

Title Graph-based Spatial-temporal Feature Learning for Neuromorphic Vision Sensing
Authors Yin Bi, Aaron Chadha, Alhabib Abbas, Eirina Bourtsoulatze, Yiannis Andreopoulos
Abstract Neuromorphic vision sensing (NVS)\ devices represent visual information as sequences of asynchronous discrete events (a.k.a., “spikes”) in response to changes in scene reflectance. Unlike conventional active pixel sensing (APS), NVS allows for significantly higher event sampling rates at substantially increased energy efficiency and robustness to illumination changes. However, feature representation for NVS is far behind its APS-based counterparts, resulting in lower performance in high-level computer vision tasks. To fully utilize its sparse and asynchronous nature, we propose a compact graph representation for NVS, which allows for end-to-end learning with graph convolution neural networks. We couple this with a novel end-to-end feature learning framework that accommodates both appearance-based and motion-based tasks. The core of our framework comprises a spatial feature learning module, which utilizes residual-graph convolutional neural networks (RG-CNN), for end-to-end learning of appearance-based features directly from graphs. We extend this with our proposed Graph2Grid block and temporal feature learning module for efficiently modelling temporal dependencies over multiple graphs and a long temporal extent. We show how our framework can be configured for object classification, action recognition and action similarity labeling. Importantly, our approach preserves the spatial and temporal coherence of spike events, while requiring less computation and memory. The experimental validation shows that our proposed framework outperforms all recent methods on standard datasets. Finally, to address the absence of large real-world NVS datasets for complex recognition tasks, we introduce, evaluate and make available the American Sign Language letters (ASL-DVS), as well as human action dataset (UCF101-DVS, HMDB51-DVS and ASLAN-DVS).
Tasks Object Classification
Published 2019-10-08
URL https://arxiv.org/abs/1910.03579v2
PDF https://arxiv.org/pdf/1910.03579v2.pdf
PWC https://paperswithcode.com/paper/graph-based-spatial-temporal-feature-learning
Repo
Framework

Experiments on Open-Set Speaker Identification with Discriminatively Trained Neural Networks

Title Experiments on Open-Set Speaker Identification with Discriminatively Trained Neural Networks
Authors Stefano Imoscopi, Volodya Grancharov, Sigurdur Sverrisson, Erlendur Karlsson, Harald Pobloth
Abstract This paper presents a study on discriminative artificial neural network classifiers in the context of open-set speaker identification. Both 2-class and multi-class architectures are tested against the conventional Gaussian mixture model based classifier on enrolled speaker sets of different sizes. The performance evaluation shows that the multi-class neural network system has superior performance for large population sizes.
Tasks Speaker Identification
Published 2019-04-02
URL http://arxiv.org/abs/1904.01269v1
PDF http://arxiv.org/pdf/1904.01269v1.pdf
PWC https://paperswithcode.com/paper/experiments-on-open-set-speaker
Repo
Framework

Advanced Rich Transcription System for Estonian Speech

Title Advanced Rich Transcription System for Estonian Speech
Authors Tanel Alumäe, Ottokar Tilk, Asadullah
Abstract This paper describes the current TT"U speech transcription system for Estonian speech. The system is designed to handle semi-spontaneous speech, such as broadcast conversations, lecture recordings and interviews recorded in diverse acoustic conditions. The system is based on the Kaldi toolkit. Multi-condition training using background noise profiles extracted automatically from untranscribed data is used to improve the robustness of the system. Out-of-vocabulary words are recovered using a phoneme n-gram based decoding subgraph and a FST-based phoneme-to-grapheme model. The system achieves a word error rate of 8.1% on a test set of broadcast conversations. The system also performs punctuation recovery and speaker identification. Speaker identification models are trained using a recently proposed weakly supervised training method.
Tasks Speaker Identification
Published 2019-01-11
URL http://arxiv.org/abs/1901.03601v1
PDF http://arxiv.org/pdf/1901.03601v1.pdf
PWC https://paperswithcode.com/paper/advanced-rich-transcription-system-for
Repo
Framework

Cross-Cutting Political Awareness through Diverse News Recommendations

Title Cross-Cutting Political Awareness through Diverse News Recommendations
Authors Bibek Paudel, Abraham Bernstein
Abstract The suggestions generated by most existing recommender systems are known to suffer from a lack of diversity, and other issues like popularity bias. As a result, they have been observed to promote well-known “blockbuster” items, and to present users with “more of the same” choices that entrench their existing beliefs and biases. This limits users’ exposure to diverse viewpoints and potentially increases political polarization. To promote the diversity of views, we developed a novel computational framework that can identify the political leanings of users and the news items they share on online social networks. Based on such information, our system can recommend news items that purposefully expose users to different viewpoints and increase the diversity of their information “diet.” Our research on recommendation diversity and political polarization helps us to develop algorithms that measure each user’s reaction %to diverse viewpoints and adjust the recommendation accordingly. The result is an approach that exposes users to a variety of political views and will, hopefully, broaden their acceptance (not necessarily the agreement) of various opinions.
Tasks Recommendation Systems
Published 2019-09-03
URL https://arxiv.org/abs/1909.01495v1
PDF https://arxiv.org/pdf/1909.01495v1.pdf
PWC https://paperswithcode.com/paper/cross-cutting-political-awareness-through
Repo
Framework

Underwhelming Generalization Improvements From Controlling Feature Attribution

Title Underwhelming Generalization Improvements From Controlling Feature Attribution
Authors Joseph D. Viviano, Becks Simpson, Francis Dutil, Yoshua Bengio, Joseph Paul Cohen
Abstract Overfitting is a common issue in machine learning, which can arise when the model learns to predict class membership using convenient but spuriously-correlated image features instead of the true image features that denote a class. These are typically visualized using saliency maps. In some object classification tasks such as for medical images, one may have some images with masks, indicating a region of interest, i.e., which part of the image contains the most relevant information for the classification. We describe a simple method for taking advantage of such auxiliary labels, by training networks to ignore the distracting features which may be extracted outside of the region of interest, on the training images for which such masks are available. This mask information is only used during training and has an impact on generalization accuracy in a dataset-dependent way. We observe an underwhelming relationship between controlling saliency maps and improving generalization performance.
Tasks Object Classification
Published 2019-10-01
URL https://arxiv.org/abs/1910.00199v1
PDF https://arxiv.org/pdf/1910.00199v1.pdf
PWC https://paperswithcode.com/paper/underwhelming-generalization-improvements
Repo
Framework

Figure Captioning with Reasoning and Sequence-Level Training

Title Figure Captioning with Reasoning and Sequence-Level Training
Authors Charles Chen, Ruiyi Zhang, Eunyee Koh, Sungchul Kim, Scott Cohen, Tong Yu, Ryan Rossi, Razvan Bunescu
Abstract Figures, such as bar charts, pie charts, and line plots, are widely used to convey important information in a concise format. They are usually human-friendly but difficult for computers to process automatically. In this work, we investigate the problem of figure captioning where the goal is to automatically generate a natural language description of the figure. While natural image captioning has been studied extensively, figure captioning has received relatively little attention and remains a challenging problem. First, we introduce a new dataset for figure captioning, FigCAP, based on FigureQA. Second, we propose two novel attention mechanisms. To achieve accurate generation of labels in figures, we propose Label Maps Attention. To model the relations between figure labels, we propose Relation Maps Attention. Third, we use sequence-level training with reinforcement learning in order to directly optimizes evaluation metrics, which alleviates the exposure bias issue and further improves the models in generating long captions. Extensive experiments show that the proposed method outperforms the baselines, thus demonstrating a significant potential for the automatic captioning of vast repositories of figures.
Tasks Image Captioning
Published 2019-06-07
URL https://arxiv.org/abs/1906.02850v1
PDF https://arxiv.org/pdf/1906.02850v1.pdf
PWC https://paperswithcode.com/paper/figure-captioning-with-reasoning-and-sequence
Repo
Framework
comments powered by Disqus