January 26, 2020

3039 words 15 mins read

Paper Group ANR 1360

Reconstruction of Undersampled 3D Non-Cartesian Image-Based Navigators for Coronary MRA Using an Unrolled Deep Learning Model. Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval. Variational Spectral Graph Convolutional Networks. Deep Physiological State Space Model for Clinical Forecasting. BAOD: Budget-Aware Object Detect …

Reconstruction of Undersampled 3D Non-Cartesian Image-Based Navigators for Coronary MRA Using an Unrolled Deep Learning Model


Title	Reconstruction of Undersampled 3D Non-Cartesian Image-Based Navigators for Coronary MRA Using an Unrolled Deep Learning Model
Authors	Mario O. Malavé, Corey A. Baron, Srivathsan P. Koundinyan, Christopher M. Sandino, Frank Ong, Joseph Y. Cheng, Dwight G. Nishimura
Abstract	Purpose: To rapidly reconstruct undersampled 3D non-Cartesian image-based navigators (iNAVs) using an unrolled deep learning (DL) model for non-rigid motion correction in coronary magnetic resonance angiography (CMRA). Methods: An unrolled network is trained to reconstruct beat-to-beat 3D iNAVs acquired as part of a CMRA sequence. The unrolled model incorporates a non-uniform FFT operator to perform the data consistency operation, and the regularization term is learned by a convolutional neural network (CNN) based on the proximal gradient descent algorithm. The training set includes 6,000 3D iNAVs acquired from 7 different subjects and 11 scans using a variable-density (VD) cones trajectory. For testing, 3D iNAVs from 4 additional subjects are reconstructed using the unrolled model. To validate reconstruction accuracy, global and localized motion estimates from DL model-based 3D iNAVs are compared with those extracted from 3D iNAVs reconstructed with $\textit{l}{1}$-ESPIRiT. Then, the high-resolution coronary MRA images motion corrected with autofocusing using the $\textit{l}{1}$-ESPIRiT and DL model-based 3D iNAVs are assessed for differences. Results: 3D iNAVs reconstructed using the DL model-based approach and conventional $\textit{l}{1}$-ESPIRiT generate similar global and localized motion estimates and provide equivalent coronary image quality. Reconstruction with the unrolled network completes in a fraction of the time compared to CPU and GPU implementations of $\textit{l}{1}$-ESPIRiT (20x and 3x speed increases, respectively). Conclusion: We have developed a deep neural network architecture to reconstruct undersampled 3D non-Cartesian VD cones iNAVs. Our approach decreases reconstruction time for 3D iNAVs, while preserving the accuracy of non-rigid motion information offered by them for correction.
Tasks
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11414v1
PDF	https://arxiv.org/pdf/1910.11414v1.pdf
PWC	https://paperswithcode.com/paper/reconstruction-of-undersampled-3d-non
Repo
Framework


Title	Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval
Authors	Tao Yao, Xiangwei Kong, Lianshan Yan, Wenjing Tang, Qi Tian
Abstract	Supervised cross-modal hashing has gained increasing research interest on large-scale retrieval task owning to its satisfactory performance and efficiency. However, it still has some challenging issues to be further studied: 1) most of them fail to well preserve the semantic correlations in hash codes because of the large heterogenous gap; 2) most of them relax the discrete constraint on hash codes, leading to large quantization error and consequent low performance; 3) most of them suffer from relatively high memory cost and computational complexity during training procedure, which makes them unscalable. In this paper, to address above issues, we propose a supervised cross-modal hashing method based on matrix factorization dubbed Efficient Discrete Supervised Hashing (EDSH). Specifically, collective matrix factorization on heterogenous features and semantic embedding with class labels are seamlessly integrated to learn hash codes. Therefore, the feature based similarities and semantic correlations can be both preserved in hash codes, which makes the learned hash codes more discriminative. Then an efficient discrete optimal algorithm is proposed to handle the scalable issue. Instead of learning hash codes bit-by-bit, hash codes matrix can be obtained directly which is more efficient. Extensive experimental results on three public real-world datasets demonstrate that EDSH produces a superior performance in both accuracy and scalability over some existing cross-modal hashing methods.
Tasks	Cross-Modal Retrieval, Quantization
Published	2019-05-03
URL	https://arxiv.org/abs/1905.01304v1
PDF	https://arxiv.org/pdf/1905.01304v1.pdf
PWC	https://paperswithcode.com/paper/efficient-discrete-supervised-hashing-for
Repo
Framework

Variational Spectral Graph Convolutional Networks


Title	Variational Spectral Graph Convolutional Networks
Authors	Louis Tiao, Pantelis Elinas, Harrison Nguyen, Edwin V. Bonilla
Abstract	We propose a Bayesian approach to spectral graph convolutional networks (GCNs) where the graph parameters are considered as random variables. We develop an inference algorithm to estimate the posterior over these parameters and use it to incorporate prior information that is not naturally considered by standard GCN. The key to our approach is to define a smooth posterior parameterization over the adjacency matrix characterizing the graph, which we estimate via stochastic variational inference. Our experiments show that we can outperform standard GCN methods in the task of semi-supervised classification in noisy-graph regimes.
Tasks
Published	2019-06-05
URL	https://arxiv.org/abs/1906.01852v1
PDF	https://arxiv.org/pdf/1906.01852v1.pdf
PWC	https://paperswithcode.com/paper/variational-spectral-graph-convolutional
Repo
Framework

Deep Physiological State Space Model for Clinical Forecasting


Title	Deep Physiological State Space Model for Clinical Forecasting
Authors	Yuan Xue, Denny Zhou, Nan Du, Andrew Dai, Zhen Xu, Kun Zhang, Claire Cui
Abstract	Clinical forecasting based on electronic medical records (EMR) can uncover the temporal correlations between patients’ conditions and outcomes from sequences of longitudinal clinical measurements. In this work, we propose an intervention-augmented deep state space generative model to capture the interactions among clinical measurements and interventions by explicitly modeling the dynamics of patients’ latent states. Based on this model, we are able to make a joint prediction of the trajectories of future observations and interventions. Empirical evaluations show that our proposed model compares favorably to several state-of-the-art methods on real EMR data.
Tasks
Published	2019-12-04
URL	https://arxiv.org/abs/1912.01762v1
PDF	https://arxiv.org/pdf/1912.01762v1.pdf
PWC	https://paperswithcode.com/paper/deep-physiological-state-space-model-for
Repo
Framework

BAOD: Budget-Aware Object Detection


Title	BAOD: Budget-Aware Object Detection
Authors	Alejandro Pardo, Mengmeng Xu, Ali Thabet, Pablo Arbelaez, Bernard Ghanem
Abstract	We study the problem of object detection from a novel perspective in which annotation budget constraints are taken into consideration, appropriately coined Budget Aware Object Detection (BAOD). When provided with a fixed budget, we propose a strategy for building a diverse and informative dataset that can be used to optimally train a robust detector. We investigate both optimization and learning-based methods to sample which images to annotate and what type of annotation (strongly or weakly supervised) to annotate them with. We adopt a hybrid supervised learning framework to train the object detector from both these types of annotation. We conduct a comprehensive empirical study showing that a handcrafted optimization method outperforms other selection techniques including random sampling, uncertainty sampling and active learning. By combining an optimal image/annotation selection scheme with hybrid supervised learning to solve the BAOD problem, we show that one can achieve the performance of a strongly supervised detector on PASCAL-VOC 2007 while saving 12.8% of its original annotation budget. Furthermore, when $100%$ of the budget is used, it surpasses this performance by 2.0 mAP percentage points.
Tasks	Active Learning, Object Detection
Published	2019-04-10
URL	http://arxiv.org/abs/1904.05443v1
PDF	http://arxiv.org/pdf/1904.05443v1.pdf
PWC	https://paperswithcode.com/paper/baod-budget-aware-object-detection
Repo
Framework

Developing an App to interpret Chest X-rays to support the diagnosis of respiratory pathology with Artificial Intelligence


Title	Developing an App to interpret Chest X-rays to support the diagnosis of respiratory pathology with Artificial Intelligence
Authors	Andrew Elkins, Felipe F. Freitas, Veronica Sanz
Abstract	In this paper we present our work to improve access to diagnosis in remote areas where good quality medical services may be lacking. We develop new Machine Learning methodologies for deployment onto mobile devices to help the early diagnosis of a number of life-threatening conditions using X-ray images. By using the latest developments in fast and portable Artificial Intelligence environments, we develop a smartphone app using an Artificial Neural Network to assist physicians in their diagnostic.
Tasks
Published	2019-06-26
URL	https://arxiv.org/abs/1906.11282v1
PDF	https://arxiv.org/pdf/1906.11282v1.pdf
PWC	https://paperswithcode.com/paper/developing-an-app-to-interpret-chest-x-rays
Repo
Framework

FairST: Equitable Spatial and Temporal Demand Prediction for New Mobility Systems


Title	FairST: Equitable Spatial and Temporal Demand Prediction for New Mobility Systems
Authors	An Yan, Bill Howe
Abstract	Emerging transportation modes, including car-sharing, bike-sharing, and ride-hailing, are transforming urban mobility but have been shown to reinforce socioeconomic inequities. Spatiotemporal demand prediction models for these new mobility regimes must therefore consider fairness as a first-class design requirement. We present FairST, a fairness-aware model for predicting demand for new mobility systems. Our approach utilizes 1D, 2D and 3D convolutions to integrate various urban features and learn the spatial-temporal dynamics of a mobility system, but we include fairness metrics as a form of regularization to make the predictions more equitable across demographic groups. We propose two novel spatiotemporal fairness metrics, a region-based fairness gap (RFG) and an individual-based fairness gap (IFG). Both quantify equity in a spatiotemporal context, but vary by whether demographics are labeled at the region level (RFG) or whether population distribution information is available (IFG). Experimental results on real bike share and ride share datasets demonstrate the effectiveness of the proposed model: FairST not only reduces the fairness gap by more than 80%, but can surprisingly achieve better accuracy than state-of-the-art yet fairness-oblivious methods including LSTMs, ConvLSTMs, and 3D CNN.
Tasks
Published	2019-06-21
URL	https://arxiv.org/abs/1907.03827v1
PDF	https://arxiv.org/pdf/1907.03827v1.pdf
PWC	https://paperswithcode.com/paper/fairst-equitable-spatial-and-temporal-demand
Repo
Framework

Deep Learning Based Segmentation Free License Plate Recognition Using Roadway Surveillance Camera Images


Title	Deep Learning Based Segmentation Free License Plate Recognition Using Roadway Surveillance Camera Images
Authors	Alperen Elihos, Burak Balci, Bensu Alkan, Yusuf Artan
Abstract	Smart automated traffic enforcement solutions have been gaining popularity in recent years. These solutions are ubiquitously used for seat-belt violation detection, red-light violation detection and speed violation detection purposes. Highly accurate license plate recognition is an indispensable part of these systems. However, general license plate recognition systems require high resolution images for high performance. In this study, we propose a novel license plate recognition method for general roadway surveillance cameras. Proposed segmentation free license plate recognition algorithm utilizes deep learning based object detection techniques in the character detection and recognition process. Proposed method has been tested on 2000 images captured on a roadway.
Tasks	License Plate Recognition, Object Detection
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02441v1
PDF	https://arxiv.org/pdf/1912.02441v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-based-segmentation-free-license
Repo
Framework

3DFaceGAN: Adversarial Nets for 3D Face Representation, Generation, and Translation


Title	3DFaceGAN: Adversarial Nets for 3D Face Representation, Generation, and Translation
Authors	Stylianos Moschoglou, Stylianos Ploumpis, Mihalis Nicolaou, Athanasios Papaioannou, Stefanos Zafeiriou
Abstract	Over the past few years, Generative Adversarial Networks (GANs) have garnered increased interest among researchers in Computer Vision, with applications including, but not limited to, image generation, translation, imputation, and super-resolution. Nevertheless, no GAN-based method has been proposed in the literature that can successfully represent, generate or translate 3D facial shapes (meshes). This can be primarily attributed to two facts, namely that (a) publicly available 3D face databases are scarce as well as limited in terms of sample size and variability (e.g., few subjects, little diversity in race and gender), and (b) mesh convolutions for deep networks present several challenges that are not entirely tackled in the literature, leading to operator approximations and model instability, often failing to preserve high-frequency components of the distribution. As a result, linear methods such as Principal Component Analysis (PCA) have been mainly utilized towards 3D shape analysis, despite being unable to capture non-linearities and high frequency details of the 3D face - such as eyelid and lip variations. In this work, we present 3DFaceGAN, the first GAN tailored towards modeling the distribution of 3D facial surfaces, while retaining the high frequency details of 3D face shapes. We conduct an extensive series of both qualitative and quantitative experiments, where the merits of 3DFaceGAN are clearly demonstrated against other, state-of-the-art methods in tasks such as 3D shape representation, generation, and translation.
Tasks	3D Shape Analysis, 3D Shape Representation, Image Generation, Imputation, Super-Resolution
Published	2019-05-01
URL	https://arxiv.org/abs/1905.00307v2
PDF	https://arxiv.org/pdf/1905.00307v2.pdf
PWC	https://paperswithcode.com/paper/3dfacegan-adversarial-nets-for-3d-face
Repo
Framework

Graph-based Spatial-temporal Feature Learning for Neuromorphic Vision Sensing


Title	Graph-based Spatial-temporal Feature Learning for Neuromorphic Vision Sensing
Authors	Yin Bi, Aaron Chadha, Alhabib Abbas, Eirina Bourtsoulatze, Yiannis Andreopoulos
Abstract	Neuromorphic vision sensing (NVS)\ devices represent visual information as sequences of asynchronous discrete events (a.k.a., “spikes”) in response to changes in scene reflectance. Unlike conventional active pixel sensing (APS), NVS allows for significantly higher event sampling rates at substantially increased energy efficiency and robustness to illumination changes. However, feature representation for NVS is far behind its APS-based counterparts, resulting in lower performance in high-level computer vision tasks. To fully utilize its sparse and asynchronous nature, we propose a compact graph representation for NVS, which allows for end-to-end learning with graph convolution neural networks. We couple this with a novel end-to-end feature learning framework that accommodates both appearance-based and motion-based tasks. The core of our framework comprises a spatial feature learning module, which utilizes residual-graph convolutional neural networks (RG-CNN), for end-to-end learning of appearance-based features directly from graphs. We extend this with our proposed Graph2Grid block and temporal feature learning module for efficiently modelling temporal dependencies over multiple graphs and a long temporal extent. We show how our framework can be configured for object classification, action recognition and action similarity labeling. Importantly, our approach preserves the spatial and temporal coherence of spike events, while requiring less computation and memory. The experimental validation shows that our proposed framework outperforms all recent methods on standard datasets. Finally, to address the absence of large real-world NVS datasets for complex recognition tasks, we introduce, evaluate and make available the American Sign Language letters (ASL-DVS), as well as human action dataset (UCF101-DVS, HMDB51-DVS and ASLAN-DVS).
Tasks	Object Classification
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03579v2
PDF	https://arxiv.org/pdf/1910.03579v2.pdf
PWC	https://paperswithcode.com/paper/graph-based-spatial-temporal-feature-learning
Repo
Framework

Experiments on Open-Set Speaker Identification with Discriminatively Trained Neural Networks


Title	Experiments on Open-Set Speaker Identification with Discriminatively Trained Neural Networks
Authors	Stefano Imoscopi, Volodya Grancharov, Sigurdur Sverrisson, Erlendur Karlsson, Harald Pobloth
Abstract	This paper presents a study on discriminative artificial neural network classifiers in the context of open-set speaker identification. Both 2-class and multi-class architectures are tested against the conventional Gaussian mixture model based classifier on enrolled speaker sets of different sizes. The performance evaluation shows that the multi-class neural network system has superior performance for large population sizes.
Tasks	Speaker Identification
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01269v1
PDF	http://arxiv.org/pdf/1904.01269v1.pdf
PWC	https://paperswithcode.com/paper/experiments-on-open-set-speaker
Repo
Framework

Advanced Rich Transcription System for Estonian Speech


Title	Advanced Rich Transcription System for Estonian Speech
Authors	Tanel Alumäe, Ottokar Tilk, Asadullah
Abstract	This paper describes the current TT"U speech transcription system for Estonian speech. The system is designed to handle semi-spontaneous speech, such as broadcast conversations, lecture recordings and interviews recorded in diverse acoustic conditions. The system is based on the Kaldi toolkit. Multi-condition training using background noise profiles extracted automatically from untranscribed data is used to improve the robustness of the system. Out-of-vocabulary words are recovered using a phoneme n-gram based decoding subgraph and a FST-based phoneme-to-grapheme model. The system achieves a word error rate of 8.1% on a test set of broadcast conversations. The system also performs punctuation recovery and speaker identification. Speaker identification models are trained using a recently proposed weakly supervised training method.
Tasks	Speaker Identification
Published	2019-01-11
URL	http://arxiv.org/abs/1901.03601v1
PDF	http://arxiv.org/pdf/1901.03601v1.pdf
PWC	https://paperswithcode.com/paper/advanced-rich-transcription-system-for
Repo
Framework

Cross-Cutting Political Awareness through Diverse News Recommendations


Title	Cross-Cutting Political Awareness through Diverse News Recommendations
Authors	Bibek Paudel, Abraham Bernstein
Abstract	The suggestions generated by most existing recommender systems are known to suffer from a lack of diversity, and other issues like popularity bias. As a result, they have been observed to promote well-known “blockbuster” items, and to present users with “more of the same” choices that entrench their existing beliefs and biases. This limits users’ exposure to diverse viewpoints and potentially increases political polarization. To promote the diversity of views, we developed a novel computational framework that can identify the political leanings of users and the news items they share on online social networks. Based on such information, our system can recommend news items that purposefully expose users to different viewpoints and increase the diversity of their information “diet.” Our research on recommendation diversity and political polarization helps us to develop algorithms that measure each user’s reaction %to diverse viewpoints and adjust the recommendation accordingly. The result is an approach that exposes users to a variety of political views and will, hopefully, broaden their acceptance (not necessarily the agreement) of various opinions.
Tasks	Recommendation Systems
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01495v1
PDF	https://arxiv.org/pdf/1909.01495v1.pdf
PWC	https://paperswithcode.com/paper/cross-cutting-political-awareness-through
Repo
Framework

Underwhelming Generalization Improvements From Controlling Feature Attribution


Title	Underwhelming Generalization Improvements From Controlling Feature Attribution
Authors	Joseph D. Viviano, Becks Simpson, Francis Dutil, Yoshua Bengio, Joseph Paul Cohen
Abstract	Overfitting is a common issue in machine learning, which can arise when the model learns to predict class membership using convenient but spuriously-correlated image features instead of the true image features that denote a class. These are typically visualized using saliency maps. In some object classification tasks such as for medical images, one may have some images with masks, indicating a region of interest, i.e., which part of the image contains the most relevant information for the classification. We describe a simple method for taking advantage of such auxiliary labels, by training networks to ignore the distracting features which may be extracted outside of the region of interest, on the training images for which such masks are available. This mask information is only used during training and has an impact on generalization accuracy in a dataset-dependent way. We observe an underwhelming relationship between controlling saliency maps and improving generalization performance.
Tasks	Object Classification
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00199v1
PDF	https://arxiv.org/pdf/1910.00199v1.pdf
PWC	https://paperswithcode.com/paper/underwhelming-generalization-improvements
Repo
Framework

Figure Captioning with Reasoning and Sequence-Level Training


Title	Figure Captioning with Reasoning and Sequence-Level Training
Authors	Charles Chen, Ruiyi Zhang, Eunyee Koh, Sungchul Kim, Scott Cohen, Tong Yu, Ryan Rossi, Razvan Bunescu
Abstract	Figures, such as bar charts, pie charts, and line plots, are widely used to convey important information in a concise format. They are usually human-friendly but difficult for computers to process automatically. In this work, we investigate the problem of figure captioning where the goal is to automatically generate a natural language description of the figure. While natural image captioning has been studied extensively, figure captioning has received relatively little attention and remains a challenging problem. First, we introduce a new dataset for figure captioning, FigCAP, based on FigureQA. Second, we propose two novel attention mechanisms. To achieve accurate generation of labels in figures, we propose Label Maps Attention. To model the relations between figure labels, we propose Relation Maps Attention. Third, we use sequence-level training with reinforcement learning in order to directly optimizes evaluation metrics, which alleviates the exposure bias issue and further improves the models in generating long captions. Extensive experiments show that the proposed method outperforms the baselines, thus demonstrating a significant potential for the automatic captioning of vast repositories of figures.
Tasks	Image Captioning
Published	2019-06-07
URL	https://arxiv.org/abs/1906.02850v1
PDF	https://arxiv.org/pdf/1906.02850v1.pdf
PWC	https://paperswithcode.com/paper/figure-captioning-with-reasoning-and-sequence
Repo
Framework