Paper Group ANR 1360
Reconstruction of Undersampled 3D Non-Cartesian Image-Based Navigators for Coronary MRA Using an Unrolled Deep Learning Model. Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval. Variational Spectral Graph Convolutional Networks. Deep Physiological State Space Model for Clinical Forecasting. BAOD: Budget-Aware Object Detect …
Reconstruction of Undersampled 3D Non-Cartesian Image-Based Navigators for Coronary MRA Using an Unrolled Deep Learning Model
Title | Reconstruction of Undersampled 3D Non-Cartesian Image-Based Navigators for Coronary MRA Using an Unrolled Deep Learning Model |
Authors | Mario O. Malavé, Corey A. Baron, Srivathsan P. Koundinyan, Christopher M. Sandino, Frank Ong, Joseph Y. Cheng, Dwight G. Nishimura |
Abstract | Purpose: To rapidly reconstruct undersampled 3D non-Cartesian image-based navigators (iNAVs) using an unrolled deep learning (DL) model for non-rigid motion correction in coronary magnetic resonance angiography (CMRA). Methods: An unrolled network is trained to reconstruct beat-to-beat 3D iNAVs acquired as part of a CMRA sequence. The unrolled model incorporates a non-uniform FFT operator to perform the data consistency operation, and the regularization term is learned by a convolutional neural network (CNN) based on the proximal gradient descent algorithm. The training set includes 6,000 3D iNAVs acquired from 7 different subjects and 11 scans using a variable-density (VD) cones trajectory. For testing, 3D iNAVs from 4 additional subjects are reconstructed using the unrolled model. To validate reconstruction accuracy, global and localized motion estimates from DL model-based 3D iNAVs are compared with those extracted from 3D iNAVs reconstructed with $\textit{l}{1}$-ESPIRiT. Then, the high-resolution coronary MRA images motion corrected with autofocusing using the $\textit{l}{1}$-ESPIRiT and DL model-based 3D iNAVs are assessed for differences. Results: 3D iNAVs reconstructed using the DL model-based approach and conventional $\textit{l}{1}$-ESPIRiT generate similar global and localized motion estimates and provide equivalent coronary image quality. Reconstruction with the unrolled network completes in a fraction of the time compared to CPU and GPU implementations of $\textit{l}{1}$-ESPIRiT (20x and 3x speed increases, respectively). Conclusion: We have developed a deep neural network architecture to reconstruct undersampled 3D non-Cartesian VD cones iNAVs. Our approach decreases reconstruction time for 3D iNAVs, while preserving the accuracy of non-rigid motion information offered by them for correction. |
Tasks | |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1910.11414v1 |
https://arxiv.org/pdf/1910.11414v1.pdf | |
PWC | https://paperswithcode.com/paper/reconstruction-of-undersampled-3d-non |
Repo | |
Framework | |
Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval
Title | Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval |
Authors | Tao Yao, Xiangwei Kong, Lianshan Yan, Wenjing Tang, Qi Tian |
Abstract | Supervised cross-modal hashing has gained increasing research interest on large-scale retrieval task owning to its satisfactory performance and efficiency. However, it still has some challenging issues to be further studied: 1) most of them fail to well preserve the semantic correlations in hash codes because of the large heterogenous gap; 2) most of them relax the discrete constraint on hash codes, leading to large quantization error and consequent low performance; 3) most of them suffer from relatively high memory cost and computational complexity during training procedure, which makes them unscalable. In this paper, to address above issues, we propose a supervised cross-modal hashing method based on matrix factorization dubbed Efficient Discrete Supervised Hashing (EDSH). Specifically, collective matrix factorization on heterogenous features and semantic embedding with class labels are seamlessly integrated to learn hash codes. Therefore, the feature based similarities and semantic correlations can be both preserved in hash codes, which makes the learned hash codes more discriminative. Then an efficient discrete optimal algorithm is proposed to handle the scalable issue. Instead of learning hash codes bit-by-bit, hash codes matrix can be obtained directly which is more efficient. Extensive experimental results on three public real-world datasets demonstrate that EDSH produces a superior performance in both accuracy and scalability over some existing cross-modal hashing methods. |
Tasks | Cross-Modal Retrieval, Quantization |
Published | 2019-05-03 |
URL | https://arxiv.org/abs/1905.01304v1 |
https://arxiv.org/pdf/1905.01304v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-discrete-supervised-hashing-for |
Repo | |
Framework | |
Variational Spectral Graph Convolutional Networks
Title | Variational Spectral Graph Convolutional Networks |
Authors | Louis Tiao, Pantelis Elinas, Harrison Nguyen, Edwin V. Bonilla |
Abstract | We propose a Bayesian approach to spectral graph convolutional networks (GCNs) where the graph parameters are considered as random variables. We develop an inference algorithm to estimate the posterior over these parameters and use it to incorporate prior information that is not naturally considered by standard GCN. The key to our approach is to define a smooth posterior parameterization over the adjacency matrix characterizing the graph, which we estimate via stochastic variational inference. Our experiments show that we can outperform standard GCN methods in the task of semi-supervised classification in noisy-graph regimes. |
Tasks | |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.01852v1 |
https://arxiv.org/pdf/1906.01852v1.pdf | |
PWC | https://paperswithcode.com/paper/variational-spectral-graph-convolutional |
Repo | |
Framework | |
Deep Physiological State Space Model for Clinical Forecasting
Title | Deep Physiological State Space Model for Clinical Forecasting |
Authors | Yuan Xue, Denny Zhou, Nan Du, Andrew Dai, Zhen Xu, Kun Zhang, Claire Cui |
Abstract | Clinical forecasting based on electronic medical records (EMR) can uncover the temporal correlations between patients’ conditions and outcomes from sequences of longitudinal clinical measurements. In this work, we propose an intervention-augmented deep state space generative model to capture the interactions among clinical measurements and interventions by explicitly modeling the dynamics of patients’ latent states. Based on this model, we are able to make a joint prediction of the trajectories of future observations and interventions. Empirical evaluations show that our proposed model compares favorably to several state-of-the-art methods on real EMR data. |
Tasks | |
Published | 2019-12-04 |
URL | https://arxiv.org/abs/1912.01762v1 |
https://arxiv.org/pdf/1912.01762v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-physiological-state-space-model-for |
Repo | |
Framework | |
BAOD: Budget-Aware Object Detection
Title | BAOD: Budget-Aware Object Detection |
Authors | Alejandro Pardo, Mengmeng Xu, Ali Thabet, Pablo Arbelaez, Bernard Ghanem |
Abstract | We study the problem of object detection from a novel perspective in which annotation budget constraints are taken into consideration, appropriately coined Budget Aware Object Detection (BAOD). When provided with a fixed budget, we propose a strategy for building a diverse and informative dataset that can be used to optimally train a robust detector. We investigate both optimization and learning-based methods to sample which images to annotate and what type of annotation (strongly or weakly supervised) to annotate them with. We adopt a hybrid supervised learning framework to train the object detector from both these types of annotation. We conduct a comprehensive empirical study showing that a handcrafted optimization method outperforms other selection techniques including random sampling, uncertainty sampling and active learning. By combining an optimal image/annotation selection scheme with hybrid supervised learning to solve the BAOD problem, we show that one can achieve the performance of a strongly supervised detector on PASCAL-VOC 2007 while saving 12.8% of its original annotation budget. Furthermore, when $100%$ of the budget is used, it surpasses this performance by 2.0 mAP percentage points. |
Tasks | Active Learning, Object Detection |
Published | 2019-04-10 |
URL | http://arxiv.org/abs/1904.05443v1 |
http://arxiv.org/pdf/1904.05443v1.pdf | |
PWC | https://paperswithcode.com/paper/baod-budget-aware-object-detection |
Repo | |
Framework | |
Developing an App to interpret Chest X-rays to support the diagnosis of respiratory pathology with Artificial Intelligence
Title | Developing an App to interpret Chest X-rays to support the diagnosis of respiratory pathology with Artificial Intelligence |
Authors | Andrew Elkins, Felipe F. Freitas, Veronica Sanz |
Abstract | In this paper we present our work to improve access to diagnosis in remote areas where good quality medical services may be lacking. We develop new Machine Learning methodologies for deployment onto mobile devices to help the early diagnosis of a number of life-threatening conditions using X-ray images. By using the latest developments in fast and portable Artificial Intelligence environments, we develop a smartphone app using an Artificial Neural Network to assist physicians in their diagnostic. |
Tasks | |
Published | 2019-06-26 |
URL | https://arxiv.org/abs/1906.11282v1 |
https://arxiv.org/pdf/1906.11282v1.pdf | |
PWC | https://paperswithcode.com/paper/developing-an-app-to-interpret-chest-x-rays |
Repo | |
Framework | |
FairST: Equitable Spatial and Temporal Demand Prediction for New Mobility Systems
Title | FairST: Equitable Spatial and Temporal Demand Prediction for New Mobility Systems |
Authors | An Yan, Bill Howe |
Abstract | Emerging transportation modes, including car-sharing, bike-sharing, and ride-hailing, are transforming urban mobility but have been shown to reinforce socioeconomic inequities. Spatiotemporal demand prediction models for these new mobility regimes must therefore consider fairness as a first-class design requirement. We present FairST, a fairness-aware model for predicting demand for new mobility systems. Our approach utilizes 1D, 2D and 3D convolutions to integrate various urban features and learn the spatial-temporal dynamics of a mobility system, but we include fairness metrics as a form of regularization to make the predictions more equitable across demographic groups. We propose two novel spatiotemporal fairness metrics, a region-based fairness gap (RFG) and an individual-based fairness gap (IFG). Both quantify equity in a spatiotemporal context, but vary by whether demographics are labeled at the region level (RFG) or whether population distribution information is available (IFG). Experimental results on real bike share and ride share datasets demonstrate the effectiveness of the proposed model: FairST not only reduces the fairness gap by more than 80%, but can surprisingly achieve better accuracy than state-of-the-art yet fairness-oblivious methods including LSTMs, ConvLSTMs, and 3D CNN. |
Tasks | |
Published | 2019-06-21 |
URL | https://arxiv.org/abs/1907.03827v1 |
https://arxiv.org/pdf/1907.03827v1.pdf | |
PWC | https://paperswithcode.com/paper/fairst-equitable-spatial-and-temporal-demand |
Repo | |
Framework | |
Deep Learning Based Segmentation Free License Plate Recognition Using Roadway Surveillance Camera Images
Title | Deep Learning Based Segmentation Free License Plate Recognition Using Roadway Surveillance Camera Images |
Authors | Alperen Elihos, Burak Balci, Bensu Alkan, Yusuf Artan |
Abstract | Smart automated traffic enforcement solutions have been gaining popularity in recent years. These solutions are ubiquitously used for seat-belt violation detection, red-light violation detection and speed violation detection purposes. Highly accurate license plate recognition is an indispensable part of these systems. However, general license plate recognition systems require high resolution images for high performance. In this study, we propose a novel license plate recognition method for general roadway surveillance cameras. Proposed segmentation free license plate recognition algorithm utilizes deep learning based object detection techniques in the character detection and recognition process. Proposed method has been tested on 2000 images captured on a roadway. |
Tasks | License Plate Recognition, Object Detection |
Published | 2019-12-05 |
URL | https://arxiv.org/abs/1912.02441v1 |
https://arxiv.org/pdf/1912.02441v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-based-segmentation-free-license |
Repo | |
Framework | |
3DFaceGAN: Adversarial Nets for 3D Face Representation, Generation, and Translation
Title | 3DFaceGAN: Adversarial Nets for 3D Face Representation, Generation, and Translation |
Authors | Stylianos Moschoglou, Stylianos Ploumpis, Mihalis Nicolaou, Athanasios Papaioannou, Stefanos Zafeiriou |
Abstract | Over the past few years, Generative Adversarial Networks (GANs) have garnered increased interest among researchers in Computer Vision, with applications including, but not limited to, image generation, translation, imputation, and super-resolution. Nevertheless, no GAN-based method has been proposed in the literature that can successfully represent, generate or translate 3D facial shapes (meshes). This can be primarily attributed to two facts, namely that (a) publicly available 3D face databases are scarce as well as limited in terms of sample size and variability (e.g., few subjects, little diversity in race and gender), and (b) mesh convolutions for deep networks present several challenges that are not entirely tackled in the literature, leading to operator approximations and model instability, often failing to preserve high-frequency components of the distribution. As a result, linear methods such as Principal Component Analysis (PCA) have been mainly utilized towards 3D shape analysis, despite being unable to capture non-linearities and high frequency details of the 3D face - such as eyelid and lip variations. In this work, we present 3DFaceGAN, the first GAN tailored towards modeling the distribution of 3D facial surfaces, while retaining the high frequency details of 3D face shapes. We conduct an extensive series of both qualitative and quantitative experiments, where the merits of 3DFaceGAN are clearly demonstrated against other, state-of-the-art methods in tasks such as 3D shape representation, generation, and translation. |
Tasks | 3D Shape Analysis, 3D Shape Representation, Image Generation, Imputation, Super-Resolution |
Published | 2019-05-01 |
URL | https://arxiv.org/abs/1905.00307v2 |
https://arxiv.org/pdf/1905.00307v2.pdf | |
PWC | https://paperswithcode.com/paper/3dfacegan-adversarial-nets-for-3d-face |
Repo | |
Framework | |
Graph-based Spatial-temporal Feature Learning for Neuromorphic Vision Sensing
Title | Graph-based Spatial-temporal Feature Learning for Neuromorphic Vision Sensing |
Authors | Yin Bi, Aaron Chadha, Alhabib Abbas, Eirina Bourtsoulatze, Yiannis Andreopoulos |
Abstract | Neuromorphic vision sensing (NVS)\ devices represent visual information as sequences of asynchronous discrete events (a.k.a., “spikes”) in response to changes in scene reflectance. Unlike conventional active pixel sensing (APS), NVS allows for significantly higher event sampling rates at substantially increased energy efficiency and robustness to illumination changes. However, feature representation for NVS is far behind its APS-based counterparts, resulting in lower performance in high-level computer vision tasks. To fully utilize its sparse and asynchronous nature, we propose a compact graph representation for NVS, which allows for end-to-end learning with graph convolution neural networks. We couple this with a novel end-to-end feature learning framework that accommodates both appearance-based and motion-based tasks. The core of our framework comprises a spatial feature learning module, which utilizes residual-graph convolutional neural networks (RG-CNN), for end-to-end learning of appearance-based features directly from graphs. We extend this with our proposed Graph2Grid block and temporal feature learning module for efficiently modelling temporal dependencies over multiple graphs and a long temporal extent. We show how our framework can be configured for object classification, action recognition and action similarity labeling. Importantly, our approach preserves the spatial and temporal coherence of spike events, while requiring less computation and memory. The experimental validation shows that our proposed framework outperforms all recent methods on standard datasets. Finally, to address the absence of large real-world NVS datasets for complex recognition tasks, we introduce, evaluate and make available the American Sign Language letters (ASL-DVS), as well as human action dataset (UCF101-DVS, HMDB51-DVS and ASLAN-DVS). |
Tasks | Object Classification |
Published | 2019-10-08 |
URL | https://arxiv.org/abs/1910.03579v2 |
https://arxiv.org/pdf/1910.03579v2.pdf | |
PWC | https://paperswithcode.com/paper/graph-based-spatial-temporal-feature-learning |
Repo | |
Framework | |
Experiments on Open-Set Speaker Identification with Discriminatively Trained Neural Networks
Title | Experiments on Open-Set Speaker Identification with Discriminatively Trained Neural Networks |
Authors | Stefano Imoscopi, Volodya Grancharov, Sigurdur Sverrisson, Erlendur Karlsson, Harald Pobloth |
Abstract | This paper presents a study on discriminative artificial neural network classifiers in the context of open-set speaker identification. Both 2-class and multi-class architectures are tested against the conventional Gaussian mixture model based classifier on enrolled speaker sets of different sizes. The performance evaluation shows that the multi-class neural network system has superior performance for large population sizes. |
Tasks | Speaker Identification |
Published | 2019-04-02 |
URL | http://arxiv.org/abs/1904.01269v1 |
http://arxiv.org/pdf/1904.01269v1.pdf | |
PWC | https://paperswithcode.com/paper/experiments-on-open-set-speaker |
Repo | |
Framework | |
Advanced Rich Transcription System for Estonian Speech
Title | Advanced Rich Transcription System for Estonian Speech |
Authors | Tanel Alumäe, Ottokar Tilk, Asadullah |
Abstract | This paper describes the current TT"U speech transcription system for Estonian speech. The system is designed to handle semi-spontaneous speech, such as broadcast conversations, lecture recordings and interviews recorded in diverse acoustic conditions. The system is based on the Kaldi toolkit. Multi-condition training using background noise profiles extracted automatically from untranscribed data is used to improve the robustness of the system. Out-of-vocabulary words are recovered using a phoneme n-gram based decoding subgraph and a FST-based phoneme-to-grapheme model. The system achieves a word error rate of 8.1% on a test set of broadcast conversations. The system also performs punctuation recovery and speaker identification. Speaker identification models are trained using a recently proposed weakly supervised training method. |
Tasks | Speaker Identification |
Published | 2019-01-11 |
URL | http://arxiv.org/abs/1901.03601v1 |
http://arxiv.org/pdf/1901.03601v1.pdf | |
PWC | https://paperswithcode.com/paper/advanced-rich-transcription-system-for |
Repo | |
Framework | |
Cross-Cutting Political Awareness through Diverse News Recommendations
Title | Cross-Cutting Political Awareness through Diverse News Recommendations |
Authors | Bibek Paudel, Abraham Bernstein |
Abstract | The suggestions generated by most existing recommender systems are known to suffer from a lack of diversity, and other issues like popularity bias. As a result, they have been observed to promote well-known “blockbuster” items, and to present users with “more of the same” choices that entrench their existing beliefs and biases. This limits users’ exposure to diverse viewpoints and potentially increases political polarization. To promote the diversity of views, we developed a novel computational framework that can identify the political leanings of users and the news items they share on online social networks. Based on such information, our system can recommend news items that purposefully expose users to different viewpoints and increase the diversity of their information “diet.” Our research on recommendation diversity and political polarization helps us to develop algorithms that measure each user’s reaction %to diverse viewpoints and adjust the recommendation accordingly. The result is an approach that exposes users to a variety of political views and will, hopefully, broaden their acceptance (not necessarily the agreement) of various opinions. |
Tasks | Recommendation Systems |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01495v1 |
https://arxiv.org/pdf/1909.01495v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-cutting-political-awareness-through |
Repo | |
Framework | |
Underwhelming Generalization Improvements From Controlling Feature Attribution
Title | Underwhelming Generalization Improvements From Controlling Feature Attribution |
Authors | Joseph D. Viviano, Becks Simpson, Francis Dutil, Yoshua Bengio, Joseph Paul Cohen |
Abstract | Overfitting is a common issue in machine learning, which can arise when the model learns to predict class membership using convenient but spuriously-correlated image features instead of the true image features that denote a class. These are typically visualized using saliency maps. In some object classification tasks such as for medical images, one may have some images with masks, indicating a region of interest, i.e., which part of the image contains the most relevant information for the classification. We describe a simple method for taking advantage of such auxiliary labels, by training networks to ignore the distracting features which may be extracted outside of the region of interest, on the training images for which such masks are available. This mask information is only used during training and has an impact on generalization accuracy in a dataset-dependent way. We observe an underwhelming relationship between controlling saliency maps and improving generalization performance. |
Tasks | Object Classification |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.00199v1 |
https://arxiv.org/pdf/1910.00199v1.pdf | |
PWC | https://paperswithcode.com/paper/underwhelming-generalization-improvements |
Repo | |
Framework | |
Figure Captioning with Reasoning and Sequence-Level Training
Title | Figure Captioning with Reasoning and Sequence-Level Training |
Authors | Charles Chen, Ruiyi Zhang, Eunyee Koh, Sungchul Kim, Scott Cohen, Tong Yu, Ryan Rossi, Razvan Bunescu |
Abstract | Figures, such as bar charts, pie charts, and line plots, are widely used to convey important information in a concise format. They are usually human-friendly but difficult for computers to process automatically. In this work, we investigate the problem of figure captioning where the goal is to automatically generate a natural language description of the figure. While natural image captioning has been studied extensively, figure captioning has received relatively little attention and remains a challenging problem. First, we introduce a new dataset for figure captioning, FigCAP, based on FigureQA. Second, we propose two novel attention mechanisms. To achieve accurate generation of labels in figures, we propose Label Maps Attention. To model the relations between figure labels, we propose Relation Maps Attention. Third, we use sequence-level training with reinforcement learning in order to directly optimizes evaluation metrics, which alleviates the exposure bias issue and further improves the models in generating long captions. Extensive experiments show that the proposed method outperforms the baselines, thus demonstrating a significant potential for the automatic captioning of vast repositories of figures. |
Tasks | Image Captioning |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.02850v1 |
https://arxiv.org/pdf/1906.02850v1.pdf | |
PWC | https://paperswithcode.com/paper/figure-captioning-with-reasoning-and-sequence |
Repo | |
Framework | |