Paper Group ANR 1077
Towards Efficient Discrete Integration via Adaptive Quantile Queries. Data Interpretation Support in Rescue Operations: Application for French Firefighters. A Bayesian Approach to Modelling Longitudinal Data in Electronic Health Records. Deep Learning Approaches for Image Retrieval and Pattern Spotting in Ancient Documents. ACTNET: end-to-end learn …
Towards Efficient Discrete Integration via Adaptive Quantile Queries
Title | Towards Efficient Discrete Integration via Adaptive Quantile Queries |
Authors | Fan Ding, Hanjing Wang, Ashish Sabharwal, Yexiang Xue |
Abstract | Discrete integration in a high dimensional space of n variables poses fundamental challenges. The WISH algorithm reduces the intractable discrete integration problem into n optimization queries subject to randomized constraints, obtaining a constant approximation guarantee. The optimization queries are expensive, which limits the applicability of WISH. We propose AdaWISH, which is able to obtain the same guarantee but accesses only a small subset of queries of WISH. For example, when the number of function values is bounded by a constant, AdaWISH issues only O(log n) queries. The key idea is to query adaptively, taking advantage of the shape of the weight function being integrated. In general, we prove that AdaWISH has a regret of only O(log n) relative to an idealistic oracle that issues queries at data-dependent optimal points. Experimentally, AdaWISH gives precise estimates for discrete integration problems, of the same quality as that of WISH and better than several competing approaches, on a variety of probabilistic inference benchmarks. At the same time, it saves substantially on the number of optimization queries compared to WISH. On a suite of UAI inference challenge benchmarks, it saves 81.5% of WISH queries while retaining the quality of results. |
Tasks | |
Published | 2019-10-13 |
URL | https://arxiv.org/abs/1910.05811v2 |
https://arxiv.org/pdf/1910.05811v2.pdf | |
PWC | https://paperswithcode.com/paper/adawish-faster-discrete-integration-via |
Repo | |
Framework | |
Data Interpretation Support in Rescue Operations: Application for French Firefighters
Title | Data Interpretation Support in Rescue Operations: Application for French Firefighters |
Authors | Samer Chehade, Nada Matta, Jean-Baptiste Pothin, Rémi Cogranne |
Abstract | This work aims at developing a system that supports French firefighters in data interpretation during rescue operations. An application ontology is proposed based on existing crisis management ones and operational expertise collection. After that, a knowledge-based system will be developed and integrated in firefighters’ environment. Our first studies are shown in this paper. |
Tasks | |
Published | 2019-09-24 |
URL | https://arxiv.org/abs/1909.10941v1 |
https://arxiv.org/pdf/1909.10941v1.pdf | |
PWC | https://paperswithcode.com/paper/data-interpretation-support-in-rescue |
Repo | |
Framework | |
A Bayesian Approach to Modelling Longitudinal Data in Electronic Health Records
Title | A Bayesian Approach to Modelling Longitudinal Data in Electronic Health Records |
Authors | Alexis Bellot, Mihaela van der Schaar |
Abstract | Analyzing electronic health records (EHR) poses significant challenges because often few samples are available describing a patient’s health and, when available, their information content is highly diverse. The problem we consider is how to integrate sparsely sampled longitudinal data, missing measurements informative of the underlying health status and fixed demographic information to produce estimated survival distributions updated through a patient’s follow up. We propose a nonparametric probabilistic model that generates survival trajectories from an ensemble of Bayesian trees that learns variable interactions over time without specifying beforehand the longitudinal process. We show performance improvements on Primary Biliary Cirrhosis patient data. |
Tasks | |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.09086v1 |
https://arxiv.org/pdf/1912.09086v1.pdf | |
PWC | https://paperswithcode.com/paper/a-bayesian-approach-to-modelling-longitudinal |
Repo | |
Framework | |
Deep Learning Approaches for Image Retrieval and Pattern Spotting in Ancient Documents
Title | Deep Learning Approaches for Image Retrieval and Pattern Spotting in Ancient Documents |
Authors | Kelly Lais Wiggers, Alceu de Souza Britto Junior, Alessandro Lameiras Koerich, Laurent Heutte, Luiz Eduardo Soares de Oliveira |
Abstract | This paper describes two approaches for content-based image retrieval and pattern spotting in document images using deep learning. The first approach uses a pre-trained CNN model to cope with the lack of training data, which is fine-tuned to achieve a compact yet discriminant representation of queries and image candidates. The second approach uses a Siamese Convolution Neural Network trained on a previously prepared subset of image pairs from the ImageNet dataset to provide the similarity-based feature maps. In both methods, the learned representation scheme considers feature maps of different sizes which are evaluated in terms of retrieval performance. A robust experimental protocol using two public datasets (Tobacoo-800 and DocExplore) has shown that the proposed methods compare favorably against state-of-the-art document image retrieval and pattern spotting methods. |
Tasks | Content-Based Image Retrieval, Image Retrieval |
Published | 2019-07-22 |
URL | https://arxiv.org/abs/1907.09404v1 |
https://arxiv.org/pdf/1907.09404v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-approaches-for-image-retrieval |
Repo | |
Framework | |
ACTNET: end-to-end learning of feature activations and multi-stream aggregation for effective instance image retrieval
Title | ACTNET: end-to-end learning of feature activations and multi-stream aggregation for effective instance image retrieval |
Authors | Syed Sameed Husain, Eng-Jon Ong, Miroslaw Bober |
Abstract | We propose a novel CNN architecture called ACTNET for robust instance image retrieval from large-scale datasets. Our key innovation is a learnable activation layer designed to improve the signal-to-noise ratio (SNR) of deep convolutional feature maps. Further, we introduce a controlled multi-stream aggregation, where complementary deep features from different convolutional layers are optimally transformed and balanced using our novel activation layers, before aggregation into a global descriptor. Importantly, the learnable parameters of our activation blocks are explicitly trained, together with the CNN parameters, in an end-to-end manner minimising triplet loss. This means that our network jointly learns the CNN filters and their optimal activation and aggregation for retrieval tasks. To our knowledge, this is the first time parametric functions have been used to control and learn optimal aggregation. We conduct an in-depth experimental study on three non-linear activation functions: Sine-Hyperbolic, Exponential and modified Weibull, showing that while all bring significant gains the Weibull function performs best thanks to its ability to equalise strong activations. The results clearly demonstrate that our ACTNET architecture significantly enhances the discriminative power of deep features, improving significantly over the state-of-the-art retrieval results on all datasets. |
Tasks | Image Retrieval |
Published | 2019-07-12 |
URL | https://arxiv.org/abs/1907.05794v2 |
https://arxiv.org/pdf/1907.05794v2.pdf | |
PWC | https://paperswithcode.com/paper/actnet-end-to-end-learning-of-feature |
Repo | |
Framework | |
Infinite Brain MR Images: PGGAN-based Data Augmentation for Tumor Detection
Title | Infinite Brain MR Images: PGGAN-based Data Augmentation for Tumor Detection |
Authors | Changhee Han, Leonardo Rundo, Ryosuke Araki, Yujiro Furukawa, Giancarlo Mauri, Hideki Nakayama, Hideaki Hayashi |
Abstract | Due to the lack of available annotated medical images, accurate computer-assisted diagnosis requires intensive Data Augmentation (DA) techniques, such as geometric/intensity transformations of original images; however, those transformed images intrinsically have a similar distribution to the original ones, leading to limited performance improvement. To fill the data lack in the real image distribution, we synthesize brain contrast-enhanced Magnetic Resonance (MR) images—realistic but completely different from the original ones—using Generative Adversarial Networks (GANs). This study exploits Progressive Growing of GANs (PGGANs), a multi-stage generative training method, to generate original-sized 256 X 256 MR images for Convolutional Neural Network-based brain tumor detection, which is challenging via conventional GANs; difficulties arise due to unstable GAN training with high resolution and a variety of tumors in size, location, shape, and contrast. Our preliminary results show that this novel PGGAN-based DA method can achieve promising performance improvement, when combined with classical DA, in tumor detection and also in other medical imaging tasks. |
Tasks | Data Augmentation |
Published | 2019-03-29 |
URL | http://arxiv.org/abs/1903.12564v1 |
http://arxiv.org/pdf/1903.12564v1.pdf | |
PWC | https://paperswithcode.com/paper/infinite-brain-mr-images-pggan-based-data |
Repo | |
Framework | |
Part Segmentation for Highly Accurate Deformable Tracking in Occlusions via Fully Convolutional Neural Networks
Title | Part Segmentation for Highly Accurate Deformable Tracking in Occlusions via Fully Convolutional Neural Networks |
Authors | Weilin Wan, Aaron Walsman, Dieter Fox |
Abstract | Successfully tracking the human body is an important perceptual challenge for robots that must work around people. Existing methods fall into two broad categories: geometric tracking and direct pose estimation using machine learning. While recent work has shown direct estimation techniques can be quite powerful, geometric tracking methods using point clouds can provide a very high level of 3D accuracy which is necessary for many robotic applications. However these approaches can have difficulty in clutter when large portions of the subject are occluded. To overcome this limitation, we propose a solution based on fully convolutional neural networks (FCN). We develop an optimized Fast-FCN network architecture for our application which allows us to filter observed point clouds and improve tracking accuracy while maintaining interactive frame rates. We also show that this model can be trained with a limited number of examples and almost no manual labelling by using an existing geometric tracker and data augmentation to automatically generate segmentation maps. We demonstrate the accuracy of our full system by comparing it against an existing geometric tracker, and show significant improvement in these challenging scenarios. |
Tasks | Data Augmentation, Pose Estimation |
Published | 2019-08-05 |
URL | https://arxiv.org/abs/1908.01504v1 |
https://arxiv.org/pdf/1908.01504v1.pdf | |
PWC | https://paperswithcode.com/paper/part-segmentation-for-highly-accurate |
Repo | |
Framework | |
Incorporating human and learned domain knowledge into training deep neural networks: A differentiable dose volume histogram and adversarial inspired framework for generating Pareto optimal dose distributions in radiation therapy
Title | Incorporating human and learned domain knowledge into training deep neural networks: A differentiable dose volume histogram and adversarial inspired framework for generating Pareto optimal dose distributions in radiation therapy |
Authors | Dan Nguyen, Rafe McBeth, Azar Sadeghnejad Barkousaraie, Gyanendra Bohara, Chenyang Shen, Xun Jia, Steve Jiang |
Abstract | We propose a novel domain specific loss, which is a differentiable loss function based on the dose volume histogram, and combine it with an adversarial loss for the training of deep neural networks to generate Pareto optimal dose distributions. The mean squared error (MSE) loss, dose volume histogram (DVH) loss, and adversarial (ADV) loss were used to train 4 instances of the neural network model: 1) MSE, 2) MSE+ADV, 3) MSE+DVH, and 4) MSE+DVH+ADV. 70 prostate patients were acquired, and the dose influence arrays were calculated for each patient. 1200 Pareto surface plans per patient were generated by pseudo-randomizing the tradeoff weights (84,000 plans total). We divided the data into 54 training, 6 validation, and 10 testing patients. Each model was trained for 100,000 iterations, with a batch size of 2. The prediction time of each model is 0.052 seconds. Quantitatively, the MSE+DVH+ADV model had the lowest prediction error of 0.038 (conformation), 0.026 (homogeneity), 0.298 (R50), 1.65% (D95), 2.14% (D98), 2.43% (D99). The MSE model had the worst prediction error of 0.134 (conformation), 0.041 (homogeneity), 0.520 (R50), 3.91% (D95), 4.33% (D98), 4.60% (D99). For both the mean dose PTV error and the max dose PTV, Body, Bladder and rectum error, the MSE+DVH+ADV outperformed all other models. All model’s predictions have an average mean and max dose error less than 2.8% and 4.2%, respectively. Expert human domain specific knowledge can be the largest driver in the performance improvement, and adversarial learning can be used to further capture nuanced features. The real-time prediction capabilities allow for a physician to quickly navigate the tradeoff space, and produce a dose distribution as a tangible endpoint for the dosimetrist to use for planning. This can considerably reduce the treatment planning time, allowing for clinicians to focus their efforts on challenging cases. |
Tasks | |
Published | 2019-08-16 |
URL | https://arxiv.org/abs/1908.05874v2 |
https://arxiv.org/pdf/1908.05874v2.pdf | |
PWC | https://paperswithcode.com/paper/incorporating-human-and-learned-domain |
Repo | |
Framework | |
Spectral partitioning of time-varying networks with unobserved edges
Title | Spectral partitioning of time-varying networks with unobserved edges |
Authors | Michael T. Schaub, Santiago Segarra, Hoi-To Wai |
Abstract | We discuss a variant of `blind’ community detection, in which we aim to partition an unobserved network from the observation of a (dynamical) graph signal defined on the network. We consider a scenario where our observed graph signals are obtained by filtering white noise input, and the underlying network is different for every observation. In this fashion, the filtered graph signals can be interpreted as defined on a time-varying network. We model each of the underlying network realizations as generated by an independent draw from a latent stochastic blockmodel (SBM). To infer the partition of the latent SBM, we propose a simple spectral algorithm for which we provide a theoretical analysis and establish consistency guarantees for the recovery. We illustrate our results using numerical experiments on synthetic and real data, highlighting the efficacy of our approach. | |
Tasks | Community Detection |
Published | 2019-04-26 |
URL | http://arxiv.org/abs/1904.11930v1 |
http://arxiv.org/pdf/1904.11930v1.pdf | |
PWC | https://paperswithcode.com/paper/spectral-partitioning-of-time-varying |
Repo | |
Framework | |
Neural Message Passing on Hybrid Spatio-Temporal Visual and Symbolic Graphs for Video Understanding
Title | Neural Message Passing on Hybrid Spatio-Temporal Visual and Symbolic Graphs for Video Understanding |
Authors | Effrosyni Mavroudi, Benjamín Béjar Haro, René Vidal |
Abstract | Many problems in video understanding require labeling multiple activities occurring concurrently in different parts of a video, including the objects and actors participating in such activities. However, state-of-the-art methods in computer vision focus primarily on tasks such as action classification, action detection, or action segmentation, where typically only one action label needs to be predicted. In this work, we propose a generic approach to classifying one or more nodes of a spatio-temporal graph grounded on spatially localized semantic entities in a video, such as actors and objects. In particular, we combine an attributed spatio-temporal visual graph, which captures visual context and interactions, with an attributed symbolic graph grounded on the semantic label space, which captures relationships between multiple labels. We further propose a neural message passing framework for jointly refining the representations of the nodes and edges of the hybrid visual-symbolic graph. Our framework features a) node-type and edge-type conditioned filters and adaptive graph connectivity, b) a soft-assignment module for connecting visual nodes to symbolic nodes and vice versa, c) a symbolic graph reasoning module that enforces semantic coherence and d) a pooling module for aggregating the refined node and edge representations for downstream classification tasks. We demonstrate the generality of our approach on a variety of tasks, such as temporal subactivity classification and object affordance classification on the CAD-120 dataset and multilabel temporal action localization on the large scale Charades dataset, where we outperform existing deep learning approaches, using only raw RGB frames. |
Tasks | Action Classification, Action Detection, Action Localization, action segmentation, Temporal Action Localization, Video Understanding |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.07385v1 |
https://arxiv.org/pdf/1905.07385v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-message-passing-on-hybrid-spatio |
Repo | |
Framework | |
Spatio-Temporal Action Localization in a Weakly Supervised Setting
Title | Spatio-Temporal Action Localization in a Weakly Supervised Setting |
Authors | Kurt Degiorgio, Fabio Cuzzolin |
Abstract | Enabling computational systems with the ability to localize actions in video-based content has manifold applications. Traditionally, such a problem is approached in a fully-supervised setting where video-clips with complete frame-by-frame annotations around the actions of interest are provided for training. However, the data requirements needed to achieve adequate generalization in this setting is prohibitive. In this work, we circumvent this issue by casting the problem in a weakly supervised setting, i.e., by considering videos as labelled `sets’ of unlabelled video segments. Firstly, we apply unsupervised segmentation to take advantage of the elementary structure of each video. Subsequently, a convolutional neural network is used to extract RGB features from the resulting video segments. Finally, Multiple Instance Learning (MIL) is employed to predict labels at the video segment level, thus inherently performing spatio-temporal action detection. In contrast to previous work, we make use of a different MIL formulation in which the label of each video segment is continuous rather then discrete, making the resulting optimization function tractable. Additionally, we utilize a set splitting technique for regularization. Experimental results considering multiple performance indicators on the UCF-Sports data-set support the effectiveness of our approach. | |
Tasks | Action Detection, Action Localization, Multiple Instance Learning, Spatio-Temporal Action Localization, Temporal Action Localization |
Published | 2019-05-06 |
URL | https://arxiv.org/abs/1905.02171v1 |
https://arxiv.org/pdf/1905.02171v1.pdf | |
PWC | https://paperswithcode.com/paper/spatio-temporal-action-localization-in-a |
Repo | |
Framework | |
LSTM-TDNN with convolutional front-end for Dialect Identification in the 2019 Multi-Genre Broadcast Challenge
Title | LSTM-TDNN with convolutional front-end for Dialect Identification in the 2019 Multi-Genre Broadcast Challenge |
Authors | Xiaoxiao Miao, Ian McLoughlin |
Abstract | This paper presents a novel Dialect Identification (DID) system developed for the Fifth Edition of the Multi-Genre Broadcast challenge, the task of Fine-grained Arabic Dialect Identification (MGB-5 ADI Challenge). The system improves upon traditional DNN x-vector performance by employing a Convolutional and Long Short Term Memory-Recurrent (CLSTM) architecture to combine the benefits of a convolutional neural network front-end for feature extraction and a back-end recurrent neural to capture longer temporal dependencies. Furthermore we investigate intensive augmentation of one low resource dialect in the highly unbalanced training set using time-scale modification (TSM). This converts an utterance to several time-stretched or time-compressed versions, subsequently used to train the CLSTM system without using any other corpus. In this paper, we also investigate speech augmentation using MUSAN and the RIR datasets to increase the quantity and diversity of the existing training data in the normal way. Results show firstly that the CLSTM architecture outperforms a traditional DNN x-vector implementation. Secondly, adopting TSM-based speed perturbation yields a small performance improvement for the unbalanced data, finally that traditional data augmentation techniques yield further benefit, in line with evidence from related speaker and language recognition tasks. Our system achieved 2nd place ranking out of 15 entries in the MGB-5 ADI challenge, presented at ASRU 2019. |
Tasks | Data Augmentation |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.09003v1 |
https://arxiv.org/pdf/1912.09003v1.pdf | |
PWC | https://paperswithcode.com/paper/lstm-tdnn-with-convolutional-front-end-for |
Repo | |
Framework | |
Diversifying Inference Path Selection: Moving-Mobile-Network for Landmark Recognition
Title | Diversifying Inference Path Selection: Moving-Mobile-Network for Landmark Recognition |
Authors | Biao Qian, Yang Wang, Zhao Zhang, Richang Hong, Meng Wang, Ling Shao |
Abstract | Deep convolutional neural networks have largely benefited computer vision tasks. However, the high computational complexity limits their real-world applications. To this end, many methods have been proposed for efficient network learning, and applications in portable mobile devices. In this paper, we propose a novel \underline{M}oving-\underline{M}obile-\underline{Net}work, named M$^2$Net, for landmark recognition, equipped each landmark image with located geographic information. We intuitively find that M$^2$Net can essentially promote the diversity of the inference path (selected blocks subset) selection, so as to enhance the recognition accuracy. The above intuition is achieved by our proposed reward function with the input of geo-location and landmarks. We also find that the performance of other portable networks can be improved via our architecture. We construct two landmark image datasets, with each landmark associated with geographic information, over which we conduct extensive experiments to demonstrate that M$^2$Net achieves improved recognition accuracy with comparable complexity. |
Tasks | |
Published | 2019-12-01 |
URL | https://arxiv.org/abs/1912.00418v1 |
https://arxiv.org/pdf/1912.00418v1.pdf | |
PWC | https://paperswithcode.com/paper/diversifying-inference-path-selection-moving |
Repo | |
Framework | |
Stochastic Blockmodels with Edge Information
Title | Stochastic Blockmodels with Edge Information |
Authors | Guy W. Cole, Sinead A. Williamson |
Abstract | Stochastic blockmodels allow us to represent networks in terms of a latent community structure, often yielding intuitions about the underlying social structure. Typically, this structure is inferred based only on a binary network representing the presence or absence of interactions between nodes, which limits the amount of information that can be extracted from the data. In practice, many interaction networks contain much more information about the relationship between two nodes. For example, in an email network, the volume of communication between two users and the content of that communication can give us information about both the strength and the nature of their relationship. In this paper, we propose the Topic Blockmodel, a stochastic blockmodel that uses a count-based topic model to capture the interaction modalities within and between latent communities. By explicitly incorporating information sent between nodes in our network representation, we are able to address questions of interest in real-world situations, such as predicting recipients for an email message or inferring the content of an unopened email. Further, by considering topics associated with a pair of communities, we are better able to interpret the nature of each community and the manner in which it interacts with other communities. |
Tasks | |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.02016v1 |
http://arxiv.org/pdf/1904.02016v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-blockmodels-with-edge-information |
Repo | |
Framework | |
Joint Learning of Generative Translator and Classifier for Visually Similar Classes
Title | Joint Learning of Generative Translator and Classifier for Visually Similar Classes |
Authors | ByungIn Yoo, Tristan Sylvain, Yoshua Bengio, Junmo Kim |
Abstract | In this paper, we propose a Generative Translation Classification Network (GTCN) for improving visual classification accuracy in settings where classes are visually similar and data is scarce. For this purpose, we propose joint learning to train a classifier and a generative stochastic translation network end-to-end. The translation network is used to perform on-line data augmentation across classes, whereas previous works have mostly involved domain adaptation. To help the model further benefit from this data-augmentation, we introduce an adaptive fade-in loss and a quadruplet loss. We perform experiments on multiple datasets to demonstrate the proposed method’s performance in varied settings. Of particular interest, training on 40% of the dataset is enough for our model to surpass the performance of baselines trained on the full dataset. When our architecture is trained on the full dataset, we achieve comparable performance with state-of-the-art methods despite using a light-weight architecture. |
Tasks | Data Augmentation, Domain Adaptation |
Published | 2019-12-15 |
URL | https://arxiv.org/abs/1912.06994v1 |
https://arxiv.org/pdf/1912.06994v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-learning-of-generative-translator-and |
Repo | |
Framework | |