January 28, 2020

3097 words 15 mins read

Paper Group ANR 890

Paper Group ANR 890

Joint Information Preservation for Heterogeneous Domain Adaptation. Context-endcoding for neural network based skull stripping in magnetic resonance imaging. FRNET: Flattened Residual Network for Infant MRI Skull Stripping. Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR. Vadere: An open-source simulation framework to prom …

Joint Information Preservation for Heterogeneous Domain Adaptation

Title Joint Information Preservation for Heterogeneous Domain Adaptation
Authors Peng Xu, Zhaohong Deng, Kup-Sze Choi, Jun Wang, Shitong Wang
Abstract Domain adaptation aims to assist the modeling tasks of the target domain with knowledge of the source domain. The two domains often lie in different feature spaces due to diverse data collection methods, which leads to the more challenging task of heterogeneous domain adaptation (HDA). A core issue of HDA is how to preserve the information of the original data during adaptation. In this paper, we propose a joint information preservation method to deal with the problem. The method preserves the information of the original data from two aspects. On the one hand, although paired samples often exist between the two domains of the HDA, current algorithms do not utilize such information sufficiently. The proposed method preserves the paired information by maximizing the correlation of the paired samples in the shared subspace. On the other hand, the proposed method improves the strategy of preserving the structural information of the original data, where the local and global structural information are preserved simultaneously. Finally, the joint information preservation is integrated by distribution matching. Experimental results show the superiority of the proposed method over the state-of-the-art HDA algorithms.
Tasks Domain Adaptation
Published 2019-05-22
URL https://arxiv.org/abs/1905.08924v1
PDF https://arxiv.org/pdf/1905.08924v1.pdf
PWC https://paperswithcode.com/paper/joint-information-preservation-for
Repo
Framework

Context-endcoding for neural network based skull stripping in magnetic resonance imaging

Title Context-endcoding for neural network based skull stripping in magnetic resonance imaging
Authors Zhen Liu, Borui Xiao, Yuemeng Li, Yong Fan
Abstract Skull stripping is usually the first step for most brain analysisprocess in magnetic resonance images. A lot of deep learn-ing neural network based methods have been developed toachieve higher accuracy. Since the 3D deep learning modelssuffer from high computational cost and are subject to GPUmemory limit challenge, a variety of 2D deep learning meth-ods have been developed. However, existing 2D deep learn-ing methods are not equipped to effectively capture 3D se-mantic information that is needed to achieve higher accuracy.In this paper, we propose a context-encoding method to em-power the 2D network to capture the 3D context information.For the context-encoding method, firstly we encode the 2Dfeatures of original 2D network, secondly we encode the sub-volume of 3D MRI images, finally we fuse the encoded 2Dfeatures and 3D features with semantic encoding classifica-tion loss. To get computational efficiency, although we en-code the sub-volume of 3D MRI images instead of buildinga 3D neural network, extensive experiments on three bench-mark Datasets demonstrate our method can achieve superioraccuracy to state-of-the-art alternative methods with the dicescore 99.6% on NFBS and 99.09 % on LPBA40 and 99.17 %on OASIS.
Tasks Skull Stripping
Published 2019-10-23
URL https://arxiv.org/abs/1910.10798v1
PDF https://arxiv.org/pdf/1910.10798v1.pdf
PWC https://paperswithcode.com/paper/context-endcoding-for-neural-network-based
Repo
Framework

FRNET: Flattened Residual Network for Infant MRI Skull Stripping

Title FRNET: Flattened Residual Network for Infant MRI Skull Stripping
Authors Qian Zhang, Li Wang, Xiaopeng Zong, Weili Lin, Gang Li, Dinggang Shen
Abstract Skull stripping for brain MR images is a basic segmentation task. Although many methods have been proposed, most of them focused mainly on the adult MR images. Skull stripping for infant MR images is more challenging due to the small size and dynamic intensity changes of brain tissues during the early ages. In this paper, we propose a novel CNN based framework to robustly extract brain region from infant MR image without any human assistance. Specifically, we propose a simplified but more robust flattened residual network architecture (FRnet). We also introduce a new boundary loss function to highlight ambiguous and low contrast regions between brain and non-brain regions. To make the whole framework more robust to MR images with different imaging quality, we further introduce an artifact simulator for data augmentation. We have trained and tested our proposed framework on a large dataset (N=343), covering newborns to 48-month-olds, and obtained performance better than the state-of-the-art methods in all age groups.
Tasks Data Augmentation, Skull Stripping
Published 2019-04-11
URL http://arxiv.org/abs/1904.05578v1
PDF http://arxiv.org/pdf/1904.05578v1.pdf
PWC https://paperswithcode.com/paper/frnet-flattened-residual-network-for-infant
Repo
Framework

Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR

Title Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR
Authors Felix Weninger, Jesús Andrés-Ferrer, Xinwei Li, Puming Zhan
Abstract Sequence-to-sequence (seq2seq) based ASR systems have shown state-of-the-art performances while having clear advantages in terms of simplicity. However, comparisons are mostly done on speaker independent (SI) ASR systems, though speaker adapted conventional systems are commonly used in practice for improving robustness to speaker and environment variations. In this paper, we apply speaker adaptation to seq2seq models with the goal of matching the performance of conventional ASR adaptation. Specifically, we investigate Kullback-Leibler divergence (KLD) as well as Linear Hidden Network (LHN) based adaptation for seq2seq ASR, using different amounts (up to 20 hours) of adaptation data per speaker. Our SI models are trained on large amounts of dictation data and achieve state-of-the-art results. We obtained 25% relative word error rate (WER) improvement with KLD adaptation of the seq2seq model vs. 18.7% gain from acoustic model adaptation in the conventional system. We also show that the WER of the seq2seq model decreases log-linearly with the amount of adaptation data. Finally, we analyze adaptation based on the minimum WER criterion and adapting the language model (LM) for score fusion with the speaker adapted seq2seq model, which result in further improvements of the seq2seq system performance.
Tasks Language Modelling
Published 2019-07-08
URL https://arxiv.org/abs/1907.04916v1
PDF https://arxiv.org/pdf/1907.04916v1.pdf
PWC https://paperswithcode.com/paper/listen-attend-spell-and-adapt-speaker-adapted
Repo
Framework

Vadere: An open-source simulation framework to promote interdisciplinary understanding

Title Vadere: An open-source simulation framework to promote interdisciplinary understanding
Authors Benedikt Kleinmeier, Benedikt Zönnchen, Marion Gödel, Gerta Köster
Abstract Pedestrian dynamics is an interdisciplinary field of research. Psychologists, sociologists, traffic engineers, physicists, mathematicians and computer scientists all strive to understand the dynamics of a moving crowd. In principle, computer simulations offer means to further this understanding. Yet, unlike for many classic dynamical systems in physics, there is no universally accepted locomotion model for crowd dynamics. On the contrary, a multitude of approaches, with very different characteristics, compete. Often only the experts in one special model type are able to assess the consequences these characteristics have on a simulation study. Therefore, scientists from all disciplines who wish to use simulations to analyze pedestrian dynamics need a tool to compare competing approaches. Developers, too, would profit from an easy way to get insight into an alternative modeling ansatz. Vadere meets this interdisciplinary demand by offering an open-source simulation framework that is lightweight in its approach and in its user interface while offering pre-implemented versions of the most widely spread models.
Tasks
Published 2019-07-16
URL https://arxiv.org/abs/1907.09520v1
PDF https://arxiv.org/pdf/1907.09520v1.pdf
PWC https://paperswithcode.com/paper/vadere-an-open-source-simulation-framework-to
Repo
Framework

Differentiable Deep Clustering with Cluster Size Constraints

Title Differentiable Deep Clustering with Cluster Size Constraints
Authors Aude Genevay, Gabriel Dulac-Arnold, Jean-Philippe Vert
Abstract Clustering is a fundamental unsupervised learning approach. Many clustering algorithms – such as $k$-means – rely on the euclidean distance as a similarity measure, which is often not the most relevant metric for high dimensional data such as images. Learning a lower-dimensional embedding that can better reflect the geometry of the dataset is therefore instrumental for performance. We propose a new approach for this task where the embedding is performed by a differentiable model such as a deep neural network. By rewriting the $k$-means clustering algorithm as an optimal transport task, and adding an entropic regularization, we derive a fully differentiable loss function that can be minimized with respect to both the embedding parameters and the cluster parameters via stochastic gradient descent. We show that this new formulation generalizes a recently proposed state-of-the-art method based on soft-$k$-means by adding constraints on the cluster sizes. Empirical evaluations on image classification benchmarks suggest that compared to state-of-the-art methods, our optimal transport-based approach provide better unsupervised accuracy and does not require a pre-training phase.
Tasks Image Classification
Published 2019-10-20
URL https://arxiv.org/abs/1910.09036v1
PDF https://arxiv.org/pdf/1910.09036v1.pdf
PWC https://paperswithcode.com/paper/differentiable-deep-clustering-with-cluster
Repo
Framework

DELP-DAR System for License Plate Detection and Recognition

Title DELP-DAR System for License Plate Detection and Recognition
Authors Zied Selmi, Mohamed Ben Halima, Umapada Pal, M. Adel Alimi
Abstract Automatic License Plate detection and Recognition (ALPR) is a quite popular and active research topic in the field of computer vision, image processing and intelligent transport systems. ALPR is used to make detection and recognition processes more robust and efficient in highly complicated environments and backgrounds. Several research investigations are still necessary due to some constraints such as: completeness of numbering systems of countries, different colors, various languages, multiple sizes and varied fonts. For this, we present in this paper an automatic framework for License Plate (LP) detection and recognition from complex scenes. Our framework is based on mask region convolutional neural networks used for LP detection, segmentation and recognition. Although some studies have focused on LP detection, LP recognition, LP segmentation or just two of them, our study uses the maskr-cnn in the three stages. The evaluation of our framework is enhanced by four datasets for different countries and consequently with various languages. In fact, it tested on four datasets including images captured from multiple scenes under numerous conditions such as varied orientation, poor quality images, blurred images and complex environmental backgrounds. Extensive experiments show the robustness and efficiency of our suggested framework in all datasets.
Tasks
Published 2019-10-04
URL https://arxiv.org/abs/1910.01853v1
PDF https://arxiv.org/pdf/1910.01853v1.pdf
PWC https://paperswithcode.com/paper/delp-dar-system-for-license-plate-detection
Repo
Framework

Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets

Title Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
Authors Ola Spjuth, Robin Carrión Brännström, Lars Carlsson, Niharika Gauraha
Abstract Conformal Prediction is a framework that produces prediction intervals based on the output from a machine learning algorithm. In this paper we explore the case when training data is made up of multiple parts available in different sources that cannot be pooled. We here consider the regression case and propose a method where a conformal predictor is trained on each data source independently, and where the prediction intervals are then combined into a single interval. We call the approach Non-Disclosed Conformal Prediction (NDCP), and we evaluate it on a regression dataset from the UCI machine learning repository using support vector regression as the underlying machine learning algorithm, with varying number of data sources and sizes. The results show that the proposed method produces conservatively valid prediction intervals, and while we cannot retain the same efficiency as when all data is used, efficiency is improved through the proposed approach as compared to predicting using a single arbitrarily chosen source.
Tasks
Published 2019-08-15
URL https://arxiv.org/abs/1908.05571v1
PDF https://arxiv.org/pdf/1908.05571v1.pdf
PWC https://paperswithcode.com/paper/combining-prediction-intervals-on-multi
Repo
Framework

Image-Question-Answer Synergistic Network for Visual Dialog

Title Image-Question-Answer Synergistic Network for Visual Dialog
Authors Dalu Guo, Chang Xu, Dacheng Tao
Abstract The image, question (combined with the history for de-referencing), and the corresponding answer are three vital components of visual dialog. Classical visual dialog systems integrate the image, question, and history to search for or generate the best matched answer, and so, this approach significantly ignores the role of the answer. In this paper, we devise a novel image-question-answer synergistic network to value the role of the answer for precise visual dialog. We extend the traditional one-stage solution to a two-stage solution. In the first stage, candidate answers are coarsely scored according to their relevance to the image and question pair. Afterward, in the second stage, answers with high probability of being correct are re-ranked by synergizing with image and question. On the Visual Dialog v1.0 dataset, the proposed synergistic network boosts the discriminative visual dialog model to achieve a new state-of-the-art of 57.88% normalized discounted cumulative gain. A generative visual dialog model equipped with the proposed technique also shows promising improvements.
Tasks Visual Dialog
Published 2019-02-26
URL http://arxiv.org/abs/1902.09774v1
PDF http://arxiv.org/pdf/1902.09774v1.pdf
PWC https://paperswithcode.com/paper/image-question-answer-synergistic-network-for
Repo
Framework

Quaternion Equivariant Capsule Networks for 3D Point Clouds

Title Quaternion Equivariant Capsule Networks for 3D Point Clouds
Authors Yongheng Zhao, Tolga Birdal, Jan Eric Lenssen, Emanuele Menegatti, Leonidas Guibas, Federico Tombari
Abstract We present a 3D capsule architecture for processing of point clouds that is equivariant with respect to the $SO(3)$ rotation group, translation and permutation of the unordered input sets. The network operates on a sparse set of local reference frames, computed from an input point cloud and establishes end-to-end equivariance through a novel 3D quaternion group capsule layer, including an equivariant dynamic routing procedure. The capsule layer enables us to disentangle geometry from pose, paving the way for more informative descriptions and a structured latent space. In the process, we theoretically connect the process of dynamic routing between capsules to the well-known Weiszfeld algorithm, a scheme for solving \emph{iterative re-weighted least squares (IRLS)} problems with provable convergence properties, enabling robust pose estimation between capsule layers. Due to the sparse equivariant quaternion capsules, our architecture allows joint object classification and orientation estimation, which we validate empirically on common benchmark datasets.
Tasks Object Classification, Pose Estimation
Published 2019-12-27
URL https://arxiv.org/abs/1912.12098v1
PDF https://arxiv.org/pdf/1912.12098v1.pdf
PWC https://paperswithcode.com/paper/quaternion-equivariant-capsule-networks-for-1
Repo
Framework

Efficient Real-Time Camera Based Estimation of Heart Rate and Its Variability

Title Efficient Real-Time Camera Based Estimation of Heart Rate and Its Variability
Authors Amogh Gudi, Marian Bittner, Roelof Lochmans, Jan van Gemert
Abstract Remote photo-plethysmography (rPPG) uses a remotely placed camera to estimating a person’s heart rate (HR). Similar to how heart rate can provide useful information about a person’s vital signs, insights about the underlying physio/psychological conditions can be obtained from heart rate variability (HRV). HRV is a measure of the fine fluctuations in the intervals between heart beats. However, this measure requires temporally locating heart beats with a high degree of precision. We introduce a refined and efficient real-time rPPG pipeline with novel filtering and motion suppression that not only estimates heart rate more accurately, but also extracts the pulse waveform to time heart beats and measure heart rate variability. This method requires no rPPG specific training and is able to operate in real-time. We validate our method on a self-recorded dataset under an idealized lab setting, and show state-of-the-art results on two public dataset with realistic conditions (VicarPPG and PURE).
Tasks Heart Rate Variability, Photoplethysmography (PPG)
Published 2019-09-03
URL https://arxiv.org/abs/1909.01206v1
PDF https://arxiv.org/pdf/1909.01206v1.pdf
PWC https://paperswithcode.com/paper/efficient-real-time-camera-based-estimation
Repo
Framework

Towards More Usable Dataset Search: From Query Characterization to Snippet Generation

Title Towards More Usable Dataset Search: From Query Characterization to Snippet Generation
Authors Jinchi Chen, Xiaxia Wang, Gong Cheng, Evgeny Kharlamov, Yuzhong Qu
Abstract Reusing published datasets on the Web is of great interest to researchers and developers. Their data needs may be met by submitting queries to a dataset search engine to retrieve relevant datasets. In this ongoing work towards developing a more usable dataset search engine, we characterize real data needs by annotating the semantics of 1,947 queries using a novel fine-grained scheme, to provide implications for enhancing dataset search. Based on the findings, we present a query-centered framework for dataset search, and explore the implementation of snippet generation and evaluate it with a preliminary user study.
Tasks
Published 2019-08-29
URL https://arxiv.org/abs/1908.11146v1
PDF https://arxiv.org/pdf/1908.11146v1.pdf
PWC https://paperswithcode.com/paper/towards-more-usable-dataset-search-from-query
Repo
Framework

Recognizing Topic Change in Search Sessions of Digital Libraries based on Thesaurus and Classification System

Title Recognizing Topic Change in Search Sessions of Digital Libraries based on Thesaurus and Classification System
Authors Daniel Hienert, Dagmar Kern
Abstract Log analysis in Web search showed that user sessions often contain several different topics. This means sessions need to be segmented into parts which handle the same topic in order to give appropriate user support based on the topic, and not on a mixture of topics. Different methods have been proposed to segment a user session to different topics based on timeouts, lexical analysis, query similarity or external knowledge sources. In this paper, we study the problem in a digital library for the social sciences. We present a method based on a thesaurus and a classification system which are typical knowledge organization systems in digital libraries. Five experts evaluated our approach and rated it as good for the segmentation of search sessions into parts that treat the same topic.
Tasks Lexical Analysis
Published 2019-09-24
URL https://arxiv.org/abs/1909.10736v1
PDF https://arxiv.org/pdf/1909.10736v1.pdf
PWC https://paperswithcode.com/paper/recognizing-topic-change-in-search-sessions
Repo
Framework

Polylingual Wordnet

Title Polylingual Wordnet
Authors Mihael Arcan, John McCrae, Paul Buitelaar
Abstract Princeton WordNet is one of the most important resources for natural language processing, but is only available for English. While it has been translated using the expand approach to many other languages, this is an expensive manual process. Therefore it would be beneficial to have a high-quality automatic translation approach that would support NLP techniques, which rely on WordNet in new languages. The translation of wordnets is fundamentally complex because of the need to translate all senses of a word including low frequency senses, which is very challenging for current machine translation approaches. For this reason we leverage existing translations of WordNet in other languages to identify contextual information for wordnet senses from a large set of generic parallel corpora. We evaluate our approach using 10 translated wordnets for European languages. Our experiment shows a significant improvement over translation without any contextual information. Furthermore, we evaluate how the choice of pivot languages affects performance of multilingual word sense disambiguation.
Tasks Machine Translation, Word Sense Disambiguation
Published 2019-03-04
URL http://arxiv.org/abs/1903.01411v1
PDF http://arxiv.org/pdf/1903.01411v1.pdf
PWC https://paperswithcode.com/paper/polylingual-wordnet
Repo
Framework

Testing Deep Learning Models for Image Analysis Using Object-Relevant Metamorphic Relations

Title Testing Deep Learning Models for Image Analysis Using Object-Relevant Metamorphic Relations
Authors Yongqiang Tian, Shiqing Ma, Ming Wen, Yepang Liu, Shing-Chi Cheung, Xiangyu Zhang
Abstract Deep learning models are widely used for image analysis. While they offer high performance in terms of accuracy, people are concerned about if these models inappropriately make inferences using irrelevant features that are not encoded from the target object in a given image. To address the concern, we propose a metamorphic testing approach that assesses if a given inference is made based on irrelevant features. Specifically, we propose two novel metamorphic relations to detect such inappropriate inferences. We applied our approach to 10 image classification models and 10 object detection models, with three large datasets, i.e., ImageNet, COCO, and Pascal VOC. Over 5.3% of the top-5 correct predictions made by the image classification models are subject to inappropriate inferences using irrelevant features. The corresponding rate for the object detection models is over 8.5%. Based on the findings, we further designed a new image generation strategy that can effectively attack existing models. Comparing with a baseline approach, our strategy can double the success rate of attacks.
Tasks Image Classification, Image Generation, Object Detection
Published 2019-09-06
URL https://arxiv.org/abs/1909.03824v1
PDF https://arxiv.org/pdf/1909.03824v1.pdf
PWC https://paperswithcode.com/paper/testing-deep-learning-models-for-image
Repo
Framework
comments powered by Disqus