January 26, 2020

3451 words 17 mins read

Paper Group ANR 1474

Precise Estimation of Renal Vascular Dominant Regions Using Spatially Aware Fully Convolutional Networks, Tensor-Cut and Voronoi Diagrams. Integration of TensorFlow based Acoustic Model with Kaldi WFST Decoder. A Multi Purpose and Large Scale Speech Corpus in Persian and English for Speaker and Speech Recognition: the DeepMine Database. Partial AUC …

Precise Estimation of Renal Vascular Dominant Regions Using Spatially Aware Fully Convolutional Networks, Tensor-Cut and Voronoi Diagrams


Title	Precise Estimation of Renal Vascular Dominant Regions Using Spatially Aware Fully Convolutional Networks, Tensor-Cut and Voronoi Diagrams
Authors	Chenglong Wang, Holger R. Roth, Takayuki Kitasaka, Masahiro Oda, Yuichiro Hayashi, Yasushi Yoshino, Tokunori Yamamoto, Naoto Sassa, Momokazu Goto, Kensaku Mori
Abstract	This paper presents a new approach for precisely estimating the renal vascular dominant region using a Voronoi diagram. To provide computer-assisted diagnostics for the pre-surgical simulation of partial nephrectomy surgery, we must obtain information on the renal arteries and the renal vascular dominant regions. We propose a fully automatic segmentation method that combines a neural network and tensor-based graph-cut methods to precisely extract the kidney and renal arteries. First, we use a convolutional neural network to localize the kidney regions and extract tiny renal arteries with a tensor-based graph-cut method. Then we generate a Voronoi diagram to estimate the renal vascular dominant regions based on the segmented kidney and renal arteries. The accuracy of kidney segmentation in 27 cases with 8-fold cross validation reached a Dice score of 95%. The accuracy of renal artery segmentation in 8 cases obtained a centerline overlap ratio of 80%. Each partition region corresponds to a renal vascular dominant region. The final dominant-region estimation accuracy achieved a Dice coefficient of 80%. A clinical application showed the potential of our proposed estimation approach in a real clinical surgical environment. Further validation using large-scale database is our future work.
Tasks
Published	2019-08-05
URL	https://arxiv.org/abs/1908.01543v1
PDF	https://arxiv.org/pdf/1908.01543v1.pdf
PWC	https://paperswithcode.com/paper/precise-estimation-of-renal-vascular-dominant
Repo
Framework

Integration of TensorFlow based Acoustic Model with Kaldi WFST Decoder


Title	Integration of TensorFlow based Acoustic Model with Kaldi WFST Decoder
Authors	Minkyu Lim, Ji-Hwan Kim
Abstract	While the Kaldi framework provides state-of-the-art components for speech recognition like feature extraction, deep neural network (DNN)-based acoustic models, and a weighted finite state transducer (WFST)-based decoder, it is difficult to implement a new flexible DNN model. By contrast, a general-purpose deep learning framework, such as TensorFlow, can easily build various types of neural network architectures using a tensor-based computation method, but it is difficult to apply them to WFST-based speech recognition. In this study, a TensorFlow-based acoustic model is integrated with a WFST-based Kaldi decoder to combine the two frameworks. The features and alignments used in Kaldi are converted so they can be trained by the TensorFlow model, and the DNN-based acoustic model is then trained. In the integrated Kaldi decoder, the posterior probabilities are calculated by querying the trained TensorFlow model, and a beam search is performed to generate the lattice. The advantages of the proposed one-pass decoder include the application of various types of neural networks to WFST-based speech recognition and WFST-based online decoding using a TensorFlow-based acoustic model. The TensorFlow based acoustic models trained using the RM, WSJ, and LibriSpeech datasets show the same level of performance as the model trained using the Kaldi framework.
Tasks	Speech Recognition
Published	2019-06-21
URL	https://arxiv.org/abs/1906.11018v1
PDF	https://arxiv.org/pdf/1906.11018v1.pdf
PWC	https://paperswithcode.com/paper/integration-of-tensorflow-based-acoustic
Repo
Framework

A Multi Purpose and Large Scale Speech Corpus in Persian and English for Speaker and Speech Recognition: the DeepMine Database


Title	A Multi Purpose and Large Scale Speech Corpus in Persian and English for Speaker and Speech Recognition: the DeepMine Database
Authors	Hossein Zeinali, Lukáš Burget, Jan “Honza’’ Černocký
Abstract	DeepMine is a speech database in Persian and English designed to build and evaluate text-dependent, text-prompted, and text-independent speaker verification, as well as Persian speech recognition systems. It contains more than 1850 speakers and 540 thousand recordings overall, more than 480 hours of speech are transcribed. It is the first public large-scale speaker verification database in Persian, the largest public text-dependent and text-prompted speaker verification database in English, and the largest public evaluation dataset for text-independent speaker verification. It has a good coverage of age, gender, and accents. We provide several evaluation protocols for each part of the database to allow for research on different aspects of speaker verification. We also provide the results of several experiments that can be considered as baselines: HMM-based i-vectors for text-dependent speaker verification, and HMM-based as well as state-of-the-art deep neural network based ASR. We demonstrate that the database can serve for training robust ASR models.
Tasks	Speaker Verification, Speech Recognition, Text-Dependent Speaker Verification, Text-Independent Speaker Verification
Published	2019-12-08
URL	https://arxiv.org/abs/1912.03627v1
PDF	https://arxiv.org/pdf/1912.03627v1.pdf
PWC	https://paperswithcode.com/paper/a-multi-purpose-and-large-scale-speech-corpus
Repo
Framework

Partial AUC optimization based deep speaker embeddings with class-center learning for text-independent speaker verification


Title	Partial AUC optimization based deep speaker embeddings with class-center learning for text-independent speaker verification
Authors	Zhongxin Bai, Xiao-Lei Zhang, Jingdong Chen
Abstract	Deep embedding based text-independent speaker verification has demonstrated superior performance to traditional methods in many challenging scenarios. Its loss functions can be generally categorized into two classes, i.e., verification and identification. The verification loss functions match the pipeline of speaker verification, but their implementations are difficult. Thus, most state-of-the-art deep embedding methods use the identification loss functions with softmax output units or their variants. In this paper, we propose a verification loss function, named the maximization of partial area under the Receiver-operating-characteristic (ROC) curve (pAUC), for deep embedding based text-independent speaker verification. We also propose a class-center based training trial construction method to improve the training efficiency, which is critical for the proposed loss function to be comparable to the identification loss in performance. Experiments on the Speaker in the Wild (SITW) and NIST SRE 2016 datasets show that the proposed pAUC loss function is highly competitive with the state-of-the-art identification loss functions.
Tasks	Speaker Verification, Text-Independent Speaker Verification
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08077v1
PDF	https://arxiv.org/pdf/1911.08077v1.pdf
PWC	https://paperswithcode.com/paper/partial-auc-optimization-based-deep-speaker
Repo
Framework

Inference with Hybrid Bio-hardware Neural Networks


Title	Inference with Hybrid Bio-hardware Neural Networks
Authors	Yuan Zeng, Zubayer Ibne Ferdous, Weixiang Zhang, Mufan Xu, Anlan Yu, Drew Patel, Xiaochen Guo, Yevgeny Berdichevsky, Zhiyuan Yan
Abstract	To understand the learning process in brains, biologically plausible algorithms have been explored by modeling the detailed neuron properties and dynamics. On the other hand, simplified multi-layer models of neural networks have shown great success on computational tasks such as image classification and speech recognition. However, the computational models that can achieve good accuracy for these learning applications are very different from the bio-plausible models. This paper studies whether a bio-plausible model of a in vitro living neural network can be used to perform machine learning tasks and achieve good inference accuracy. A novel two-layer bio-hardware hybrid neural network is proposed. The biological layer faithfully models variations of synapses, neurons, and network sparsity in in vitro living neural networks. The hardware layer is a computational fully-connected layer that tunes parameters to optimize for accuracy. Several techniques are proposed to improve the inference accuracy of the proposed hybrid neural network. For instance, an adaptive pre-processing technique helps the proposed neural network to achieve good learning accuracy for different living neural network sparsity. The proposed hybrid neural network with realistic neuron parameters and variations achieves a 98.3% testing accuracy for the handwritten digit recognition task on the full MNIST dataset.
Tasks	Handwritten Digit Recognition, Image Classification, Speech Recognition
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11594v2
PDF	https://arxiv.org/pdf/1905.11594v2.pdf
PWC	https://paperswithcode.com/paper/inference-with-hybrid-bio-hardware-neural
Repo
Framework

BERT Meets Chinese Word Segmentation


Title	BERT Meets Chinese Word Segmentation
Authors	Haiqin Yang
Abstract	Chinese word segmentation (CWS) is a fundamental task for Chinese language understanding. Recently, neural network-based models have attained superior performance in solving the in-domain CWS task. Last year, Bidirectional Encoder Representation from Transformers (BERT), a new language representation model, has been proposed as a backbone model for many natural language tasks and redefined the corresponding performance. The excellent performance of BERT motivates us to apply it to solve the CWS task. By conducting intensive experiments in the benchmark datasets from the second International Chinese Word Segmentation Bake-off, we obtain several keen observations. BERT can slightly improve the performance even when the datasets contain the issue of labeling inconsistency. When applying sufficiently learned features, Softmax, a simpler classifier, can attain the same performance as that of a more complicated classifier, e.g., Conditional Random Field (CRF). The performance of BERT usually increases as the model size increases. The features extracted by BERT can be also applied as good candidates for other neural network models.
Tasks	Chinese Word Segmentation
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09292v1
PDF	https://arxiv.org/pdf/1909.09292v1.pdf
PWC	https://paperswithcode.com/paper/bert-meets-chinese-word-segmentation
Repo
Framework

Systematic Analysis of Image Generation using GANs


Title	Systematic Analysis of Image Generation using GANs
Authors	Rohan Akut, Sumukh Marathe, Rucha Apte, Ishan Joshi, Siddhivinayak Kulkarni
Abstract	Generative Adversarial Networks have been crucial in the developments made in unsupervised learning in recent times. Exemplars of image synthesis from text or other images, these networks have shown remarkable improvements over conventional methods in terms of performance. Trained on the adversarial training philosophy, these networks aim to estimate the potential distribution from the real data and then use this as input to generate the synthetic data. Based on this fundamental principle, several frameworks can be generated that are paragon implementations in several real-life applications such as art synthesis, generation of high resolution outputs and synthesis of images from human drawn sketches, to name a few. While theoretically GANs present better results and prove to be an improvement over conventional methods in many factors, the implementation of these frameworks for dedicated applications remains a challenge. This study explores and presents a taxonomy of these frameworks and their use in various image to image synthesis and text to image synthesis applications. The basic GANs, as well as a variety of different niche frameworks, are critically analyzed. The advantages of GANs for image generation over conventional methods as well their disadvantages amongst other frameworks are presented. The future applications of GANs in industries such as healthcare, art and entertainment are also discussed.
Tasks	Image Generation
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11863v1
PDF	https://arxiv.org/pdf/1908.11863v1.pdf
PWC	https://paperswithcode.com/paper/systematic-analysis-of-image-generation-using
Repo
Framework

Graph Convolutional Networks Meet with High Dimensionality Reduction


Title	Graph Convolutional Networks Meet with High Dimensionality Reduction
Authors	Mustafa Coskun
Abstract	Recently, Graph Convolutional Networks (GCNs) and their variants have been receiving many research interests for learning graph-related tasks. While the GCNs have been successfully applied to this problem, some caveats inherited from classical deep learning still remain as open research topics in the context of the node classification problem. One such inherited caveat is that GCNs only consider the nodes that are a few propagations away from the labeled nodes to classify them. However, taking only a few propagation steps away nodes into account defeats the purpose of using the graph topological information in the GCNs. To remedy this problem, the-state-of-the-art methods leverage the network diffusion approaches, namely personalized page rank and its variants, to fully account for the graph topology, {\em after} they use the Neural Networks in the GCNs. However, these approaches overlook the fact that the network diffusion methods favour high degree nodes in the graph, resulting in the propagation of labels to unlabeled centralized, hub, nodes. To address this biasing hub nodes problem, in this paper, we propose to utilize a dimensionality reduction technique conjugate with personalized page rank so that we can both take advantage from graph topology and resolve the hub node favouring problem for GCNs. Here, our approach opens a new holistic road for message passing phase of GCNs by suggesting the usage of other proximity matrices instead of well-known Laplacian. Testing on two real-world networks that are commonly used in benchmarking GCNs’ performance for the node classification context, we systematically evaluate the performance of the proposed methodology and show that our approach outperforms existing methods for wide ranges of parameter values with very limited deep learning training {\em epochs}.
Tasks	Dimensionality Reduction, Node Classification
Published	2019-11-07
URL	https://arxiv.org/abs/1911.02928v1
PDF	https://arxiv.org/pdf/1911.02928v1.pdf
PWC	https://paperswithcode.com/paper/graph-convolutional-networks-meet-with-high
Repo
Framework

Abductive Reasoning as Self-Supervision for Common Sense Question Answering


Title	Abductive Reasoning as Self-Supervision for Common Sense Question Answering
Authors	Sathyanarayanan N. Aakur, Sudeep Sarkar
Abstract	Question answering has seen significant advances in recent times, especially with the introduction of increasingly bigger transformer-based models pre-trained on massive amounts of data. While achieving impressive results on many benchmarks, their performances appear to be proportional to the amount of training data available in the target domain. In this work, we explore the ability of current question-answering models to generalize - to both other domains as well as with restricted training data. We find that large amounts of training data are necessary, both for pre-training as well as fine-tuning to a task, for the models to perform well on the designated task. We introduce a novel abductive reasoning approach based on Grenander’s Pattern Theory framework to provide self-supervised domain adaptation cues or “pseudo-labels,” which can be used instead of expensive human annotations. The proposed self-supervised training regimen allows for effective domain adaptation without losing performance compared to fully supervised baselines. Extensive experiments on two publicly available benchmarks show the efficacy of the proposed approach. We show that neural networks models trained using self-labeled data can retain up to $75%$ of the performance of models trained on large amounts of human-annotated training data.
Tasks	Common Sense Reasoning, Domain Adaptation, Question Answering
Published	2019-09-06
URL	https://arxiv.org/abs/1909.03099v2
PDF	https://arxiv.org/pdf/1909.03099v2.pdf
PWC	https://paperswithcode.com/paper/abductive-reasoning-as-self-supervision-for
Repo
Framework

An End-to-End Text-independent Speaker Verification Framework with a Keyword Adversarial Network


Title	An End-to-End Text-independent Speaker Verification Framework with a Keyword Adversarial Network
Authors	Sungrack Yun, Janghoon Cho, Jungyun Eum, Wonil Chang, Kyuwoong Hwang
Abstract	This paper presents an end-to-end text-independent speaker verification framework by jointly considering the speaker embedding (SE) network and automatic speech recognition (ASR) network. The SE network learns to output an embedding vector which distinguishes the speaker characteristics of the input utterance, while the ASR network learns to recognize the phonetic context of the input. In training our speaker verification framework, we consider both the triplet loss minimization and adversarial gradient of the ASR network to obtain more discriminative and text-independent speaker embedding vectors. With the triplet loss, the distances between the embedding vectors of the same speaker are minimized while those of different speakers are maximized. Also, with the adversarial gradient of the ASR network, the text-dependency of the speaker embedding vector can be reduced. In the experiments, we evaluated our speaker verification framework using the LibriSpeech and CHiME 2013 dataset, and the evaluation results show that our speaker verification framework shows lower equal error rate and better text-independency compared to the other approaches.
Tasks	Speaker Verification, Speech Recognition, Text-Independent Speaker Verification
Published	2019-08-06
URL	https://arxiv.org/abs/1908.02612v1
PDF	https://arxiv.org/pdf/1908.02612v1.pdf
PWC	https://paperswithcode.com/paper/an-end-to-end-text-independent-speaker
Repo
Framework

2D moment invariants from the point of view of the classical invariant theory


Title	2D moment invariants from the point of view of the classical invariant theory
Authors	Leonid Bedratyuk
Abstract	Invariants allow to classify images up to the action of a group of transformations. In this paper we introduce notions of the algebras of simultaneous polynomial and rational 2D moment invariants and prove that they are isomorphic to the algebras of joint polynomial and rational $SO(2)$-invariants of binary forms. Also, to simplify the calculating of invariants we pass from an action of Lie group $SO(2)$ to an action of its Lie algebra $\mathfrak{so}_2$. This allow us to reduce the problem to standard problems of the classical invariant theory.
Tasks
Published	2019-08-20
URL	https://arxiv.org/abs/1908.08927v2
PDF	https://arxiv.org/pdf/1908.08927v2.pdf
PWC	https://paperswithcode.com/paper/2d-moment-invariants-from-the-point-of-view
Repo
Framework

3D Face Pose and Animation Tracking via Eigen-Decomposition based Bayesian Approach


Title	3D Face Pose and Animation Tracking via Eigen-Decomposition based Bayesian Approach
Authors	Ngoc-Trung Tran, Fakhr-Eddine Ababsa, Maurice Charbit, Jacques Feldmar, Dijana Petrovska-Delacrétaz, Gérard Chollet
Abstract	This paper presents a new method to track both the face pose and the face animation with a monocular camera. The approach is based on the 3D face model CANDIDE and on the SIFT (Scale Invariant Feature Transform) descriptors, extracted around a few given landmarks (26 selected vertices of CANDIDE model) with a Bayesian approach. The training phase is performed on a synthetic database generated from the first video frame. At each current frame, the face pose and animation parameters are estimated via a Bayesian approach, with a Gaussian prior and a Gaussian likelihood function whose the mean and the covariance matrix eigenvalues are updated from the previous frame using eigen decomposition. Numerical results on pose estimation and landmark locations are reported using the Boston University Face Tracking (BUFT) database and Talking Face video. They show that our approach, compared to six other published algorithms, provides a very good compromise and presents a promising perspective due to the good results in terms of landmark localization.
Tasks	Pose Estimation
Published	2019-08-29
URL	https://arxiv.org/abs/1908.11039v1
PDF	https://arxiv.org/pdf/1908.11039v1.pdf
PWC	https://paperswithcode.com/paper/3d-face-pose-and-animation-tracking-via-eigen
Repo
Framework

BIAS: Transparent reporting of biomedical image analysis challenges


Title	BIAS: Transparent reporting of biomedical image analysis challenges
Authors	Lena Maier-Hein, Annika Reinke, Michal Kozubek, Anne L. Martel, Tal Arbel, Matthias Eisenmann, Allan Hanbuary, Pierre Jannin, Henning Müller, Sinan Onogur, Julio Saez-Rodriguez, Bram van Ginneken, Annette Kopp-Schneider, Bennett Landman
Abstract	The number of biomedical image analysis challenges organized per year is steadily increasing. These international competitions have the purpose of benchmarking algorithms on common data sets, typically to identify the best method for a given problem. Recent research, however, revealed that common practice related to challenge reporting does not allow for adequate interpretation and reproducibility of results. To address the discrepancy between the impact of challenges and the quality (control), the Biomedical I mage Analysis ChallengeS (BIAS) initiative developed a set of recommendations for the reporting of challenges. The BIAS statement aims to improve the transparency of the reporting of a biomedical image analysis challenge regardless of field of application, image modality or task category assessed. This article describes how the BIAS statement was developed and presents a checklist which authors of biomedical image analysis challenges are encouraged to include in their submission when giving a paper on a challenge into review. The purpose of the checklist is to standardize and facilitate the review process and raise interpretability and reproducibility of challenge results by making relevant information explicit.
Tasks
Published	2019-10-09
URL	https://arxiv.org/abs/1910.04071v3
PDF	https://arxiv.org/pdf/1910.04071v3.pdf
PWC	https://paperswithcode.com/paper/bias-transparent-reporting-of-biomedical
Repo
Framework

Push for Quantization: Deep Fisher Hashing


Title	Push for Quantization: Deep Fisher Hashing
Authors	Yunqiang Li, Wenjie Pei, Yufei zha, Jan van Gemert
Abstract	Current massive datasets demand light-weight access for analysis. Discrete hashing methods are thus beneficial because they map high-dimensional data to compact binary codes that are efficient to store and process, while preserving semantic similarity. To optimize powerful deep learning methods for image hashing, gradient-based methods are required. Binary codes, however, are discrete and thus have no continuous derivatives. Relaxing the problem by solving it in a continuous space and then quantizing the solution is not guaranteed to yield separable binary codes. The quantization needs to be included in the optimization. In this paper we push for quantization: We optimize maximum class separability in the binary space. We introduce a margin on distances between dissimilar image pairs as measured in the binary space. In addition to pair-wise distances, we draw inspiration from Fisher’s Linear Discriminant Analysis (Fisher LDA) to maximize the binary distances between classes and at the same time minimize the binary distance of images within the same class. Experiments on CIFAR-10, NUS-WIDE and ImageNet100 demonstrate compact codes comparing favorably to the current state of the art.
Tasks	Quantization, Semantic Similarity, Semantic Textual Similarity
Published	2019-08-31
URL	https://arxiv.org/abs/1909.00206v1
PDF	https://arxiv.org/pdf/1909.00206v1.pdf
PWC	https://paperswithcode.com/paper/push-for-quantization-deep-fisher-hashing
Repo
Framework

Real-time Approximate Bayesian Computation for Scene Understanding


Title	Real-time Approximate Bayesian Computation for Scene Understanding
Authors	Javier Felip, Nilesh Ahuja, David Gómez-Gutiérrez, Omesh Tickoo, Vikash Mansinghka
Abstract	Consider scene understanding problems such as predicting where a person is probably reaching, or inferring the pose of 3D objects from depth images, or inferring the probable street crossings of pedestrians at a busy intersection. This paper shows how to solve these problems using Approximate Bayesian Computation. The underlying generative models are built from realistic simulation software, wrapped in a Bayesian error model for the gap between simulation outputs and real data. The simulators are drawn from off-the-shelf computer graphics, video game, and traffic simulation code. The paper introduces two techniques for speeding up inference that can be used separately or in combination. The first is to train neural surrogates of the simulators, using a simple form of domain randomization to make the surrogates more robust to the gap between the simulation and reality. The second is to adaptively discretize the latent variables using a Tree-pyramid approach adapted from computer graphics. This paper also shows performance and accuracy measurements on real-world problems, establishing that it is feasible to solve these problems in real-time.
Tasks	Scene Understanding
Published	2019-05-22
URL	https://arxiv.org/abs/1905.13307v1
PDF	https://arxiv.org/pdf/1905.13307v1.pdf
PWC	https://paperswithcode.com/paper/190513307
Repo
Framework