October 18, 2019

3168 words 15 mins read

Paper Group ANR 549

Selective Zero-Shot Classification with Augmented Attributes. Short utterance compensation in speaker verification via cosine-based teacher-student learning of speaker embeddings. Deep Convolutional Neural Network for Plant Seedlings Classification. Generative x-vectors for text-independent speaker verification. Aligning Very Small Parallel Corpora …

Selective Zero-Shot Classification with Augmented Attributes


Title	Selective Zero-Shot Classification with Augmented Attributes
Authors	Jie Song, Chengchao Shen, Jie Lei, An-Xiang Zeng, Kairi Ou, Dacheng Tao, Mingli Song
Abstract	In this paper, we introduce a selective zero-shot classification problem: how can the classifier avoid making dubious predictions? Existing attribute-based zero-shot classification methods are shown to work poorly in the selective classification scenario. We argue the under-complete human defined attribute vocabulary accounts for the poor performance. We propose a selective zero-shot classifier based on both the human defined and the automatically discovered residual attributes. The proposed classifier is constructed by firstly learning the defined and the residual attributes jointly. Then the predictions are conducted within the subspace of the defined attributes. Finally, the prediction confidence is measured by both the defined and the residual attributes. Experiments conducted on several benchmarks demonstrate that our classifier produces a superior performance to other methods under the risk-coverage trade-off metric.
Tasks	Zero-Shot Learning
Published	2018-07-19
URL	http://arxiv.org/abs/1807.07437v1
PDF	http://arxiv.org/pdf/1807.07437v1.pdf
PWC	https://paperswithcode.com/paper/selective-zero-shot-classification-with
Repo
Framework

Short utterance compensation in speaker verification via cosine-based teacher-student learning of speaker embeddings


Title	Short utterance compensation in speaker verification via cosine-based teacher-student learning of speaker embeddings
Authors	Jee-weon Jung, Hee-soo Heo, Hye-jin Shim, Ha-jin Yu
Abstract	The short duration of an input utterance is one of the most critical threats that degrade the performance of speaker verification systems. This study aimed to develop an integrated text-independent speaker verification system that inputs utterances with short duration of 2 seconds or less. We propose an approach using a teacher-student learning framework for this goal, applied to short utterance compensation for the first time in our knowledge. The core concept of the proposed system is to conduct the compensation throughout the network that extracts the speaker embedding, mainly in phonetic-level, rather than compensating via a separate system after extracting the speaker embedding. In the proposed architecture, phonetic-level features where each feature represents a segment of 130 ms are extracted using convolutional layers. A layer of gated recurrent units extracts an utterance-level feature using phonetic-level features. The proposed approach also adopts a new objective function for teacher-student learning that considers both Kullback-Leibler divergence of output layers and cosine distance of speaker embeddings layers. Experiments were conducted using deep neural networks that take raw waveforms as input, and output speaker embeddings on VoxCeleb1 dataset. The proposed model could compensate approximately 65 % of the performance degradation due to the shortened duration.
Tasks	Speaker Verification, Text-Independent Speaker Verification
Published	2018-10-25
URL	http://arxiv.org/abs/1810.10884v2
PDF	http://arxiv.org/pdf/1810.10884v2.pdf
PWC	https://paperswithcode.com/paper/short-utterance-compensation-in-speaker
Repo
Framework

Deep Convolutional Neural Network for Plant Seedlings Classification


Title	Deep Convolutional Neural Network for Plant Seedlings Classification
Authors	Daniel K. Nkemelu, Daniel Omeiza, Nancy Lubalo
Abstract	Agriculture is vital for human survival and remains a major driver of several economies around the world; more so in underdeveloped and developing economies. With increasing demand for food and cash crops, due to a growing global population and the challenges posed by climate change, there is a pressing need to increase farm outputs while incurring minimal costs. Previous machine vision technologies developed for selective weeding have faced the challenge of reliable and accurate weed detection. We present approaches for plant seedlings classification with a dataset that contains 4,275 images of approximately 960 unique plants belonging to 12 species at several growth stages. We compare the performances of two traditional algorithms and a Convolutional Neural Network (CNN), a deep learning technique widely applied to image recognition, for this task. Our findings show that CNN-driven seedling classification applications when used in farming automation has the potential to optimize crop yield and improve productivity and efficiency when designed appropriately.
Tasks
Published	2018-11-20
URL	http://arxiv.org/abs/1811.08404v1
PDF	http://arxiv.org/pdf/1811.08404v1.pdf
PWC	https://paperswithcode.com/paper/deep-convolutional-neural-network-for-plant
Repo
Framework

Generative x-vectors for text-independent speaker verification


Title	Generative x-vectors for text-independent speaker verification
Authors	Longting Xu, Rohan Kumar Das, Emre Yılmaz, Jichen Yang, Haizhou Li
Abstract	Speaker verification (SV) systems using deep neural network embeddings, so-called the x-vector systems, are becoming popular due to its good performance superior to the i-vector systems. The fusion of these systems provides improved performance benefiting both from the discriminatively trained x-vectors and generative i-vectors capturing distinct speaker characteristics. In this paper, we propose a novel method to include the complementary information of i-vector and x-vector, that is called generative x-vector. The generative x-vector utilizes a transformation model learned from the i-vector and x-vector representations of the background data. Canonical correlation analysis is applied to derive this transformation model, which is later used to transform the standard x-vectors of the enrollment and test segments to the corresponding generative x-vectors. The SV experiments performed on the NIST SRE 2010 dataset demonstrate that the system using generative x-vectors provides considerably better performance than the baseline i-vector and x-vector systems. Furthermore, the generative x-vectors outperform the fusion of i-vector and x-vector systems for long-duration utterances, while yielding comparable results for short-duration utterances.
Tasks	Speaker Verification, Text-Independent Speaker Verification
Published	2018-09-17
URL	http://arxiv.org/abs/1809.06798v1
PDF	http://arxiv.org/pdf/1809.06798v1.pdf
PWC	https://paperswithcode.com/paper/generative-x-vectors-for-text-independent
Repo
Framework

Aligning Very Small Parallel Corpora Using Cross-Lingual Word Embeddings and a Monogamy Objective


Title	Aligning Very Small Parallel Corpora Using Cross-Lingual Word Embeddings and a Monogamy Objective
Authors	Nina Poerner, Masoud Jalili Sabet, Benjamin Roth, Hinrich Schütze
Abstract	Count-based word alignment methods, such as the IBM models or fast-align, struggle on very small parallel corpora. We therefore present an alternative approach based on cross-lingual word embeddings (CLWEs), which are trained on purely monolingual data. Our main contribution is an unsupervised objective to adapt CLWEs to parallel corpora. In experiments on between 25 and 500 sentences, our method outperforms fast-align. We also show that our fine-tuning objective consistently improves a CLWE-only baseline.
Tasks	Word Alignment, Word Embeddings
Published	2018-10-31
URL	http://arxiv.org/abs/1811.00066v1
PDF	http://arxiv.org/pdf/1811.00066v1.pdf
PWC	https://paperswithcode.com/paper/aligning-very-small-parallel-corpora-using
Repo
Framework

Training a Ranking Function for Open-Domain Question Answering


Title	Training a Ranking Function for Open-Domain Question Answering
Authors	Phu Mon Htut, Samuel R. Bowman, Kyunghyun Cho
Abstract	In recent years, there have been amazing advances in deep learning methods for machine reading. In machine reading, the machine reader has to extract the answer from the given ground truth paragraph. Recently, the state-of-the-art machine reading models achieve human level performance in SQuAD which is a reading comprehension-style question answering (QA) task. The success of machine reading has inspired researchers to combine information retrieval with machine reading to tackle open-domain QA. However, these systems perform poorly compared to reading comprehension-style QA because it is difficult to retrieve the pieces of paragraphs that contain the answer to the question. In this study, we propose two neural network rankers that assign scores to different passages based on their likelihood of containing the answer to a given question. Additionally, we analyze the relative importance of semantic similarity and word level relevance matching in open-domain QA.
Tasks	Information Retrieval, Open-Domain Question Answering, Question Answering, Reading Comprehension, Semantic Similarity, Semantic Textual Similarity
Published	2018-04-12
URL	http://arxiv.org/abs/1804.04264v1
PDF	http://arxiv.org/pdf/1804.04264v1.pdf
PWC	https://paperswithcode.com/paper/training-a-ranking-function-for-open-domain
Repo
Framework

Prosodic-Enhanced Siamese Convolutional Neural Networks for Cross-Device Text-Independent Speaker Verification


Title	Prosodic-Enhanced Siamese Convolutional Neural Networks for Cross-Device Text-Independent Speaker Verification
Authors	Sobhan Soleymani, Ali Dabouei, Seyed Mehdi Iranmanesh, Hadi Kazemi, Jeremy Dawson, Nasser M. Nasrabadi
Abstract	In this paper a novel cross-device text-independent speaker verification architecture is proposed. Majority of the state-of-the-art deep architectures that are used for speaker verification tasks consider Mel-frequency cepstral coefficients. In contrast, our proposed Siamese convolutional neural network architecture uses Mel-frequency spectrogram coefficients to benefit from the dependency of the adjacent spectro-temporal features. Moreover, although spectro-temporal features have proved to be highly reliable in speaker verification models, they only represent some aspects of short-term acoustic level traits of the speaker’s voice. However, the human voice consists of several linguistic levels such as acoustic, lexicon, prosody, and phonetics, that can be utilized in speaker verification models. To compensate for these inherited shortcomings in spectro-temporal features, we propose to enhance the proposed Siamese convolutional neural network architecture by deploying a multilayer perceptron network to incorporate the prosodic, jitter, and shimmer features. The proposed end-to-end verification architecture performs feature extraction and verification simultaneously. This proposed architecture displays significant improvement over classical signal processing approaches and deep algorithms for forensic cross-device speaker verification.
Tasks	Speaker Verification, Text-Independent Speaker Verification
Published	2018-07-31
URL	http://arxiv.org/abs/1808.01026v1
PDF	http://arxiv.org/pdf/1808.01026v1.pdf
PWC	https://paperswithcode.com/paper/prosodic-enhanced-siamese-convolutional
Repo
Framework

Gender Privacy: An Ensemble of Semi Adversarial Networks for Confounding Arbitrary Gender Classifiers


Title	Gender Privacy: An Ensemble of Semi Adversarial Networks for Confounding Arbitrary Gender Classifiers
Authors	Vahid Mirjalili, Sebastian Raschka, Arun Ross
Abstract	Recent research has proposed the use of Semi Adversarial Networks (SAN) for imparting privacy to face images. SANs are convolutional autoencoders that perturb face images such that the perturbed images cannot be reliably used by an attribute classifier (e.g., a gender classifier) but can still be used by a face matcher for matching purposes. However, the generalizability of SANs across multiple arbitrary gender classifiers has not been demonstrated in the literature. In this work, we tackle the generalization issue by designing an ensemble SAN model that generates a diverse set of perturbed outputs for a given input face image. This is accomplished by enforcing diversity among the individual models in the ensemble through the use of different data augmentation techniques. The goal is to ensure that at least one of the perturbed output faces will confound an arbitrary, previously unseen gender classifier. Extensive experiments using different unseen gender classifiers and face matchers are performed to demonstrate the efficacy of the proposed paradigm in imparting gender privacy to face images.
Tasks	Data Augmentation
Published	2018-07-31
URL	http://arxiv.org/abs/1807.11936v1
PDF	http://arxiv.org/pdf/1807.11936v1.pdf
PWC	https://paperswithcode.com/paper/gender-privacy-an-ensemble-of-semi
Repo
Framework

Predicting breast tumor proliferation from whole-slide images: the TUPAC16 challenge


Title	Predicting breast tumor proliferation from whole-slide images: the TUPAC16 challenge
Authors	Mitko Veta, Yujing J. Heng, Nikolas Stathonikos, Babak Ehteshami Bejnordi, Francisco Beca, Thomas Wollmann, Karl Rohr, Manan A. Shah, Dayong Wang, Mikael Rousson, Martin Hedlund, David Tellez, Francesco Ciompi, Erwan Zerhouni, David Lanyi, Matheus Viana, Vassili Kovalev, Vitali Liauchuk, Hady Ahmady Phoulady, Talha Qaiser, Simon Graham, Nasir Rajpoot, Erik Sjöblom, Jesper Molin, Kyunghyun Paeng, Sangheum Hwang, Sunggyun Park, Zhipeng Jia, Eric I-Chao Chang, Yan Xu, Andrew H. Beck, Paul J. van Diest, Josien P. W. Pluim
Abstract	Tumor proliferation is an important biomarker indicative of the prognosis of breast cancer patients. Assessment of tumor proliferation in a clinical setting is highly subjective and labor-intensive task. Previous efforts to automate tumor proliferation assessment by image analysis only focused on mitosis detection in predefined tumor regions. However, in a real-world scenario, automatic mitosis detection should be performed in whole-slide images (WSIs) and an automatic method should be able to produce a tumor proliferation score given a WSI as input. To address this, we organized the TUmor Proliferation Assessment Challenge 2016 (TUPAC16) on prediction of tumor proliferation scores from WSIs. The challenge dataset consisted of 500 training and 321 testing breast cancer histopathology WSIs. In order to ensure fair and independent evaluation, only the ground truth for the training dataset was provided to the challenge participants. The first task of the challenge was to predict mitotic scores, i.e., to reproduce the manual method of assessing tumor proliferation by a pathologist. The second task was to predict the gene expression based PAM50 proliferation scores from the WSI. The best performing automatic method for the first task achieved a quadratic-weighted Cohen’s kappa score of $\kappa$ = 0.567, 95% CI [0.464, 0.671] between the predicted scores and the ground truth. For the second task, the predictions of the top method had a Spearman’s correlation coefficient of r = 0.617, 95% CI [0.581 0.651] with the ground truth. This was the first study that investigated tumor proliferation assessment from WSIs. The achieved results are promising given the difficulty of the tasks and weakly-labelled nature of the ground truth. However, further research is needed to improve the practical utility of image analysis methods for this task.
Tasks	Mitosis Detection
Published	2018-07-22
URL	http://arxiv.org/abs/1807.08284v2
PDF	http://arxiv.org/pdf/1807.08284v2.pdf
PWC	https://paperswithcode.com/paper/predicting-breast-tumor-proliferation-from
Repo
Framework

Text-Independent Speaker Verification Using Long Short-Term Memory Networks


Title	Text-Independent Speaker Verification Using Long Short-Term Memory Networks
Authors	Aryan Mobiny, Mohammad Najarian
Abstract	In this paper, an architecture based on Long Short-Term Memory Networks has been proposed for the text-independent scenario which is aimed to capture the temporal speaker-related information by operating over traditional speech features. For speaker verification, at first, a background model must be created for speaker representation. Then, in enrollment stage, the speaker models will be created based on the enrollment utterances. For this work, the model will be trained in an end-to-end fashion to combine the first two stages. The main goal of end-to-end training is the model being optimized to be consistent with the speaker verification protocol. The end- to-end training jointly learns the background and speaker models by creating the representation space. The LSTM architecture is trained to create a discrimination space for validating the match and non-match pairs for speaker verification. The proposed architecture demonstrate its superiority in the text-independent compared to other traditional methods.
Tasks	Speaker Verification, Text-Independent Speaker Verification
Published	2018-05-02
URL	http://arxiv.org/abs/1805.00604v3
PDF	http://arxiv.org/pdf/1805.00604v3.pdf
PWC	https://paperswithcode.com/paper/text-independent-speaker-verification-using-1
Repo
Framework

I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification


Title	I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification
Authors	Jiacen Zhang, Nakamasa Inoue, Koichi Shinoda
Abstract	I-vector based text-independent speaker verification (SV) systems often have poor performance with short utterances, as the biased phonetic distribution in a short utterance makes the extracted i-vector unreliable. This paper proposes an i-vector compensation method using a generative adversarial network (GAN), where its generator network is trained to generate a compensated i-vector from a short-utterance i-vector and its discriminator network is trained to determine whether an i-vector is generated by the generator or the one extracted from a long utterance. Additionally, we assign two other learning tasks to the GAN to stabilize its training and to make the generated ivector more speaker-specific. Speaker verification experiments on the NIST SRE 2008 “10sec-10sec” condition show that our method reduced the equal error rate by 11.3% from the conventional i-vector and PLDA system.
Tasks	Speaker Verification, Text-Independent Speaker Verification
Published	2018-04-01
URL	http://arxiv.org/abs/1804.00290v1
PDF	http://arxiv.org/pdf/1804.00290v1.pdf
PWC	https://paperswithcode.com/paper/i-vector-transformation-using-conditional
Repo
Framework

Gradient Hyperalignment for multi-subject fMRI data alignment


Title	Gradient Hyperalignment for multi-subject fMRI data alignment
Authors	Tonglin Xu, Muhammad Yousefnezhad, Daoqiang Zhang
Abstract	Multi-subject fMRI data analysis is an interesting and challenging problem in human brain decoding studies. The inherent anatomical and functional variability across subjects make it necessary to do both anatomical and functional alignment before classification analysis. Besides, when it comes to big data, time complexity becomes a problem that cannot be ignored. This paper proposes Gradient Hyperalignment (Gradient-HA) as a gradient-based functional alignment method that is suitable for multi-subject fMRI datasets with large amounts of samples and voxels. The advantage of Gradient-HA is that it can solve independence and high dimension problems by using Independent Component Analysis (ICA) and Stochastic Gradient Ascent (SGA). Validation using multi-classification tasks on big data demonstrates that Gradient-HA method has less time complexity and better or comparable performance compared with other state-of-the-art functional alignment methods.
Tasks	Brain Decoding, Multi-Subject Fmri Data Alignment
Published	2018-07-07
URL	http://arxiv.org/abs/1807.02612v1
PDF	http://arxiv.org/pdf/1807.02612v1.pdf
PWC	https://paperswithcode.com/paper/gradient-hyperalignment-for-multi-subject
Repo
Framework

Learning to Interpret Satellite Images Using Wikipedia


Title	Learning to Interpret Satellite Images Using Wikipedia
Authors	Evan Sheehan, Burak Uzkent, Chenlin Meng, Zhongyi Tang, Marshall Burke, David Lobell, Stefano Ermon
Abstract	Despite recent progress in computer vision, fine-grained interpretation of satellite images remains challenging because of a lack of labeled training data. To overcome this limitation, we propose using Wikipedia as a previously untapped source of rich, georeferenced textual information with global coverage. We construct a novel large-scale, multi-modal dataset by pairing geo-referenced Wikipedia articles with satellite imagery of their corresponding locations. To prove the efficacy of this dataset, we focus on the African continent and train a deep network to classify images based on labels extracted from articles. We then fine-tune the model on a human annotated dataset and demonstrate that this weak form of supervision can drastically reduce the quantity of human annotated labels and time required for downstream tasks.
Tasks
Published	2018-09-19
URL	http://arxiv.org/abs/1809.10236v1
PDF	http://arxiv.org/pdf/1809.10236v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-interpret-satellite-images-using
Repo
Framework

Temporal Difference Models: Model-Free Deep RL for Model-Based Control


Title	Temporal Difference Models: Model-Free Deep RL for Model-Based Control
Authors	Vitchyr Pong, Shixiang Gu, Murtaza Dalal, Sergey Levine
Abstract	Model-free reinforcement learning (RL) is a powerful, general tool for learning complex behaviors. However, its sample efficiency is often impractically large for solving challenging real-world problems, even with off-policy algorithms such as Q-learning. A limiting factor in classic model-free RL is that the learning signal consists only of scalar rewards, ignoring much of the rich information contained in state transition tuples. Model-based RL uses this information, by training a predictive model, but often does not achieve the same asymptotic performance as model-free RL due to model bias. We introduce temporal difference models (TDMs), a family of goal-conditioned value functions that can be trained with model-free learning and used for model-based control. TDMs combine the benefits of model-free and model-based RL: they leverage the rich information in state transitions to learn very efficiently, while still attaining asymptotic performance that exceeds that of direct model-based RL methods. Our experimental results show that, on a range of continuous control tasks, TDMs provide a substantial improvement in efficiency compared to state-of-the-art model-based and model-free methods.
Tasks	Continuous Control, Q-Learning
Published	2018-02-25
URL	https://arxiv.org/abs/1802.09081v2
PDF	https://arxiv.org/pdf/1802.09081v2.pdf
PWC	https://paperswithcode.com/paper/temporal-difference-models-model-free-deep-rl
Repo
Framework

Combining Support Vector Machine and Elephant Herding Optimization for Cardiac Arrhythmias


Title	Combining Support Vector Machine and Elephant Herding Optimization for Cardiac Arrhythmias
Authors	Aboul Ella Hassanien, Moataz Kilany, Essam H. Houssein
Abstract	Many people are currently suffering from heart diseases that can lead to untimely death. The most common heart abnormality is arrhythmia, which is simply irregular beating of the heart. A prediction system for the early intervention and prevention of heart diseases, including cardiovascular diseases (CDVs) and arrhythmia, is important. This paper introduces the classification of electrocardiogram (ECG) heartbeats into normal or abnormal. The approach is based on the combination of swarm optimization algorithms with a modified PannTompkins algorithm (MPTA) and support vector machines (SVMs). The MPTA was implemented to remove ECG noise, followed by the application of the extended features extraction algorithm (EFEA) for ECG feature extraction. Then, elephant herding optimization (EHO) was used to find a subset of ECG features from a larger feature pool that provided better classification performance than that achieved using the whole set. Finally, SVMs were used for classification. The results show that the EHOSVM approach achieved good classification results in terms of five statistical indices: accuracy, 93.31%; sensitivity, 45.49%; precision, 46.45%; F-measure, 45.48%; and specificity, 45.48%. Furthermore, the results demonstrate a clear improvement in accuracy compared to that of other methods when applied to the MITBIH arrhythmia database.
Tasks
Published	2018-06-20
URL	http://arxiv.org/abs/1806.08242v1
PDF	http://arxiv.org/pdf/1806.08242v1.pdf
PWC	https://paperswithcode.com/paper/combining-support-vector-machine-and-elephant
Repo
Framework