Paper Group ANR 549
Selective Zero-Shot Classification with Augmented Attributes. Short utterance compensation in speaker verification via cosine-based teacher-student learning of speaker embeddings. Deep Convolutional Neural Network for Plant Seedlings Classification. Generative x-vectors for text-independent speaker verification. Aligning Very Small Parallel Corpora …
Selective Zero-Shot Classification with Augmented Attributes
Title | Selective Zero-Shot Classification with Augmented Attributes |
Authors | Jie Song, Chengchao Shen, Jie Lei, An-Xiang Zeng, Kairi Ou, Dacheng Tao, Mingli Song |
Abstract | In this paper, we introduce a selective zero-shot classification problem: how can the classifier avoid making dubious predictions? Existing attribute-based zero-shot classification methods are shown to work poorly in the selective classification scenario. We argue the under-complete human defined attribute vocabulary accounts for the poor performance. We propose a selective zero-shot classifier based on both the human defined and the automatically discovered residual attributes. The proposed classifier is constructed by firstly learning the defined and the residual attributes jointly. Then the predictions are conducted within the subspace of the defined attributes. Finally, the prediction confidence is measured by both the defined and the residual attributes. Experiments conducted on several benchmarks demonstrate that our classifier produces a superior performance to other methods under the risk-coverage trade-off metric. |
Tasks | Zero-Shot Learning |
Published | 2018-07-19 |
URL | http://arxiv.org/abs/1807.07437v1 |
http://arxiv.org/pdf/1807.07437v1.pdf | |
PWC | https://paperswithcode.com/paper/selective-zero-shot-classification-with |
Repo | |
Framework | |
Short utterance compensation in speaker verification via cosine-based teacher-student learning of speaker embeddings
Title | Short utterance compensation in speaker verification via cosine-based teacher-student learning of speaker embeddings |
Authors | Jee-weon Jung, Hee-soo Heo, Hye-jin Shim, Ha-jin Yu |
Abstract | The short duration of an input utterance is one of the most critical threats that degrade the performance of speaker verification systems. This study aimed to develop an integrated text-independent speaker verification system that inputs utterances with short duration of 2 seconds or less. We propose an approach using a teacher-student learning framework for this goal, applied to short utterance compensation for the first time in our knowledge. The core concept of the proposed system is to conduct the compensation throughout the network that extracts the speaker embedding, mainly in phonetic-level, rather than compensating via a separate system after extracting the speaker embedding. In the proposed architecture, phonetic-level features where each feature represents a segment of 130 ms are extracted using convolutional layers. A layer of gated recurrent units extracts an utterance-level feature using phonetic-level features. The proposed approach also adopts a new objective function for teacher-student learning that considers both Kullback-Leibler divergence of output layers and cosine distance of speaker embeddings layers. Experiments were conducted using deep neural networks that take raw waveforms as input, and output speaker embeddings on VoxCeleb1 dataset. The proposed model could compensate approximately 65 % of the performance degradation due to the shortened duration. |
Tasks | Speaker Verification, Text-Independent Speaker Verification |
Published | 2018-10-25 |
URL | http://arxiv.org/abs/1810.10884v2 |
http://arxiv.org/pdf/1810.10884v2.pdf | |
PWC | https://paperswithcode.com/paper/short-utterance-compensation-in-speaker |
Repo | |
Framework | |
Deep Convolutional Neural Network for Plant Seedlings Classification
Title | Deep Convolutional Neural Network for Plant Seedlings Classification |
Authors | Daniel K. Nkemelu, Daniel Omeiza, Nancy Lubalo |
Abstract | Agriculture is vital for human survival and remains a major driver of several economies around the world; more so in underdeveloped and developing economies. With increasing demand for food and cash crops, due to a growing global population and the challenges posed by climate change, there is a pressing need to increase farm outputs while incurring minimal costs. Previous machine vision technologies developed for selective weeding have faced the challenge of reliable and accurate weed detection. We present approaches for plant seedlings classification with a dataset that contains 4,275 images of approximately 960 unique plants belonging to 12 species at several growth stages. We compare the performances of two traditional algorithms and a Convolutional Neural Network (CNN), a deep learning technique widely applied to image recognition, for this task. Our findings show that CNN-driven seedling classification applications when used in farming automation has the potential to optimize crop yield and improve productivity and efficiency when designed appropriately. |
Tasks | |
Published | 2018-11-20 |
URL | http://arxiv.org/abs/1811.08404v1 |
http://arxiv.org/pdf/1811.08404v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-convolutional-neural-network-for-plant |
Repo | |
Framework | |
Generative x-vectors for text-independent speaker verification
Title | Generative x-vectors for text-independent speaker verification |
Authors | Longting Xu, Rohan Kumar Das, Emre Yılmaz, Jichen Yang, Haizhou Li |
Abstract | Speaker verification (SV) systems using deep neural network embeddings, so-called the x-vector systems, are becoming popular due to its good performance superior to the i-vector systems. The fusion of these systems provides improved performance benefiting both from the discriminatively trained x-vectors and generative i-vectors capturing distinct speaker characteristics. In this paper, we propose a novel method to include the complementary information of i-vector and x-vector, that is called generative x-vector. The generative x-vector utilizes a transformation model learned from the i-vector and x-vector representations of the background data. Canonical correlation analysis is applied to derive this transformation model, which is later used to transform the standard x-vectors of the enrollment and test segments to the corresponding generative x-vectors. The SV experiments performed on the NIST SRE 2010 dataset demonstrate that the system using generative x-vectors provides considerably better performance than the baseline i-vector and x-vector systems. Furthermore, the generative x-vectors outperform the fusion of i-vector and x-vector systems for long-duration utterances, while yielding comparable results for short-duration utterances. |
Tasks | Speaker Verification, Text-Independent Speaker Verification |
Published | 2018-09-17 |
URL | http://arxiv.org/abs/1809.06798v1 |
http://arxiv.org/pdf/1809.06798v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-x-vectors-for-text-independent |
Repo | |
Framework | |
Aligning Very Small Parallel Corpora Using Cross-Lingual Word Embeddings and a Monogamy Objective
Title | Aligning Very Small Parallel Corpora Using Cross-Lingual Word Embeddings and a Monogamy Objective |
Authors | Nina Poerner, Masoud Jalili Sabet, Benjamin Roth, Hinrich Schütze |
Abstract | Count-based word alignment methods, such as the IBM models or fast-align, struggle on very small parallel corpora. We therefore present an alternative approach based on cross-lingual word embeddings (CLWEs), which are trained on purely monolingual data. Our main contribution is an unsupervised objective to adapt CLWEs to parallel corpora. In experiments on between 25 and 500 sentences, our method outperforms fast-align. We also show that our fine-tuning objective consistently improves a CLWE-only baseline. |
Tasks | Word Alignment, Word Embeddings |
Published | 2018-10-31 |
URL | http://arxiv.org/abs/1811.00066v1 |
http://arxiv.org/pdf/1811.00066v1.pdf | |
PWC | https://paperswithcode.com/paper/aligning-very-small-parallel-corpora-using |
Repo | |
Framework | |
Training a Ranking Function for Open-Domain Question Answering
Title | Training a Ranking Function for Open-Domain Question Answering |
Authors | Phu Mon Htut, Samuel R. Bowman, Kyunghyun Cho |
Abstract | In recent years, there have been amazing advances in deep learning methods for machine reading. In machine reading, the machine reader has to extract the answer from the given ground truth paragraph. Recently, the state-of-the-art machine reading models achieve human level performance in SQuAD which is a reading comprehension-style question answering (QA) task. The success of machine reading has inspired researchers to combine information retrieval with machine reading to tackle open-domain QA. However, these systems perform poorly compared to reading comprehension-style QA because it is difficult to retrieve the pieces of paragraphs that contain the answer to the question. In this study, we propose two neural network rankers that assign scores to different passages based on their likelihood of containing the answer to a given question. Additionally, we analyze the relative importance of semantic similarity and word level relevance matching in open-domain QA. |
Tasks | Information Retrieval, Open-Domain Question Answering, Question Answering, Reading Comprehension, Semantic Similarity, Semantic Textual Similarity |
Published | 2018-04-12 |
URL | http://arxiv.org/abs/1804.04264v1 |
http://arxiv.org/pdf/1804.04264v1.pdf | |
PWC | https://paperswithcode.com/paper/training-a-ranking-function-for-open-domain |
Repo | |
Framework | |
Prosodic-Enhanced Siamese Convolutional Neural Networks for Cross-Device Text-Independent Speaker Verification
Title | Prosodic-Enhanced Siamese Convolutional Neural Networks for Cross-Device Text-Independent Speaker Verification |
Authors | Sobhan Soleymani, Ali Dabouei, Seyed Mehdi Iranmanesh, Hadi Kazemi, Jeremy Dawson, Nasser M. Nasrabadi |
Abstract | In this paper a novel cross-device text-independent speaker verification architecture is proposed. Majority of the state-of-the-art deep architectures that are used for speaker verification tasks consider Mel-frequency cepstral coefficients. In contrast, our proposed Siamese convolutional neural network architecture uses Mel-frequency spectrogram coefficients to benefit from the dependency of the adjacent spectro-temporal features. Moreover, although spectro-temporal features have proved to be highly reliable in speaker verification models, they only represent some aspects of short-term acoustic level traits of the speaker’s voice. However, the human voice consists of several linguistic levels such as acoustic, lexicon, prosody, and phonetics, that can be utilized in speaker verification models. To compensate for these inherited shortcomings in spectro-temporal features, we propose to enhance the proposed Siamese convolutional neural network architecture by deploying a multilayer perceptron network to incorporate the prosodic, jitter, and shimmer features. The proposed end-to-end verification architecture performs feature extraction and verification simultaneously. This proposed architecture displays significant improvement over classical signal processing approaches and deep algorithms for forensic cross-device speaker verification. |
Tasks | Speaker Verification, Text-Independent Speaker Verification |
Published | 2018-07-31 |
URL | http://arxiv.org/abs/1808.01026v1 |
http://arxiv.org/pdf/1808.01026v1.pdf | |
PWC | https://paperswithcode.com/paper/prosodic-enhanced-siamese-convolutional |
Repo | |
Framework | |
Gender Privacy: An Ensemble of Semi Adversarial Networks for Confounding Arbitrary Gender Classifiers
Title | Gender Privacy: An Ensemble of Semi Adversarial Networks for Confounding Arbitrary Gender Classifiers |
Authors | Vahid Mirjalili, Sebastian Raschka, Arun Ross |
Abstract | Recent research has proposed the use of Semi Adversarial Networks (SAN) for imparting privacy to face images. SANs are convolutional autoencoders that perturb face images such that the perturbed images cannot be reliably used by an attribute classifier (e.g., a gender classifier) but can still be used by a face matcher for matching purposes. However, the generalizability of SANs across multiple arbitrary gender classifiers has not been demonstrated in the literature. In this work, we tackle the generalization issue by designing an ensemble SAN model that generates a diverse set of perturbed outputs for a given input face image. This is accomplished by enforcing diversity among the individual models in the ensemble through the use of different data augmentation techniques. The goal is to ensure that at least one of the perturbed output faces will confound an arbitrary, previously unseen gender classifier. Extensive experiments using different unseen gender classifiers and face matchers are performed to demonstrate the efficacy of the proposed paradigm in imparting gender privacy to face images. |
Tasks | Data Augmentation |
Published | 2018-07-31 |
URL | http://arxiv.org/abs/1807.11936v1 |
http://arxiv.org/pdf/1807.11936v1.pdf | |
PWC | https://paperswithcode.com/paper/gender-privacy-an-ensemble-of-semi |
Repo | |
Framework | |
Predicting breast tumor proliferation from whole-slide images: the TUPAC16 challenge
Title | Predicting breast tumor proliferation from whole-slide images: the TUPAC16 challenge |
Authors | Mitko Veta, Yujing J. Heng, Nikolas Stathonikos, Babak Ehteshami Bejnordi, Francisco Beca, Thomas Wollmann, Karl Rohr, Manan A. Shah, Dayong Wang, Mikael Rousson, Martin Hedlund, David Tellez, Francesco Ciompi, Erwan Zerhouni, David Lanyi, Matheus Viana, Vassili Kovalev, Vitali Liauchuk, Hady Ahmady Phoulady, Talha Qaiser, Simon Graham, Nasir Rajpoot, Erik Sjöblom, Jesper Molin, Kyunghyun Paeng, Sangheum Hwang, Sunggyun Park, Zhipeng Jia, Eric I-Chao Chang, Yan Xu, Andrew H. Beck, Paul J. van Diest, Josien P. W. Pluim |
Abstract | Tumor proliferation is an important biomarker indicative of the prognosis of breast cancer patients. Assessment of tumor proliferation in a clinical setting is highly subjective and labor-intensive task. Previous efforts to automate tumor proliferation assessment by image analysis only focused on mitosis detection in predefined tumor regions. However, in a real-world scenario, automatic mitosis detection should be performed in whole-slide images (WSIs) and an automatic method should be able to produce a tumor proliferation score given a WSI as input. To address this, we organized the TUmor Proliferation Assessment Challenge 2016 (TUPAC16) on prediction of tumor proliferation scores from WSIs. The challenge dataset consisted of 500 training and 321 testing breast cancer histopathology WSIs. In order to ensure fair and independent evaluation, only the ground truth for the training dataset was provided to the challenge participants. The first task of the challenge was to predict mitotic scores, i.e., to reproduce the manual method of assessing tumor proliferation by a pathologist. The second task was to predict the gene expression based PAM50 proliferation scores from the WSI. The best performing automatic method for the first task achieved a quadratic-weighted Cohen’s kappa score of $\kappa$ = 0.567, 95% CI [0.464, 0.671] between the predicted scores and the ground truth. For the second task, the predictions of the top method had a Spearman’s correlation coefficient of r = 0.617, 95% CI [0.581 0.651] with the ground truth. This was the first study that investigated tumor proliferation assessment from WSIs. The achieved results are promising given the difficulty of the tasks and weakly-labelled nature of the ground truth. However, further research is needed to improve the practical utility of image analysis methods for this task. |
Tasks | Mitosis Detection |
Published | 2018-07-22 |
URL | http://arxiv.org/abs/1807.08284v2 |
http://arxiv.org/pdf/1807.08284v2.pdf | |
PWC | https://paperswithcode.com/paper/predicting-breast-tumor-proliferation-from |
Repo | |
Framework | |
Text-Independent Speaker Verification Using Long Short-Term Memory Networks
Title | Text-Independent Speaker Verification Using Long Short-Term Memory Networks |
Authors | Aryan Mobiny, Mohammad Najarian |
Abstract | In this paper, an architecture based on Long Short-Term Memory Networks has been proposed for the text-independent scenario which is aimed to capture the temporal speaker-related information by operating over traditional speech features. For speaker verification, at first, a background model must be created for speaker representation. Then, in enrollment stage, the speaker models will be created based on the enrollment utterances. For this work, the model will be trained in an end-to-end fashion to combine the first two stages. The main goal of end-to-end training is the model being optimized to be consistent with the speaker verification protocol. The end- to-end training jointly learns the background and speaker models by creating the representation space. The LSTM architecture is trained to create a discrimination space for validating the match and non-match pairs for speaker verification. The proposed architecture demonstrate its superiority in the text-independent compared to other traditional methods. |
Tasks | Speaker Verification, Text-Independent Speaker Verification |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.00604v3 |
http://arxiv.org/pdf/1805.00604v3.pdf | |
PWC | https://paperswithcode.com/paper/text-independent-speaker-verification-using-1 |
Repo | |
Framework | |
I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification
Title | I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification |
Authors | Jiacen Zhang, Nakamasa Inoue, Koichi Shinoda |
Abstract | I-vector based text-independent speaker verification (SV) systems often have poor performance with short utterances, as the biased phonetic distribution in a short utterance makes the extracted i-vector unreliable. This paper proposes an i-vector compensation method using a generative adversarial network (GAN), where its generator network is trained to generate a compensated i-vector from a short-utterance i-vector and its discriminator network is trained to determine whether an i-vector is generated by the generator or the one extracted from a long utterance. Additionally, we assign two other learning tasks to the GAN to stabilize its training and to make the generated ivector more speaker-specific. Speaker verification experiments on the NIST SRE 2008 “10sec-10sec” condition show that our method reduced the equal error rate by 11.3% from the conventional i-vector and PLDA system. |
Tasks | Speaker Verification, Text-Independent Speaker Verification |
Published | 2018-04-01 |
URL | http://arxiv.org/abs/1804.00290v1 |
http://arxiv.org/pdf/1804.00290v1.pdf | |
PWC | https://paperswithcode.com/paper/i-vector-transformation-using-conditional |
Repo | |
Framework | |
Gradient Hyperalignment for multi-subject fMRI data alignment
Title | Gradient Hyperalignment for multi-subject fMRI data alignment |
Authors | Tonglin Xu, Muhammad Yousefnezhad, Daoqiang Zhang |
Abstract | Multi-subject fMRI data analysis is an interesting and challenging problem in human brain decoding studies. The inherent anatomical and functional variability across subjects make it necessary to do both anatomical and functional alignment before classification analysis. Besides, when it comes to big data, time complexity becomes a problem that cannot be ignored. This paper proposes Gradient Hyperalignment (Gradient-HA) as a gradient-based functional alignment method that is suitable for multi-subject fMRI datasets with large amounts of samples and voxels. The advantage of Gradient-HA is that it can solve independence and high dimension problems by using Independent Component Analysis (ICA) and Stochastic Gradient Ascent (SGA). Validation using multi-classification tasks on big data demonstrates that Gradient-HA method has less time complexity and better or comparable performance compared with other state-of-the-art functional alignment methods. |
Tasks | Brain Decoding, Multi-Subject Fmri Data Alignment |
Published | 2018-07-07 |
URL | http://arxiv.org/abs/1807.02612v1 |
http://arxiv.org/pdf/1807.02612v1.pdf | |
PWC | https://paperswithcode.com/paper/gradient-hyperalignment-for-multi-subject |
Repo | |
Framework | |
Learning to Interpret Satellite Images Using Wikipedia
Title | Learning to Interpret Satellite Images Using Wikipedia |
Authors | Evan Sheehan, Burak Uzkent, Chenlin Meng, Zhongyi Tang, Marshall Burke, David Lobell, Stefano Ermon |
Abstract | Despite recent progress in computer vision, fine-grained interpretation of satellite images remains challenging because of a lack of labeled training data. To overcome this limitation, we propose using Wikipedia as a previously untapped source of rich, georeferenced textual information with global coverage. We construct a novel large-scale, multi-modal dataset by pairing geo-referenced Wikipedia articles with satellite imagery of their corresponding locations. To prove the efficacy of this dataset, we focus on the African continent and train a deep network to classify images based on labels extracted from articles. We then fine-tune the model on a human annotated dataset and demonstrate that this weak form of supervision can drastically reduce the quantity of human annotated labels and time required for downstream tasks. |
Tasks | |
Published | 2018-09-19 |
URL | http://arxiv.org/abs/1809.10236v1 |
http://arxiv.org/pdf/1809.10236v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-interpret-satellite-images-using |
Repo | |
Framework | |
Temporal Difference Models: Model-Free Deep RL for Model-Based Control
Title | Temporal Difference Models: Model-Free Deep RL for Model-Based Control |
Authors | Vitchyr Pong, Shixiang Gu, Murtaza Dalal, Sergey Levine |
Abstract | Model-free reinforcement learning (RL) is a powerful, general tool for learning complex behaviors. However, its sample efficiency is often impractically large for solving challenging real-world problems, even with off-policy algorithms such as Q-learning. A limiting factor in classic model-free RL is that the learning signal consists only of scalar rewards, ignoring much of the rich information contained in state transition tuples. Model-based RL uses this information, by training a predictive model, but often does not achieve the same asymptotic performance as model-free RL due to model bias. We introduce temporal difference models (TDMs), a family of goal-conditioned value functions that can be trained with model-free learning and used for model-based control. TDMs combine the benefits of model-free and model-based RL: they leverage the rich information in state transitions to learn very efficiently, while still attaining asymptotic performance that exceeds that of direct model-based RL methods. Our experimental results show that, on a range of continuous control tasks, TDMs provide a substantial improvement in efficiency compared to state-of-the-art model-based and model-free methods. |
Tasks | Continuous Control, Q-Learning |
Published | 2018-02-25 |
URL | https://arxiv.org/abs/1802.09081v2 |
https://arxiv.org/pdf/1802.09081v2.pdf | |
PWC | https://paperswithcode.com/paper/temporal-difference-models-model-free-deep-rl |
Repo | |
Framework | |
Combining Support Vector Machine and Elephant Herding Optimization for Cardiac Arrhythmias
Title | Combining Support Vector Machine and Elephant Herding Optimization for Cardiac Arrhythmias |
Authors | Aboul Ella Hassanien, Moataz Kilany, Essam H. Houssein |
Abstract | Many people are currently suffering from heart diseases that can lead to untimely death. The most common heart abnormality is arrhythmia, which is simply irregular beating of the heart. A prediction system for the early intervention and prevention of heart diseases, including cardiovascular diseases (CDVs) and arrhythmia, is important. This paper introduces the classification of electrocardiogram (ECG) heartbeats into normal or abnormal. The approach is based on the combination of swarm optimization algorithms with a modified PannTompkins algorithm (MPTA) and support vector machines (SVMs). The MPTA was implemented to remove ECG noise, followed by the application of the extended features extraction algorithm (EFEA) for ECG feature extraction. Then, elephant herding optimization (EHO) was used to find a subset of ECG features from a larger feature pool that provided better classification performance than that achieved using the whole set. Finally, SVMs were used for classification. The results show that the EHOSVM approach achieved good classification results in terms of five statistical indices: accuracy, 93.31%; sensitivity, 45.49%; precision, 46.45%; F-measure, 45.48%; and specificity, 45.48%. Furthermore, the results demonstrate a clear improvement in accuracy compared to that of other methods when applied to the MITBIH arrhythmia database. |
Tasks | |
Published | 2018-06-20 |
URL | http://arxiv.org/abs/1806.08242v1 |
http://arxiv.org/pdf/1806.08242v1.pdf | |
PWC | https://paperswithcode.com/paper/combining-support-vector-machine-and-elephant |
Repo | |
Framework | |