Paper Group ANR 997
Normalization Before Shaking Toward Learning Symmetrically Distributed Representation Without Margin in Speech Emotion Recognition. Real-time clustering and multi-target tracking using event-based sensors. Humans can decipher adversarial images. Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition. Arc …
Normalization Before Shaking Toward Learning Symmetrically Distributed Representation Without Margin in Speech Emotion Recognition
Title | Normalization Before Shaking Toward Learning Symmetrically Distributed Representation Without Margin in Speech Emotion Recognition |
Authors | Che-Wei Huang, Shrikanth S. Narayanan |
Abstract | Regularization is crucial to the success of many practical deep learning models, in particular in a more often than not scenario where there are only a few to a moderate number of accessible training samples. In addition to weight decay, data augmentation and dropout, regularization based on multi-branch architectures, such as Shake-Shake regularization, has been proven successful in many applications and attracted more and more attention. However, beyond model-based representation augmentation, it is unclear how Shake-Shake regularization helps to provide further improvement on classification tasks, let alone the baffling interaction between batch normalization and shaking. In this work, we present our investigation on Shake-Shake regularization, drawing connections to the vicinal risk minimization principle and discriminative feature learning in verification tasks. Furthermore, we identify a strong resemblance between batch normalized residual blocks and batch normalized recurrent neural networks, where both of them share a similar convergence behavior, which could be mitigated by a proper initialization of batch normalization. Based on the findings, our experiments on speech emotion recognition demonstrate simultaneously an improvement on the classification accuracy and a reduction on the generalization gap both with statistical significance. |
Tasks | Data Augmentation, Emotion Recognition, Speech Emotion Recognition |
Published | 2018-08-02 |
URL | http://arxiv.org/abs/1808.00876v2 |
http://arxiv.org/pdf/1808.00876v2.pdf | |
PWC | https://paperswithcode.com/paper/normalization-before-shaking-toward-learning |
Repo | |
Framework | |
Real-time clustering and multi-target tracking using event-based sensors
Title | Real-time clustering and multi-target tracking using event-based sensors |
Authors | Francisco Barranco, Cornelia Fermuller, Eduardo Ros |
Abstract | Clustering is crucial for many computer vision applications such as robust tracking, object detection and segmentation. This work presents a real-time clustering technique that takes advantage of the unique properties of event-based vision sensors. Since event-based sensors trigger events only when the intensity changes, the data is sparse, with low redundancy. Thus, our approach redefines the well-known mean-shift clustering method using asynchronous events instead of conventional frames. The potential of our approach is demonstrated in a multi-target tracking application using Kalman filters to smooth the trajectories. We evaluated our method on an existing dataset with patterns of different shapes and speeds, and a new dataset that we collected. The sensor was attached to the Baxter robot in an eye-in-hand setup monitoring real-world objects in an action manipulation task. Clustering accuracy achieved an F-measure of 0.95, reducing the computational cost by 88% compared to the frame-based method. The average error for tracking was 2.5 pixels and the clustering achieved a consistent number of clusters along time. |
Tasks | Event-based vision, Object Detection |
Published | 2018-07-08 |
URL | http://arxiv.org/abs/1807.02851v1 |
http://arxiv.org/pdf/1807.02851v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-clustering-and-multi-target |
Repo | |
Framework | |
Humans can decipher adversarial images
Title | Humans can decipher adversarial images |
Authors | Zhenglong Zhou, Chaz Firestone |
Abstract | How similar is the human mind to the sophisticated machine-learning systems that mirror its performance? Models of object categorization based on convolutional neural networks (CNNs) have achieved human-level benchmarks in assigning known labels to novel images. These advances promise to support transformative technologies such as autonomous vehicles and machine diagnosis; beyond this, they also serve as candidate models for the visual system itself – not only in their output but perhaps even in their underlying mechanisms and principles. However, unlike human vision, CNNs can be “fooled” by adversarial examples – carefully crafted images that appear as nonsense patterns to humans but are recognized as familiar objects by machines, or that appear as one object to humans and a different object to machines. This seemingly extreme divergence between human and machine classification challenges the promise of these new advances, both as applied image-recognition systems and also as models of the human mind. Surprisingly, however, little work has empirically investigated human classification of such adversarial stimuli: Does human and machine performance fundamentally diverge? Or could humans decipher such images and predict the machine’s preferred labels? Here, we show that human and machine classification of adversarial stimuli are robustly related: In eight experiments on five prominent and diverse adversarial imagesets, human subjects reliably identified the machine’s chosen label over relevant foils. This pattern persisted for images with strong antecedent identities, and even for images described as “totally unrecognizable to human eyes”. We suggest that human intuition may be a more reliable guide to machine (mis)classification than has typically been imagined, and we explore the consequences of this result for minds and machines alike. |
Tasks | Autonomous Vehicles |
Published | 2018-09-11 |
URL | http://arxiv.org/abs/1809.04120v3 |
http://arxiv.org/pdf/1809.04120v3.pdf | |
PWC | https://paperswithcode.com/paper/humans-can-decipher-adversarial-images |
Repo | |
Framework | |
Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition
Title | Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition |
Authors | Pengcheng Guo, Haihua Xu, Lei Xie, Eng Siong Chng |
Abstract | In this paper, we present our overall efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian Mandarin-English (SEAME) data. We first investigate semi-supervised lexicon learning approach to adapt the canonical lexicon, which is meant to alleviate the heavily accented pronunciation issue within the code-switching conversation of the local area. As a result, the learned lexicon yields improved performance. Furthermore, we attempt to use semi-supervised training to deal with those transcriptions that are highly mismatched between human transcribers and ASR system. Specifically, we conduct semi-supervised training assuming those poorly transcribed data as unsupervised data. We found the semi-supervised acoustic modeling can lead to improved results. Finally, to make up for the limitation of the conventional n-gram language models due to data sparsity issue, we perform lattice rescoring using neural network language models, and significant WER reduction is obtained. |
Tasks | Speech Recognition |
Published | 2018-06-16 |
URL | http://arxiv.org/abs/1806.06200v1 |
http://arxiv.org/pdf/1806.06200v1.pdf | |
PWC | https://paperswithcode.com/paper/study-of-semi-supervised-approaches-to |
Repo | |
Framework | |
Architecture Based Classification of Leaf Images
Title | Architecture Based Classification of Leaf Images |
Authors | Mahmoud Sadeghi, Ali Zakerolhosseini, Ali Sonboli |
Abstract | Plant classification and identification has so far been an important and difficult task. In this paper, an efficient and systematic approach for extracting the leaf architecture characters from captured digital images is proposed. The input image is first pre-processed in five steps to be prepared for feature extraction. In the second stage, methods for extracting different architectural features are studied using various mathematical and computational methods. Also, the classification rules for mapping the calculated values of each feature to semantic botanical terms in proposed. Compared with previous studies, the proposed method combines extracted features of an image with specific knowledge of leaf architecture in the domain of botany to provide a comprehensive framework for both computer engineers and botanist. Finally, Based on the proposed method, experiments on the classification of the ImagerCLEF 2012 dataset has been performed with promising results. |
Tasks | |
Published | 2018-01-07 |
URL | http://arxiv.org/abs/1801.02121v1 |
http://arxiv.org/pdf/1801.02121v1.pdf | |
PWC | https://paperswithcode.com/paper/architecture-based-classification-of-leaf |
Repo | |
Framework | |
Convolutional Neural Networks based Intra Prediction for HEVC
Title | Convolutional Neural Networks based Intra Prediction for HEVC |
Authors | Wenxue Cui, Tao Zhang, Shengping Zhang, Feng Jiang, Wangmeng Zuo, Debin Zhao |
Abstract | Traditional intra prediction methods for HEVC rely on using the nearest reference lines for predicting a block, which ignore much richer context between the current block and its neighboring blocks and therefore cause inaccurate prediction especially when weak spatial correlation exists between the current block and the reference lines. To overcome this problem, in this paper, an intra prediction convolutional neural network (IPCNN) is proposed for intra prediction, which exploits the rich context of the current block and therefore is capable of improving the accuracy of predicting the current block. Meanwhile, the predictions of the three nearest blocks can also be refined. To the best of our knowledge, this is the first paper that directly applies CNNs to intra prediction for HEVC. Experimental results validate the effectiveness of applying CNNs to intra prediction and achieved significant performance improvement compared to traditional intra prediction methods. |
Tasks | |
Published | 2018-08-17 |
URL | http://arxiv.org/abs/1808.05734v1 |
http://arxiv.org/pdf/1808.05734v1.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-neural-networks-based-intra |
Repo | |
Framework | |
Patch-Based Image Hallucination for Super Resolution with Detail Reconstruction from Similar Sample Images
Title | Patch-Based Image Hallucination for Super Resolution with Detail Reconstruction from Similar Sample Images |
Authors | Chieh-Chi Kao, Yuxiang Wang, Jonathan Waltman, Pradeep Sen |
Abstract | Image hallucination and super-resolution have been studied for decades, and many approaches have been proposed to upsample low-resolution images using information from the images themselves, multiple example images, or large image databases. However, most of this work has focused exclusively on small magnification levels because the algorithms simply sharpen the blurry edges in the upsampled images - no actual new detail is typically reconstructed in the final result. In this paper, we present a patch-based algorithm for image hallucination which, for the first time, properly synthesizes novel high frequency detail. To do this, we pose the synthesis problem as a patch-based optimization which inserts coherent, high-frequency detail from contextually-similar images of the same physical scene/subject provided from either a personal image collection or a large online database. The resulting image is visually plausible and contains coherent high frequency information. We demonstrate the robustness of our algorithm by testing it on a large number of images and show that its performance is considerably superior to all state-of-the-art approaches, a result that is verified to be statistically significant through a randomized user study. |
Tasks | Super-Resolution |
Published | 2018-06-03 |
URL | http://arxiv.org/abs/1806.00874v1 |
http://arxiv.org/pdf/1806.00874v1.pdf | |
PWC | https://paperswithcode.com/paper/patch-based-image-hallucination-for-super |
Repo | |
Framework | |
On Enhancing Speech Emotion Recognition using Generative Adversarial Networks
Title | On Enhancing Speech Emotion Recognition using Generative Adversarial Networks |
Authors | Saurabh Sahu, Rahul Gupta, Carol Espy-Wilson |
Abstract | Generative Adversarial Networks (GANs) have gained a lot of attention from machine learning community due to their ability to learn and mimic an input data distribution. GANs consist of a discriminator and a generator working in tandem playing a min-max game to learn a target underlying data distribution; when fed with data-points sampled from a simpler distribution (like uniform or Gaussian distribution). Once trained, they allow synthetic generation of examples sampled from the target distribution. We investigate the application of GANs to generate synthetic feature vectors used for speech emotion recognition. Specifically, we investigate two set ups: (i) a vanilla GAN that learns the distribution of a lower dimensional representation of the actual higher dimensional feature vector and, (ii) a conditional GAN that learns the distribution of the higher dimensional feature vectors conditioned on the labels or the emotional class to which it belongs. As a potential practical application of these synthetically generated samples, we measure any improvement in a classifier’s performance when the synthetic data is used along with real data for training. We perform cross-validation analyses followed by a cross-corpus study. |
Tasks | Emotion Recognition, Speech Emotion Recognition |
Published | 2018-06-18 |
URL | http://arxiv.org/abs/1806.06626v1 |
http://arxiv.org/pdf/1806.06626v1.pdf | |
PWC | https://paperswithcode.com/paper/on-enhancing-speech-emotion-recognition-using |
Repo | |
Framework | |
Learning to Explore with Meta-Policy Gradient
Title | Learning to Explore with Meta-Policy Gradient |
Authors | Tianbing Xu, Qiang Liu, Liang Zhao, Jian Peng |
Abstract | The performance of off-policy learning, including deep Q-learning and deep deterministic policy gradient (DDPG), critically depends on the choice of the exploration policy. Existing exploration methods are mostly based on adding noise to the on-going actor policy and can only explore \emph{local} regions close to what the actor policy dictates. In this work, we develop a simple meta-policy gradient algorithm that allows us to adaptively learn the exploration policy in DDPG. Our algorithm allows us to train flexible exploration behaviors that are independent of the actor policy, yielding a \emph{global exploration} that significantly speeds up the learning process. With an extensive study, we show that our method significantly improves the sample-efficiency of DDPG on a variety of reinforcement learning tasks. |
Tasks | Q-Learning |
Published | 2018-03-13 |
URL | http://arxiv.org/abs/1803.05044v2 |
http://arxiv.org/pdf/1803.05044v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-explore-with-meta-policy-gradient |
Repo | |
Framework | |
Abstractive Text Classification Using Sequence-to-convolution Neural Networks
Title | Abstractive Text Classification Using Sequence-to-convolution Neural Networks |
Authors | Taehoon Kim, Jihoon Yang |
Abstract | We propose a new deep neural network model and its training scheme for text classification. Our model Sequence-to-convolution Neural Networks(Seq2CNN) consists of two blocks: Sequential Block that summarizes input texts and Convolution Block that receives summary of input and classifies it to a label. Seq2CNN is trained end-to-end to classify various-length texts without preprocessing inputs into fixed length. We also present Gradual Weight Shift(GWS) method that stabilizes training. GWS is applied to our model’s loss function. We compared our model with word-based TextCNN trained with different data preprocessing methods. We obtained significant improvement in classification accuracy over word-based TextCNN without any ensemble or data augmentation. |
Tasks | Data Augmentation, Text Classification |
Published | 2018-05-20 |
URL | https://arxiv.org/abs/1805.07745v5 |
https://arxiv.org/pdf/1805.07745v5.pdf | |
PWC | https://paperswithcode.com/paper/abstractive-text-classification-using |
Repo | |
Framework | |
Automatic Dataset Annotation to Learn CNN Pore Description for Fingerprint Recognition
Title | Automatic Dataset Annotation to Learn CNN Pore Description for Fingerprint Recognition |
Authors | Gabriel Dahia, Maurício Pamplona Segundo |
Abstract | High-resolution fingerprint recognition often relies on sophisticated matching algorithms based on hand-crafted keypoint descriptors, with pores being the most common keypoint choice. Our method is the opposite of the prevalent approach: we use instead a simple matching algorithm based on robust local pore descriptors that are learned from the data using a CNN. In order to train this CNN in a fully supervised manner, we describe how the automatic alignment of fingerprint images can be used to obtain the required training annotations, which are otherwise missing in all publicly available datasets. This improves the state-of-the-art recognition results for both partial and full fingerprints in a public benchmark. To confirm that the observed improvement is due to the adoption of learned descriptors, we conduct an ablation study using the most successful pore descriptors previously used in the literature. All our code is available at https://github.com/gdahia/high-res-fingerprint-recognition |
Tasks | |
Published | 2018-09-26 |
URL | http://arxiv.org/abs/1809.10229v4 |
http://arxiv.org/pdf/1809.10229v4.pdf | |
PWC | https://paperswithcode.com/paper/automatic-dataset-annotation-to-learn-cnn |
Repo | |
Framework | |
On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks
Title | On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks |
Authors | Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, Stefan Wermter |
Abstract | Speech emotion recognition (SER) is an important aspect of effective human-robot collaboration and received a lot of attention from the research community. For example, many neural network-based architectures were proposed recently and pushed the performance to a new level. However, the applicability of such neural SER models trained only on in-domain data to noisy conditions is currently under-researched. In this work, we evaluate the robustness of state-of-the-art neural acoustic emotion recognition models in human-robot interaction scenarios. We hypothesize that a robot’s ego noise, room conditions, and various acoustic events that can occur in a home environment can significantly affect the performance of a model. We conduct several experiments on the iCub robot platform and propose several novel ways to reduce the gap between the model’s performance during training and testing in real-world conditions. Furthermore, we observe large improvements in the model performance on the robot and demonstrate the necessity of introducing several data augmentation techniques like overlaying background noise and loudness variations to improve the robustness of the neural approaches. |
Tasks | Data Augmentation, Emotion Recognition, Speech Emotion Recognition |
Published | 2018-04-06 |
URL | http://arxiv.org/abs/1804.02173v1 |
http://arxiv.org/pdf/1804.02173v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-robustness-of-speech-emotion |
Repo | |
Framework | |
Speech Emotion Recognition Considering Local Dynamic Features
Title | Speech Emotion Recognition Considering Local Dynamic Features |
Authors | Haotian Guan, Zhilei Liu, Longbiao Wang, Jianwu Dang, Ruiguo Yu |
Abstract | Recently, increasing attention has been directed to the study of the speech emotion recognition, in which global acoustic features of an utterance are mostly used to eliminate the content differences. However, the expression of speech emotion is a dynamic process, which is reflected through dynamic durations, energies, and some other prosodic information when one speaks. In this paper, a novel local dynamic pitch probability distribution feature, which is obtained by drawing the histogram, is proposed to improve the accuracy of speech emotion recognition. Compared with most of the previous works using global features, the proposed method takes advantage of the local dynamic information conveyed by the emotional speech. Several experiments on Berlin Database of Emotional Speech are conducted to verify the effectiveness of the proposed method. The experimental results demonstrate that the local dynamic information obtained with the proposed method is more effective for speech emotion recognition than the traditional global features. |
Tasks | Emotion Recognition, Speech Emotion Recognition |
Published | 2018-03-21 |
URL | http://arxiv.org/abs/1803.07738v1 |
http://arxiv.org/pdf/1803.07738v1.pdf | |
PWC | https://paperswithcode.com/paper/speech-emotion-recognition-considering-local |
Repo | |
Framework | |
Cross-lingual and Multilingual Speech Emotion Recognition on English and French
Title | Cross-lingual and Multilingual Speech Emotion Recognition on English and French |
Authors | Michael Neumann, Ngoc Thang Vu |
Abstract | Research on multilingual speech emotion recognition faces the problem that most available speech corpora differ from each other in important ways, such as annotation methods or interaction scenarios. These inconsistencies complicate building a multilingual system. We present results for cross-lingual and multilingual emotion recognition on English and French speech data with similar characteristics in terms of interaction (human-human conversations). Further, we explore the possibility of fine-tuning a pre-trained cross-lingual model with only a small number of samples from the target language, which is of great interest for low-resource languages. To gain more insights in what is learned by the deployed convolutional neural network, we perform an analysis on the attention mechanism inside the network. |
Tasks | Emotion Recognition, Speech Emotion Recognition |
Published | 2018-03-01 |
URL | http://arxiv.org/abs/1803.00357v1 |
http://arxiv.org/pdf/1803.00357v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-lingual-and-multilingual-speech-emotion |
Repo | |
Framework | |
Skin lesion segmentation using U-Net and good training strategies
Title | Skin lesion segmentation using U-Net and good training strategies |
Authors | Fred Guth, Teofilo E. deCampos |
Abstract | In this paper we approach the problem of skin lesion segmentation using a convolutional neural network based on the U-Net architecture. We present a set of training strategies that had a significant impact on the performance of this model. We evaluated this method on the ISIC Challenge 2018 - Skin Lesion Analysis Towards Melanoma Detection, obtaining threshold Jaccard index of 77.5%. |
Tasks | Lesion Segmentation |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.11314v1 |
http://arxiv.org/pdf/1811.11314v1.pdf | |
PWC | https://paperswithcode.com/paper/skin-lesion-segmentation-using-u-net-and-good |
Repo | |
Framework | |