October 16, 2019

2961 words 14 mins read

Paper Group ANR 997

Normalization Before Shaking Toward Learning Symmetrically Distributed Representation Without Margin in Speech Emotion Recognition. Real-time clustering and multi-target tracking using event-based sensors. Humans can decipher adversarial images. Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition. Arc …

Normalization Before Shaking Toward Learning Symmetrically Distributed Representation Without Margin in Speech Emotion Recognition


Title	Normalization Before Shaking Toward Learning Symmetrically Distributed Representation Without Margin in Speech Emotion Recognition
Authors	Che-Wei Huang, Shrikanth S. Narayanan
Abstract	Regularization is crucial to the success of many practical deep learning models, in particular in a more often than not scenario where there are only a few to a moderate number of accessible training samples. In addition to weight decay, data augmentation and dropout, regularization based on multi-branch architectures, such as Shake-Shake regularization, has been proven successful in many applications and attracted more and more attention. However, beyond model-based representation augmentation, it is unclear how Shake-Shake regularization helps to provide further improvement on classification tasks, let alone the baffling interaction between batch normalization and shaking. In this work, we present our investigation on Shake-Shake regularization, drawing connections to the vicinal risk minimization principle and discriminative feature learning in verification tasks. Furthermore, we identify a strong resemblance between batch normalized residual blocks and batch normalized recurrent neural networks, where both of them share a similar convergence behavior, which could be mitigated by a proper initialization of batch normalization. Based on the findings, our experiments on speech emotion recognition demonstrate simultaneously an improvement on the classification accuracy and a reduction on the generalization gap both with statistical significance.
Tasks	Data Augmentation, Emotion Recognition, Speech Emotion Recognition
Published	2018-08-02
URL	http://arxiv.org/abs/1808.00876v2
PDF	http://arxiv.org/pdf/1808.00876v2.pdf
PWC	https://paperswithcode.com/paper/normalization-before-shaking-toward-learning
Repo
Framework

Real-time clustering and multi-target tracking using event-based sensors


Title	Real-time clustering and multi-target tracking using event-based sensors
Authors	Francisco Barranco, Cornelia Fermuller, Eduardo Ros
Abstract	Clustering is crucial for many computer vision applications such as robust tracking, object detection and segmentation. This work presents a real-time clustering technique that takes advantage of the unique properties of event-based vision sensors. Since event-based sensors trigger events only when the intensity changes, the data is sparse, with low redundancy. Thus, our approach redefines the well-known mean-shift clustering method using asynchronous events instead of conventional frames. The potential of our approach is demonstrated in a multi-target tracking application using Kalman filters to smooth the trajectories. We evaluated our method on an existing dataset with patterns of different shapes and speeds, and a new dataset that we collected. The sensor was attached to the Baxter robot in an eye-in-hand setup monitoring real-world objects in an action manipulation task. Clustering accuracy achieved an F-measure of 0.95, reducing the computational cost by 88% compared to the frame-based method. The average error for tracking was 2.5 pixels and the clustering achieved a consistent number of clusters along time.
Tasks	Event-based vision, Object Detection
Published	2018-07-08
URL	http://arxiv.org/abs/1807.02851v1
PDF	http://arxiv.org/pdf/1807.02851v1.pdf
PWC	https://paperswithcode.com/paper/real-time-clustering-and-multi-target
Repo
Framework

Humans can decipher adversarial images


Title	Humans can decipher adversarial images
Authors	Zhenglong Zhou, Chaz Firestone
Abstract	How similar is the human mind to the sophisticated machine-learning systems that mirror its performance? Models of object categorization based on convolutional neural networks (CNNs) have achieved human-level benchmarks in assigning known labels to novel images. These advances promise to support transformative technologies such as autonomous vehicles and machine diagnosis; beyond this, they also serve as candidate models for the visual system itself – not only in their output but perhaps even in their underlying mechanisms and principles. However, unlike human vision, CNNs can be “fooled” by adversarial examples – carefully crafted images that appear as nonsense patterns to humans but are recognized as familiar objects by machines, or that appear as one object to humans and a different object to machines. This seemingly extreme divergence between human and machine classification challenges the promise of these new advances, both as applied image-recognition systems and also as models of the human mind. Surprisingly, however, little work has empirically investigated human classification of such adversarial stimuli: Does human and machine performance fundamentally diverge? Or could humans decipher such images and predict the machine’s preferred labels? Here, we show that human and machine classification of adversarial stimuli are robustly related: In eight experiments on five prominent and diverse adversarial imagesets, human subjects reliably identified the machine’s chosen label over relevant foils. This pattern persisted for images with strong antecedent identities, and even for images described as “totally unrecognizable to human eyes”. We suggest that human intuition may be a more reliable guide to machine (mis)classification than has typically been imagined, and we explore the consequences of this result for minds and machines alike.
Tasks	Autonomous Vehicles
Published	2018-09-11
URL	http://arxiv.org/abs/1809.04120v3
PDF	http://arxiv.org/pdf/1809.04120v3.pdf
PWC	https://paperswithcode.com/paper/humans-can-decipher-adversarial-images
Repo
Framework

Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition


Title	Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition
Authors	Pengcheng Guo, Haihua Xu, Lei Xie, Eng Siong Chng
Abstract	In this paper, we present our overall efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian Mandarin-English (SEAME) data. We first investigate semi-supervised lexicon learning approach to adapt the canonical lexicon, which is meant to alleviate the heavily accented pronunciation issue within the code-switching conversation of the local area. As a result, the learned lexicon yields improved performance. Furthermore, we attempt to use semi-supervised training to deal with those transcriptions that are highly mismatched between human transcribers and ASR system. Specifically, we conduct semi-supervised training assuming those poorly transcribed data as unsupervised data. We found the semi-supervised acoustic modeling can lead to improved results. Finally, to make up for the limitation of the conventional n-gram language models due to data sparsity issue, we perform lattice rescoring using neural network language models, and significant WER reduction is obtained.
Tasks	Speech Recognition
Published	2018-06-16
URL	http://arxiv.org/abs/1806.06200v1
PDF	http://arxiv.org/pdf/1806.06200v1.pdf
PWC	https://paperswithcode.com/paper/study-of-semi-supervised-approaches-to
Repo
Framework

Architecture Based Classification of Leaf Images


Title	Architecture Based Classification of Leaf Images
Authors	Mahmoud Sadeghi, Ali Zakerolhosseini, Ali Sonboli
Abstract	Plant classification and identification has so far been an important and difficult task. In this paper, an efficient and systematic approach for extracting the leaf architecture characters from captured digital images is proposed. The input image is first pre-processed in five steps to be prepared for feature extraction. In the second stage, methods for extracting different architectural features are studied using various mathematical and computational methods. Also, the classification rules for mapping the calculated values of each feature to semantic botanical terms in proposed. Compared with previous studies, the proposed method combines extracted features of an image with specific knowledge of leaf architecture in the domain of botany to provide a comprehensive framework for both computer engineers and botanist. Finally, Based on the proposed method, experiments on the classification of the ImagerCLEF 2012 dataset has been performed with promising results.
Tasks
Published	2018-01-07
URL	http://arxiv.org/abs/1801.02121v1
PDF	http://arxiv.org/pdf/1801.02121v1.pdf
PWC	https://paperswithcode.com/paper/architecture-based-classification-of-leaf
Repo
Framework

Convolutional Neural Networks based Intra Prediction for HEVC


Title	Convolutional Neural Networks based Intra Prediction for HEVC
Authors	Wenxue Cui, Tao Zhang, Shengping Zhang, Feng Jiang, Wangmeng Zuo, Debin Zhao
Abstract	Traditional intra prediction methods for HEVC rely on using the nearest reference lines for predicting a block, which ignore much richer context between the current block and its neighboring blocks and therefore cause inaccurate prediction especially when weak spatial correlation exists between the current block and the reference lines. To overcome this problem, in this paper, an intra prediction convolutional neural network (IPCNN) is proposed for intra prediction, which exploits the rich context of the current block and therefore is capable of improving the accuracy of predicting the current block. Meanwhile, the predictions of the three nearest blocks can also be refined. To the best of our knowledge, this is the first paper that directly applies CNNs to intra prediction for HEVC. Experimental results validate the effectiveness of applying CNNs to intra prediction and achieved significant performance improvement compared to traditional intra prediction methods.
Tasks
Published	2018-08-17
URL	http://arxiv.org/abs/1808.05734v1
PDF	http://arxiv.org/pdf/1808.05734v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-neural-networks-based-intra
Repo
Framework

Patch-Based Image Hallucination for Super Resolution with Detail Reconstruction from Similar Sample Images


Title	Patch-Based Image Hallucination for Super Resolution with Detail Reconstruction from Similar Sample Images
Authors	Chieh-Chi Kao, Yuxiang Wang, Jonathan Waltman, Pradeep Sen
Abstract	Image hallucination and super-resolution have been studied for decades, and many approaches have been proposed to upsample low-resolution images using information from the images themselves, multiple example images, or large image databases. However, most of this work has focused exclusively on small magnification levels because the algorithms simply sharpen the blurry edges in the upsampled images - no actual new detail is typically reconstructed in the final result. In this paper, we present a patch-based algorithm for image hallucination which, for the first time, properly synthesizes novel high frequency detail. To do this, we pose the synthesis problem as a patch-based optimization which inserts coherent, high-frequency detail from contextually-similar images of the same physical scene/subject provided from either a personal image collection or a large online database. The resulting image is visually plausible and contains coherent high frequency information. We demonstrate the robustness of our algorithm by testing it on a large number of images and show that its performance is considerably superior to all state-of-the-art approaches, a result that is verified to be statistically significant through a randomized user study.
Tasks	Super-Resolution
Published	2018-06-03
URL	http://arxiv.org/abs/1806.00874v1
PDF	http://arxiv.org/pdf/1806.00874v1.pdf
PWC	https://paperswithcode.com/paper/patch-based-image-hallucination-for-super
Repo
Framework

On Enhancing Speech Emotion Recognition using Generative Adversarial Networks


Title	On Enhancing Speech Emotion Recognition using Generative Adversarial Networks
Authors	Saurabh Sahu, Rahul Gupta, Carol Espy-Wilson
Abstract	Generative Adversarial Networks (GANs) have gained a lot of attention from machine learning community due to their ability to learn and mimic an input data distribution. GANs consist of a discriminator and a generator working in tandem playing a min-max game to learn a target underlying data distribution; when fed with data-points sampled from a simpler distribution (like uniform or Gaussian distribution). Once trained, they allow synthetic generation of examples sampled from the target distribution. We investigate the application of GANs to generate synthetic feature vectors used for speech emotion recognition. Specifically, we investigate two set ups: (i) a vanilla GAN that learns the distribution of a lower dimensional representation of the actual higher dimensional feature vector and, (ii) a conditional GAN that learns the distribution of the higher dimensional feature vectors conditioned on the labels or the emotional class to which it belongs. As a potential practical application of these synthetically generated samples, we measure any improvement in a classifier’s performance when the synthetic data is used along with real data for training. We perform cross-validation analyses followed by a cross-corpus study.
Tasks	Emotion Recognition, Speech Emotion Recognition
Published	2018-06-18
URL	http://arxiv.org/abs/1806.06626v1
PDF	http://arxiv.org/pdf/1806.06626v1.pdf
PWC	https://paperswithcode.com/paper/on-enhancing-speech-emotion-recognition-using
Repo
Framework

Learning to Explore with Meta-Policy Gradient


Title	Learning to Explore with Meta-Policy Gradient
Authors	Tianbing Xu, Qiang Liu, Liang Zhao, Jian Peng
Abstract	The performance of off-policy learning, including deep Q-learning and deep deterministic policy gradient (DDPG), critically depends on the choice of the exploration policy. Existing exploration methods are mostly based on adding noise to the on-going actor policy and can only explore \emph{local} regions close to what the actor policy dictates. In this work, we develop a simple meta-policy gradient algorithm that allows us to adaptively learn the exploration policy in DDPG. Our algorithm allows us to train flexible exploration behaviors that are independent of the actor policy, yielding a \emph{global exploration} that significantly speeds up the learning process. With an extensive study, we show that our method significantly improves the sample-efficiency of DDPG on a variety of reinforcement learning tasks.
Tasks	Q-Learning
Published	2018-03-13
URL	http://arxiv.org/abs/1803.05044v2
PDF	http://arxiv.org/pdf/1803.05044v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-explore-with-meta-policy-gradient
Repo
Framework

Abstractive Text Classification Using Sequence-to-convolution Neural Networks


Title	Abstractive Text Classification Using Sequence-to-convolution Neural Networks
Authors	Taehoon Kim, Jihoon Yang
Abstract	We propose a new deep neural network model and its training scheme for text classification. Our model Sequence-to-convolution Neural Networks(Seq2CNN) consists of two blocks: Sequential Block that summarizes input texts and Convolution Block that receives summary of input and classifies it to a label. Seq2CNN is trained end-to-end to classify various-length texts without preprocessing inputs into fixed length. We also present Gradual Weight Shift(GWS) method that stabilizes training. GWS is applied to our model’s loss function. We compared our model with word-based TextCNN trained with different data preprocessing methods. We obtained significant improvement in classification accuracy over word-based TextCNN without any ensemble or data augmentation.
Tasks	Data Augmentation, Text Classification
Published	2018-05-20
URL	https://arxiv.org/abs/1805.07745v5
PDF	https://arxiv.org/pdf/1805.07745v5.pdf
PWC	https://paperswithcode.com/paper/abstractive-text-classification-using
Repo
Framework

Automatic Dataset Annotation to Learn CNN Pore Description for Fingerprint Recognition


Title	Automatic Dataset Annotation to Learn CNN Pore Description for Fingerprint Recognition
Authors	Gabriel Dahia, Maurício Pamplona Segundo
Abstract	High-resolution fingerprint recognition often relies on sophisticated matching algorithms based on hand-crafted keypoint descriptors, with pores being the most common keypoint choice. Our method is the opposite of the prevalent approach: we use instead a simple matching algorithm based on robust local pore descriptors that are learned from the data using a CNN. In order to train this CNN in a fully supervised manner, we describe how the automatic alignment of fingerprint images can be used to obtain the required training annotations, which are otherwise missing in all publicly available datasets. This improves the state-of-the-art recognition results for both partial and full fingerprints in a public benchmark. To confirm that the observed improvement is due to the adoption of learned descriptors, we conduct an ablation study using the most successful pore descriptors previously used in the literature. All our code is available at https://github.com/gdahia/high-res-fingerprint-recognition
Tasks
Published	2018-09-26
URL	http://arxiv.org/abs/1809.10229v4
PDF	http://arxiv.org/pdf/1809.10229v4.pdf
PWC	https://paperswithcode.com/paper/automatic-dataset-annotation-to-learn-cnn
Repo
Framework

On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks


Title	On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks
Authors	Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, Stefan Wermter
Abstract	Speech emotion recognition (SER) is an important aspect of effective human-robot collaboration and received a lot of attention from the research community. For example, many neural network-based architectures were proposed recently and pushed the performance to a new level. However, the applicability of such neural SER models trained only on in-domain data to noisy conditions is currently under-researched. In this work, we evaluate the robustness of state-of-the-art neural acoustic emotion recognition models in human-robot interaction scenarios. We hypothesize that a robot’s ego noise, room conditions, and various acoustic events that can occur in a home environment can significantly affect the performance of a model. We conduct several experiments on the iCub robot platform and propose several novel ways to reduce the gap between the model’s performance during training and testing in real-world conditions. Furthermore, we observe large improvements in the model performance on the robot and demonstrate the necessity of introducing several data augmentation techniques like overlaying background noise and loudness variations to improve the robustness of the neural approaches.
Tasks	Data Augmentation, Emotion Recognition, Speech Emotion Recognition
Published	2018-04-06
URL	http://arxiv.org/abs/1804.02173v1
PDF	http://arxiv.org/pdf/1804.02173v1.pdf
PWC	https://paperswithcode.com/paper/on-the-robustness-of-speech-emotion
Repo
Framework

Speech Emotion Recognition Considering Local Dynamic Features


Title	Speech Emotion Recognition Considering Local Dynamic Features
Authors	Haotian Guan, Zhilei Liu, Longbiao Wang, Jianwu Dang, Ruiguo Yu
Abstract	Recently, increasing attention has been directed to the study of the speech emotion recognition, in which global acoustic features of an utterance are mostly used to eliminate the content differences. However, the expression of speech emotion is a dynamic process, which is reflected through dynamic durations, energies, and some other prosodic information when one speaks. In this paper, a novel local dynamic pitch probability distribution feature, which is obtained by drawing the histogram, is proposed to improve the accuracy of speech emotion recognition. Compared with most of the previous works using global features, the proposed method takes advantage of the local dynamic information conveyed by the emotional speech. Several experiments on Berlin Database of Emotional Speech are conducted to verify the effectiveness of the proposed method. The experimental results demonstrate that the local dynamic information obtained with the proposed method is more effective for speech emotion recognition than the traditional global features.
Tasks	Emotion Recognition, Speech Emotion Recognition
Published	2018-03-21
URL	http://arxiv.org/abs/1803.07738v1
PDF	http://arxiv.org/pdf/1803.07738v1.pdf
PWC	https://paperswithcode.com/paper/speech-emotion-recognition-considering-local
Repo
Framework

Cross-lingual and Multilingual Speech Emotion Recognition on English and French


Title	Cross-lingual and Multilingual Speech Emotion Recognition on English and French
Authors	Michael Neumann, Ngoc Thang Vu
Abstract	Research on multilingual speech emotion recognition faces the problem that most available speech corpora differ from each other in important ways, such as annotation methods or interaction scenarios. These inconsistencies complicate building a multilingual system. We present results for cross-lingual and multilingual emotion recognition on English and French speech data with similar characteristics in terms of interaction (human-human conversations). Further, we explore the possibility of fine-tuning a pre-trained cross-lingual model with only a small number of samples from the target language, which is of great interest for low-resource languages. To gain more insights in what is learned by the deployed convolutional neural network, we perform an analysis on the attention mechanism inside the network.
Tasks	Emotion Recognition, Speech Emotion Recognition
Published	2018-03-01
URL	http://arxiv.org/abs/1803.00357v1
PDF	http://arxiv.org/pdf/1803.00357v1.pdf
PWC	https://paperswithcode.com/paper/cross-lingual-and-multilingual-speech-emotion
Repo
Framework

Skin lesion segmentation using U-Net and good training strategies


Title	Skin lesion segmentation using U-Net and good training strategies
Authors	Fred Guth, Teofilo E. deCampos
Abstract	In this paper we approach the problem of skin lesion segmentation using a convolutional neural network based on the U-Net architecture. We present a set of training strategies that had a significant impact on the performance of this model. We evaluated this method on the ISIC Challenge 2018 - Skin Lesion Analysis Towards Melanoma Detection, obtaining threshold Jaccard index of 77.5%.
Tasks	Lesion Segmentation
Published	2018-11-27
URL	http://arxiv.org/abs/1811.11314v1
PDF	http://arxiv.org/pdf/1811.11314v1.pdf
PWC	https://paperswithcode.com/paper/skin-lesion-segmentation-using-u-net-and-good
Repo
Framework