October 17, 2019

3175 words 15 mins read

Paper Group ANR 890

Paper Group ANR 890

Phasebook and Friends: Leveraging Discrete Representations for Source Separation. Hate Speech Detection from Code-mixed Hindi-English Tweets Using Deep Learning Models. Structural inpainting. Hydranet: Data Augmentation for Regression Neural Networks. Distribution-based Label Space Transformation for Multi-label Learning. Why is unsupervised alignm …

Phasebook and Friends: Leveraging Discrete Representations for Source Separation

Title Phasebook and Friends: Leveraging Discrete Representations for Source Separation
Authors Jonathan Le Roux, Gordon Wichern, Shinji Watanabe, Andy Sarroff, John R. Hershey
Abstract Deep learning based speech enhancement and source separation systems have recently reached unprecedented levels of quality, to the point that performance is reaching a new ceiling. Most systems rely on estimating the magnitude of a target source by estimating a real-valued mask to be applied to a time-frequency representation of the mixture signal. A limiting factor in such approaches is a lack of phase estimation: the phase of the mixture is most often used when reconstructing the estimated time-domain signal. Here, we propose “magbook”, “phasebook”, and “combook”, three new types of layers based on discrete representations that can be used to estimate complex time-frequency masks. Magbook layers extend classical sigmoidal units and a recently introduced convex softmax activation for mask-based magnitude estimation. Phasebook layers use a similar structure to give an estimate of the phase mask without suffering from phase wrapping issues. Combook layers are an alternative to the magbook-phasebook combination that directly estimate complex masks. We present various training and inference schemes involving these representations, and explain in particular how to include them in an end-to-end learning framework. We also present an oracle study to assess upper bounds on performance for various types of masks using discrete phase representations. We evaluate the proposed methods on the wsj0-2mix dataset, a well-studied corpus for single-channel speaker-independent speaker separation, matching the performance of state-of-the-art mask-based approaches without requiring additional phase reconstruction steps.
Tasks Speaker Separation, Speech Enhancement
Published 2018-10-02
URL http://arxiv.org/abs/1810.01395v2
PDF http://arxiv.org/pdf/1810.01395v2.pdf
PWC https://paperswithcode.com/paper/phasebook-and-friends-leveraging-discrete
Repo
Framework

Hate Speech Detection from Code-mixed Hindi-English Tweets Using Deep Learning Models

Title Hate Speech Detection from Code-mixed Hindi-English Tweets Using Deep Learning Models
Authors Satyajit Kamble, Aditya Joshi
Abstract This paper reports an increment to the state-of-the-art in hate speech detection for English-Hindi code-mixed tweets. We compare three typical deep learning models using domain-specific embeddings. On experimenting with a benchmark dataset of English-Hindi code-mixed tweets, we observe that using domain-specific embeddings results in an improved representation of target groups, and an improved F-score.
Tasks Hate Speech Detection
Published 2018-11-13
URL http://arxiv.org/abs/1811.05145v1
PDF http://arxiv.org/pdf/1811.05145v1.pdf
PWC https://paperswithcode.com/paper/hate-speech-detection-from-code-mixed-hindi
Repo
Framework

Structural inpainting

Title Structural inpainting
Authors Huy V. Vo, Ngoc Q. K. Duong, Patrick Perez
Abstract Scene-agnostic visual inpainting remains very challenging despite progress in patch-based methods. Recently, Pathak et al. 2016 have introduced convolutional “context encoders” (CEs) for unsupervised feature learning through image completion tasks. With the additional help of adversarial training, CEs turned out to be a promising tool to complete complex structures in real inpainting problems. In the present paper we propose to push further this key ability by relying on perceptual reconstruction losses at training time. We show on a wide variety of visual scenes the merit of the approach for structural inpainting, and confirm it through a user study. Combined with the optimization-based refinement of Yang et al. 2016 with neural patches, our context encoder opens up new opportunities for prior-free visual inpainting.
Tasks
Published 2018-03-27
URL http://arxiv.org/abs/1803.10348v1
PDF http://arxiv.org/pdf/1803.10348v1.pdf
PWC https://paperswithcode.com/paper/structural-inpainting
Repo
Framework

Hydranet: Data Augmentation for Regression Neural Networks

Title Hydranet: Data Augmentation for Regression Neural Networks
Authors Florian Dubost, Gerda Bortsova, Hieab Adams, M. Arfan Ikram, Wiro Niessen, Meike Vernooij, Marleen de Bruijne
Abstract Deep learning techniques are often criticized to heavily depend on a large quantity of labeled data. This problem is even more challenging in medical image analysis where the annotator expertise is often scarce. We propose a novel data-augmentation method to regularize neural network regressors that learn from a single global label per image. The principle of the method is to create new samples by recombining existing ones. We demonstrate the performance of our algorithm on two tasks: estimation of the number of enlarged perivascular spaces in the basal ganglia, and estimation of white matter hyperintensities volume. We show that the proposed method improves the performance over more basic data augmentation. The proposed method reached an intraclass correlation coefficient between ground truth and network predictions of 0.73 on the first task and 0.84 on the second task, only using between 25 and 30 scans with a single global label per scan for training. With the same number of training scans, more conventional data augmentation methods could only reach intraclass correlation coefficients of 0.68 on the first task, and 0.79 on the second task.
Tasks Data Augmentation
Published 2018-07-12
URL https://arxiv.org/abs/1807.04798v3
PDF https://arxiv.org/pdf/1807.04798v3.pdf
PWC https://paperswithcode.com/paper/hydranet-data-augmentation-for-regression
Repo
Framework

Distribution-based Label Space Transformation for Multi-label Learning

Title Distribution-based Label Space Transformation for Multi-label Learning
Authors Zongting Lyu, Yan Yan, Fei Wu
Abstract Multi-label learning problems have manifested themselves in various machine learning applications. The key to successful multi-label learning algorithms lies in the exploration of inter-label correlations, which usually incur great computational cost. Another notable factor in multi-label learning is that the label vectors are usually extremely sparse, especially when the candidate label vocabulary is very large and only a few instances are assigned to each category. Recently, a label space transformation (LST) framework has been proposed targeting these challenges. However, current methods based on LST usually suffer from information loss in the label space dimension reduction process and fail to address the sparsity problem effectively. In this paper, we propose a distribution-based label space transformation (DLST) model. By defining the distribution based on the similarity of label vectors, a more comprehensive label structure can be captured. Then, by minimizing KL-divergence of two distributions, the information of the original label space can be approximately preserved in the latent space. Consequently, multi-label classifier trained using the dense latent codes yields better performance. The leverage of distribution enables DLST to fill out additional information about the label correlations. This endows DLST the capability to handle label set sparsity and training data sparsity in multi-label learning problems. With the optimal latent code, a kernel logistic regression function is learned for the mapping from feature space to the latent space. Then ML-KNN is employed to recover the original label vector from the transformed latent code. Extensive experiments on several benchmark datasets demonstrate that DLST not only achieves high classification performance but also is computationally more efficient.
Tasks Dimensionality Reduction, Multi-Label Learning
Published 2018-05-15
URL http://arxiv.org/abs/1805.05687v1
PDF http://arxiv.org/pdf/1805.05687v1.pdf
PWC https://paperswithcode.com/paper/distribution-based-label-space-transformation
Repo
Framework

Why is unsupervised alignment of English embeddings from different algorithms so hard?

Title Why is unsupervised alignment of English embeddings from different algorithms so hard?
Authors Mareike Hartmann, Yova Kementchedjhieva, Anders Søgaard
Abstract This paper presents a challenge to the community: Generative adversarial networks (GANs) can perfectly align independent English word embeddings induced using the same algorithm, based on distributional information alone; but fails to do so, for two different embeddings algorithms. Why is that? We believe understanding why, is key to understand both modern word embedding algorithms and the limitations and instability dynamics of GANs. This paper shows that (a) in all these cases, where alignment fails, there exists a linear transform between the two embeddings (so algorithm biases do not lead to non-linear differences), and (b) similar effects can not easily be obtained by varying hyper-parameters. One plausible suggestion based on our initial experiments is that the differences in the inductive biases of the embedding algorithms lead to an optimization landscape that is riddled with local optima, leading to a very small basin of convergence, but we present this more as a challenge paper than a technical contribution.
Tasks Word Embeddings
Published 2018-09-01
URL http://arxiv.org/abs/1809.00150v1
PDF http://arxiv.org/pdf/1809.00150v1.pdf
PWC https://paperswithcode.com/paper/why-is-unsupervised-alignment-of-english
Repo
Framework

Input Perturbations for Adaptive Control and Learning

Title Input Perturbations for Adaptive Control and Learning
Authors Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis
Abstract This paper studies adaptive algorithms for simultaneous regulation (i.e., control) and estimation (i.e., learning) of Multiple Input Multiple Output (MIMO) linear dynamical systems. It proposes practical, easy to implement control policies based on perturbations of input signals. Such policies are shown to achieve a worst-case regret that scales as the square-root of the time horizon, and holds uniformly over time. Further, it discusses specific settings where such greedy policies attain the information theoretic lower bound of logarithmic regret. To establish the results, recent advances on self-normalized martingales together with a novel method of policy decomposition are leveraged.
Tasks
Published 2018-11-10
URL https://arxiv.org/abs/1811.04258v3
PDF https://arxiv.org/pdf/1811.04258v3.pdf
PWC https://paperswithcode.com/paper/input-perturbations-for-adaptive-regulation
Repo
Framework

Pre-Trained Convolutional Neural Network Features for Facial Expression Recognition

Title Pre-Trained Convolutional Neural Network Features for Facial Expression Recognition
Authors Aravind Ravi
Abstract Facial expression recognition has been an active area in computer vision with application areas including animation, social robots, personalized banking, etc. In this study, we explore the problem of image classification for detecting facial expressions based on features extracted from pre-trained convolutional neural networks trained on ImageNet database. Features are extracted and transferred to a Linear Support Vector Machine for classification. All experiments are performed on two publicly available datasets such as JAFFE and CK+ database. The results show that representations learned from pre-trained networks for a task such as object recognition can be transferred, and used for facial expression recognition. Furthermore, for a small dataset, using features from earlier layers of the VGG19 network provides better classification accuracy. Accuracies of 92.26% and 92.86% were achieved for the CK+ and JAFFE datasets respectively.
Tasks Facial Expression Recognition, Image Classification, Object Recognition
Published 2018-12-16
URL http://arxiv.org/abs/1812.06387v1
PDF http://arxiv.org/pdf/1812.06387v1.pdf
PWC https://paperswithcode.com/paper/pre-trained-convolutional-neural-network
Repo
Framework

Subspace Clustering by Block Diagonal Representation

Title Subspace Clustering by Block Diagonal Representation
Authors Canyi Lu, Jiashi Feng, Zhouchen Lin, Tao Mei, Shuicheng Yan
Abstract This paper studies the subspace clustering problem. Given some data points approximately drawn from a union of subspaces, the goal is to group these data points into their underlying subspaces. Many subspace clustering methods have been proposed and among which sparse subspace clustering and low-rank representation are two representative ones. Despite the different motivations, we observe that many existing methods own the common block diagonal property, which possibly leads to correct clustering, yet with their proofs given case by case. In this work, we consider a general formulation and provide a unified theoretical guarantee of the block diagonal property. The block diagonal property of many existing methods falls into our special case. Second, we observe that many existing methods approximate the block diagonal representation matrix by using different structure priors, e.g., sparsity and low-rankness, which are indirect. We propose the first block diagonal matrix induced regularizer for directly pursuing the block diagonal matrix. With this regularizer, we solve the subspace clustering problem by Block Diagonal Representation (BDR), which uses the block diagonal structure prior. The BDR model is nonconvex and we propose an alternating minimization solver and prove its convergence. Experiments on real datasets demonstrate the effectiveness of BDR.
Tasks
Published 2018-05-23
URL http://arxiv.org/abs/1805.09243v1
PDF http://arxiv.org/pdf/1805.09243v1.pdf
PWC https://paperswithcode.com/paper/subspace-clustering-by-block-diagonal
Repo
Framework

Boosted Convolutional Neural Networks for Motor Imagery EEG Decoding with Multiwavelet-based Time-Frequency Conditional Granger Causality Analysis

Title Boosted Convolutional Neural Networks for Motor Imagery EEG Decoding with Multiwavelet-based Time-Frequency Conditional Granger Causality Analysis
Authors Yang Li, Mengying Lei, Xianrui Zhang, Weigang Cui, Yuzhu Guo, Ting-Wen Huang, Hua-Liang Wei
Abstract Decoding EEG signals of different mental states is a challenging task for brain-computer interfaces (BCIs) due to nonstationarity of perceptual decision processes. This paper presents a novel boosted convolutional neural networks (ConvNets) decoding scheme for motor imagery (MI) EEG signals assisted by the multiwavelet-based time-frequency (TF) causality analysis. Specifically, multiwavelet basis functions are first combined with Geweke spectral measure to obtain high-resolution TF-conditional Granger causality (CGC) representations, where a regularized orthogonal forward regression (ROFR) algorithm is adopted to detect a parsimonious model with good generalization performance. The causality images for network input preserving time, frequency and location information of connectivity are then designed based on the TF-CGC distributions of alpha band multichannel EEG signals. Further constructed boosted ConvNets by using spatio-temporal convolutions as well as advances in deep learning including cropping and boosting methods, to extract discriminative causality features and classify MI tasks. Our proposed approach outperforms the competition winner algorithm with 12.15% increase in average accuracy and 74.02% decrease in associated inter subject standard deviation for the same binary classification on BCI competition-IV dataset-IIa. Experiment results indicate that the boosted ConvNets with causality images works well in decoding MI-EEG signals and provides a promising framework for developing MI-BCI systems.
Tasks EEG, Eeg Decoding
Published 2018-10-22
URL http://arxiv.org/abs/1810.10353v1
PDF http://arxiv.org/pdf/1810.10353v1.pdf
PWC https://paperswithcode.com/paper/boosted-convolutional-neural-networks-for
Repo
Framework

An Empirical Study towards Understanding How Deep Convolutional Nets Recognize Falls

Title An Empirical Study towards Understanding How Deep Convolutional Nets Recognize Falls
Authors Yan Zhang, Heiko Neumann
Abstract Detecting unintended falls is essential for ambient intelligence and healthcare of elderly people living alone. In recent years, deep convolutional nets are widely used in human action analysis, based on which a number of fall detection methods have been proposed. Despite their highly effective performances, the behaviors of how the convolutional nets recognize falls are still not clear. In this paper, instead of proposing a novel approach, we perform a systematical empirical study, attempting to investigate the underlying fall recognition process. We propose four tasks to investigate, which involve five types of input modalities, seven net instances and different training samples. The obtained quantitative and qualitative results reveal the patterns that the nets tend to learn, and several factors that can heavily influence the performances on fall recognition. We expect that our conclusions are favorable to proposing better deep learning solutions to fall detection systems.
Tasks
Published 2018-12-05
URL http://arxiv.org/abs/1812.01923v1
PDF http://arxiv.org/pdf/1812.01923v1.pdf
PWC https://paperswithcode.com/paper/an-empirical-study-towards-understanding-how
Repo
Framework

Scalable Multi-Class Bayesian Support Vector Machines for Structured and Unstructured Data

Title Scalable Multi-Class Bayesian Support Vector Machines for Structured and Unstructured Data
Authors Martin Wistuba, Ambrish Rawat
Abstract We introduce a new Bayesian multi-class support vector machine by formulating a pseudo-likelihood for a multi-class hinge loss in the form of a location-scale mixture of Gaussians. We derive a variational-inference-based training objective for gradient-based learning. Additionally, we employ an inducing point approximation which scales inference to large data sets. Furthermore, we develop hybrid Bayesian neural networks that combine standard deep learning components with the proposed model to enable learning for unstructured data. We provide empirical evidence that our model outperforms the competitor methods with respect to both training time and accuracy in classification experiments on 68 structured and two unstructured data sets. Finally, we highlight the key capability of our model in yielding prediction uncertainty for classification by demonstrating its effectiveness in the tasks of large-scale active learning and detection of adversarial images.
Tasks Active Learning
Published 2018-06-07
URL http://arxiv.org/abs/1806.02659v1
PDF http://arxiv.org/pdf/1806.02659v1.pdf
PWC https://paperswithcode.com/paper/scalable-multi-class-bayesian-support-vector
Repo
Framework

Multi-Source Fusion Operations in Subjective Logic

Title Multi-Source Fusion Operations in Subjective Logic
Authors Rens Wouter van der Heijden, Henning Kopp, Frank Kargl
Abstract The purpose of multi-source fusion is to combine information from more than two evidence sources, or subjective opinions from multiple actors. For subjective logic, a number of different fusion operators have been proposed, each matching a fusion scenario with different assumptions. However, not all of these operators are associative, and therefore multi-source fusion is not well-defined for these settings. In this paper, we address this challenge, and define multi-source fusion for weighted belief fusion (WBF) and consensus & compromise fusion (CCF). For WBF, we show the definition to be equivalent to the intuitive formulation under the bijective mapping between subjective logic and Dirichlet evidence PDFs. For CCF, since there is no independent generalization, we show that the resulting multi-source fusion produces valid opinions, and explain why our generalization is sound. For completeness, we also provide corrections to previous results for averaging and cumulative belief fusion (ABF and CBF), as well as belief constraint fusion (BCF), which is an extension of Dempster’s rule. With our generalizations of fusion operators, fusing information from multiple sources is now well-defined for all different fusion types defined in subjective logic. This enables wider applicability of subjective logic in applications where multiple actors interact.
Tasks
Published 2018-05-03
URL http://arxiv.org/abs/1805.01388v1
PDF http://arxiv.org/pdf/1805.01388v1.pdf
PWC https://paperswithcode.com/paper/multi-source-fusion-operations-in-subjective
Repo
Framework

Kickstarting Deep Reinforcement Learning

Title Kickstarting Deep Reinforcement Learning
Authors Simon Schmitt, Jonathan J. Hudson, Augustin Zidek, Simon Osindero, Carl Doersch, Wojciech M. Czarnecki, Joel Z. Leibo, Heinrich Kuttler, Andrew Zisserman, Karen Simonyan, S. M. Ali Eslami
Abstract We present a method for using previously-trained ‘teacher’ agents to kickstart the training of a new ‘student’ agent. To this end, we leverage ideas from policy distillation and population based training. Our method places no constraints on the architecture of the teacher or student agents, and it regulates itself to allow the students to surpass their teachers in performance. We show that, on a challenging and computationally-intensive multi-task benchmark (DMLab-30), kickstarted training improves the data efficiency of new agents, making it significantly easier to iterate on their design. We also show that the same kickstarting pipeline can allow a single student agent to leverage multiple ‘expert’ teachers which specialize on individual tasks. In this setting kickstarting yields surprisingly large gains, with the kickstarted agent matching the performance of an agent trained from scratch in almost 10x fewer steps, and surpassing its final performance by 42 percent. Kickstarting is conceptually simple and can easily be incorporated into reinforcement learning experiments.
Tasks
Published 2018-03-10
URL http://arxiv.org/abs/1803.03835v1
PDF http://arxiv.org/pdf/1803.03835v1.pdf
PWC https://paperswithcode.com/paper/kickstarting-deep-reinforcement-learning
Repo
Framework

Synthesis of High-Quality Visible Faces from Polarimetric Thermal Faces using Generative Adversarial Networks

Title Synthesis of High-Quality Visible Faces from Polarimetric Thermal Faces using Generative Adversarial Networks
Authors He Zhang, Benjamin S. Riggan, Shuowen Hu, Nathaniel J. Short, Vishal M. Patel
Abstract The large domain discrepancy between faces captured in polarimetric (or conventional) thermal and visible domain makes cross-domain face verification a highly challenging problem for human examiners as well as computer vision algorithms. Previous approaches utilize either a two-step procedure (visible feature estimation and visible image reconstruction) or an input-level fusion technique, where different Stokes images are concatenated and used as a multi-channel input to synthesize the visible image given the corresponding polarimetric signatures. Although these methods have yielded improvements, we argue that input-level fusion alone may not be sufficient to realize the full potential of the available Stokes images. We propose a Generative Adversarial Networks (GAN) based multi-stream feature-level fusion technique to synthesize high-quality visible images from prolarimetric thermal images. The proposed network consists of a generator sub-network, constructed using an encoder-decoder network based on dense residual blocks, and a multi-scale discriminator sub-network. The generator network is trained by optimizing an adversarial loss in addition to a perceptual loss and an identity preserving loss to enable photo realistic generation of visible images while preserving discriminative characteristics. An extended dataset consisting of polarimetric thermal facial signatures of 111 subjects is also introduced. Multiple experiments evaluated on different experimental protocols demonstrate that the proposed method achieves state-of-the-art performance. Code will be made available at https://github.com/hezhangsprinter.
Tasks Face Generation, Face Verification, Image Generation, Image Reconstruction
Published 2018-12-12
URL http://arxiv.org/abs/1812.05155v1
PDF http://arxiv.org/pdf/1812.05155v1.pdf
PWC https://paperswithcode.com/paper/synthesis-of-high-quality-visible-faces-from
Repo
Framework
comments powered by Disqus