Paper Group ANR 890
Phasebook and Friends: Leveraging Discrete Representations for Source Separation. Hate Speech Detection from Code-mixed Hindi-English Tweets Using Deep Learning Models. Structural inpainting. Hydranet: Data Augmentation for Regression Neural Networks. Distribution-based Label Space Transformation for Multi-label Learning. Why is unsupervised alignm …
Phasebook and Friends: Leveraging Discrete Representations for Source Separation
Title | Phasebook and Friends: Leveraging Discrete Representations for Source Separation |
Authors | Jonathan Le Roux, Gordon Wichern, Shinji Watanabe, Andy Sarroff, John R. Hershey |
Abstract | Deep learning based speech enhancement and source separation systems have recently reached unprecedented levels of quality, to the point that performance is reaching a new ceiling. Most systems rely on estimating the magnitude of a target source by estimating a real-valued mask to be applied to a time-frequency representation of the mixture signal. A limiting factor in such approaches is a lack of phase estimation: the phase of the mixture is most often used when reconstructing the estimated time-domain signal. Here, we propose “magbook”, “phasebook”, and “combook”, three new types of layers based on discrete representations that can be used to estimate complex time-frequency masks. Magbook layers extend classical sigmoidal units and a recently introduced convex softmax activation for mask-based magnitude estimation. Phasebook layers use a similar structure to give an estimate of the phase mask without suffering from phase wrapping issues. Combook layers are an alternative to the magbook-phasebook combination that directly estimate complex masks. We present various training and inference schemes involving these representations, and explain in particular how to include them in an end-to-end learning framework. We also present an oracle study to assess upper bounds on performance for various types of masks using discrete phase representations. We evaluate the proposed methods on the wsj0-2mix dataset, a well-studied corpus for single-channel speaker-independent speaker separation, matching the performance of state-of-the-art mask-based approaches without requiring additional phase reconstruction steps. |
Tasks | Speaker Separation, Speech Enhancement |
Published | 2018-10-02 |
URL | http://arxiv.org/abs/1810.01395v2 |
http://arxiv.org/pdf/1810.01395v2.pdf | |
PWC | https://paperswithcode.com/paper/phasebook-and-friends-leveraging-discrete |
Repo | |
Framework | |
Hate Speech Detection from Code-mixed Hindi-English Tweets Using Deep Learning Models
Title | Hate Speech Detection from Code-mixed Hindi-English Tweets Using Deep Learning Models |
Authors | Satyajit Kamble, Aditya Joshi |
Abstract | This paper reports an increment to the state-of-the-art in hate speech detection for English-Hindi code-mixed tweets. We compare three typical deep learning models using domain-specific embeddings. On experimenting with a benchmark dataset of English-Hindi code-mixed tweets, we observe that using domain-specific embeddings results in an improved representation of target groups, and an improved F-score. |
Tasks | Hate Speech Detection |
Published | 2018-11-13 |
URL | http://arxiv.org/abs/1811.05145v1 |
http://arxiv.org/pdf/1811.05145v1.pdf | |
PWC | https://paperswithcode.com/paper/hate-speech-detection-from-code-mixed-hindi |
Repo | |
Framework | |
Structural inpainting
Title | Structural inpainting |
Authors | Huy V. Vo, Ngoc Q. K. Duong, Patrick Perez |
Abstract | Scene-agnostic visual inpainting remains very challenging despite progress in patch-based methods. Recently, Pathak et al. 2016 have introduced convolutional “context encoders” (CEs) for unsupervised feature learning through image completion tasks. With the additional help of adversarial training, CEs turned out to be a promising tool to complete complex structures in real inpainting problems. In the present paper we propose to push further this key ability by relying on perceptual reconstruction losses at training time. We show on a wide variety of visual scenes the merit of the approach for structural inpainting, and confirm it through a user study. Combined with the optimization-based refinement of Yang et al. 2016 with neural patches, our context encoder opens up new opportunities for prior-free visual inpainting. |
Tasks | |
Published | 2018-03-27 |
URL | http://arxiv.org/abs/1803.10348v1 |
http://arxiv.org/pdf/1803.10348v1.pdf | |
PWC | https://paperswithcode.com/paper/structural-inpainting |
Repo | |
Framework | |
Hydranet: Data Augmentation for Regression Neural Networks
Title | Hydranet: Data Augmentation for Regression Neural Networks |
Authors | Florian Dubost, Gerda Bortsova, Hieab Adams, M. Arfan Ikram, Wiro Niessen, Meike Vernooij, Marleen de Bruijne |
Abstract | Deep learning techniques are often criticized to heavily depend on a large quantity of labeled data. This problem is even more challenging in medical image analysis where the annotator expertise is often scarce. We propose a novel data-augmentation method to regularize neural network regressors that learn from a single global label per image. The principle of the method is to create new samples by recombining existing ones. We demonstrate the performance of our algorithm on two tasks: estimation of the number of enlarged perivascular spaces in the basal ganglia, and estimation of white matter hyperintensities volume. We show that the proposed method improves the performance over more basic data augmentation. The proposed method reached an intraclass correlation coefficient between ground truth and network predictions of 0.73 on the first task and 0.84 on the second task, only using between 25 and 30 scans with a single global label per scan for training. With the same number of training scans, more conventional data augmentation methods could only reach intraclass correlation coefficients of 0.68 on the first task, and 0.79 on the second task. |
Tasks | Data Augmentation |
Published | 2018-07-12 |
URL | https://arxiv.org/abs/1807.04798v3 |
https://arxiv.org/pdf/1807.04798v3.pdf | |
PWC | https://paperswithcode.com/paper/hydranet-data-augmentation-for-regression |
Repo | |
Framework | |
Distribution-based Label Space Transformation for Multi-label Learning
Title | Distribution-based Label Space Transformation for Multi-label Learning |
Authors | Zongting Lyu, Yan Yan, Fei Wu |
Abstract | Multi-label learning problems have manifested themselves in various machine learning applications. The key to successful multi-label learning algorithms lies in the exploration of inter-label correlations, which usually incur great computational cost. Another notable factor in multi-label learning is that the label vectors are usually extremely sparse, especially when the candidate label vocabulary is very large and only a few instances are assigned to each category. Recently, a label space transformation (LST) framework has been proposed targeting these challenges. However, current methods based on LST usually suffer from information loss in the label space dimension reduction process and fail to address the sparsity problem effectively. In this paper, we propose a distribution-based label space transformation (DLST) model. By defining the distribution based on the similarity of label vectors, a more comprehensive label structure can be captured. Then, by minimizing KL-divergence of two distributions, the information of the original label space can be approximately preserved in the latent space. Consequently, multi-label classifier trained using the dense latent codes yields better performance. The leverage of distribution enables DLST to fill out additional information about the label correlations. This endows DLST the capability to handle label set sparsity and training data sparsity in multi-label learning problems. With the optimal latent code, a kernel logistic regression function is learned for the mapping from feature space to the latent space. Then ML-KNN is employed to recover the original label vector from the transformed latent code. Extensive experiments on several benchmark datasets demonstrate that DLST not only achieves high classification performance but also is computationally more efficient. |
Tasks | Dimensionality Reduction, Multi-Label Learning |
Published | 2018-05-15 |
URL | http://arxiv.org/abs/1805.05687v1 |
http://arxiv.org/pdf/1805.05687v1.pdf | |
PWC | https://paperswithcode.com/paper/distribution-based-label-space-transformation |
Repo | |
Framework | |
Why is unsupervised alignment of English embeddings from different algorithms so hard?
Title | Why is unsupervised alignment of English embeddings from different algorithms so hard? |
Authors | Mareike Hartmann, Yova Kementchedjhieva, Anders Søgaard |
Abstract | This paper presents a challenge to the community: Generative adversarial networks (GANs) can perfectly align independent English word embeddings induced using the same algorithm, based on distributional information alone; but fails to do so, for two different embeddings algorithms. Why is that? We believe understanding why, is key to understand both modern word embedding algorithms and the limitations and instability dynamics of GANs. This paper shows that (a) in all these cases, where alignment fails, there exists a linear transform between the two embeddings (so algorithm biases do not lead to non-linear differences), and (b) similar effects can not easily be obtained by varying hyper-parameters. One plausible suggestion based on our initial experiments is that the differences in the inductive biases of the embedding algorithms lead to an optimization landscape that is riddled with local optima, leading to a very small basin of convergence, but we present this more as a challenge paper than a technical contribution. |
Tasks | Word Embeddings |
Published | 2018-09-01 |
URL | http://arxiv.org/abs/1809.00150v1 |
http://arxiv.org/pdf/1809.00150v1.pdf | |
PWC | https://paperswithcode.com/paper/why-is-unsupervised-alignment-of-english |
Repo | |
Framework | |
Input Perturbations for Adaptive Control and Learning
Title | Input Perturbations for Adaptive Control and Learning |
Authors | Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis |
Abstract | This paper studies adaptive algorithms for simultaneous regulation (i.e., control) and estimation (i.e., learning) of Multiple Input Multiple Output (MIMO) linear dynamical systems. It proposes practical, easy to implement control policies based on perturbations of input signals. Such policies are shown to achieve a worst-case regret that scales as the square-root of the time horizon, and holds uniformly over time. Further, it discusses specific settings where such greedy policies attain the information theoretic lower bound of logarithmic regret. To establish the results, recent advances on self-normalized martingales together with a novel method of policy decomposition are leveraged. |
Tasks | |
Published | 2018-11-10 |
URL | https://arxiv.org/abs/1811.04258v3 |
https://arxiv.org/pdf/1811.04258v3.pdf | |
PWC | https://paperswithcode.com/paper/input-perturbations-for-adaptive-regulation |
Repo | |
Framework | |
Pre-Trained Convolutional Neural Network Features for Facial Expression Recognition
Title | Pre-Trained Convolutional Neural Network Features for Facial Expression Recognition |
Authors | Aravind Ravi |
Abstract | Facial expression recognition has been an active area in computer vision with application areas including animation, social robots, personalized banking, etc. In this study, we explore the problem of image classification for detecting facial expressions based on features extracted from pre-trained convolutional neural networks trained on ImageNet database. Features are extracted and transferred to a Linear Support Vector Machine for classification. All experiments are performed on two publicly available datasets such as JAFFE and CK+ database. The results show that representations learned from pre-trained networks for a task such as object recognition can be transferred, and used for facial expression recognition. Furthermore, for a small dataset, using features from earlier layers of the VGG19 network provides better classification accuracy. Accuracies of 92.26% and 92.86% were achieved for the CK+ and JAFFE datasets respectively. |
Tasks | Facial Expression Recognition, Image Classification, Object Recognition |
Published | 2018-12-16 |
URL | http://arxiv.org/abs/1812.06387v1 |
http://arxiv.org/pdf/1812.06387v1.pdf | |
PWC | https://paperswithcode.com/paper/pre-trained-convolutional-neural-network |
Repo | |
Framework | |
Subspace Clustering by Block Diagonal Representation
Title | Subspace Clustering by Block Diagonal Representation |
Authors | Canyi Lu, Jiashi Feng, Zhouchen Lin, Tao Mei, Shuicheng Yan |
Abstract | This paper studies the subspace clustering problem. Given some data points approximately drawn from a union of subspaces, the goal is to group these data points into their underlying subspaces. Many subspace clustering methods have been proposed and among which sparse subspace clustering and low-rank representation are two representative ones. Despite the different motivations, we observe that many existing methods own the common block diagonal property, which possibly leads to correct clustering, yet with their proofs given case by case. In this work, we consider a general formulation and provide a unified theoretical guarantee of the block diagonal property. The block diagonal property of many existing methods falls into our special case. Second, we observe that many existing methods approximate the block diagonal representation matrix by using different structure priors, e.g., sparsity and low-rankness, which are indirect. We propose the first block diagonal matrix induced regularizer for directly pursuing the block diagonal matrix. With this regularizer, we solve the subspace clustering problem by Block Diagonal Representation (BDR), which uses the block diagonal structure prior. The BDR model is nonconvex and we propose an alternating minimization solver and prove its convergence. Experiments on real datasets demonstrate the effectiveness of BDR. |
Tasks | |
Published | 2018-05-23 |
URL | http://arxiv.org/abs/1805.09243v1 |
http://arxiv.org/pdf/1805.09243v1.pdf | |
PWC | https://paperswithcode.com/paper/subspace-clustering-by-block-diagonal |
Repo | |
Framework | |
Boosted Convolutional Neural Networks for Motor Imagery EEG Decoding with Multiwavelet-based Time-Frequency Conditional Granger Causality Analysis
Title | Boosted Convolutional Neural Networks for Motor Imagery EEG Decoding with Multiwavelet-based Time-Frequency Conditional Granger Causality Analysis |
Authors | Yang Li, Mengying Lei, Xianrui Zhang, Weigang Cui, Yuzhu Guo, Ting-Wen Huang, Hua-Liang Wei |
Abstract | Decoding EEG signals of different mental states is a challenging task for brain-computer interfaces (BCIs) due to nonstationarity of perceptual decision processes. This paper presents a novel boosted convolutional neural networks (ConvNets) decoding scheme for motor imagery (MI) EEG signals assisted by the multiwavelet-based time-frequency (TF) causality analysis. Specifically, multiwavelet basis functions are first combined with Geweke spectral measure to obtain high-resolution TF-conditional Granger causality (CGC) representations, where a regularized orthogonal forward regression (ROFR) algorithm is adopted to detect a parsimonious model with good generalization performance. The causality images for network input preserving time, frequency and location information of connectivity are then designed based on the TF-CGC distributions of alpha band multichannel EEG signals. Further constructed boosted ConvNets by using spatio-temporal convolutions as well as advances in deep learning including cropping and boosting methods, to extract discriminative causality features and classify MI tasks. Our proposed approach outperforms the competition winner algorithm with 12.15% increase in average accuracy and 74.02% decrease in associated inter subject standard deviation for the same binary classification on BCI competition-IV dataset-IIa. Experiment results indicate that the boosted ConvNets with causality images works well in decoding MI-EEG signals and provides a promising framework for developing MI-BCI systems. |
Tasks | EEG, Eeg Decoding |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.10353v1 |
http://arxiv.org/pdf/1810.10353v1.pdf | |
PWC | https://paperswithcode.com/paper/boosted-convolutional-neural-networks-for |
Repo | |
Framework | |
An Empirical Study towards Understanding How Deep Convolutional Nets Recognize Falls
Title | An Empirical Study towards Understanding How Deep Convolutional Nets Recognize Falls |
Authors | Yan Zhang, Heiko Neumann |
Abstract | Detecting unintended falls is essential for ambient intelligence and healthcare of elderly people living alone. In recent years, deep convolutional nets are widely used in human action analysis, based on which a number of fall detection methods have been proposed. Despite their highly effective performances, the behaviors of how the convolutional nets recognize falls are still not clear. In this paper, instead of proposing a novel approach, we perform a systematical empirical study, attempting to investigate the underlying fall recognition process. We propose four tasks to investigate, which involve five types of input modalities, seven net instances and different training samples. The obtained quantitative and qualitative results reveal the patterns that the nets tend to learn, and several factors that can heavily influence the performances on fall recognition. We expect that our conclusions are favorable to proposing better deep learning solutions to fall detection systems. |
Tasks | |
Published | 2018-12-05 |
URL | http://arxiv.org/abs/1812.01923v1 |
http://arxiv.org/pdf/1812.01923v1.pdf | |
PWC | https://paperswithcode.com/paper/an-empirical-study-towards-understanding-how |
Repo | |
Framework | |
Scalable Multi-Class Bayesian Support Vector Machines for Structured and Unstructured Data
Title | Scalable Multi-Class Bayesian Support Vector Machines for Structured and Unstructured Data |
Authors | Martin Wistuba, Ambrish Rawat |
Abstract | We introduce a new Bayesian multi-class support vector machine by formulating a pseudo-likelihood for a multi-class hinge loss in the form of a location-scale mixture of Gaussians. We derive a variational-inference-based training objective for gradient-based learning. Additionally, we employ an inducing point approximation which scales inference to large data sets. Furthermore, we develop hybrid Bayesian neural networks that combine standard deep learning components with the proposed model to enable learning for unstructured data. We provide empirical evidence that our model outperforms the competitor methods with respect to both training time and accuracy in classification experiments on 68 structured and two unstructured data sets. Finally, we highlight the key capability of our model in yielding prediction uncertainty for classification by demonstrating its effectiveness in the tasks of large-scale active learning and detection of adversarial images. |
Tasks | Active Learning |
Published | 2018-06-07 |
URL | http://arxiv.org/abs/1806.02659v1 |
http://arxiv.org/pdf/1806.02659v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-multi-class-bayesian-support-vector |
Repo | |
Framework | |
Multi-Source Fusion Operations in Subjective Logic
Title | Multi-Source Fusion Operations in Subjective Logic |
Authors | Rens Wouter van der Heijden, Henning Kopp, Frank Kargl |
Abstract | The purpose of multi-source fusion is to combine information from more than two evidence sources, or subjective opinions from multiple actors. For subjective logic, a number of different fusion operators have been proposed, each matching a fusion scenario with different assumptions. However, not all of these operators are associative, and therefore multi-source fusion is not well-defined for these settings. In this paper, we address this challenge, and define multi-source fusion for weighted belief fusion (WBF) and consensus & compromise fusion (CCF). For WBF, we show the definition to be equivalent to the intuitive formulation under the bijective mapping between subjective logic and Dirichlet evidence PDFs. For CCF, since there is no independent generalization, we show that the resulting multi-source fusion produces valid opinions, and explain why our generalization is sound. For completeness, we also provide corrections to previous results for averaging and cumulative belief fusion (ABF and CBF), as well as belief constraint fusion (BCF), which is an extension of Dempster’s rule. With our generalizations of fusion operators, fusing information from multiple sources is now well-defined for all different fusion types defined in subjective logic. This enables wider applicability of subjective logic in applications where multiple actors interact. |
Tasks | |
Published | 2018-05-03 |
URL | http://arxiv.org/abs/1805.01388v1 |
http://arxiv.org/pdf/1805.01388v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-source-fusion-operations-in-subjective |
Repo | |
Framework | |
Kickstarting Deep Reinforcement Learning
Title | Kickstarting Deep Reinforcement Learning |
Authors | Simon Schmitt, Jonathan J. Hudson, Augustin Zidek, Simon Osindero, Carl Doersch, Wojciech M. Czarnecki, Joel Z. Leibo, Heinrich Kuttler, Andrew Zisserman, Karen Simonyan, S. M. Ali Eslami |
Abstract | We present a method for using previously-trained ‘teacher’ agents to kickstart the training of a new ‘student’ agent. To this end, we leverage ideas from policy distillation and population based training. Our method places no constraints on the architecture of the teacher or student agents, and it regulates itself to allow the students to surpass their teachers in performance. We show that, on a challenging and computationally-intensive multi-task benchmark (DMLab-30), kickstarted training improves the data efficiency of new agents, making it significantly easier to iterate on their design. We also show that the same kickstarting pipeline can allow a single student agent to leverage multiple ‘expert’ teachers which specialize on individual tasks. In this setting kickstarting yields surprisingly large gains, with the kickstarted agent matching the performance of an agent trained from scratch in almost 10x fewer steps, and surpassing its final performance by 42 percent. Kickstarting is conceptually simple and can easily be incorporated into reinforcement learning experiments. |
Tasks | |
Published | 2018-03-10 |
URL | http://arxiv.org/abs/1803.03835v1 |
http://arxiv.org/pdf/1803.03835v1.pdf | |
PWC | https://paperswithcode.com/paper/kickstarting-deep-reinforcement-learning |
Repo | |
Framework | |
Synthesis of High-Quality Visible Faces from Polarimetric Thermal Faces using Generative Adversarial Networks
Title | Synthesis of High-Quality Visible Faces from Polarimetric Thermal Faces using Generative Adversarial Networks |
Authors | He Zhang, Benjamin S. Riggan, Shuowen Hu, Nathaniel J. Short, Vishal M. Patel |
Abstract | The large domain discrepancy between faces captured in polarimetric (or conventional) thermal and visible domain makes cross-domain face verification a highly challenging problem for human examiners as well as computer vision algorithms. Previous approaches utilize either a two-step procedure (visible feature estimation and visible image reconstruction) or an input-level fusion technique, where different Stokes images are concatenated and used as a multi-channel input to synthesize the visible image given the corresponding polarimetric signatures. Although these methods have yielded improvements, we argue that input-level fusion alone may not be sufficient to realize the full potential of the available Stokes images. We propose a Generative Adversarial Networks (GAN) based multi-stream feature-level fusion technique to synthesize high-quality visible images from prolarimetric thermal images. The proposed network consists of a generator sub-network, constructed using an encoder-decoder network based on dense residual blocks, and a multi-scale discriminator sub-network. The generator network is trained by optimizing an adversarial loss in addition to a perceptual loss and an identity preserving loss to enable photo realistic generation of visible images while preserving discriminative characteristics. An extended dataset consisting of polarimetric thermal facial signatures of 111 subjects is also introduced. Multiple experiments evaluated on different experimental protocols demonstrate that the proposed method achieves state-of-the-art performance. Code will be made available at https://github.com/hezhangsprinter. |
Tasks | Face Generation, Face Verification, Image Generation, Image Reconstruction |
Published | 2018-12-12 |
URL | http://arxiv.org/abs/1812.05155v1 |
http://arxiv.org/pdf/1812.05155v1.pdf | |
PWC | https://paperswithcode.com/paper/synthesis-of-high-quality-visible-faces-from |
Repo | |
Framework | |