Paper Group ANR 813
Clustering of Data with Missing Entries. Improving latent variable descriptiveness with AutoGen. Knowledge-based Word Sense Disambiguation using Topic Models. Quaternion Convolutional Neural Networks for Detection and Localization of 3D Sound Events. Robust Compressive Phase Retrieval via Deep Generative Priors. End-to-End Polyphonic Sound Event De …
Clustering of Data with Missing Entries
Title | Clustering of Data with Missing Entries |
Authors | Sunrita Poddar, Mathews Jacob |
Abstract | The analysis of large datasets is often complicated by the presence of missing entries, mainly because most of the current machine learning algorithms are designed to work with full data. The main focus of this work is to introduce a clustering algorithm, that will provide good clustering even in the presence of missing data. The proposed technique solves an $\ell_0$ fusion penalty based optimization problem to recover the clusters. We theoretically analyze the conditions needed for the successful recovery of the clusters. We also propose an algorithm to solve a relaxation of this problem using saturating non-convex fusion penalties. The method is demonstrated on simulated and real datasets, and is observed to perform well in the presence of large fractions of missing entries. |
Tasks | |
Published | 2018-01-03 |
URL | http://arxiv.org/abs/1801.01455v1 |
http://arxiv.org/pdf/1801.01455v1.pdf | |
PWC | https://paperswithcode.com/paper/clustering-of-data-with-missing-entries |
Repo | |
Framework | |
Improving latent variable descriptiveness with AutoGen
Title | Improving latent variable descriptiveness with AutoGen |
Authors | Alex Mansbridge, Roberto Fierimonte, Ilya Feige, David Barber |
Abstract | Powerful generative models, particularly in Natural Language Modelling, are commonly trained by maximizing a variational lower bound on the data log likelihood. These models often suffer from poor use of their latent variable, with ad-hoc annealing factors used to encourage retention of information in the latent variable. We discuss an alternative and general approach to latent variable modelling, based on an objective that combines the data log likelihood as well as the likelihood of a perfect reconstruction through an autoencoder. Tying these together ensures by design that the latent variable captures information about the observations, whilst retaining the ability to generate well. Interestingly, though this approach is a priori unrelated to VAEs, the lower bound attained is identical to the standard VAE bound but with the addition of a simple pre-factor; thus, providing a formal interpretation of the commonly used, ad-hoc pre-factors in training VAEs. |
Tasks | Language Modelling |
Published | 2018-06-12 |
URL | http://arxiv.org/abs/1806.04480v1 |
http://arxiv.org/pdf/1806.04480v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-latent-variable-descriptiveness |
Repo | |
Framework | |
Knowledge-based Word Sense Disambiguation using Topic Models
Title | Knowledge-based Word Sense Disambiguation using Topic Models |
Authors | Devendra Singh Chaplot, Ruslan Salakhutdinov |
Abstract | Word Sense Disambiguation is an open problem in Natural Language Processing which is particularly challenging and useful in the unsupervised setting where all the words in any given text need to be disambiguated without using any labeled data. Typically WSD systems use the sentence or a small window of words around the target word as the context for disambiguation because their computational complexity scales exponentially with the size of the context. In this paper, we leverage the formalism of topic model to design a WSD system that scales linearly with the number of words in the context. As a result, our system is able to utilize the whole document as the context for a word to be disambiguated. The proposed method is a variant of Latent Dirichlet Allocation in which the topic proportions for a document are replaced by synset proportions. We further utilize the information in the WordNet by assigning a non-uniform prior to synset distribution over words and a logistic-normal prior for document distribution over synsets. We evaluate the proposed method on Senseval-2, Senseval-3, SemEval-2007, SemEval-2013 and SemEval-2015 English All-Word WSD datasets and show that it outperforms the state-of-the-art unsupervised knowledge-based WSD system by a significant margin. |
Tasks | Topic Models, Word Sense Disambiguation |
Published | 2018-01-05 |
URL | http://arxiv.org/abs/1801.01900v1 |
http://arxiv.org/pdf/1801.01900v1.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-based-word-sense-disambiguation |
Repo | |
Framework | |
Quaternion Convolutional Neural Networks for Detection and Localization of 3D Sound Events
Title | Quaternion Convolutional Neural Networks for Detection and Localization of 3D Sound Events |
Authors | Danilo Comminiello, Marco Lella, Simone Scardapane, Aurelio Uncini |
Abstract | Learning from data in the quaternion domain enables us to exploit internal dependencies of 4D signals and treating them as a single entity. One of the models that perfectly suits with quaternion-valued data processing is represented by 3D acoustic signals in their spherical harmonics decomposition. In this paper, we address the problem of localizing and detecting sound events in the spatial sound field by using quaternion-valued data processing. In particular, we consider the spherical harmonic components of the signals captured by a first-order ambisonic microphone and process them by using a quaternion convolutional neural network. Experimental results show that the proposed approach exploits the correlated nature of the ambisonic signals, thus improving accuracy results in 3D sound event detection and localization. |
Tasks | Sound Event Detection |
Published | 2018-12-17 |
URL | http://arxiv.org/abs/1812.06811v1 |
http://arxiv.org/pdf/1812.06811v1.pdf | |
PWC | https://paperswithcode.com/paper/quaternion-convolutional-neural-networks-for |
Repo | |
Framework | |
Robust Compressive Phase Retrieval via Deep Generative Priors
Title | Robust Compressive Phase Retrieval via Deep Generative Priors |
Authors | Fahad Shamshad, Ali Ahmed |
Abstract | This paper proposes a new framework to regularize the highly ill-posed and non-linear phase retrieval problem through deep generative priors using simple gradient descent algorithm. We experimentally show effectiveness of proposed algorithm for random Gaussian measurements (practically relevant in imaging through scattering media) and Fourier friendly measurements (relevant in optical set ups). We demonstrate that proposed approach achieves impressive results when compared with traditional hand engineered priors including sparsity and denoising frameworks for number of measurements and robustness against noise. Finally, we show the effectiveness of the proposed approach on a real transmission matrix dataset in an actual application of multiple scattering media imaging. |
Tasks | Denoising |
Published | 2018-08-17 |
URL | http://arxiv.org/abs/1808.05854v1 |
http://arxiv.org/pdf/1808.05854v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-compressive-phase-retrieval-via-deep |
Repo | |
Framework | |
End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input
Title | End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input |
Authors | Emre Çakır, Tuomas Virtanen |
Abstract | Sound event detection systems typically consist of two stages: extracting hand-crafted features from the raw audio waveform, and learning a mapping between these features and the target sound events using a classifier. Recently, the focus of sound event detection research has been mostly shifted to the latter stage using standard features such as mel spectrogram as the input for classifiers such as deep neural networks. In this work, we utilize end-to-end approach and propose to combine these two stages in a single deep neural network classifier. The feature extraction over the raw waveform is conducted by a feedforward layer block, whose parameters are initialized to extract the time-frequency representations. The feature extraction parameters are updated during training, resulting with a representation that is optimized for the specific task. This feature extraction block is followed by (and jointly trained with) a convolutional recurrent network, which has recently given state-of-the-art results in many sound recognition tasks. The proposed system does not outperform a convolutional recurrent network with fixed hand-crafted features. The final magnitude spectrum characteristics of the feature extraction block parameters indicate that the most relevant information for the given task is contained in 0 - 3 kHz frequency range, and this is also supported by the empirical results on the SED performance. |
Tasks | Sound Event Detection |
Published | 2018-05-09 |
URL | http://arxiv.org/abs/1805.03647v1 |
http://arxiv.org/pdf/1805.03647v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-polyphonic-sound-event-detection |
Repo | |
Framework | |
Prediction of final infarct volume from native CT perfusion and treatment parameters using deep learning
Title | Prediction of final infarct volume from native CT perfusion and treatment parameters using deep learning |
Authors | David Robben, Anna M. M. Boers, Henk A. Marquering, Lucianne L. C. M. Langezaal, Yvo B. W. E. M. Roos, Robert J. van Oostenbrugge, Wim H. van Zwam, Diederik W. J. Dippel, Charles B. L. M. Majoie, Aad van der Lugt, Robin Lemmens, Paul Suetens |
Abstract | CT Perfusion (CTP) imaging has gained importance in the diagnosis of acute stroke. Conventional perfusion analysis performs a deconvolution of the measurements and thresholds the perfusion parameters to determine the tissue status. We pursue a data-driven and deconvolution-free approach, where a deep neural network learns to predict the final infarct volume directly from the native CTP images and metadata such as the time parameters and treatment. This would allow clinicians to simulate various treatments and gain insight into predicted tissue status over time. We demonstrate on a multicenter dataset that our approach is able to predict the final infarct and effectively uses the metadata. An ablation study shows that using the native CTP measurements instead of the deconvolved measurements improves the prediction. |
Tasks | |
Published | 2018-12-06 |
URL | https://arxiv.org/abs/1812.02496v2 |
https://arxiv.org/pdf/1812.02496v2.pdf | |
PWC | https://paperswithcode.com/paper/prediction-of-final-infarct-volume-from |
Repo | |
Framework | |
Drift Theory in Continuous Search Spaces: Expected Hitting Time of the (1+1)-ES with 1/5 Success Rule
Title | Drift Theory in Continuous Search Spaces: Expected Hitting Time of the (1+1)-ES with 1/5 Success Rule |
Authors | Youhei Akimoto, Anne Auger, Tobias Glasmachers |
Abstract | This paper explores the use of the standard approach for proving runtime bounds in discrete domains—often referred to as drift analysis—in the context of optimization on a continuous domain. Using this framework we analyze the (1+1) Evolution Strategy with one-fifth success rule on the sphere function. To deal with potential functions that are not lower-bounded, we formulate novel drift theorems. We then use the theorems to prove bounds on the expected hitting time to reach a certain target fitness in finite dimension $d$. The bounds are akin to linear convergence. We then study the dependency of the different terms on $d$ proving a convergence rate dependency of $\Theta(1/d)$. Our results constitute the first non-asymptotic analysis for the algorithm considered as well as the first explicit application of drift analysis to a randomized search heuristic with continuous domain. |
Tasks | |
Published | 2018-02-09 |
URL | http://arxiv.org/abs/1802.03209v4 |
http://arxiv.org/pdf/1802.03209v4.pdf | |
PWC | https://paperswithcode.com/paper/drift-theory-in-continuous-search-spaces |
Repo | |
Framework | |
End-to-end Networks for Supervised Single-channel Speech Separation
Title | End-to-end Networks for Supervised Single-channel Speech Separation |
Authors | Shrikant Venkataramani, Paris Smaragdis |
Abstract | The performance of single channel source separation algorithms has improved greatly in recent times with the development and deployment of neural networks. However, many such networks continue to operate on the magnitude spectrogram of a mixture, and produce an estimate of source magnitude spectrograms, to perform source separation. In this paper, we interpret these steps as additional neural network layers and propose an end-to-end source separation network that allows us to estimate the separated speech waveform by operating directly on the raw waveform of the mixture. Furthermore, we also propose the use of masking based end-to-end separation networks that jointly optimize the mask and the latent representations of the mixture waveforms. These networks show a significant improvement in separation performance compared to existing architectures in our experiments. To train these end-to-end models, we investigate the use of composite cost functions that are derived from objective evaluation metrics as measured on waveforms. We present subjective listening test results that demonstrate the improvement attained by using masking based end-to-end networks and also reveal insights into the performance of these cost functions for end-to-end source separation. |
Tasks | Speech Separation |
Published | 2018-10-05 |
URL | http://arxiv.org/abs/1810.02568v1 |
http://arxiv.org/pdf/1810.02568v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-networks-for-supervised-single |
Repo | |
Framework | |
SUNLayer: Stable denoising with generative networks
Title | SUNLayer: Stable denoising with generative networks |
Authors | Dustin G. Mixon, Soledad Villar |
Abstract | It has been experimentally established that deep neural networks can be used to produce good generative models for real world data. It has also been established that such generative models can be exploited to solve classical inverse problems like compressed sensing and super resolution. In this work we focus on the classical signal processing problem of image denoising. We propose a theoretical setting that uses spherical harmonics to identify what mathematical properties of the activation functions will allow signal denoising with local methods. |
Tasks | Denoising, Image Denoising, Super-Resolution |
Published | 2018-03-25 |
URL | http://arxiv.org/abs/1803.09319v1 |
http://arxiv.org/pdf/1803.09319v1.pdf | |
PWC | https://paperswithcode.com/paper/sunlayer-stable-denoising-with-generative |
Repo | |
Framework | |
A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification
Title | A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification |
Authors | Weicheng Cai, Zexin Cai, Xiang Zhang, Xiaoqi Wang, Ming Li |
Abstract | A novel learnable dictionary encoding layer is proposed in this paper for end-to-end language identification. It is inline with the conventional GMM i-vector approach both theoretically and practically. We imitate the mechanism of traditional GMM training and Supervector encoding procedure on the top of CNN. The proposed layer can accumulate high-order statistics from variable-length input sequence and generate an utterance level fixed-dimensional vector representation. Unlike the conventional methods, our new approach provides an end-to-end learning framework, where the inherent dictionary are learned directly from the loss function. The dictionaries and the encoding representation for the classifier are learned jointly. The representation is orderless and therefore appropriate for language identification. We conducted a preliminary experiment on NIST LRE07 closed-set task, and the results reveal that our proposed dictionary encoding layer achieves significant error reduction comparing with the simple average pooling. |
Tasks | Language Identification |
Published | 2018-04-02 |
URL | http://arxiv.org/abs/1804.00385v1 |
http://arxiv.org/pdf/1804.00385v1.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-learnable-dictionary-encoding-layer |
Repo | |
Framework | |
End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction
Title | End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction |
Authors | Zhong-Qiu Wang, Jonathan Le Roux, DeLiang Wang, John R. Hershey |
Abstract | This paper proposes an end-to-end approach for single-channel speaker-independent multi-speaker speech separation, where time-frequency (T-F) masking, the short-time Fourier transform (STFT), and its inverse are represented as layers within a deep network. Previous approaches, rather than computing a loss on the reconstructed signal, used a surrogate loss based on the target STFT magnitudes. This ignores reconstruction error introduced by phase inconsistency. In our approach, the loss function is directly defined on the reconstructed signals, which are optimized for best separation. In addition, we train through unfolded iterations of a phase reconstruction algorithm, represented as a series of STFT and inverse STFT layers. While mask values are typically limited to lie between zero and one for approaches using the mixture phase for reconstruction, this limitation is less relevant if the estimated magnitudes are to be used together with phase reconstruction. We thus propose several novel activation functions for the output layer of the T-F masking, to allow mask values beyond one. On the publicly-available wsj0-2mix dataset, our approach achieves state-of-the-art 12.6 dB scale-invariant signal-to-distortion ratio (SI-SDR) and 13.1 dB SDR, revealing new possibilities for deep learning based phase reconstruction and representing a fundamental progress towards solving the notoriously-hard cocktail party problem. |
Tasks | Speech Separation |
Published | 2018-04-26 |
URL | http://arxiv.org/abs/1804.10204v1 |
http://arxiv.org/pdf/1804.10204v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-speech-separation-with-unfolded |
Repo | |
Framework | |
Why not be Versatile? Applications of the SGNMT Decoder for Machine Translation
Title | Why not be Versatile? Applications of the SGNMT Decoder for Machine Translation |
Authors | Felix Stahlberg, Danielle Saunders, Gonzalo Iglesias, Bill Byrne |
Abstract | SGNMT is a decoding platform for machine translation which allows paring various modern neural models of translation with different kinds of constraints and symbolic models. In this paper, we describe three use cases in which SGNMT is currently playing an active role: (1) teaching as SGNMT is being used for course work and student theses in the MPhil in Machine Learning, Speech and Language Technology at the University of Cambridge, (2) research as most of the research work of the Cambridge MT group is based on SGNMT, and (3) technology transfer as we show how SGNMT is helping to transfer research findings from the laboratory to the industry, eg. into a product of SDL plc. |
Tasks | Machine Translation |
Published | 2018-03-20 |
URL | http://arxiv.org/abs/1803.07204v1 |
http://arxiv.org/pdf/1803.07204v1.pdf | |
PWC | https://paperswithcode.com/paper/why-not-be-versatile-applications-of-the |
Repo | |
Framework | |
Incorporating Discriminator in Sentence Generation: a Gibbs Sampling Method
Title | Incorporating Discriminator in Sentence Generation: a Gibbs Sampling Method |
Authors | Jinyue Su, Jiacheng Xu, Xipeng Qiu, Xuanjing Huang |
Abstract | Generating plausible and fluent sentence with desired properties has long been a challenge. Most of the recent works use recurrent neural networks (RNNs) and their variants to predict following words given previous sequence and target label. In this paper, we propose a novel framework to generate constrained sentences via Gibbs Sampling. The candidate sentences are revised and updated iteratively, with sampled new words replacing old ones. Our experiments show the effectiveness of the proposed method to generate plausible and diverse sentences. |
Tasks | |
Published | 2018-02-25 |
URL | http://arxiv.org/abs/1802.08970v1 |
http://arxiv.org/pdf/1802.08970v1.pdf | |
PWC | https://paperswithcode.com/paper/incorporating-discriminator-in-sentence |
Repo | |
Framework | |
Beyond Gröbner Bases: Basis Selection for Minimal Solvers
Title | Beyond Gröbner Bases: Basis Selection for Minimal Solvers |
Authors | Viktor Larsson, Magnus Oskarsson, Kalle Åström, Alge Wallis, Zuzana Kukelova, Tomas Pajdla |
Abstract | Many computer vision applications require robust estimation of the underlying geometry, in terms of camera motion and 3D structure of the scene. These robust methods often rely on running minimal solvers in a RANSAC framework. In this paper we show how we can make polynomial solvers based on the action matrix method faster, by careful selection of the monomial bases. These monomial bases have traditionally been based on a Gr"obner basis for the polynomial ideal. Here we describe how we can enumerate all such bases in an efficient way. We also show that going beyond Gr"obner bases leads to more efficient solvers in many cases. We present a novel basis sampling scheme that we evaluate on a number of problems. |
Tasks | |
Published | 2018-03-12 |
URL | http://arxiv.org/abs/1803.04360v1 |
http://arxiv.org/pdf/1803.04360v1.pdf | |
PWC | https://paperswithcode.com/paper/beyond-grobner-bases-basis-selection-for-1 |
Repo | |
Framework | |