October 17, 2019

2795 words 14 mins read

Paper Group ANR 813

Clustering of Data with Missing Entries. Improving latent variable descriptiveness with AutoGen. Knowledge-based Word Sense Disambiguation using Topic Models. Quaternion Convolutional Neural Networks for Detection and Localization of 3D Sound Events. Robust Compressive Phase Retrieval via Deep Generative Priors. End-to-End Polyphonic Sound Event De …

Clustering of Data with Missing Entries


Title	Clustering of Data with Missing Entries
Authors	Sunrita Poddar, Mathews Jacob
Abstract	The analysis of large datasets is often complicated by the presence of missing entries, mainly because most of the current machine learning algorithms are designed to work with full data. The main focus of this work is to introduce a clustering algorithm, that will provide good clustering even in the presence of missing data. The proposed technique solves an $\ell_0$ fusion penalty based optimization problem to recover the clusters. We theoretically analyze the conditions needed for the successful recovery of the clusters. We also propose an algorithm to solve a relaxation of this problem using saturating non-convex fusion penalties. The method is demonstrated on simulated and real datasets, and is observed to perform well in the presence of large fractions of missing entries.
Tasks
Published	2018-01-03
URL	http://arxiv.org/abs/1801.01455v1
PDF	http://arxiv.org/pdf/1801.01455v1.pdf
PWC	https://paperswithcode.com/paper/clustering-of-data-with-missing-entries
Repo
Framework

Improving latent variable descriptiveness with AutoGen


Title	Improving latent variable descriptiveness with AutoGen
Authors	Alex Mansbridge, Roberto Fierimonte, Ilya Feige, David Barber
Abstract	Powerful generative models, particularly in Natural Language Modelling, are commonly trained by maximizing a variational lower bound on the data log likelihood. These models often suffer from poor use of their latent variable, with ad-hoc annealing factors used to encourage retention of information in the latent variable. We discuss an alternative and general approach to latent variable modelling, based on an objective that combines the data log likelihood as well as the likelihood of a perfect reconstruction through an autoencoder. Tying these together ensures by design that the latent variable captures information about the observations, whilst retaining the ability to generate well. Interestingly, though this approach is a priori unrelated to VAEs, the lower bound attained is identical to the standard VAE bound but with the addition of a simple pre-factor; thus, providing a formal interpretation of the commonly used, ad-hoc pre-factors in training VAEs.
Tasks	Language Modelling
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04480v1
PDF	http://arxiv.org/pdf/1806.04480v1.pdf
PWC	https://paperswithcode.com/paper/improving-latent-variable-descriptiveness
Repo
Framework

Knowledge-based Word Sense Disambiguation using Topic Models


Title	Knowledge-based Word Sense Disambiguation using Topic Models
Authors	Devendra Singh Chaplot, Ruslan Salakhutdinov
Abstract	Word Sense Disambiguation is an open problem in Natural Language Processing which is particularly challenging and useful in the unsupervised setting where all the words in any given text need to be disambiguated without using any labeled data. Typically WSD systems use the sentence or a small window of words around the target word as the context for disambiguation because their computational complexity scales exponentially with the size of the context. In this paper, we leverage the formalism of topic model to design a WSD system that scales linearly with the number of words in the context. As a result, our system is able to utilize the whole document as the context for a word to be disambiguated. The proposed method is a variant of Latent Dirichlet Allocation in which the topic proportions for a document are replaced by synset proportions. We further utilize the information in the WordNet by assigning a non-uniform prior to synset distribution over words and a logistic-normal prior for document distribution over synsets. We evaluate the proposed method on Senseval-2, Senseval-3, SemEval-2007, SemEval-2013 and SemEval-2015 English All-Word WSD datasets and show that it outperforms the state-of-the-art unsupervised knowledge-based WSD system by a significant margin.
Tasks	Topic Models, Word Sense Disambiguation
Published	2018-01-05
URL	http://arxiv.org/abs/1801.01900v1
PDF	http://arxiv.org/pdf/1801.01900v1.pdf
PWC	https://paperswithcode.com/paper/knowledge-based-word-sense-disambiguation
Repo
Framework

Quaternion Convolutional Neural Networks for Detection and Localization of 3D Sound Events


Title	Quaternion Convolutional Neural Networks for Detection and Localization of 3D Sound Events
Authors	Danilo Comminiello, Marco Lella, Simone Scardapane, Aurelio Uncini
Abstract	Learning from data in the quaternion domain enables us to exploit internal dependencies of 4D signals and treating them as a single entity. One of the models that perfectly suits with quaternion-valued data processing is represented by 3D acoustic signals in their spherical harmonics decomposition. In this paper, we address the problem of localizing and detecting sound events in the spatial sound field by using quaternion-valued data processing. In particular, we consider the spherical harmonic components of the signals captured by a first-order ambisonic microphone and process them by using a quaternion convolutional neural network. Experimental results show that the proposed approach exploits the correlated nature of the ambisonic signals, thus improving accuracy results in 3D sound event detection and localization.
Tasks	Sound Event Detection
Published	2018-12-17
URL	http://arxiv.org/abs/1812.06811v1
PDF	http://arxiv.org/pdf/1812.06811v1.pdf
PWC	https://paperswithcode.com/paper/quaternion-convolutional-neural-networks-for
Repo
Framework

Robust Compressive Phase Retrieval via Deep Generative Priors


Title	Robust Compressive Phase Retrieval via Deep Generative Priors
Authors	Fahad Shamshad, Ali Ahmed
Abstract	This paper proposes a new framework to regularize the highly ill-posed and non-linear phase retrieval problem through deep generative priors using simple gradient descent algorithm. We experimentally show effectiveness of proposed algorithm for random Gaussian measurements (practically relevant in imaging through scattering media) and Fourier friendly measurements (relevant in optical set ups). We demonstrate that proposed approach achieves impressive results when compared with traditional hand engineered priors including sparsity and denoising frameworks for number of measurements and robustness against noise. Finally, we show the effectiveness of the proposed approach on a real transmission matrix dataset in an actual application of multiple scattering media imaging.
Tasks	Denoising
Published	2018-08-17
URL	http://arxiv.org/abs/1808.05854v1
PDF	http://arxiv.org/pdf/1808.05854v1.pdf
PWC	https://paperswithcode.com/paper/robust-compressive-phase-retrieval-via-deep
Repo
Framework

End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input


Title	End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input
Authors	Emre Çakır, Tuomas Virtanen
Abstract	Sound event detection systems typically consist of two stages: extracting hand-crafted features from the raw audio waveform, and learning a mapping between these features and the target sound events using a classifier. Recently, the focus of sound event detection research has been mostly shifted to the latter stage using standard features such as mel spectrogram as the input for classifiers such as deep neural networks. In this work, we utilize end-to-end approach and propose to combine these two stages in a single deep neural network classifier. The feature extraction over the raw waveform is conducted by a feedforward layer block, whose parameters are initialized to extract the time-frequency representations. The feature extraction parameters are updated during training, resulting with a representation that is optimized for the specific task. This feature extraction block is followed by (and jointly trained with) a convolutional recurrent network, which has recently given state-of-the-art results in many sound recognition tasks. The proposed system does not outperform a convolutional recurrent network with fixed hand-crafted features. The final magnitude spectrum characteristics of the feature extraction block parameters indicate that the most relevant information for the given task is contained in 0 - 3 kHz frequency range, and this is also supported by the empirical results on the SED performance.
Tasks	Sound Event Detection
Published	2018-05-09
URL	http://arxiv.org/abs/1805.03647v1
PDF	http://arxiv.org/pdf/1805.03647v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-polyphonic-sound-event-detection
Repo
Framework

Prediction of final infarct volume from native CT perfusion and treatment parameters using deep learning


Title	Prediction of final infarct volume from native CT perfusion and treatment parameters using deep learning
Authors	David Robben, Anna M. M. Boers, Henk A. Marquering, Lucianne L. C. M. Langezaal, Yvo B. W. E. M. Roos, Robert J. van Oostenbrugge, Wim H. van Zwam, Diederik W. J. Dippel, Charles B. L. M. Majoie, Aad van der Lugt, Robin Lemmens, Paul Suetens
Abstract	CT Perfusion (CTP) imaging has gained importance in the diagnosis of acute stroke. Conventional perfusion analysis performs a deconvolution of the measurements and thresholds the perfusion parameters to determine the tissue status. We pursue a data-driven and deconvolution-free approach, where a deep neural network learns to predict the final infarct volume directly from the native CTP images and metadata such as the time parameters and treatment. This would allow clinicians to simulate various treatments and gain insight into predicted tissue status over time. We demonstrate on a multicenter dataset that our approach is able to predict the final infarct and effectively uses the metadata. An ablation study shows that using the native CTP measurements instead of the deconvolved measurements improves the prediction.
Tasks
Published	2018-12-06
URL	https://arxiv.org/abs/1812.02496v2
PDF	https://arxiv.org/pdf/1812.02496v2.pdf
PWC	https://paperswithcode.com/paper/prediction-of-final-infarct-volume-from
Repo
Framework

Drift Theory in Continuous Search Spaces: Expected Hitting Time of the (1+1)-ES with 1/5 Success Rule


Title	Drift Theory in Continuous Search Spaces: Expected Hitting Time of the (1+1)-ES with 1/5 Success Rule
Authors	Youhei Akimoto, Anne Auger, Tobias Glasmachers
Abstract	This paper explores the use of the standard approach for proving runtime bounds in discrete domains—often referred to as drift analysis—in the context of optimization on a continuous domain. Using this framework we analyze the (1+1) Evolution Strategy with one-fifth success rule on the sphere function. To deal with potential functions that are not lower-bounded, we formulate novel drift theorems. We then use the theorems to prove bounds on the expected hitting time to reach a certain target fitness in finite dimension $d$. The bounds are akin to linear convergence. We then study the dependency of the different terms on $d$ proving a convergence rate dependency of $\Theta(1/d)$. Our results constitute the first non-asymptotic analysis for the algorithm considered as well as the first explicit application of drift analysis to a randomized search heuristic with continuous domain.
Tasks
Published	2018-02-09
URL	http://arxiv.org/abs/1802.03209v4
PDF	http://arxiv.org/pdf/1802.03209v4.pdf
PWC	https://paperswithcode.com/paper/drift-theory-in-continuous-search-spaces
Repo
Framework

End-to-end Networks for Supervised Single-channel Speech Separation


Title	End-to-end Networks for Supervised Single-channel Speech Separation
Authors	Shrikant Venkataramani, Paris Smaragdis
Abstract	The performance of single channel source separation algorithms has improved greatly in recent times with the development and deployment of neural networks. However, many such networks continue to operate on the magnitude spectrogram of a mixture, and produce an estimate of source magnitude spectrograms, to perform source separation. In this paper, we interpret these steps as additional neural network layers and propose an end-to-end source separation network that allows us to estimate the separated speech waveform by operating directly on the raw waveform of the mixture. Furthermore, we also propose the use of masking based end-to-end separation networks that jointly optimize the mask and the latent representations of the mixture waveforms. These networks show a significant improvement in separation performance compared to existing architectures in our experiments. To train these end-to-end models, we investigate the use of composite cost functions that are derived from objective evaluation metrics as measured on waveforms. We present subjective listening test results that demonstrate the improvement attained by using masking based end-to-end networks and also reveal insights into the performance of these cost functions for end-to-end source separation.
Tasks	Speech Separation
Published	2018-10-05
URL	http://arxiv.org/abs/1810.02568v1
PDF	http://arxiv.org/pdf/1810.02568v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-networks-for-supervised-single
Repo
Framework

SUNLayer: Stable denoising with generative networks


Title	SUNLayer: Stable denoising with generative networks
Authors	Dustin G. Mixon, Soledad Villar
Abstract	It has been experimentally established that deep neural networks can be used to produce good generative models for real world data. It has also been established that such generative models can be exploited to solve classical inverse problems like compressed sensing and super resolution. In this work we focus on the classical signal processing problem of image denoising. We propose a theoretical setting that uses spherical harmonics to identify what mathematical properties of the activation functions will allow signal denoising with local methods.
Tasks	Denoising, Image Denoising, Super-Resolution
Published	2018-03-25
URL	http://arxiv.org/abs/1803.09319v1
PDF	http://arxiv.org/pdf/1803.09319v1.pdf
PWC	https://paperswithcode.com/paper/sunlayer-stable-denoising-with-generative
Repo
Framework

A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification


Title	A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification
Authors	Weicheng Cai, Zexin Cai, Xiang Zhang, Xiaoqi Wang, Ming Li
Abstract	A novel learnable dictionary encoding layer is proposed in this paper for end-to-end language identification. It is inline with the conventional GMM i-vector approach both theoretically and practically. We imitate the mechanism of traditional GMM training and Supervector encoding procedure on the top of CNN. The proposed layer can accumulate high-order statistics from variable-length input sequence and generate an utterance level fixed-dimensional vector representation. Unlike the conventional methods, our new approach provides an end-to-end learning framework, where the inherent dictionary are learned directly from the loss function. The dictionaries and the encoding representation for the classifier are learned jointly. The representation is orderless and therefore appropriate for language identification. We conducted a preliminary experiment on NIST LRE07 closed-set task, and the results reveal that our proposed dictionary encoding layer achieves significant error reduction comparing with the simple average pooling.
Tasks	Language Identification
Published	2018-04-02
URL	http://arxiv.org/abs/1804.00385v1
PDF	http://arxiv.org/pdf/1804.00385v1.pdf
PWC	https://paperswithcode.com/paper/a-novel-learnable-dictionary-encoding-layer
Repo
Framework

End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction


Title	End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction
Authors	Zhong-Qiu Wang, Jonathan Le Roux, DeLiang Wang, John R. Hershey
Abstract	This paper proposes an end-to-end approach for single-channel speaker-independent multi-speaker speech separation, where time-frequency (T-F) masking, the short-time Fourier transform (STFT), and its inverse are represented as layers within a deep network. Previous approaches, rather than computing a loss on the reconstructed signal, used a surrogate loss based on the target STFT magnitudes. This ignores reconstruction error introduced by phase inconsistency. In our approach, the loss function is directly defined on the reconstructed signals, which are optimized for best separation. In addition, we train through unfolded iterations of a phase reconstruction algorithm, represented as a series of STFT and inverse STFT layers. While mask values are typically limited to lie between zero and one for approaches using the mixture phase for reconstruction, this limitation is less relevant if the estimated magnitudes are to be used together with phase reconstruction. We thus propose several novel activation functions for the output layer of the T-F masking, to allow mask values beyond one. On the publicly-available wsj0-2mix dataset, our approach achieves state-of-the-art 12.6 dB scale-invariant signal-to-distortion ratio (SI-SDR) and 13.1 dB SDR, revealing new possibilities for deep learning based phase reconstruction and representing a fundamental progress towards solving the notoriously-hard cocktail party problem.
Tasks	Speech Separation
Published	2018-04-26
URL	http://arxiv.org/abs/1804.10204v1
PDF	http://arxiv.org/pdf/1804.10204v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-speech-separation-with-unfolded
Repo
Framework

Why not be Versatile? Applications of the SGNMT Decoder for Machine Translation


Title	Why not be Versatile? Applications of the SGNMT Decoder for Machine Translation
Authors	Felix Stahlberg, Danielle Saunders, Gonzalo Iglesias, Bill Byrne
Abstract	SGNMT is a decoding platform for machine translation which allows paring various modern neural models of translation with different kinds of constraints and symbolic models. In this paper, we describe three use cases in which SGNMT is currently playing an active role: (1) teaching as SGNMT is being used for course work and student theses in the MPhil in Machine Learning, Speech and Language Technology at the University of Cambridge, (2) research as most of the research work of the Cambridge MT group is based on SGNMT, and (3) technology transfer as we show how SGNMT is helping to transfer research findings from the laboratory to the industry, eg. into a product of SDL plc.
Tasks	Machine Translation
Published	2018-03-20
URL	http://arxiv.org/abs/1803.07204v1
PDF	http://arxiv.org/pdf/1803.07204v1.pdf
PWC	https://paperswithcode.com/paper/why-not-be-versatile-applications-of-the
Repo
Framework

Incorporating Discriminator in Sentence Generation: a Gibbs Sampling Method


Title	Incorporating Discriminator in Sentence Generation: a Gibbs Sampling Method
Authors	Jinyue Su, Jiacheng Xu, Xipeng Qiu, Xuanjing Huang
Abstract	Generating plausible and fluent sentence with desired properties has long been a challenge. Most of the recent works use recurrent neural networks (RNNs) and their variants to predict following words given previous sequence and target label. In this paper, we propose a novel framework to generate constrained sentences via Gibbs Sampling. The candidate sentences are revised and updated iteratively, with sampled new words replacing old ones. Our experiments show the effectiveness of the proposed method to generate plausible and diverse sentences.
Tasks
Published	2018-02-25
URL	http://arxiv.org/abs/1802.08970v1
PDF	http://arxiv.org/pdf/1802.08970v1.pdf
PWC	https://paperswithcode.com/paper/incorporating-discriminator-in-sentence
Repo
Framework

Beyond Gröbner Bases: Basis Selection for Minimal Solvers


Title	Beyond Gröbner Bases: Basis Selection for Minimal Solvers
Authors	Viktor Larsson, Magnus Oskarsson, Kalle Åström, Alge Wallis, Zuzana Kukelova, Tomas Pajdla
Abstract	Many computer vision applications require robust estimation of the underlying geometry, in terms of camera motion and 3D structure of the scene. These robust methods often rely on running minimal solvers in a RANSAC framework. In this paper we show how we can make polynomial solvers based on the action matrix method faster, by careful selection of the monomial bases. These monomial bases have traditionally been based on a Gr"obner basis for the polynomial ideal. Here we describe how we can enumerate all such bases in an efficient way. We also show that going beyond Gr"obner bases leads to more efficient solvers in many cases. We present a novel basis sampling scheme that we evaluate on a number of problems.
Tasks
Published	2018-03-12
URL	http://arxiv.org/abs/1803.04360v1
PDF	http://arxiv.org/pdf/1803.04360v1.pdf
PWC	https://paperswithcode.com/paper/beyond-grobner-bases-basis-selection-for-1
Repo
Framework