July 27, 2019

3415 words 17 mins read

Paper Group ANR 606

Supervised Speech Separation Based on Deep Learning: An Overview. An Empirical Evaluation of Visual Question Answering for Novel Objects. Beyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting. Inference in Graphical Models via Semidefinite Programming Hierarchies. Comparison of Batch Normalization and Weight N …

Supervised Speech Separation Based on Deep Learning: An Overview


Title	Supervised Speech Separation Based on Deep Learning: An Overview
Authors	DeLiang Wang, Jitong Chen
Abstract	Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. This article provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. We first introduce the background of speech separation and the formulation of supervised separation. Then we discuss three main components of supervised separation: learning machines, training targets, and acoustic features. Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement (speech-nonspeech separation), speaker separation (multi-talker separation), and speech dereverberation, as well as multi-microphone techniques. The important issue of generalization, unique to supervised learning, is discussed. This overview provides a historical perspective on how advances are made. In addition, we discuss a number of conceptual issues, including what constitutes the target source.
Tasks	Speaker Separation, Speech Enhancement, Speech Separation
Published	2017-08-24
URL	http://arxiv.org/abs/1708.07524v2
PDF	http://arxiv.org/pdf/1708.07524v2.pdf
PWC	https://paperswithcode.com/paper/supervised-speech-separation-based-on-deep
Repo
Framework

An Empirical Evaluation of Visual Question Answering for Novel Objects


Title	An Empirical Evaluation of Visual Question Answering for Novel Objects
Authors	Santhosh K. Ramakrishnan, Ambar Pal, Gaurav Sharma, Anurag Mittal
Abstract	We study the problem of answering questions about images in the harder setting, where the test questions and corresponding images contain novel objects, which were not queried about in the training data. Such setting is inevitable in real world-owing to the heavy tailed distribution of the visual categories, there would be some objects which would not be annotated in the train set. We show that the performance of two popular existing methods drop significantly (up to 28%) when evaluated on novel objects cf. known objects. We propose methods which use large existing external corpora of (i) unlabeled text, i.e. books, and (ii) images tagged with classes, to achieve novel object based visual question answering. We do systematic empirical studies, for both an oracle case where the novel objects are known textually, as well as a fully automatic case without any explicit knowledge of the novel objects, but with the minimal assumption that the novel objects are semantically related to the existing objects in training. The proposed methods for novel object based visual question answering are modular and can potentially be used with many visual question answering architectures. We show consistent improvements with the two popular architectures and give qualitative analysis of the cases where the model does well and of those where it fails to bring improvements.
Tasks	Question Answering, Visual Question Answering
Published	2017-04-08
URL	http://arxiv.org/abs/1704.02516v1
PDF	http://arxiv.org/pdf/1704.02516v1.pdf
PWC	https://paperswithcode.com/paper/an-empirical-evaluation-of-visual-question
Repo
Framework

Beyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting


Title	Beyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting
Authors	Rebecca E. Morrison, Ricardo Baptista, Youssef Marzouk
Abstract	We present an algorithm to identify sparse dependence structure in continuous and non-Gaussian probability distributions, given a corresponding set of data. The conditional independence structure of an arbitrary distribution can be represented as an undirected graph (or Markov random field), but most algorithms for learning this structure are restricted to the discrete or Gaussian cases. Our new approach allows for more realistic and accurate descriptions of the distribution in question, and in turn better estimates of its sparse Markov structure. Sparsity in the graph is of interest as it can accelerate inference, improve sampling methods, and reveal important dependencies between variables. The algorithm relies on exploiting the connection between the sparsity of the graph and the sparsity of transport maps, which deterministically couple one probability measure to another.
Tasks
Published	2017-11-02
URL	http://arxiv.org/abs/1711.00950v2
PDF	http://arxiv.org/pdf/1711.00950v2.pdf
PWC	https://paperswithcode.com/paper/beyond-normality-learning-sparse
Repo
Framework

Inference in Graphical Models via Semidefinite Programming Hierarchies


Title	Inference in Graphical Models via Semidefinite Programming Hierarchies
Authors	Murat A. Erdogdu, Yash Deshpande, Andrea Montanari
Abstract	Maximum A posteriori Probability (MAP) inference in graphical models amounts to solving a graph-structured combinatorial optimization problem. Popular inference algorithms such as belief propagation (BP) and generalized belief propagation (GBP) are intimately related to linear programming (LP) relaxation within the Sherali-Adams hierarchy. Despite the popularity of these algorithms, it is well understood that the Sum-of-Squares (SOS) hierarchy based on semidefinite programming (SDP) can provide superior guarantees. Unfortunately, SOS relaxations for a graph with $n$ vertices require solving an SDP with $n^{\Theta(d)}$ variables where $d$ is the degree in the hierarchy. In practice, for $d\ge 4$, this approach does not scale beyond a few tens of variables. In this paper, we propose binary SDP relaxations for MAP inference using the SOS hierarchy with two innovations focused on computational efficiency. Firstly, in analogy to BP and its variants, we only introduce decision variables corresponding to contiguous regions in the graphical model. Secondly, we solve the resulting SDP using a non-convex Burer-Monteiro style method, and develop a sequential rounding procedure. We demonstrate that the resulting algorithm can solve problems with tens of thousands of variables within minutes, and outperforms BP and GBP on practical problems such as image denoising and Ising spin glasses. Finally, for specific graph types, we establish a sufficient condition for the tightness of the proposed partial SOS relaxation.
Tasks	Combinatorial Optimization, Denoising, Image Denoising
Published	2017-09-19
URL	http://arxiv.org/abs/1709.06525v1
PDF	http://arxiv.org/pdf/1709.06525v1.pdf
PWC	https://paperswithcode.com/paper/inference-in-graphical-models-via
Repo
Framework

Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification


Title	Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification
Authors	Igor Gitman, Boris Ginsburg
Abstract	Batch normalization (BN) has become a de facto standard for training deep convolutional networks. However, BN accounts for a significant fraction of training run-time and is difficult to accelerate, since it is a memory-bandwidth bounded operation. Such a drawback of BN motivates us to explore recently proposed weight normalization algorithms (WN algorithms), i.e. weight normalization, normalization propagation and weight normalization with translated ReLU. These algorithms don’t slow-down training iterations and were experimentally shown to outperform BN on relatively small networks and datasets. However, it is not clear if these algorithms could replace BN in practical, large-scale applications. We answer this question by providing a detailed comparison of BN and WN algorithms using ResNet-50 network trained on ImageNet. We found that although WN achieves better training accuracy, the final test accuracy is significantly lower ($\approx 6%$) than that of BN. This result demonstrates the surprising strength of the BN regularization effect which we were unable to compensate for using standard regularization techniques like dropout and weight decay. We also found that training of deep networks with WN algorithms is significantly less stable compared to BN, limiting their practical applications.
Tasks	Image Classification
Published	2017-09-24
URL	http://arxiv.org/abs/1709.08145v2
PDF	http://arxiv.org/pdf/1709.08145v2.pdf
PWC	https://paperswithcode.com/paper/comparison-of-batch-normalization-and-weight
Repo
Framework

JESC: Japanese-English Subtitle Corpus


Title	JESC: Japanese-English Subtitle Corpus
Authors	Reid Pryzant, Yongjoo Chung, Dan Jurafsky, Denny Britz
Abstract	In this paper we describe the Japanese-English Subtitle Corpus (JESC). JESC is a large Japanese-English parallel corpus covering the underrepresented domain of conversational dialogue. It consists of more than 3.2 million examples, making it the largest freely available dataset of its kind. The corpus was assembled by crawling and aligning subtitles found on the web. The assembly process incorporates a number of novel preprocessing elements to ensure high monolingual fluency and accurate bilingual alignments. We summarize its contents and evaluate its quality using human experts and baseline machine translation (MT) systems.
Tasks	Machine Translation
Published	2017-10-29
URL	http://arxiv.org/abs/1710.10639v4
PDF	http://arxiv.org/pdf/1710.10639v4.pdf
PWC	https://paperswithcode.com/paper/jesc-japanese-english-subtitle-corpus
Repo
Framework

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization


Title	Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization
Authors	Pan Xu, Jinghui Chen, Difan Zou, Quanquan Gu
Abstract	We present a unified framework to analyze the global convergence of Langevin dynamics based algorithms for nonconvex finite-sum optimization with $n$ component functions. At the core of our analysis is a direct analysis of the ergodicity of the numerical approximations to Langevin dynamics, which leads to faster convergence rates. Specifically, we show that gradient Langevin dynamics (GLD) and stochastic gradient Langevin dynamics (SGLD) converge to the almost minimizer within $\tilde O\big(nd/(\lambda\epsilon) \big)$ and $\tilde O\big(d^7/(\lambda^5\epsilon^5) \big)$ stochastic gradient evaluations respectively, where $d$ is the problem dimension, and $\lambda$ is the spectral gap of the Markov chain generated by GLD. Both of the results improve upon the best known gradient complexity results. Furthermore, for the first time we prove the global convergence guarantee for variance reduced stochastic gradient Langevin dynamics (VR-SGLD) to the almost minimizer after $\tilde O\big(\sqrt{n}d^5/(\lambda^4\epsilon^{5/2})\big)$ stochastic gradient evaluations, which outperforms the gradient complexities of GLD and SGLD in a wide regime. Our theoretical analyses shed some light on using Langevin dynamics based algorithms for nonconvex optimization with provable guarantees.
Tasks
Published	2017-07-20
URL	http://arxiv.org/abs/1707.06618v2
PDF	http://arxiv.org/pdf/1707.06618v2.pdf
PWC	https://paperswithcode.com/paper/global-convergence-of-langevin-dynamics-based
Repo
Framework

Discovery and visualization of structural biomarkers from MRI using transport-based morphometry


Title	Discovery and visualization of structural biomarkers from MRI using transport-based morphometry
Authors	Shinjini Kundu, Soheil Kolouri, Kirk I Erickson, Arthur F Kramer, Edward McAuley, Gustavo K Rohde
Abstract	Disease in the brain is often associated with subtle, spatially diffuse, or complex tissue changes that may lie beneath the level of gross visual inspection, even on magnetic resonance imaging (MRI). Unfortunately, current computer-assisted approaches that examine pre-specified features, whether anatomically-defined (i.e. thalamic volume, cortical thickness) or based on pixelwise comparison (i.e. deformation-based methods), are prone to missing a vast array of physical changes that are not well-encapsulated by these metrics. In this paper, we have developed a technique for automated pattern analysis that can fully determine the relationship between brain structure and observable phenotype without requiring any a priori features. Our technique, called transport-based morphometry (TBM), is an image transformation that maps brain images losslessly to a domain where they become much more separable. The new approach is validated on structural brain images of healthy older adult subjects where even linear models for discrimination, regression, and blind source separation enable TBM to independently discover the characteristic changes of aging and highlight potential mechanisms by which aerobic fitness may mediate brain health later in life. TBM is a generative approach that can provide visualization of physically meaningful shifts in tissue distribution through inverse transformation. The proposed framework is a powerful technique that can potentially elucidate genotype-structural-behavioral associations in myriad diseases.
Tasks
Published	2017-05-14
URL	http://arxiv.org/abs/1705.04919v1
PDF	http://arxiv.org/pdf/1705.04919v1.pdf
PWC	https://paperswithcode.com/paper/discovery-and-visualization-of-structural
Repo
Framework

Rate of Change Analysis for Interestingness Measures


Title	Rate of Change Analysis for Interestingness Measures
Authors	Nandan Sudarsanam, Nishanth Kumar, Abhishek Sharma, Balaraman Ravindran
Abstract	The use of Association Rule Mining techniques in diverse contexts and domains has resulted in the creation of numerous interestingness measures. This, in turn, has motivated researchers to come up with various classification schemes for these measures. One popular approach to classify the objective measures is to assess the set of mathematical properties they satisfy in order to help practitioners select the right measure for a given problem. In this research, we discuss the insufficiency of the existing properties in literature to capture certain behaviors of interestingness measures. This motivates us to present a novel approach to analyze and classify measures. We refer to this as a rate of change analysis (RCA). In this analysis a measure is described by how it varies if there is a unit change in the frequency count $(f_{11},f_{10},f_{01},f_{00})$, for different pre-existing states of the frequency counts. More formally, we look at the first partial derivative of the measure with respect to the various frequency count variables. We then use this analysis to define two new properties, Unit-Null Asymptotic Invariance (UNAI) and Unit-Null Zero Rate (UNZR). UNAI looks at the asymptotic effect of adding frequency patterns, while UNZR looks at the initial effect of adding frequency patterns when they do not pre-exist in the dataset. We present a comprehensive analysis of 50 interestingness measures and classify them in accordance with the two properties. We also present empirical studies, involving both synthetic and real-world datasets, which are used to cluster various measures according to the rule ranking patterns of the measures. The study concludes with the observation that classification of measures using the empirical clusters share significant similarities to the classification of measures done through the properties presented in this research.
Tasks
Published	2017-12-14
URL	http://arxiv.org/abs/1712.05193v1
PDF	http://arxiv.org/pdf/1712.05193v1.pdf
PWC	https://paperswithcode.com/paper/rate-of-change-analysis-for-interestingness
Repo
Framework

Output Range Analysis for Deep Neural Networks


Title	Output Range Analysis for Deep Neural Networks
Authors	Souradeep Dutta, Susmit Jha, Sriram Sanakaranarayanan, Ashish Tiwari
Abstract	Deep neural networks (NN) are extensively used for machine learning tasks such as image classification, perception and control of autonomous systems. Increasingly, these deep NNs are also been deployed in high-assurance applications. Thus, there is a pressing need for developing techniques to verify neural networks to check whether certain user-expected properties are satisfied. In this paper, we study a specific verification problem of computing a guaranteed range for the output of a deep neural network given a set of inputs represented as a convex polyhedron. Range estimation is a key primitive for verifying deep NNs. We present an efficient range estimation algorithm that uses a combination of local search and linear programming problems to efficiently find the maximum and minimum values taken by the outputs of the NN over the given input set. In contrast to recently proposed “monolithic” optimization approaches, we use local gradient descent to repeatedly find and eliminate local minima of the function. The final global optimum is certified using a mixed integer programming instance. We implement our approach and compare it with Reluplex, a recently proposed solver for deep neural networks. We demonstrate the effectiveness of the proposed approach for verification of NNs used in automated control as well as those used in classification.
Tasks	Image Classification
Published	2017-09-26
URL	http://arxiv.org/abs/1709.09130v1
PDF	http://arxiv.org/pdf/1709.09130v1.pdf
PWC	https://paperswithcode.com/paper/output-range-analysis-for-deep-neural
Repo
Framework

Improving Negative Sampling for Word Representation using Self-embedded Features


Title	Improving Negative Sampling for Word Representation using Self-embedded Features
Authors	Long Chen, Fajie Yuan, Joemon M. Jose, Weinan Zhang
Abstract	Although the word-popularity based negative sampler has shown superb performance in the skip-gram model, the theoretical motivation behind oversampling popular (non-observed) words as negative samples is still not well understood. In this paper, we start from an investigation of the gradient vanishing issue in the skipgram model without a proper negative sampler. By performing an insightful analysis from the stochastic gradient descent (SGD) learning perspective, we demonstrate that, both theoretically and intuitively, negative samples with larger inner product scores are more informative than those with lower scores for the SGD learner in terms of both convergence rate and accuracy. Understanding this, we propose an alternative sampling algorithm that dynamically selects informative negative samples during each SGD update. More importantly, the proposed sampler accounts for multi-dimensional self-embedded features during the sampling process, which essentially makes it more effective than the original popularity-based (one-dimensional) sampler. Empirical experiments further verify our observations, and show that our fine-grained samplers gain significant improvement over the existing ones without increasing computational complexity.
Tasks
Published	2017-10-26
URL	http://arxiv.org/abs/1710.09805v3
PDF	http://arxiv.org/pdf/1710.09805v3.pdf
PWC	https://paperswithcode.com/paper/improving-negative-sampling-for-word
Repo
Framework

Learning-based Surgical Workflow Detection from Intra-Operative Signals


Title	Learning-based Surgical Workflow Detection from Intra-Operative Signals
Authors	Ralf Stauder, Ergün Kayis, Nassir Navab
Abstract	A modern operating room (OR) provides a plethora of advanced medical devices. In order to better facilitate the information offered by them, they need to automatically react to the intra-operative context. To this end, the progress of the surgical workflow must be detected and interpreted, so that the current status can be given in machine-readable form. In this work, Random Forests (RF) and Hidden Markov Models (HMM) are compared and combined to detect the surgical workflow phase of a laparoscopic cholecystectomy. Various combinations of data were tested, from using only raw sensor data to filtered and augmented datasets. Achieved accuracies ranged from 64% to 72% for the RF approach, and from 80% to 82% for the combination of RF and HMM.
Tasks
Published	2017-06-02
URL	http://arxiv.org/abs/1706.00587v1
PDF	http://arxiv.org/pdf/1706.00587v1.pdf
PWC	https://paperswithcode.com/paper/learning-based-surgical-workflow-detection
Repo
Framework

Towards Deep Modeling of Music Semantics using EEG Regularizers


Title	Towards Deep Modeling of Music Semantics using EEG Regularizers
Authors	Francisco Raposo, David Martins de Matos, Ricardo Ribeiro, Suhua Tang, Yi Yu
Abstract	Modeling of music audio semantics has been previously tackled through learning of mappings from audio data to high-level tags or latent unsupervised spaces. The resulting semantic spaces are theoretically limited, either because the chosen high-level tags do not cover all of music semantics or because audio data itself is not enough to determine music semantics. In this paper, we propose a generic framework for semantics modeling that focuses on the perception of the listener, through EEG data, in addition to audio data. We implement this framework using a novel end-to-end 2-view Neural Network (NN) architecture and a Deep Canonical Correlation Analysis (DCCA) loss function that forces the semantic embedding spaces of both views to be maximally correlated. We also detail how the EEG dataset was collected and use it to train our proposed model. We evaluate the learned semantic space in a transfer learning context, by using it as an audio feature extractor in an independent dataset and proxy task: music audio-lyrics cross-modal retrieval. We show that our embedding model outperforms Spotify features and performs comparably to a state-of-the-art embedding model that was trained on 700 times more data. We further discuss improvements to the model that are likely to improve its performance.
Tasks	Cross-Modal Retrieval, EEG, Transfer Learning
Published	2017-12-14
URL	http://arxiv.org/abs/1712.05197v2
PDF	http://arxiv.org/pdf/1712.05197v2.pdf
PWC	https://paperswithcode.com/paper/towards-deep-modeling-of-music-semantics
Repo
Framework

Neural Affine Grayscale Image Denoising


Title	Neural Affine Grayscale Image Denoising
Authors	Sungmin Cha, Taesup Moon
Abstract	We propose a new grayscale image denoiser, dubbed as Neural Affine Image Denoiser (Neural AIDE), which utilizes neural network in a novel way. Unlike other neural network based image denoising methods, which typically apply simple supervised learning to learn a mapping from a noisy patch to a clean patch, we formulate to train a neural network to learn an \emph{affine} mapping that gets applied to a noisy pixel, based on its context. Our formulation enables both supervised training of the network from the labeled training dataset and adaptive fine-tuning of the network parameters using the given noisy image subject to denoising. The key tool for devising Neural AIDE is to devise an estimated loss function of the MSE of the affine mapping, solely based on the noisy data. As a result, our algorithm can outperform most of the recent state-of-the-art methods in the standard benchmark datasets. Moreover, our fine-tuning method can nicely overcome one of the drawbacks of the patch-level supervised learning methods in image denoising; namely, a supervised trained model with a mismatched noise variance can be mostly corrected as long as we have the matched noise variance during the fine-tuning step.
Tasks	Denoising, Image Denoising
Published	2017-09-17
URL	http://arxiv.org/abs/1709.05672v1
PDF	http://arxiv.org/pdf/1709.05672v1.pdf
PWC	https://paperswithcode.com/paper/neural-affine-grayscale-image-denoising
Repo
Framework

Une véritable approche $\ell_0$ pour l’apprentissage de dictionnaire


Title	Une véritable approche $\ell_0$ pour l’apprentissage de dictionnaire
Authors	Yuan Liu, Stéphane Canu, Paul Honeine, Su Ruan
Abstract	Sparse representation learning has recently gained a great success in signal and image processing, thanks to recent advances in dictionary learning. To this end, the $\ell_0$-norm is often used to control the sparsity level. Nevertheless, optimization problems based on the $\ell_0$-norm are non-convex and NP-hard. For these reasons, relaxation techniques have been attracting much attention of researchers, by priorly targeting approximation solutions (e.g. $\ell_1$-norm, pursuit strategies). On the contrary, this paper considers the exact $\ell_0$-norm optimization problem and proves that it can be solved effectively, despite of its complexity. The proposed method reformulates the problem as a Mixed-Integer Quadratic Program (MIQP) and gets the global optimal solution by applying existing optimization software. Because the main difficulty of this approach is its computational time, two techniques are introduced that improve the computational speed. Finally, our method is applied to image denoising which shows its feasibility and relevance compared to the state-of-the-art.
Tasks	Denoising, Dictionary Learning, Image Denoising, Representation Learning
Published	2017-09-12
URL	http://arxiv.org/abs/1709.05937v1
PDF	http://arxiv.org/pdf/1709.05937v1.pdf
PWC	https://paperswithcode.com/paper/une-veritable-approche-ell_0-pour
Repo
Framework