Paper Group ANR 606
Supervised Speech Separation Based on Deep Learning: An Overview. An Empirical Evaluation of Visual Question Answering for Novel Objects. Beyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting. Inference in Graphical Models via Semidefinite Programming Hierarchies. Comparison of Batch Normalization and Weight N …
Supervised Speech Separation Based on Deep Learning: An Overview
Title | Supervised Speech Separation Based on Deep Learning: An Overview |
Authors | DeLiang Wang, Jitong Chen |
Abstract | Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. This article provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. We first introduce the background of speech separation and the formulation of supervised separation. Then we discuss three main components of supervised separation: learning machines, training targets, and acoustic features. Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement (speech-nonspeech separation), speaker separation (multi-talker separation), and speech dereverberation, as well as multi-microphone techniques. The important issue of generalization, unique to supervised learning, is discussed. This overview provides a historical perspective on how advances are made. In addition, we discuss a number of conceptual issues, including what constitutes the target source. |
Tasks | Speaker Separation, Speech Enhancement, Speech Separation |
Published | 2017-08-24 |
URL | http://arxiv.org/abs/1708.07524v2 |
http://arxiv.org/pdf/1708.07524v2.pdf | |
PWC | https://paperswithcode.com/paper/supervised-speech-separation-based-on-deep |
Repo | |
Framework | |
An Empirical Evaluation of Visual Question Answering for Novel Objects
Title | An Empirical Evaluation of Visual Question Answering for Novel Objects |
Authors | Santhosh K. Ramakrishnan, Ambar Pal, Gaurav Sharma, Anurag Mittal |
Abstract | We study the problem of answering questions about images in the harder setting, where the test questions and corresponding images contain novel objects, which were not queried about in the training data. Such setting is inevitable in real world-owing to the heavy tailed distribution of the visual categories, there would be some objects which would not be annotated in the train set. We show that the performance of two popular existing methods drop significantly (up to 28%) when evaluated on novel objects cf. known objects. We propose methods which use large existing external corpora of (i) unlabeled text, i.e. books, and (ii) images tagged with classes, to achieve novel object based visual question answering. We do systematic empirical studies, for both an oracle case where the novel objects are known textually, as well as a fully automatic case without any explicit knowledge of the novel objects, but with the minimal assumption that the novel objects are semantically related to the existing objects in training. The proposed methods for novel object based visual question answering are modular and can potentially be used with many visual question answering architectures. We show consistent improvements with the two popular architectures and give qualitative analysis of the cases where the model does well and of those where it fails to bring improvements. |
Tasks | Question Answering, Visual Question Answering |
Published | 2017-04-08 |
URL | http://arxiv.org/abs/1704.02516v1 |
http://arxiv.org/pdf/1704.02516v1.pdf | |
PWC | https://paperswithcode.com/paper/an-empirical-evaluation-of-visual-question |
Repo | |
Framework | |
Beyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting
Title | Beyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting |
Authors | Rebecca E. Morrison, Ricardo Baptista, Youssef Marzouk |
Abstract | We present an algorithm to identify sparse dependence structure in continuous and non-Gaussian probability distributions, given a corresponding set of data. The conditional independence structure of an arbitrary distribution can be represented as an undirected graph (or Markov random field), but most algorithms for learning this structure are restricted to the discrete or Gaussian cases. Our new approach allows for more realistic and accurate descriptions of the distribution in question, and in turn better estimates of its sparse Markov structure. Sparsity in the graph is of interest as it can accelerate inference, improve sampling methods, and reveal important dependencies between variables. The algorithm relies on exploiting the connection between the sparsity of the graph and the sparsity of transport maps, which deterministically couple one probability measure to another. |
Tasks | |
Published | 2017-11-02 |
URL | http://arxiv.org/abs/1711.00950v2 |
http://arxiv.org/pdf/1711.00950v2.pdf | |
PWC | https://paperswithcode.com/paper/beyond-normality-learning-sparse |
Repo | |
Framework | |
Inference in Graphical Models via Semidefinite Programming Hierarchies
Title | Inference in Graphical Models via Semidefinite Programming Hierarchies |
Authors | Murat A. Erdogdu, Yash Deshpande, Andrea Montanari |
Abstract | Maximum A posteriori Probability (MAP) inference in graphical models amounts to solving a graph-structured combinatorial optimization problem. Popular inference algorithms such as belief propagation (BP) and generalized belief propagation (GBP) are intimately related to linear programming (LP) relaxation within the Sherali-Adams hierarchy. Despite the popularity of these algorithms, it is well understood that the Sum-of-Squares (SOS) hierarchy based on semidefinite programming (SDP) can provide superior guarantees. Unfortunately, SOS relaxations for a graph with $n$ vertices require solving an SDP with $n^{\Theta(d)}$ variables where $d$ is the degree in the hierarchy. In practice, for $d\ge 4$, this approach does not scale beyond a few tens of variables. In this paper, we propose binary SDP relaxations for MAP inference using the SOS hierarchy with two innovations focused on computational efficiency. Firstly, in analogy to BP and its variants, we only introduce decision variables corresponding to contiguous regions in the graphical model. Secondly, we solve the resulting SDP using a non-convex Burer-Monteiro style method, and develop a sequential rounding procedure. We demonstrate that the resulting algorithm can solve problems with tens of thousands of variables within minutes, and outperforms BP and GBP on practical problems such as image denoising and Ising spin glasses. Finally, for specific graph types, we establish a sufficient condition for the tightness of the proposed partial SOS relaxation. |
Tasks | Combinatorial Optimization, Denoising, Image Denoising |
Published | 2017-09-19 |
URL | http://arxiv.org/abs/1709.06525v1 |
http://arxiv.org/pdf/1709.06525v1.pdf | |
PWC | https://paperswithcode.com/paper/inference-in-graphical-models-via |
Repo | |
Framework | |
Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification
Title | Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification |
Authors | Igor Gitman, Boris Ginsburg |
Abstract | Batch normalization (BN) has become a de facto standard for training deep convolutional networks. However, BN accounts for a significant fraction of training run-time and is difficult to accelerate, since it is a memory-bandwidth bounded operation. Such a drawback of BN motivates us to explore recently proposed weight normalization algorithms (WN algorithms), i.e. weight normalization, normalization propagation and weight normalization with translated ReLU. These algorithms don’t slow-down training iterations and were experimentally shown to outperform BN on relatively small networks and datasets. However, it is not clear if these algorithms could replace BN in practical, large-scale applications. We answer this question by providing a detailed comparison of BN and WN algorithms using ResNet-50 network trained on ImageNet. We found that although WN achieves better training accuracy, the final test accuracy is significantly lower ($\approx 6%$) than that of BN. This result demonstrates the surprising strength of the BN regularization effect which we were unable to compensate for using standard regularization techniques like dropout and weight decay. We also found that training of deep networks with WN algorithms is significantly less stable compared to BN, limiting their practical applications. |
Tasks | Image Classification |
Published | 2017-09-24 |
URL | http://arxiv.org/abs/1709.08145v2 |
http://arxiv.org/pdf/1709.08145v2.pdf | |
PWC | https://paperswithcode.com/paper/comparison-of-batch-normalization-and-weight |
Repo | |
Framework | |
JESC: Japanese-English Subtitle Corpus
Title | JESC: Japanese-English Subtitle Corpus |
Authors | Reid Pryzant, Yongjoo Chung, Dan Jurafsky, Denny Britz |
Abstract | In this paper we describe the Japanese-English Subtitle Corpus (JESC). JESC is a large Japanese-English parallel corpus covering the underrepresented domain of conversational dialogue. It consists of more than 3.2 million examples, making it the largest freely available dataset of its kind. The corpus was assembled by crawling and aligning subtitles found on the web. The assembly process incorporates a number of novel preprocessing elements to ensure high monolingual fluency and accurate bilingual alignments. We summarize its contents and evaluate its quality using human experts and baseline machine translation (MT) systems. |
Tasks | Machine Translation |
Published | 2017-10-29 |
URL | http://arxiv.org/abs/1710.10639v4 |
http://arxiv.org/pdf/1710.10639v4.pdf | |
PWC | https://paperswithcode.com/paper/jesc-japanese-english-subtitle-corpus |
Repo | |
Framework | |
Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization
Title | Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization |
Authors | Pan Xu, Jinghui Chen, Difan Zou, Quanquan Gu |
Abstract | We present a unified framework to analyze the global convergence of Langevin dynamics based algorithms for nonconvex finite-sum optimization with $n$ component functions. At the core of our analysis is a direct analysis of the ergodicity of the numerical approximations to Langevin dynamics, which leads to faster convergence rates. Specifically, we show that gradient Langevin dynamics (GLD) and stochastic gradient Langevin dynamics (SGLD) converge to the almost minimizer within $\tilde O\big(nd/(\lambda\epsilon) \big)$ and $\tilde O\big(d^7/(\lambda^5\epsilon^5) \big)$ stochastic gradient evaluations respectively, where $d$ is the problem dimension, and $\lambda$ is the spectral gap of the Markov chain generated by GLD. Both of the results improve upon the best known gradient complexity results. Furthermore, for the first time we prove the global convergence guarantee for variance reduced stochastic gradient Langevin dynamics (VR-SGLD) to the almost minimizer after $\tilde O\big(\sqrt{n}d^5/(\lambda^4\epsilon^{5/2})\big)$ stochastic gradient evaluations, which outperforms the gradient complexities of GLD and SGLD in a wide regime. Our theoretical analyses shed some light on using Langevin dynamics based algorithms for nonconvex optimization with provable guarantees. |
Tasks | |
Published | 2017-07-20 |
URL | http://arxiv.org/abs/1707.06618v2 |
http://arxiv.org/pdf/1707.06618v2.pdf | |
PWC | https://paperswithcode.com/paper/global-convergence-of-langevin-dynamics-based |
Repo | |
Framework | |
Discovery and visualization of structural biomarkers from MRI using transport-based morphometry
Title | Discovery and visualization of structural biomarkers from MRI using transport-based morphometry |
Authors | Shinjini Kundu, Soheil Kolouri, Kirk I Erickson, Arthur F Kramer, Edward McAuley, Gustavo K Rohde |
Abstract | Disease in the brain is often associated with subtle, spatially diffuse, or complex tissue changes that may lie beneath the level of gross visual inspection, even on magnetic resonance imaging (MRI). Unfortunately, current computer-assisted approaches that examine pre-specified features, whether anatomically-defined (i.e. thalamic volume, cortical thickness) or based on pixelwise comparison (i.e. deformation-based methods), are prone to missing a vast array of physical changes that are not well-encapsulated by these metrics. In this paper, we have developed a technique for automated pattern analysis that can fully determine the relationship between brain structure and observable phenotype without requiring any a priori features. Our technique, called transport-based morphometry (TBM), is an image transformation that maps brain images losslessly to a domain where they become much more separable. The new approach is validated on structural brain images of healthy older adult subjects where even linear models for discrimination, regression, and blind source separation enable TBM to independently discover the characteristic changes of aging and highlight potential mechanisms by which aerobic fitness may mediate brain health later in life. TBM is a generative approach that can provide visualization of physically meaningful shifts in tissue distribution through inverse transformation. The proposed framework is a powerful technique that can potentially elucidate genotype-structural-behavioral associations in myriad diseases. |
Tasks | |
Published | 2017-05-14 |
URL | http://arxiv.org/abs/1705.04919v1 |
http://arxiv.org/pdf/1705.04919v1.pdf | |
PWC | https://paperswithcode.com/paper/discovery-and-visualization-of-structural |
Repo | |
Framework | |
Rate of Change Analysis for Interestingness Measures
Title | Rate of Change Analysis for Interestingness Measures |
Authors | Nandan Sudarsanam, Nishanth Kumar, Abhishek Sharma, Balaraman Ravindran |
Abstract | The use of Association Rule Mining techniques in diverse contexts and domains has resulted in the creation of numerous interestingness measures. This, in turn, has motivated researchers to come up with various classification schemes for these measures. One popular approach to classify the objective measures is to assess the set of mathematical properties they satisfy in order to help practitioners select the right measure for a given problem. In this research, we discuss the insufficiency of the existing properties in literature to capture certain behaviors of interestingness measures. This motivates us to present a novel approach to analyze and classify measures. We refer to this as a rate of change analysis (RCA). In this analysis a measure is described by how it varies if there is a unit change in the frequency count $(f_{11},f_{10},f_{01},f_{00})$, for different pre-existing states of the frequency counts. More formally, we look at the first partial derivative of the measure with respect to the various frequency count variables. We then use this analysis to define two new properties, Unit-Null Asymptotic Invariance (UNAI) and Unit-Null Zero Rate (UNZR). UNAI looks at the asymptotic effect of adding frequency patterns, while UNZR looks at the initial effect of adding frequency patterns when they do not pre-exist in the dataset. We present a comprehensive analysis of 50 interestingness measures and classify them in accordance with the two properties. We also present empirical studies, involving both synthetic and real-world datasets, which are used to cluster various measures according to the rule ranking patterns of the measures. The study concludes with the observation that classification of measures using the empirical clusters share significant similarities to the classification of measures done through the properties presented in this research. |
Tasks | |
Published | 2017-12-14 |
URL | http://arxiv.org/abs/1712.05193v1 |
http://arxiv.org/pdf/1712.05193v1.pdf | |
PWC | https://paperswithcode.com/paper/rate-of-change-analysis-for-interestingness |
Repo | |
Framework | |
Output Range Analysis for Deep Neural Networks
Title | Output Range Analysis for Deep Neural Networks |
Authors | Souradeep Dutta, Susmit Jha, Sriram Sanakaranarayanan, Ashish Tiwari |
Abstract | Deep neural networks (NN) are extensively used for machine learning tasks such as image classification, perception and control of autonomous systems. Increasingly, these deep NNs are also been deployed in high-assurance applications. Thus, there is a pressing need for developing techniques to verify neural networks to check whether certain user-expected properties are satisfied. In this paper, we study a specific verification problem of computing a guaranteed range for the output of a deep neural network given a set of inputs represented as a convex polyhedron. Range estimation is a key primitive for verifying deep NNs. We present an efficient range estimation algorithm that uses a combination of local search and linear programming problems to efficiently find the maximum and minimum values taken by the outputs of the NN over the given input set. In contrast to recently proposed “monolithic” optimization approaches, we use local gradient descent to repeatedly find and eliminate local minima of the function. The final global optimum is certified using a mixed integer programming instance. We implement our approach and compare it with Reluplex, a recently proposed solver for deep neural networks. We demonstrate the effectiveness of the proposed approach for verification of NNs used in automated control as well as those used in classification. |
Tasks | Image Classification |
Published | 2017-09-26 |
URL | http://arxiv.org/abs/1709.09130v1 |
http://arxiv.org/pdf/1709.09130v1.pdf | |
PWC | https://paperswithcode.com/paper/output-range-analysis-for-deep-neural |
Repo | |
Framework | |
Improving Negative Sampling for Word Representation using Self-embedded Features
Title | Improving Negative Sampling for Word Representation using Self-embedded Features |
Authors | Long Chen, Fajie Yuan, Joemon M. Jose, Weinan Zhang |
Abstract | Although the word-popularity based negative sampler has shown superb performance in the skip-gram model, the theoretical motivation behind oversampling popular (non-observed) words as negative samples is still not well understood. In this paper, we start from an investigation of the gradient vanishing issue in the skipgram model without a proper negative sampler. By performing an insightful analysis from the stochastic gradient descent (SGD) learning perspective, we demonstrate that, both theoretically and intuitively, negative samples with larger inner product scores are more informative than those with lower scores for the SGD learner in terms of both convergence rate and accuracy. Understanding this, we propose an alternative sampling algorithm that dynamically selects informative negative samples during each SGD update. More importantly, the proposed sampler accounts for multi-dimensional self-embedded features during the sampling process, which essentially makes it more effective than the original popularity-based (one-dimensional) sampler. Empirical experiments further verify our observations, and show that our fine-grained samplers gain significant improvement over the existing ones without increasing computational complexity. |
Tasks | |
Published | 2017-10-26 |
URL | http://arxiv.org/abs/1710.09805v3 |
http://arxiv.org/pdf/1710.09805v3.pdf | |
PWC | https://paperswithcode.com/paper/improving-negative-sampling-for-word |
Repo | |
Framework | |
Learning-based Surgical Workflow Detection from Intra-Operative Signals
Title | Learning-based Surgical Workflow Detection from Intra-Operative Signals |
Authors | Ralf Stauder, Ergün Kayis, Nassir Navab |
Abstract | A modern operating room (OR) provides a plethora of advanced medical devices. In order to better facilitate the information offered by them, they need to automatically react to the intra-operative context. To this end, the progress of the surgical workflow must be detected and interpreted, so that the current status can be given in machine-readable form. In this work, Random Forests (RF) and Hidden Markov Models (HMM) are compared and combined to detect the surgical workflow phase of a laparoscopic cholecystectomy. Various combinations of data were tested, from using only raw sensor data to filtered and augmented datasets. Achieved accuracies ranged from 64% to 72% for the RF approach, and from 80% to 82% for the combination of RF and HMM. |
Tasks | |
Published | 2017-06-02 |
URL | http://arxiv.org/abs/1706.00587v1 |
http://arxiv.org/pdf/1706.00587v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-based-surgical-workflow-detection |
Repo | |
Framework | |
Towards Deep Modeling of Music Semantics using EEG Regularizers
Title | Towards Deep Modeling of Music Semantics using EEG Regularizers |
Authors | Francisco Raposo, David Martins de Matos, Ricardo Ribeiro, Suhua Tang, Yi Yu |
Abstract | Modeling of music audio semantics has been previously tackled through learning of mappings from audio data to high-level tags or latent unsupervised spaces. The resulting semantic spaces are theoretically limited, either because the chosen high-level tags do not cover all of music semantics or because audio data itself is not enough to determine music semantics. In this paper, we propose a generic framework for semantics modeling that focuses on the perception of the listener, through EEG data, in addition to audio data. We implement this framework using a novel end-to-end 2-view Neural Network (NN) architecture and a Deep Canonical Correlation Analysis (DCCA) loss function that forces the semantic embedding spaces of both views to be maximally correlated. We also detail how the EEG dataset was collected and use it to train our proposed model. We evaluate the learned semantic space in a transfer learning context, by using it as an audio feature extractor in an independent dataset and proxy task: music audio-lyrics cross-modal retrieval. We show that our embedding model outperforms Spotify features and performs comparably to a state-of-the-art embedding model that was trained on 700 times more data. We further discuss improvements to the model that are likely to improve its performance. |
Tasks | Cross-Modal Retrieval, EEG, Transfer Learning |
Published | 2017-12-14 |
URL | http://arxiv.org/abs/1712.05197v2 |
http://arxiv.org/pdf/1712.05197v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-deep-modeling-of-music-semantics |
Repo | |
Framework | |
Neural Affine Grayscale Image Denoising
Title | Neural Affine Grayscale Image Denoising |
Authors | Sungmin Cha, Taesup Moon |
Abstract | We propose a new grayscale image denoiser, dubbed as Neural Affine Image Denoiser (Neural AIDE), which utilizes neural network in a novel way. Unlike other neural network based image denoising methods, which typically apply simple supervised learning to learn a mapping from a noisy patch to a clean patch, we formulate to train a neural network to learn an \emph{affine} mapping that gets applied to a noisy pixel, based on its context. Our formulation enables both supervised training of the network from the labeled training dataset and adaptive fine-tuning of the network parameters using the given noisy image subject to denoising. The key tool for devising Neural AIDE is to devise an estimated loss function of the MSE of the affine mapping, solely based on the noisy data. As a result, our algorithm can outperform most of the recent state-of-the-art methods in the standard benchmark datasets. Moreover, our fine-tuning method can nicely overcome one of the drawbacks of the patch-level supervised learning methods in image denoising; namely, a supervised trained model with a mismatched noise variance can be mostly corrected as long as we have the matched noise variance during the fine-tuning step. |
Tasks | Denoising, Image Denoising |
Published | 2017-09-17 |
URL | http://arxiv.org/abs/1709.05672v1 |
http://arxiv.org/pdf/1709.05672v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-affine-grayscale-image-denoising |
Repo | |
Framework | |
Une véritable approche $\ell_0$ pour l’apprentissage de dictionnaire
Title | Une véritable approche $\ell_0$ pour l’apprentissage de dictionnaire |
Authors | Yuan Liu, Stéphane Canu, Paul Honeine, Su Ruan |
Abstract | Sparse representation learning has recently gained a great success in signal and image processing, thanks to recent advances in dictionary learning. To this end, the $\ell_0$-norm is often used to control the sparsity level. Nevertheless, optimization problems based on the $\ell_0$-norm are non-convex and NP-hard. For these reasons, relaxation techniques have been attracting much attention of researchers, by priorly targeting approximation solutions (e.g. $\ell_1$-norm, pursuit strategies). On the contrary, this paper considers the exact $\ell_0$-norm optimization problem and proves that it can be solved effectively, despite of its complexity. The proposed method reformulates the problem as a Mixed-Integer Quadratic Program (MIQP) and gets the global optimal solution by applying existing optimization software. Because the main difficulty of this approach is its computational time, two techniques are introduced that improve the computational speed. Finally, our method is applied to image denoising which shows its feasibility and relevance compared to the state-of-the-art. |
Tasks | Denoising, Dictionary Learning, Image Denoising, Representation Learning |
Published | 2017-09-12 |
URL | http://arxiv.org/abs/1709.05937v1 |
http://arxiv.org/pdf/1709.05937v1.pdf | |
PWC | https://paperswithcode.com/paper/une-veritable-approche-ell_0-pour |
Repo | |
Framework | |