January 30, 2020

3035 words 15 mins read

Paper Group ANR 415

Unsupervised Image Matching and Object Discovery as Optimization. gradSLAM: Dense SLAM meets Automatic Differentiation. Probabilistic Structure Learning for EEG/MEG Source Imaging with Hierarchical Graph Prior. Approximation of Reeb spaces with Mappers and Applications to Stochastic Filters. CNN-based Cost Volume Analysis as Confidence Measure for …

Unsupervised Image Matching and Object Discovery as Optimization


Title	Unsupervised Image Matching and Object Discovery as Optimization
Authors	Huy V. Vo, Francis Bach, Minsu Cho, Kai Han, Yann LeCun, Patrick Perez, Jean Ponce
Abstract	Learning with complete or partial supervision is powerful but relies on ever-growing human annotation efforts. As a way to mitigate this serious problem, as well as to serve specific applications, unsupervised learning has emerged as an important field of research. In computer vision, unsupervised learning comes in various guises. We focus here on the unsupervised discovery and matching of object categories among images in a collection, following the work of Cho et al. 2015. We show that the original approach can be reformulated and solved as a proper optimization problem. Experiments on several benchmarks establish the merit of our approach.
Tasks
Published	2019-04-05
URL	http://arxiv.org/abs/1904.03148v1
PDF	http://arxiv.org/pdf/1904.03148v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-image-matching-and-object
Repo
Framework

gradSLAM: Dense SLAM meets Automatic Differentiation


Title	gradSLAM: Dense SLAM meets Automatic Differentiation
Authors	Krishna Murthy Jatavallabhula, Ganesh Iyer, Liam Paull
Abstract	The question of “representation” is central in the context of dense simultaneous localization and mapping (SLAM). Newer learning-based approaches have the potential to leverage data or task performance to directly inform the choice of representation. However, learning representations for SLAM has been an open question, because traditional SLAM systems are not end-to-end differentiable. In this work, we present gradSLAM, a differentiable computational graph take on SLAM. Leveraging the automatic differentiation capabilities of computational graphs, gradSLAM enables the design of SLAM systems that allow for gradient-based learning across each of their components, or the system as a whole. This is achieved by creating differentiable alternatives for each non-differentiable component in a typical dense SLAM system. Specifically, we demonstrate how to design differentiable trust-region optimizers, surface measurement and fusion schemes, as well as differentiate over rays, without sacrificing performance. This amalgamation of dense SLAM with computational graphs enables us to backprop all the way from 3D maps to 2D pixels, opening up new possibilities in gradient-based learning for SLAM. TL;DR: We leverage the power of automatic differentiation frameworks to make dense SLAM differentiable.
Tasks	Simultaneous Localization and Mapping
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10672v1
PDF	https://arxiv.org/pdf/1910.10672v1.pdf
PWC	https://paperswithcode.com/paper/gradslam-dense-slam-meets-automatic
Repo
Framework

Probabilistic Structure Learning for EEG/MEG Source Imaging with Hierarchical Graph Prior


Title	Probabilistic Structure Learning for EEG/MEG Source Imaging with Hierarchical Graph Prior
Authors	Feng Liu, Li Wang, Yifei Lou, Rencang Li, Patrick Purdon
Abstract	Brain source imaging is an important method for noninvasively characterizing brain activity using Electroencephalogram (EEG) or Magnetoencephalography (MEG) recordings. Traditional EEG/MEG Source Imaging (ESI) methods usually assume that either source activity at different time points is unrelated, or that similar spatiotemporal patterns exist across an entire study period. The former assumption makes ESI analyses sensitive to noise, while the latter renders ESI analyses unable to account for time-varying patterns of activity. To effectively deal with noise while maintaining flexibility and continuity among brain activation patterns, we propose a novel probabilistic ESI model based on a hierarchical graph prior. Under our method, a spanning tree constraint ensures that activity patterns have spatiotemporal continuity. An efficient algorithm based on alternating convex search is presented to solve the proposed model and is provably convergent. Comprehensive numerical studies using synthetic data on a real brain model are conducted under different levels of signal-to-noise ratio (SNR) from both sensor and source spaces. We also examine the EEG/MEG data in a real application, in which our ESI reconstructions are neurologically plausible. All the results demonstrate significant improvements of the proposed algorithm over the benchmark methods in terms of source localization performance, especially at high noise levels.
Tasks	EEG
Published	2019-06-05
URL	https://arxiv.org/abs/1906.02252v1
PDF	https://arxiv.org/pdf/1906.02252v1.pdf
PWC	https://paperswithcode.com/paper/probabilistic-structure-learning-for-eegmeg
Repo
Framework

Approximation of Reeb spaces with Mappers and Applications to Stochastic Filters


Title	Approximation of Reeb spaces with Mappers and Applications to Stochastic Filters
Authors	Mathieu Carrière, Bertrand Michel
Abstract	Reeb spaces, as well as their discretized versions called Mappers, are common descriptors used in Topological Data Analysis, with plenty of applications in various fields of science, such as computational biology and data visualization, among others. The stability and quantification of the rate of convergence of the Mapper to the Reeb space has been studied a lot in recent works~\cite{Brown2019, Carriere2018a, Carriere2018, Munch2016}, focusing on the case where a scalar-valued filter is used for the computation of Mapper. On the other hand, much less is known in the multivariate case, where the domain of the filter is in $\mathbb R^d$ instead of $\mathbb R$. The only available result in this setting~\cite{Munch2016} only works for topological spaces and cannot be used as is for finite metric spaces representing data, such as point clouds and distance matrices. In this article, we present an approximation result for the Reeb space in the multivariate case using a Mapper-based estimator, which is a slight modification of the usual Mapper construction. Moreover, our approximation is stated with respect to a pseudometric that is an extension of the usual {\em interleaving distance} between persistence modules~\cite{Chazal2016}. Finally, we apply our results to the case where the filter function used to compute the Mapper is estimated from the data. We provide applications of this setting in statistics and machine learning and probability for different kinds of target filters, as well as numerical experiments that demonstrate the relevance of our approach.
Tasks	Topological Data Analysis
Published	2019-12-23
URL	https://arxiv.org/abs/1912.10742v1
PDF	https://arxiv.org/pdf/1912.10742v1.pdf
PWC	https://paperswithcode.com/paper/approximation-of-reeb-spaces-with-mappers-and
Repo
Framework

CNN-based Cost Volume Analysis as Confidence Measure for Dense Matching


Title	CNN-based Cost Volume Analysis as Confidence Measure for Dense Matching
Authors	Max Mehltretter, Christian Heipke
Abstract	Due to its capability to identify erroneous disparity assignments in dense stereo matching, confidence estimation is beneficial for a wide range of applications, e.g. autonomous driving, which needs a high degree of confidence as mandatory prerequisite. Especially, the introduction of deep learning based methods resulted in an increasing popularity of this field in recent years, caused by a significantly improved accuracy. Despite this remarkable development, most of these methods rely on features learned from disparity maps only, not taking into account the corresponding 3-dimensional cost volumes. However, it was already demonstrated that with conventional methods based on hand-crafted features this additional information can be used to further increase the accuracy. In order to combine the advantages of deep learning and cost volume based features, in this paper, we propose a novel Convolutional Neural Network (CNN) architecture to directly learn features for confidence estimation from volumetric 3D data. An extensive evaluation on three datasets using three common dense stereo matching techniques demonstrates the generality and state-of-the-art accuracy of the proposed method.
Tasks	Autonomous Driving, Stereo Matching, Stereo Matching Hand
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07287v2
PDF	https://arxiv.org/pdf/1905.07287v2.pdf
PWC	https://paperswithcode.com/paper/cnn-based-cost-volume-analysis-as-confidence
Repo
Framework

Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities


Title	Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities
Authors	Aristide Tossou, Debabrota Basu, Christos Dimitrakakis
Abstract	We study model-based reinforcement learning in an unknown finite communicating Markov decision process. We propose a simple algorithm that leverages a variance based confidence interval. We show that the proposed algorithm, UCRL-V, achieves the optimal regret $\tilde{\mathcal{O}}(\sqrt{DSAT})$ up to logarithmic factors, and so our work closes a gap with the lower bound without additional assumptions on the MDP. We perform experiments in a variety of environments that validates the theoretical bounds as well as prove UCRL-V to be better than the state-of-the-art algorithms.
Tasks
Published	2019-05-27
URL	https://arxiv.org/abs/1905.12425v2
PDF	https://arxiv.org/pdf/1905.12425v2.pdf
PWC	https://paperswithcode.com/paper/near-optimal-optimistic-reinforcement
Repo
Framework

Is Supervised Learning With Adversarial Features Provably Better Than Sole Supervision?


Title	Is Supervised Learning With Adversarial Features Provably Better Than Sole Supervision?
Authors	Litu Rout
Abstract	Generative Adversarial Networks (GAN) have shown promising results on a wide variety of complex tasks. Recent experiments show adversarial training provides useful gradients to the generator that helps attain better performance. In this paper, we intend to theoretically analyze whether supervised learning with adversarial features can outperform sole supervision, or not. First, we show that supervised learning without adversarial features suffer from vanishing gradient issue in near optimal region. Second, we analyze how adversarial learning augmented with supervised signal mitigates this vanishing gradient issue. Finally, we prove our main result that shows supervised learning with adversarial features can be better than sole supervision (under some mild assumptions). We support our main result on two fronts (i) expected empirical risk and (ii) rate of convergence.
Tasks
Published	2019-10-30
URL	https://arxiv.org/abs/1910.13993v1
PDF	https://arxiv.org/pdf/1910.13993v1.pdf
PWC	https://paperswithcode.com/paper/is-supervised-learning-with-adversarial
Repo
Framework

Predict Future Sales using Ensembled Random Forests


Title	Predict Future Sales using Ensembled Random Forests
Authors	Yuwei Zhang, Xin Wu, Chenyang Gu, Yueqi Xie
Abstract	This is a method report for the Kaggle data competition ‘Predict future sales’. In this paper, we propose a rather simple approach to future sales predicting based on feature engineering, Random Forest Regressor and ensemble learning. Its performance turned out to exceed many of the conventional methods and get final score 0.88186, representing root mean squared error. As of this writing, our model ranked 5th on the leaderboard. (till 8.5.2018)
Tasks	Feature Engineering
Published	2019-04-17
URL	http://arxiv.org/abs/1904.09031v1
PDF	http://arxiv.org/pdf/1904.09031v1.pdf
PWC	https://paperswithcode.com/paper/predict-future-sales-using-ensembled-random
Repo
Framework

Speech denoising by parametric resynthesis


Title	Speech denoising by parametric resynthesis
Authors	Soumi Maiti, Michael I Mandel
Abstract	This work proposes the use of clean speech vocoder parameters as the target for a neural network performing speech enhancement. These parameters have been designed for text-to-speech synthesis so that they both produce high-quality resyntheses and also are straightforward to model with neural networks, but have not been utilized in speech enhancement until now. In comparison to a matched text-to-speech system that is given the ground truth transcripts of the noisy speech, our model is able to produce more natural speech because it has access to the true prosody in the noisy speech. In comparison to two denoising systems, the oracle Wiener mask and a DNN-based mask predictor, our model equals the oracle Wiener mask in subjective quality and intelligibility and surpasses the realistic system. A vocoder-based upper bound shows that there is still room for improvement with this approach beyond the oracle Wiener mask. We test speaker-dependence with two speakers and show that a single model can be used for multiple speakers.
Tasks	Denoising, Speech Enhancement, Speech Synthesis, Text-To-Speech Synthesis
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01537v1
PDF	http://arxiv.org/pdf/1904.01537v1.pdf
PWC	https://paperswithcode.com/paper/speech-denoising-by-parametric-resynthesis
Repo
Framework

Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet


Title	Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet
Authors	Mingyang Zhang, Xin Wang, Fuming Fang, Haizhou Li, Junichi Yamagishi
Abstract	We investigated the training of a shared model for both text-to-speech (TTS) and voice conversion (VC) tasks. We propose using an extended model architecture of Tacotron, that is a multi-source sequence-to-sequence model with a dual attention mechanism as the shared model for both the TTS and VC tasks. This model can accomplish these two different tasks respectively according to the type of input. An end-to-end speech synthesis task is conducted when the model is given text as the input while a sequence-to-sequence voice conversion task is conducted when it is given the speech of a source speaker as the input. Waveform signals are generated by using WaveNet, which is conditioned by using a predicted mel-spectrogram. We propose jointly training a shared model as a decoder for a target speaker that supports multiple sources. Listening experiments show that our proposed multi-source encoder-decoder model can efficiently achieve both the TTS and VC tasks.
Tasks	Speech Synthesis, Voice Conversion
Published	2019-03-29
URL	http://arxiv.org/abs/1903.12389v2
PDF	http://arxiv.org/pdf/1903.12389v2.pdf
PWC	https://paperswithcode.com/paper/joint-training-framework-for-text-to-speech
Repo
Framework

Robust Learning Under Label Noise With Iterative Noise-Filtering


Title	Robust Learning Under Label Noise With Iterative Noise-Filtering
Authors	Duc Tam Nguyen, Thi-Phuong-Nhung Ngo, Zhongyu Lou, Michael Klar, Laura Beggel, Thomas Brox
Abstract	We consider the problem of training a model under the presence of label noise. Current approaches identify samples with potentially incorrect labels and reduce their influence on the learning process by either assigning lower weights to them or completely removing them from the training set. In the first case the model however still learns from noisy labels; in the latter approach, good training data can be lost. In this paper, we propose an iterative semi-supervised mechanism for robust learning which excludes noisy labels but is still able to learn from the corresponding samples. To this end, we add an unsupervised loss term that also serves as a regularizer against the remaining label noise. We evaluate our approach on common classification tasks with different noise ratios. Our robust models outperform the state-of-the-art methods by a large margin. Especially for very large noise ratios, we achieve up to 20 % absolute improvement compared to the previous best model.
Tasks
Published	2019-06-01
URL	https://arxiv.org/abs/1906.00216v1
PDF	https://arxiv.org/pdf/1906.00216v1.pdf
PWC	https://paperswithcode.com/paper/190600216
Repo
Framework

Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis


Title	Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis
Authors	Bajibabu Bollepalli, Lauri Juvela, Paavo Alku
Abstract	Recent studies have shown that text-to-speech synthesis quality can be improved by using glottal vocoding. This refers to vocoders that parameterize speech into two parts, the glottal excitation and vocal tract, that occur in the human speech production apparatus. Current glottal vocoders generate the glottal excitation waveform by using deep neural networks (DNNs). However, the squared error-based training of the present glottal excitation models is limited to generating conditional average waveforms, which fails to capture the stochastic variation of the waveforms. As a result, shaped noise is added as post-processing. In this study, we propose a new method for predicting glottal waveforms by generative adversarial networks (GANs). GANs are generative models that aim to embed the data distribution in a latent space, enabling generation of new instances very similar to the original by randomly sampling the latent distribution. The glottal pulses generated by GANs show a stochastic component similar to natural glottal pulses. In our experiments, we compare synthetic speech generated using glottal waveforms produced by both DNNs and GANs. The results show that the newly proposed GANs achieve synthesis quality comparable to that of widely-used DNNs, without using an additive noise component.
Tasks	Speech Synthesis, Text-To-Speech Synthesis
Published	2019-03-14
URL	http://arxiv.org/abs/1903.05955v1
PDF	http://arxiv.org/pdf/1903.05955v1.pdf
PWC	https://paperswithcode.com/paper/generative-adversarial-network-based-glottal
Repo
Framework

Multitask Learning Deep Neural Networks to Combine Revealed and Stated Preference Data


Title	Multitask Learning Deep Neural Networks to Combine Revealed and Stated Preference Data
Authors	Shenhao Wang, Qingyi Wang, Jinhua Zhao
Abstract	It is an enduring question how to combine revealed preference (RP) and stated preference (SP) data to analyze travel behavior. This study presents a framework of multitask learning deep neural networks (MTLDNNs) for this question, and demonstrates that MTLDNNs are more generic than the traditional nested logit (NL) method, due to its capacity of automatic feature learning and soft constraints. About 1,500 MTLDNN models are designed and applied to the survey data that was collected in Singapore and focused on the RP of four current travel modes and the SP with autonomous vehicles (AV) as the one new travel mode in addition to those in RP. We found that MTLDNNs consistently outperform six benchmark models and particularly the classical NL models by about 5% prediction accuracy in both RP and SP datasets. This performance improvement can be mainly attributed to the soft constraints specific to MTLDNNs, including its innovative architectural design and regularization methods, but not much to the generic capacity of automatic feature learning endowed by a standard feedforward DNN architecture. Besides prediction, MTLDNNs are also interpretable. The empirical results show that AV is mainly the substitute of driving and AV alternative-specific variables are more important than the socio-economic variables in determining AV adoption. Overall, this study introduces a new MTLDNN framework to combine RP and SP, and demonstrates its theoretical flexibility and empirical power for prediction and interpretation. Future studies can design new MTLDNN architectures to reflect the speciality of RP and SP and extend this work to other behavioral analysis.
Tasks	Autonomous Vehicles
Published	2019-01-02
URL	https://arxiv.org/abs/1901.00227v2
PDF	https://arxiv.org/pdf/1901.00227v2.pdf
PWC	https://paperswithcode.com/paper/multitask-learning-deep-neural-network-to
Repo
Framework

The Virtual Doctor: An Interactive Artificial Intelligence based on Deep Learning for Non-Invasive Prediction of Diabetes


Title	The Virtual Doctor: An Interactive Artificial Intelligence based on Deep Learning for Non-Invasive Prediction of Diabetes
Authors	Sebastian Spänig, Agnes Emberger-Klein, Jan-Peter Sowa, Ali Canbay, Klaus Menrad, Dominik Heider
Abstract	Artificial intelligence (AI) will pave the way to a new era in medicine. However, currently available AI systems do not interact with a patient, e.g., for anamnesis, and thus are only used by the physicians for predictions in diagnosis or prognosis. However, these systems are widely used, e.g., in diabetes or cancer prediction. In the current study, we developed an AI that is able to interact with a patient (virtual doctor) by using a speech recognition and speech synthesis system and thus can autonomously interact with the patient, which is particularly important for, e.g., rural areas, where the availability of primary medical care is strongly limited by low population densities. As a proof-of-concept, the system is able to predict type 2 diabetes mellitus (T2DM) based on non-invasive sensors and deep neural networks. Moreover, the system provides an easy-to-interpret probability estimation for T2DM for a given patient. Besides the development of the AI, we further analyzed the acceptance of young people for AI in healthcare to estimate the impact of such system in the future.
Tasks	Speech Recognition, Speech Synthesis
Published	2019-03-09
URL	http://arxiv.org/abs/1903.12069v1
PDF	http://arxiv.org/pdf/1903.12069v1.pdf
PWC	https://paperswithcode.com/paper/the-virtual-doctor-an-interactive-artificial
Repo
Framework

FacTweet: Profiling Fake News Twitter Accounts


Title	FacTweet: Profiling Fake News Twitter Accounts
Authors	Bilal Ghanem, Simone Paolo Ponzetto, Paolo Rosso
Abstract	We present an approach to detect fake news in Twitter at the account level using a neural recurrent model and a variety of different semantic and stylistic features. Our method extracts a set of features from the timelines of news Twitter accounts by reading their posts as chunks, rather than dealing with each tweet independently. We show the experimental benefits of modeling latent stylistic signatures of mixed fake and real news with a sequential model over a wide range of strong baselines.
Tasks
Published	2019-10-15
URL	https://arxiv.org/abs/1910.06592v1
PDF	https://arxiv.org/pdf/1910.06592v1.pdf
PWC	https://paperswithcode.com/paper/factweet-profiling-fake-news-twitter-accounts
Repo
Framework