October 18, 2019

2977 words 14 mins read

Paper Group ANR 645

Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention. Image Restoration by Estimating Frequency Distribution of Local Patches. LIME: Live Intrinsic Material Estimation. Online Continuous Submodular Maximization. Reasoning about exceptions in ontologies: from the lexicographic closure to the skeptica …

Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention


Title	Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention
Authors	Bajibabu Bollepalli, Lauri Juvela, Paavo Alku
Abstract	Currently, there are increasing interests in text-to-speech (TTS) synthesis to use sequence-to-sequence models with attention. These models are end-to-end meaning that they learn both co-articulation and duration properties directly from text and speech. Since these models are entirely data-driven, they need large amounts of data to generate synthetic speech with good quality. However, in challenging speaking styles, such as Lombard speech, it is difficult to record sufficiently large speech corpora. Therefore, in this study we propose a transfer learning method to adapt a sequence-to-sequence based TTS system of normal speaking style to Lombard style. Moreover, we experiment with a WaveNet vocoder in synthesis of Lombard speech. We conducted subjective evaluations to assess the performance of the adapted TTS systems. The subjective evaluation results indicated that an adaptation system with the WaveNet vocoder clearly outperformed the conventional deep neural network based TTS system in synthesis of Lombard speech.
Tasks	Speech Synthesis, Text-To-Speech Synthesis, Transfer Learning
Published	2018-10-29
URL	http://arxiv.org/abs/1810.12051v1
PDF	http://arxiv.org/pdf/1810.12051v1.pdf
PWC	https://paperswithcode.com/paper/speaking-style-adaptation-in-text-to-speech
Repo
Framework

Image Restoration by Estimating Frequency Distribution of Local Patches


Title	Image Restoration by Estimating Frequency Distribution of Local Patches
Authors	Jaeyoung Yoo, Sang-ho Lee, Nojun Kwak
Abstract	In this paper, we propose a method to solve the image restoration problem, which tries to restore the details of a corrupted image, especially due to the loss caused by JPEG compression. We have treated an image in the frequency domain to explicitly restore the frequency components lost during image compression. In doing so, the distribution in the frequency domain is learned using the cross entropy loss. Unlike recent approaches, we have reconstructed the details of an image without using the scheme of adversarial training. Rather, the image restoration problem is treated as a classification problem to determine the frequency coefficient for each frequency band in an image patch. In this paper, we show that the proposed method effectively restores a JPEG-compressed image with more detailed high frequency components, making the restored image more vivid.
Tasks	Image Compression, Image Restoration
Published	2018-05-23
URL	http://arxiv.org/abs/1805.09097v1
PDF	http://arxiv.org/pdf/1805.09097v1.pdf
PWC	https://paperswithcode.com/paper/image-restoration-by-estimating-frequency
Repo
Framework

LIME: Live Intrinsic Material Estimation


Title	LIME: Live Intrinsic Material Estimation
Authors	Abhimitra Meka, Maxim Maximov, Michael Zollhoefer, Avishek Chatterjee, Hans-Peter Seidel, Christian Richardt, Christian Theobalt
Abstract	We present the first end to end approach for real time material estimation for general object shapes with uniform material that only requires a single color image as input. In addition to Lambertian surface properties, our approach fully automatically computes the specular albedo, material shininess, and a foreground segmentation. We tackle this challenging and ill posed inverse rendering problem using recent advances in image to image translation techniques based on deep convolutional encoder decoder architectures. The underlying core representations of our approach are specular shading, diffuse shading and mirror images, which allow to learn the effective and accurate separation of diffuse and specular albedo. In addition, we propose a novel highly efficient perceptual rendering loss that mimics real world image formation and obtains intermediate results even during run time. The estimation of material parameters at real time frame rates enables exciting mixed reality applications, such as seamless illumination consistent integration of virtual objects into real world scenes, and virtual material cloning. We demonstrate our approach in a live setup, compare it to the state of the art, and demonstrate its effectiveness through quantitative and qualitative evaluation.
Tasks	Image-to-Image Translation
Published	2018-01-03
URL	http://arxiv.org/abs/1801.01075v2
PDF	http://arxiv.org/pdf/1801.01075v2.pdf
PWC	https://paperswithcode.com/paper/lime-live-intrinsic-material-estimation
Repo
Framework

Online Continuous Submodular Maximization


Title	Online Continuous Submodular Maximization
Authors	Lin Chen, Hamed Hassani, Amin Karbasi
Abstract	In this paper, we consider an online optimization process, where the objective functions are not convex (nor concave) but instead belong to a broad class of continuous submodular functions. We first propose a variant of the Frank-Wolfe algorithm that has access to the full gradient of the objective functions. We show that it achieves a regret bound of $O(\sqrt{T})$ (where $T$ is the horizon of the online optimization problem) against a $(1-1/e)$-approximation to the best feasible solution in hindsight. However, in many scenarios, only an unbiased estimate of the gradients are available. For such settings, we then propose an online stochastic gradient ascent algorithm that also achieves a regret bound of $O(\sqrt{T})$ regret, albeit against a weaker $1/2$-approximation to the best feasible solution in hindsight. We also generalize our results to $\gamma$-weakly submodular functions and prove the same sublinear regret bounds. Finally, we demonstrate the efficiency of our algorithms on a few problem instances, including non-convex/non-concave quadratic programs, multilinear extensions of submodular set functions, and D-optimal design.
Tasks
Published	2018-02-16
URL	http://arxiv.org/abs/1802.06052v1
PDF	http://arxiv.org/pdf/1802.06052v1.pdf
PWC	https://paperswithcode.com/paper/online-continuous-submodular-maximization
Repo
Framework

Reasoning about exceptions in ontologies: from the lexicographic closure to the skeptical closure


Title	Reasoning about exceptions in ontologies: from the lexicographic closure to the skeptical closure
Authors	Laura Giordano, Valentina Gliozzi
Abstract	Reasoning about exceptions in ontologies is nowadays one of the challenges the description logics community is facing. The paper describes a preferential approach for dealing with exceptions in Description Logics, based on the rational closure. The rational closure has the merit of providing a simple and efficient approach for reasoning with exceptions, but it does not allow independent handling of the inheritance of different defeasible properties of concepts. In this work we outline a possible solution to this problem by introducing a variant of the lexicographical closure, that we call skeptical closure, which requires to construct a single base. We develop a bi-preference semantics semantics for defining a characterization of the skeptical closure.
Tasks
Published	2018-07-08
URL	http://arxiv.org/abs/1807.02879v1
PDF	http://arxiv.org/pdf/1807.02879v1.pdf
PWC	https://paperswithcode.com/paper/reasoning-about-exceptions-in-ontologies-from
Repo
Framework

Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis


Title	Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis
Authors	Daisy Stanton, Yuxuan Wang, RJ Skerry-Ryan
Abstract	Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a state-of-the-art end-to-end text-to-speech synthesis system, to uncover expressive factors of variation in speaking style. In this work, we introduce the Text-Predicted Global Style Token (TP-GST) architecture, which treats GST combination weights or style embeddings as “virtual” speaking style labels within Tacotron. TP-GST learns to predict stylistic renderings from text alone, requiring neither explicit labels during training nor auxiliary inputs for inference. We show that, when trained on a dataset of expressive speech, our system generates audio with more pitch and energy variation than two state-of-the-art baseline models. We further demonstrate that TP-GSTs can synthesize speech with background noise removed, and corroborate these analyses with positive results on human-rated listener preference audiobook tasks. Finally, we demonstrate that multi-speaker TP-GST models successfully factorize speaker identity and speaking style. We provide a website with audio samples for each of our findings.
Tasks	Speech Synthesis, Text-To-Speech Synthesis
Published	2018-08-04
URL	http://arxiv.org/abs/1808.01410v1
PDF	http://arxiv.org/pdf/1808.01410v1.pdf
PWC	https://paperswithcode.com/paper/predicting-expressive-speaking-style-from
Repo
Framework

Sequential Copying Networks


Title	Sequential Copying Networks
Authors	Qingyu Zhou, Nan Yang, Furu Wei, Ming Zhou
Abstract	Copying mechanism shows effectiveness in sequence-to-sequence based neural network models for text generation tasks, such as abstractive sentence summarization and question generation. However, existing works on modeling copying or pointing mechanism only considers single word copying from the source sentences. In this paper, we propose a novel copying framework, named Sequential Copying Networks (SeqCopyNet), which not only learns to copy single words, but also copies sequences from the input sentence. It leverages the pointer networks to explicitly select a sub-span from the source side to target side, and integrates this sequential copying mechanism to the generation process in the encoder-decoder paradigm. Experiments on abstractive sentence summarization and question generation tasks show that the proposed SeqCopyNet can copy meaningful spans and outperforms the baseline models.
Tasks	Abstractive Sentence Summarization, Question Generation, Text Generation
Published	2018-07-06
URL	http://arxiv.org/abs/1807.02301v1
PDF	http://arxiv.org/pdf/1807.02301v1.pdf
PWC	https://paperswithcode.com/paper/sequential-copying-networks
Repo
Framework

Australia’s long-term electricity demand forecasting using deep neural networks


Title	Australia’s long-term electricity demand forecasting using deep neural networks
Authors	Homayoun Hamedmoghadam, Nima Joorabloo, Mahdi Jalili
Abstract	Accurate prediction of long-term electricity demand has a significant role in demand side management and electricity network planning and operation. Demand over-estimation results in over-investment in network assets, driving up the electricity prices, while demand under-estimation may lead to under-investment resulting in unreliable and insecure electricity. In this manuscript, we apply deep neural networks to predict Australia’s long-term electricity demand. A stacked autoencoder is used in combination with multilayer perceptrons or cascade-forward multilayer perceptrons to predict the nation-wide electricity consumption rates for 1-24 months ahead of time. The experimental results show that the deep structures have better performance than classical neural networks, especially for 12-month to 24-month prediction horizon.
Tasks
Published	2018-01-07
URL	http://arxiv.org/abs/1801.02148v2
PDF	http://arxiv.org/pdf/1801.02148v2.pdf
PWC	https://paperswithcode.com/paper/australias-long-term-electricity-demand
Repo
Framework

Low-Shot Learning from Imaginary Data


Title	Low-Shot Learning from Imaginary Data
Authors	Yu-Xiong Wang, Ross Girshick, Martial Hebert, Bharath Hariharan
Abstract	Humans can quickly learn new visual concepts, perhaps because they can easily visualize or imagine what novel objects look like from different views. Incorporating this ability to hallucinate novel instances of new concepts might help machine vision systems perform better low-shot learning, i.e., learning concepts from few examples. We present a novel approach to low-shot learning that uses this idea. Our approach builds on recent progress in meta-learning (“learning to learn”) by combining a meta-learner with a “hallucinator” that produces additional training examples, and optimizing both models jointly. Our hallucinator can be incorporated into a variety of meta-learners and provides significant gains: up to a 6 point boost in classification accuracy when only a single training example is available, yielding state-of-the-art performance on the challenging ImageNet low-shot classification benchmark.
Tasks	Meta-Learning
Published	2018-01-16
URL	http://arxiv.org/abs/1801.05401v2
PDF	http://arxiv.org/pdf/1801.05401v2.pdf
PWC	https://paperswithcode.com/paper/low-shot-learning-from-imaginary-data
Repo
Framework

Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition


Title	Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition
Authors	Antonio Miguel, Jorge Llombart, Alfonso Ortega, Eduardo Lleida
Abstract	In this paper we propose a method to model speaker and session variability and able to generate likelihood ratios using neural networks in an end-to-end phrase dependent speaker verification system. As in Joint Factor Analysis, the model uses tied hidden variables to model speaker and session variability and a MAP adaptation of some of the parameters of the model. In the training procedure our method jointly estimates the network parameters and the values of the speaker and channel hidden variables. This is done in a two-step backpropagation algorithm, first the network weights and factor loading matrices are updated and then the hidden variables, whose gradients are calculated by aggregating the corresponding speaker or session frames, since these hidden variables are tied. The last layer of the network is defined as a linear regression probabilistic model whose inputs are the previous layer outputs. This choice has the advantage that it produces likelihoods and additionally it can be adapted during the enrolment using MAP without the need of a gradient optimization. The decisions are made based on the ratio of the output likelihoods of two neural network models, speaker adapted and universal background model. The method was evaluated on the RSR2015 database.
Tasks	Speaker Recognition, Speaker Verification
Published	2018-12-27
URL	http://arxiv.org/abs/1812.11946v1
PDF	http://arxiv.org/pdf/1812.11946v1.pdf
PWC	https://paperswithcode.com/paper/tied-hidden-factors-in-neural-networks-for
Repo
Framework

The Boosted DC Algorithm for nonsmooth functions


Title	The Boosted DC Algorithm for nonsmooth functions
Authors	Francisco J. Aragón Artacho, Phan T. Vuong
Abstract	The Boosted Difference of Convex functions Algorithm (BDCA) was recently proposed for minimizing smooth difference of convex (DC) functions. BDCA accelerates the convergence of the classical Difference of Convex functions Algorithm (DCA) thanks to an additional line search step. The purpose of this paper is twofold. Firstly, to show that this scheme can be generalized and successfully applied to certain types of nonsmooth DC functions, namely, those that can be expressed as the difference of a smooth function and a possibly nonsmooth one. Secondly, to show that there is complete freedom in the choice of the trial step size for the line search, which is something that can further improve its performance. We prove that any limit point of the BDCA iterative sequence is a critical point of the problem under consideration, and that the corresponding objective value is monotonically decreasing and convergent. The global convergence and convergent rate of the iterations are obtained under the Kurdyka-Lojasiewicz property. Applications and numerical experiments for two problems in data science are presented, demonstrating that BDCA outperforms DCA. Specifically, for the Minimum Sum-of-Squares Clustering problem, BDCA was on average sixteen times faster than DCA, and for the Multidimensional Scaling problem, BDCA was three times faster than DCA.
Tasks
Published	2018-12-14
URL	https://arxiv.org/abs/1812.06070v2
PDF	https://arxiv.org/pdf/1812.06070v2.pdf
PWC	https://paperswithcode.com/paper/the-boosted-dc-algorithm-for-nonsmooth
Repo
Framework

Five lessons from building a deep neural network recommender


Title	Five lessons from building a deep neural network recommender
Authors	Simen Eide, Audun M. Øygard, Ning Zhou
Abstract	Recommendation algorithms are widely adopted in marketplaces to help users find the items they are looking for. The sparsity of the items by user matrix and the cold-start issue in marketplaces pose challenges for the off-the-shelf matrix factorization based recommender systems. To understand user intent and tailor recommendations to their needs, we use deep learning to explore various heterogeneous data available in marketplaces. This paper summarizes five lessons we learned from experimenting with state-of-the-art deep learning recommenders at the leading Norwegian marketplace FINN.no. We design a hybrid recommender system that takes the user-generated contents of a marketplace (including text, images and meta attributes) and combines them with user behavior data such as page views and messages to provide recommendations for marketplace items. Among various tactics we experimented with, the following five show the best impact: staged training instead of end-to-end training, leveraging rich user behaviors beyond page views, using user behaviors as noisy labels to train embeddings, using transfer learning to solve the unbalanced data problem, and using attention mechanisms in the hybrid model. This system is currently running with around 20% click-through-rate in production at FINN.no and serves over one million visitors everyday.
Tasks	Recommendation Systems, Transfer Learning
Published	2018-09-06
URL	http://arxiv.org/abs/1809.02131v2
PDF	http://arxiv.org/pdf/1809.02131v2.pdf
PWC	https://paperswithcode.com/paper/five-lessons-from-building-a-deep-neural
Repo
Framework

Fast Learning-based Registration of Sparse 3D Clinical Images


Title	Fast Learning-based Registration of Sparse 3D Clinical Images
Authors	Kathleen M. Lewis, Natalia S. Rost, John Guttag, Adrian V. Dalca
Abstract	We introduce SparseVM, a method to register clinical 3D scans faster and more accurately than previously possible. Deformable alignment, or registration, of clinical scans is a fundamental task for many medical image applications such as longitudinal population studies. Most registration algorithms are designed for high-resolution research-quality scans and under-perform when applied to clinical data. Clinical scans present unique challenges because, in contrast to research-quality scans, clinical scans are often sparse, missing up to 85% of the slices available in research scans. We build on a state-of-the-art learning-based registration method to improve the accuracy of sparse clinical image registration and demonstrate our method on a clinically-acquired MRI dataset of stroke patients. SparseVM registers 3D scans in under a second on a GPU, which is over 1000x faster than the most accurate clinical registration methods, without compromising accuracy. Because of this, SparseVM enables clinical analyses that were not previously possible. The code is publicly available at voxelmorph.mit.edu.
Tasks	Image Registration, Registration Of Sparse Clinical Images
Published	2018-12-17
URL	https://arxiv.org/abs/1812.06932v2
PDF	https://arxiv.org/pdf/1812.06932v2.pdf
PWC	https://paperswithcode.com/paper/fast-learning-based-registration-of-sparse
Repo
Framework

Analyzing the Noise Robustness of Deep Neural Networks


Title	Analyzing the Noise Robustness of Deep Neural Networks
Authors	Mengchen Liu, Shixia Liu, Hang Su, Kelei Cao, Jun Zhu
Abstract	Deep neural networks (DNNs) are vulnerable to maliciously generated adversarial examples. These examples are intentionally designed by making imperceptible perturbations and often mislead a DNN into making an incorrect prediction. This phenomenon means that there is significant risk in applying DNNs to safety-critical applications, such as driverless cars. To address this issue, we present a visual analytics approach to explain the primary cause of the wrong predictions introduced by adversarial examples. The key is to analyze the datapaths of the adversarial examples and compare them with those of the normal examples. A datapath is a group of critical neurons and their connections. To this end, we formulate the datapath extraction as a subset selection problem and approximately solve it based on back-propagation. A multi-level visualization consisting of a segmented DAG (layer level), an Euler diagram (feature map level), and a heat map (neuron level), has been designed to help experts investigate datapaths from the high-level layers to the detailed neuron activations. Two case studies are conducted that demonstrate the promise of our approach in support of explaining the working mechanism of adversarial examples.
Tasks
Published	2018-10-09
URL	http://arxiv.org/abs/1810.03913v1
PDF	http://arxiv.org/pdf/1810.03913v1.pdf
PWC	https://paperswithcode.com/paper/analyzing-the-noise-robustness-of-deep-neural
Repo
Framework

Enhancing Label-Driven Deep Deformable Image Registration with Local Distance Metrics for State-of-the-Art Cardiac Motion Tracking


Title	Enhancing Label-Driven Deep Deformable Image Registration with Local Distance Metrics for State-of-the-Art Cardiac Motion Tracking
Authors	Alessa Hering, Sven Kuckertz, Stefan Heldmann, Mattias Heinrich
Abstract	While deep learning has achieved significant advances in accuracy for medical image segmentation, its benefits for deformable image registration have so far remained limited to reduced computation times. Previous work has either focused on replacing the iterative optimization of distance and smoothness terms with CNN-layers or using supervised approaches driven by labels. Our method is the first to combine the complementary strengths of global semantic information (represented by segmentation labels) and local distance metrics that help align surrounding structures. We demonstrate significant higher Dice scores (of 86.5%) for deformable cardiac image registration compared to classic registration (79.0%) as well as label-driven deep learning frameworks (83.4%).
Tasks	Image Registration, Medical Image Segmentation, Semantic Segmentation
Published	2018-12-05
URL	http://arxiv.org/abs/1812.01859v1
PDF	http://arxiv.org/pdf/1812.01859v1.pdf
PWC	https://paperswithcode.com/paper/enhancing-label-driven-deep-deformable-image
Repo
Framework