February 1, 2020

3179 words 15 mins read

Paper Group AWR 117

Applying Probabilistic Programming to Affective Computing. Expediting TTS Synthesis with Adversarial Vocoding. Region Normalization for Image Inpainting. Weakly Supervised Disentanglement by Pairwise Similarities. Adversarial Examples Improve Image Recognition. Deep Signature Transforms. A Tool for Super-Resolving Multimodal Clinical MRI. Improving …

Applying Probabilistic Programming to Affective Computing


Title	Applying Probabilistic Programming to Affective Computing
Authors	Desmond C. Ong, Harold Soh, Jamil Zaki, Noah D. Goodman
Abstract	Affective Computing is a rapidly growing field spurred by advancements in artificial intelligence, but often, held back by the inability to translate psychological theories of emotion into tractable computational models. To address this, we propose a probabilistic programming approach to affective computing, which models psychological-grounded theories as generative models of emotion, and implements them as stochastic, executable computer programs. We first review probabilistic approaches that integrate reasoning about emotions with reasoning about other latent mental states (e.g., beliefs, desires) in context. Recently-developed probabilistic programming languages offer several key desidarata over previous approaches, such as: (i) flexibility in representing emotions and emotional processes; (ii) modularity and compositionality; (iii) integration with deep learning libraries that facilitate efficient inference and learning from large, naturalistic data; and (iv) ease of adoption. Furthermore, using a probabilistic programming framework allows a standardized platform for theory-building and experimentation: Competing theories (e.g., of appraisal or other emotional processes) can be easily compared via modular substitution of code followed by model comparison. To jumpstart adoption, we illustrate our points with executable code that researchers can easily modify for their own models. We end with a discussion of applications and future directions of the probabilistic programming approach.
Tasks	Probabilistic Programming
Published	2019-03-15
URL	http://arxiv.org/abs/1903.06445v1
PDF	http://arxiv.org/pdf/1903.06445v1.pdf
PWC	https://paperswithcode.com/paper/applying-probabilistic-programming-to
Repo	https://github.com/desmond-ong/pplAffComp
Framework	pytorch

Expediting TTS Synthesis with Adversarial Vocoding


Title	Expediting TTS Synthesis with Adversarial Vocoding
Authors	Paarth Neekhara, Chris Donahue, Miller Puckette, Shlomo Dubnov, Julian McAuley
Abstract	Recent approaches in text-to-speech (TTS) synthesis employ neural network strategies to vocode perceptually-informed spectrogram representations directly into listenable waveforms. Such vocoding procedures create a computational bottleneck in modern TTS pipelines. We propose an alternative approach which utilizes generative adversarial networks (GANs) to learn mappings from perceptually-informed spectrograms to simple magnitude spectrograms which can be heuristically vocoded. Through a user study, we show that our approach significantly outperforms na"ive vocoding strategies while being hundreds of times faster than neural network vocoders used in state-of-the-art TTS systems. We also show that our method can be used to achieve state-of-the-art results in unsupervised synthesis of individual words of speech.
Tasks
Published	2019-04-16
URL	https://arxiv.org/abs/1904.07944v2
PDF	https://arxiv.org/pdf/1904.07944v2.pdf
PWC	https://paperswithcode.com/paper/expediting-tts-synthesis-with-adversarial
Repo	https://github.com/paarthneekhara/advoc
Framework	tf

Region Normalization for Image Inpainting


Title	Region Normalization for Image Inpainting
Authors	Tao Yu, Zongyu Guo, Xin Jin, Shilin Wu, Zhibo Chen, Weiping Li, Zhizheng Zhang, Sen Liu
Abstract	Feature Normalization (FN) is an important technique to help neural network training, which typically normalizes features across spatial dimensions. Most previous image inpainting methods apply FN in their networks without considering the impact of the corrupted regions of the input image on normalization, e.g. mean and variance shifts. In this work, we show that the mean and variance shifts caused by full-spatial FN limit the image inpainting network training and we propose a spatial region-wise normalization named Region Normalization (RN) to overcome the limitation. RN divides spatial pixels into different regions according to the input mask, and computes the mean and variance in each region for normalization. We develop two kinds of RN for our image inpainting network: (1) Basic RN (RN-B), which normalizes pixels from the corrupted and uncorrupted regions separately based on the original inpainting mask to solve the mean and variance shift problem; (2) Learnable RN (RN-L), which automatically detects potentially corrupted and uncorrupted regions for separate normalization, and performs global affine transformation to enhance their fusion. We apply RN-B in the early layers and RN-L in the latter layers of the network respectively. Experiments show that our method outperforms current state-of-the-art methods quantitatively and qualitatively. We further generalize RN to other inpainting networks and achieve consistent performance improvements.
Tasks	Image Inpainting
Published	2019-11-23
URL	https://arxiv.org/abs/1911.10375v1
PDF	https://arxiv.org/pdf/1911.10375v1.pdf
PWC	https://paperswithcode.com/paper/region-normalization-for-image-inpainting
Repo	https://github.com/geekyutao/RN
Framework	pytorch

Weakly Supervised Disentanglement by Pairwise Similarities


Title	Weakly Supervised Disentanglement by Pairwise Similarities
Authors	Junxiang Chen, Kayhan Batmanghelich
Abstract	Recently, researches related to unsupervised disentanglement learning with deep generative models have gained substantial popularity. However, without introducing supervision, there is no guarantee that the factors of interest can be successfully recovered. Motivated by a real-world problem, we propose a setting where the user introduces weak supervision by providing similarities between instances based on a factor to be disentangled. The similarity is provided as either a binary (yes/no) or a real-valued label describing whether a pair of instances are similar or not. We propose a new method for weakly supervised disentanglement of latent variables within the framework of Variational Autoencoder. Experimental results demonstrate that utilizing weak supervision improves the performance of the disentanglement method substantially.
Tasks
Published	2019-06-03
URL	https://arxiv.org/abs/1906.01044v2
PDF	https://arxiv.org/pdf/1906.01044v2.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-disentanglement-by-pairwise
Repo	https://github.com/batmanlab/VAE_pairwise
Framework	pytorch

Adversarial Examples Improve Image Recognition


Title	Adversarial Examples Improve Image Recognition
Authors	Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le
Abstract	Adversarial examples are commonly viewed as a threat to ConvNets. Here we present an opposite perspective: adversarial examples can be used to improve image recognition models if harnessed in the right manner. We propose AdvProp, an enhanced adversarial training scheme which treats adversarial examples as additional examples, to prevent overfitting. Key to our method is the usage of a separate auxiliary batch norm for adversarial examples, as they have different underlying distributions to normal examples. We show that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger. For instance, by applying AdvProp to the latest EfficientNet-B7 [28] on ImageNet, we achieve significant improvements on ImageNet (+0.7%), ImageNet-C (+6.5%), ImageNet-A (+7.0%), Stylized-ImageNet (+4.8%). With an enhanced EfficientNet-B8, our method achieves the state-of-the-art 85.5% ImageNet top-1 accuracy without extra data. This result even surpasses the best model in [20] which is trained with 3.5B Instagram images (~3000X more than ImageNet) and ~9.4X more parameters. Models are available at https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet.
Tasks	Image Classification
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09665v1
PDF	https://arxiv.org/pdf/1911.09665v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-examples-improve-image
Repo	https://github.com/osmr/imgclsmob
Framework	mxnet

Deep Signature Transforms


Title	Deep Signature Transforms
Authors	Patric Bonnier, Patrick Kidger, Imanol Perez Arribas, Cristopher Salvi, Terry Lyons
Abstract	The signature is an infinite graded sequence of statistics known to characterise a stream of data up to a negligible equivalence class. It is a transform which has previously been treated as a fixed feature transformation, on top of which a model may be built. We propose a novel approach which combines the advantages of the signature transform with modern deep learning frameworks. By learning an augmentation of the stream prior to the signature transform, the terms of the signature may be selected in a data-dependent way. More generally, we describe how the signature transform may be used as a layer anywhere within a neural network. In this context it may be interpreted as a pooling operation. We present the results of empirical experiments to back up the theoretical justification. Code available at https://github.com/patrick-kidger/Deep-Signature-Transforms.
Tasks
Published	2019-05-21
URL	https://arxiv.org/abs/1905.08494v2
PDF	https://arxiv.org/pdf/1905.08494v2.pdf
PWC	https://paperswithcode.com/paper/deep-signatures
Repo	https://github.com/patrick-kidger/Deep-Signatures
Framework	pytorch

A Tool for Super-Resolving Multimodal Clinical MRI


Title	A Tool for Super-Resolving Multimodal Clinical MRI
Authors	Mikael Brudfors, Yael Balbastre, Parashkev Nachev, John Ashburner
Abstract	We present a tool for resolution recovery in multimodal clinical magnetic resonance imaging (MRI). Such images exhibit great variability, both biological and instrumental. This variability makes automated processing with neuroimaging analysis software very challenging. This leaves intelligence extractable only from large-scale analyses of clinical data untapped, and impedes the introduction of automated predictive systems in clinical care. The tool presented in this paper enables such processing, via inference in a generative model of thick-sliced, multi-contrast MR scans. All model parameters are estimated from the observed data, without the need for manual tuning. The model-driven nature of the approach means that no type of training is needed for applicability to the diversity of MR contrasts present in a clinical context. We show on simulated data that the proposed approach outperforms conventional model-based techniques, and on a large hospital dataset of multimodal MRIs that the tool can successfully super-resolve very thick-sliced images. The implementation is available from https://github.com/brudfors/spm_superres.
Tasks
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01140v1
PDF	https://arxiv.org/pdf/1909.01140v1.pdf
PWC	https://paperswithcode.com/paper/a-tool-for-super-resolving-multimodal
Repo	https://github.com/brudfors/spm_superres
Framework	none

Improving Style Transfer with Calibrated Metrics


Title	Improving Style Transfer with Calibrated Metrics
Authors	Mao-Chuang Yeh, Shuai Tang, Anand Bhattad, Chuhang Zou, David Forsyth
Abstract	Style transfer methods produce a transferred image which is a rendering of a content image in the manner of a style image. We seek to understand how to improve style transfer. To do so requires quantitative evaluation procedures, but the current evaluation is qualitative, mostly involving user studies. We describe a novel quantitative evaluation procedure. Our procedure relies on two statistics: the Effectiveness (E) statistic measures the extent that a given style has been transferred to the target, and the Coherence (C) statistic measures the extent to which the original image’s content is preserved. Our statistics are calibrated to human preference: targets with larger values of E (resp C) will reliably be preferred by human subjects in comparisons of style (resp. content). We use these statistics to investigate the relative performance of a number of Neural Style Transfer(NST) methods, revealing several intriguing properties. Admissible methods lie on a Pareto frontier (i.e. improving E reduces C or vice versa). Three methods are admissible: Universal style transfer produces very good C but weak E; modifying the optimization used for Gatys’ loss produces a method with strong E and strong C; and a modified cross-layer method has slightly better E at strong cost in C. While the histogram loss improves the E statistics of Gatys’ method, it does not make the method admissible. Surprisingly, style weights have relatively little effect in improving EC scores, and most variability in the transfer is explained by the style itself (meaning experimenters can be misguided by selecting styles).
Tasks	Style Transfer
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09447v2
PDF	https://arxiv.org/pdf/1910.09447v2.pdf
PWC	https://paperswithcode.com/paper/improving-style-transfer-with-calibrated
Repo	https://github.com/stringtron/quantative_style
Framework	pytorch

Customizing Sequence Generation with Multi-Task Dynamical Systems


Title	Customizing Sequence Generation with Multi-Task Dynamical Systems
Authors	Alex Bird, Christopher K. I. Williams
Abstract	Dynamical system models (including RNNs) often lack the ability to adapt the sequence generation or prediction to a given context, limiting their real-world application. In this paper we show that hierarchical multi-task dynamical systems (MTDSs) provide direct user control over sequence generation, via use of a latent code $\mathbf{z}$ that specifies the customization to the individual data sequence. This enables style transfer, interpolation and morphing within generated sequences. We show the MTDS can improve predictions via latent code interpolation, and avoid the long-term performance degradation of standard RNN approaches.
Tasks	Style Transfer
Published	2019-10-11
URL	https://arxiv.org/abs/1910.05026v1
PDF	https://arxiv.org/pdf/1910.05026v1.pdf
PWC	https://paperswithcode.com/paper/customizing-sequence-generation-with-multi
Repo	https://github.com/ornithos/mtds-dblpend
Framework	none

Net2Vis – A Visual Grammar for Automatically Generating Publication-Ready CNN Architecture Visualizations


Title	Net2Vis – A Visual Grammar for Automatically Generating Publication-Ready CNN Architecture Visualizations
Authors	Alex Bäuerle, Christian van Onzenoodt, Timo Ropinski
Abstract	To convey neural network architectures in publications, appropriate visualizations are of great importance. While most current deep learning papers contain such visualizations, these are usually handcrafted just before publication, which results in a lack of a common visual grammar, significant time investment, errors, and ambiguities. Current automatic network visualization tools focus on debugging the network itself, and are not ideal for generating publication-ready visualizations. Therefore, we present an approach to automate this process by translating network architectures specified in Keras into visualizations that can directly be embedded into any publication. To do so, we propose a visual grammar for convolutional neural networks (CNNs), which has been derived from an analysis of such figures extracted from all ICCV and CVPR papers published between 2013 and 2019. The proposed grammar incorporates visual encoding, network layout, layer aggregation, and legend generation. We have further realized our approach in an online system available to the community, which we have evaluated through expert feedback, and a quantitative study. It not only reduces the time needed to generate publication-ready network visualizations, but also enables a unified and unambiguous visualization design.
Tasks
Published	2019-02-11
URL	https://arxiv.org/abs/1902.04394v4
PDF	https://arxiv.org/pdf/1902.04394v4.pdf
PWC	https://paperswithcode.com/paper/net2vis-transforming-deep-convolutional
Repo	https://github.com/christianversloot/net2vis-docker
Framework	none

Guided Super-Resolution as Pixel-to-Pixel Transformation


Title	Guided Super-Resolution as Pixel-to-Pixel Transformation
Authors	Riccardo de Lutio, Stefano D’Aronco, Jan Dirk Wegner, Konrad Schindler
Abstract	Guided super-resolution is a unifying framework for several computer vision tasks where the inputs are a low-resolution source image of some target quantity (e.g., perspective depth acquired with a time-of-flight camera) and a high-resolution guide image from a different domain (e.g., a grey-scale image from a conventional camera); and the target output is a high-resolution version of the source (in our example, a high-res depth map). The standard way of looking at this problem is to formulate it as a super-resolution task, i.e., the source image is upsampled to the target resolution, while transferring the missing high-frequency details from the guide. Here, we propose to turn that interpretation on its head and instead see it as a pixel-to-pixel mapping of the guide image to the domain of the source image. The pixel-wise mapping is parametrised as a multi-layer perceptron, whose weights are learned by minimising the discrepancies between the source image and the downsampled target image. Importantly, our formulation makes it possible to regularise only the mapping function, while avoiding regularisation of the outputs; thus producing crisp, natural-looking images. The proposed method is unsupervised, using only the specific source and guide images to fit the mapping. We evaluate our method on two different tasks, super-resolution of depth maps and of tree height maps. In both cases, we clearly outperform recent baselines in quantitative comparisons, while delivering visually much sharper outputs.
Tasks	Super-Resolution
Published	2019-04-02
URL	https://arxiv.org/abs/1904.01501v2
PDF	https://arxiv.org/pdf/1904.01501v2.pdf
PWC	https://paperswithcode.com/paper/guided-super-resolution-as-a-learned-pixel-to
Repo	https://github.com/riccardodelutio/PixTransform
Framework	pytorch

Earthmover-based manifold learning for analyzing molecular conformation spaces


Title	Earthmover-based manifold learning for analyzing molecular conformation spaces
Authors	Nathan Zelesko, Amit Moscovich, Joe Kileel, Amit Singer
Abstract	In this paper, we propose a novel approach for manifold learning that combines the Earthmover’s distance (EMD) with the diffusion maps method for dimensionality reduction. We demonstrate the potential benefits of this approach for learning shape spaces of proteins and other flexible macromolecules using a simulated dataset of 3-D density maps that mimic the non-uniform rotary motion of ATP synthase. Our results show that EMD-based diffusion maps require far fewer samples to recover the intrinsic geometry than the standard diffusion maps algorithm that is based on the Euclidean distance. To reduce the computational burden of calculating the EMD for all volume pairs, we employ a wavelet-based approximation to the EMD which reduces the computation of the pairwise EMDs to a computation of pairwise weighted-$\ell_1$ distances between wavelet coefficient vectors.
Tasks	Dimensionality Reduction
Published	2019-10-16
URL	https://arxiv.org/abs/1911.06107v1
PDF	https://arxiv.org/pdf/1911.06107v1.pdf
PWC	https://paperswithcode.com/paper/earthmover-based-manifold-learning-for
Repo	https://github.com/nathanzelesko/earthmover
Framework	none

Efficient Object Detection in Large Images using Deep Reinforcement Learning


Title	Efficient Object Detection in Large Images using Deep Reinforcement Learning
Authors	Burak Uzkent, Christopher Yeh, Stefano Ermon
Abstract	Traditionally, an object detector is applied to every part of the scene of interest, and its accuracy and computational cost increases with higher resolution images. However, in some application domains such as remote sensing, purchasing high spatial resolution images is expensive. To reduce the large computational and monetary cost associated with using high spatial resolution images, we propose a reinforcement learning agent that adaptively selects the spatial resolution of each image that is provided to the detector. In particular, we train the agent in a dual reward setting to choose low spatial resolution images to be run through a coarse level detector when the image is dominated by large objects, and high spatial resolution images to be run through a fine level detector when it is dominated by small objects. This reduces the dependency on high spatial resolution images for building a robust detector and increases run-time efficiency. We perform experiments on the xView dataset, consisting of large images, where we increase run-time efficiency by 50% and use high resolution images only 30% of the time while maintaining similar accuracy as a detector that uses only high resolution images.
Tasks	Object Detection
Published	2019-12-09
URL	https://arxiv.org/abs/1912.03966v1
PDF	https://arxiv.org/pdf/1912.03966v1.pdf
PWC	https://paperswithcode.com/paper/efficient-object-detection-in-large-images
Repo	https://github.com/ermongroup/EfficientObjectDetection
Framework	pytorch

Accident Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights


Title	Accident Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights
Authors	Sobhan Moosavi, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, Radu Teodorescu, Rajiv Ramnath
Abstract	Reducing traffic accidents is an important public safety challenge, therefore, accident analysis and prediction has been a topic of much research over the past few decades. Using small-scale datasets with limited coverage, being dependent on extensive set of data, and being not applicable for real-time purposes are the important shortcomings of the existing studies. To address these challenges, we propose a new solution for real-time traffic accident prediction using easy-to-obtain, but sparse data. Our solution relies on a deep-neural-network model (which we have named DAP, for Deep Accident Prediction); which utilizes a variety of data attributes such as traffic events, weather data, points-of-interest, and time. DAP incorporates multiple components including a recurrent (for time-sensitive data), a fully connected (for time-insensitive data), and a trainable embedding component (to capture spatial heterogeneity). To fill the data gap, we have - through a comprehensive process of data collection, integration, and augmentation - created a large-scale publicly available database of accident information named US-Accidents. By employing the US-Accidents dataset and through an extensive set of experiments across several large cities, we have evaluated our proposal against several baselines. Our analysis and results show significant improvements to predict rare accident events. Further, we have shown the impact of traffic information, time, and points-of-interest data for real-time accident prediction.
Tasks
Published	2019-09-19
URL	https://arxiv.org/abs/1909.09638v1
PDF	https://arxiv.org/pdf/1909.09638v1.pdf
PWC	https://paperswithcode.com/paper/190909638
Repo	https://github.com/mhsamavatian/DAP
Framework	tf

Boosting Entity Linking Performance by Leveraging Unlabeled Documents


Title	Boosting Entity Linking Performance by Leveraging Unlabeled Documents
Authors	Phong Le, Ivan Titov
Abstract	Modern entity linking systems rely on large collections of documents specifically annotated for the task (e.g., AIDA CoNLL). In contrast, we propose an approach which exploits only naturally occurring information: unlabeled documents and Wikipedia. Our approach consists of two stages. First, we construct a high recall list of candidate entities for each mention in an unlabeled document. Second, we use the candidate lists as weak supervision to constrain our document-level entity linking model. The model treats entities as latent variables and, when estimated on a collection of unlabelled texts, learns to choose entities relying both on local context of each mention and on coherence with other entities in the document. The resulting approach rivals fully-supervised state-of-the-art systems on standard test sets. It also approaches their performance in the very challenging setting: when tested on a test set sampled from the data used to estimate the supervised systems. By comparing to Wikipedia-only training of our model, we demonstrate that modeling unlabeled documents is beneficial.
Tasks	Entity Linking
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01250v1
PDF	https://arxiv.org/pdf/1906.01250v1.pdf
PWC	https://paperswithcode.com/paper/boosting-entity-linking-performance-by
Repo	https://github.com/lephong/wnel
Framework	pytorch