October 21, 2019

3101 words 15 mins read

Paper Group AWR 66

Homogeneity-Based Transmissive Process to Model True and False News in Social Networks. Learning pronunciation from a foreign language in speech synthesis networks. Deriving Machine Attention from Human Rationales. Bayesian Joint Spike-and-Slab Graphical Lasso. Hybrid Macro/Micro Level Backpropagation for Training Deep Spiking Neural Networks. Ster …


Title	Homogeneity-Based Transmissive Process to Model True and False News in Social Networks
Authors	Jooyeon Kim, Dongkwan Kim, Alice Oh
Abstract	An overwhelming number of true and false news stories are posted and shared in social networks, and users diffuse the stories based on multiple factors. Diffusion of news stories from one user to another depends not only on the stories’ content and the genuineness but also on the alignment of the topical interests between the users. In this paper, we propose a novel Bayesian nonparametric model that incorporates homogeneity of news stories as the key component that regulates the topical similarity between the posting and sharing users’ topical interests. Our model extends hierarchical Dirichlet process to model the topics of the news stories and incorporates Bayesian Gaussian process latent variable model to discover the homogeneity values. We train our model on a real-world social network dataset and find homogeneity values of news stories that strongly relate to their labels of genuineness and their contents. Finally, we show that the supervised version of our model predicts the labels of news stories better than the state-of-the-art neural network and Bayesian models.
Tasks
Published	2018-11-16
URL	http://arxiv.org/abs/1811.09702v1
PDF	http://arxiv.org/pdf/1811.09702v1.pdf
PWC	https://paperswithcode.com/paper/homogeneity-based-transmissive-process-to
Repo	https://github.com/todoaskit/HBTP
Framework	none

Learning pronunciation from a foreign language in speech synthesis networks


Title	Learning pronunciation from a foreign language in speech synthesis networks
Authors	Younggun Lee, Suwon Shon, Taesu Kim
Abstract	Although there are more than 65,000 languages in the world, the pronunciations of many phonemes sound similar across the languages. When people learn a foreign language, their pronunciation often reflects their native language’s characteristics. This motivates us to investigate how the speech synthesis network learns the pronunciation from datasets from different languages. In this study, we are interested in analyzing and taking advantage of multilingual speech synthesis network. First, we train the speech synthesis network bilingually in English and Korean and analyze how the network learns the relations of phoneme pronunciation between the languages. Our experimental result shows that the learned phoneme embedding vectors are located closer if their pronunciations are similar across the languages. Consequently, the trained networks can synthesize the English speakers’ Korean speech and vice versa. Using this result, we propose a training framework to utilize information from a different language. To be specific, we pre-train a speech synthesis network using datasets from both high-resource language and low-resource language, then we fine-tune the network using the low-resource language dataset. Finally, we conducted more simulations on 10 different languages to show it is generally extendable to other languages.
Tasks	Speech Synthesis
Published	2018-11-23
URL	http://arxiv.org/abs/1811.09364v3
PDF	http://arxiv.org/pdf/1811.09364v3.pdf
PWC	https://paperswithcode.com/paper/learning-pronunciation-from-a-foreign
Repo	https://github.com/Kyubyong/g2p
Framework	tf

Deriving Machine Attention from Human Rationales


Title	Deriving Machine Attention from Human Rationales
Authors	Yujia Bao, Shiyu Chang, Mo Yu, Regina Barzilay
Abstract	Attention-based models are successful when trained on large amounts of data. In this paper, we demonstrate that even in the low-resource scenario, attention can be learned effectively. To this end, we start with discrete human-annotated rationales and map them into continuous attention. Our central hypothesis is that this mapping is general across domains, and thus can be transferred from resource-rich domains to low-resource ones. Our model jointly learns a domain-invariant representation and induces the desired mapping between rationales and attention. Our empirical results validate this hypothesis and show that our approach delivers significant gains over state-of-the-art baselines, yielding over 15% average error reduction on benchmark datasets.
Tasks
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09367v1
PDF	http://arxiv.org/pdf/1808.09367v1.pdf
PWC	https://paperswithcode.com/paper/deriving-machine-attention-from-human
Repo	https://github.com/Sein-Jang/Deriving-Machine-Attention-from-Human-Rationales
Framework	pytorch

Bayesian Joint Spike-and-Slab Graphical Lasso


Title	Bayesian Joint Spike-and-Slab Graphical Lasso
Authors	Zehang Richard Li, Tyler H. McCormick, Samuel J. Clark
Abstract	In this article, we propose a new class of priors for Bayesian inference with multiple Gaussian graphical models. We introduce fully Bayesian treatments of two popular procedures, the group graphical lasso and the fused graphical lasso, and extend them to a continuous spike-and-slab framework to allow self-adaptive shrinkage and model selection simultaneously. We develop an EM algorithm that performs fast and dynamic explorations of posterior modes. Our approach selects sparse models efficiently with substantially smaller bias than would be induced by alternative regularization procedures. The performance of the proposed methods are demonstrated through simulation and two real data examples.
Tasks	Bayesian Inference, Model Selection
Published	2018-05-18
URL	https://arxiv.org/abs/1805.07051v2
PDF	https://arxiv.org/pdf/1805.07051v2.pdf
PWC	https://paperswithcode.com/paper/bayesian-joint-spike-and-slab-graphical-lasso
Repo	https://github.com/richardli/SSJGL
Framework	none

Hybrid Macro/Micro Level Backpropagation for Training Deep Spiking Neural Networks


Title	Hybrid Macro/Micro Level Backpropagation for Training Deep Spiking Neural Networks
Authors	Yingyezhe Jin, Wenrui Zhang, Peng Li
Abstract	Spiking neural networks (SNNs) are positioned to enable spatio-temporal information processing and ultra-low power event-driven neuromorphic hardware. However, SNNs are yet to reach the same performances of conventional deep artificial neural networks (ANNs), a long-standing challenge due to complex dynamics and non-differentiable spike events encountered in training. The existing SNN error backpropagation (BP) methods are limited in terms of scalability, lack of proper handling of spiking discontinuities, and/or mismatch between the rate-coded loss function and computed gradient. We present a hybrid macro/micro level backpropagation (HM2-BP) algorithm for training multi-layer SNNs. The temporal effects are precisely captured by the proposed spike-train level post-synaptic potential (S-PSP) at the microscopic level. The rate-coded errors are defined at the macroscopic level, computed and back-propagated across both macroscopic and microscopic levels. Different from existing BP methods, HM2-BP directly computes the gradient of the rate-coded loss function w.r.t tunable parameters. We evaluate the proposed HM2-BP algorithm by training deep fully connected and convolutional SNNs based on the static MNIST [14] and dynamic neuromorphic N-MNIST [26]. HM2-BP achieves an accuracy level of 99.49% and 98.88% for MNIST and N-MNIST, respectively, outperforming the best reported performances obtained from the existing SNN BP algorithms. Furthermore, the HM2-BP produces the highest accuracies based on SNNs for the EMNIST [3] dataset, and leads to high recognition accuracy for the 16-speaker spoken English letters of TI46 Corpus [16], a challenging patio-temporal speech recognition benchmark for which no prior success based on SNNs was reported. It also achieves competitive performances surpassing those of conventional deep learning models when dealing with asynchronous spiking streams.
Tasks	Speech Recognition
Published	2018-05-21
URL	http://arxiv.org/abs/1805.07866v6
PDF	http://arxiv.org/pdf/1805.07866v6.pdf
PWC	https://paperswithcode.com/paper/hybrid-macromicro-level-backpropagation-for
Repo	https://github.com/jinyyy666/mm-bp-snn
Framework	none

Stereo Magnification: Learning View Synthesis using Multiplane Images


Title	Stereo Magnification: Learning View Synthesis using Multiplane Images
Authors	Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, Noah Snavely
Abstract	The view synthesis problem–generating novel views of a scene from known imagery–has garnered recent attention due in part to compelling applications in virtual and augmented reality. In this paper, we explore an intriguing scenario for view synthesis: extrapolating views from imagery captured by narrow-baseline stereo cameras, including VR cameras and now-widespread dual-lens camera phones. We call this problem stereo magnification, and propose a learning framework that leverages a new layered representation that we call multiplane images (MPIs). Our method also uses a massive new data source for learning view extrapolation: online videos on YouTube. Using data mined from such videos, we train a deep network that predicts an MPI from an input stereo image pair. This inferred MPI can then be used to synthesize a range of novel views of the scene, including views that extrapolate significantly beyond the input baseline. We show that our method compares favorably with several recent view synthesis methods, and demonstrate applications in magnifying narrow-baseline stereo images.
Tasks	Novel View Synthesis
Published	2018-05-24
URL	http://arxiv.org/abs/1805.09817v1
PDF	http://arxiv.org/pdf/1805.09817v1.pdf
PWC	https://paperswithcode.com/paper/stereo-magnification-learning-view-synthesis
Repo	https://github.com/google/stereo-magnification
Framework	tf

Training Tips for the Transformer Model


Title	Training Tips for the Transformer Model
Authors	Martin Popel, Ondřej Bojar
Abstract	This article describes our experiments in neural machine translation using the recent Tensor2Tensor framework and the Transformer sequence-to-sequence model (Vaswani et al., 2017). We examine some of the critical parameters that affect the final translation quality, memory usage, training stability and training time, concluding each experiment with a set of recommendations for fellow researchers. In addition to confirming the general mantra “more data and larger models”, we address scaling to multiple GPUs and provide practical tips for improved training regarding batch size, learning rate, warmup steps, maximum sentence length and checkpoint averaging. We hope that our observations will allow others to get better results given their particular hardware and data constraints.
Tasks	Machine Translation
Published	2018-04-01
URL	http://arxiv.org/abs/1804.00247v2
PDF	http://arxiv.org/pdf/1804.00247v2.pdf
PWC	https://paperswithcode.com/paper/training-tips-for-the-transformer-model
Repo	https://github.com/eyaler/transformer_tpu
Framework	none

DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency


Title	DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency
Authors	Yuliang Zou, Zelun Luo, Jia-Bin Huang
Abstract	We present an unsupervised learning framework for simultaneously training single-view depth prediction and optical flow estimation models using unlabeled video sequences. Existing unsupervised methods often exploit brightness constancy and spatial smoothness priors to train depth or flow models. In this paper, we propose to leverage geometric consistency as additional supervisory signals. Our core idea is that for rigid regions we can use the predicted scene depth and camera motion to synthesize 2D optical flow by backprojecting the induced 3D scene flow. The discrepancy between the rigid flow (from depth prediction and camera motion) and the estimated flow (from optical flow model) allows us to impose a cross-task consistency loss. While all the networks are jointly optimized during training, they can be applied independently at test time. Extensive experiments demonstrate that our depth and flow models compare favorably with state-of-the-art unsupervised methods.
Tasks	Depth And Camera Motion, Depth Estimation, Optical Flow Estimation
Published	2018-09-05
URL	http://arxiv.org/abs/1809.01649v1
PDF	http://arxiv.org/pdf/1809.01649v1.pdf
PWC	https://paperswithcode.com/paper/df-net-unsupervised-joint-learning-of-depth
Repo	https://github.com/vt-vl-lab/DF-Net
Framework	tf

Painting Outside the Box: Image Outpainting with GANs


Title	Painting Outside the Box: Image Outpainting with GANs
Authors	Mark Sabini, Gili Rusak
Abstract	The challenging task of image outpainting (extrapolation) has received comparatively little attention in relation to its cousin, image inpainting (completion). Accordingly, we present a deep learning approach based on Iizuka et al. for adversarially training a network to hallucinate past image boundaries. We use a three-phase training schedule to stably train a DCGAN architecture on a subset of the Places365 dataset. In line with Iizuka et al., we also use local discriminators to enhance the quality of our output. Once trained, our model is able to outpaint $128 \times 128$ color images relatively realistically, thus allowing for recursive outpainting. Our results show that deep learning approaches to image outpainting are both feasible and promising.
Tasks	Image Inpainting, Image Outpainting
Published	2018-08-25
URL	http://arxiv.org/abs/1808.08483v1
PDF	http://arxiv.org/pdf/1808.08483v1.pdf
PWC	https://paperswithcode.com/paper/painting-outside-the-box-image-outpainting
Repo	https://github.com/ShinyCode/image-outpainting
Framework	tf

Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions


Title	Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions
Authors	Boris Muzellec, Marco Cuturi
Abstract	Embedding complex objects as vectors in low dimensional spaces is a longstanding problem in machine learning. We propose in this work an extension of that approach, which consists in embedding objects as elliptical probability distributions, namely distributions whose densities have elliptical level sets. We endow these measures with the 2-Wasserstein metric, with two important benefits: (i) For such measures, the squared 2-Wasserstein metric has a closed form, equal to a weighted sum of the squared Euclidean distance between means and the squared Bures metric between covariance matrices. The latter is a Riemannian metric between positive semi-definite matrices, which turns out to be Euclidean on a suitable factor representation of such matrices, which is valid on the entire geodesic between these matrices. (ii) The 2-Wasserstein distance boils down to the usual Euclidean metric when comparing Diracs, and therefore provides a natural framework to extend point embeddings. We show that for these reasons Wasserstein elliptical embeddings are more intuitive and yield tools that are better behaved numerically than the alternative choice of Gaussian embeddings with the Kullback-Leibler divergence. In particular, and unlike previous work based on the KL geometry, we learn elliptical distributions that are not necessarily diagonal. We demonstrate the advantages of elliptical embeddings by using them for visualization, to compute embeddings of words, and to reflect entailment or hypernymy.
Tasks
Published	2018-05-19
URL	http://arxiv.org/abs/1805.07594v5
PDF	http://arxiv.org/pdf/1805.07594v5.pdf
PWC	https://paperswithcode.com/paper/generalizing-point-embeddings-using-the
Repo	https://github.com/BorisMuzellec/EllipticalEmbeddings
Framework	none

Investigating the Working of Text Classifiers


Title	Investigating the Working of Text Classifiers
Authors	Devendra Singh Sachan, Manzil Zaheer, Ruslan Salakhutdinov
Abstract	Text classification is one of the most widely studied tasks in natural language processing. Motivated by the principle of compositionality, large multilayer neural network models have been employed for this task in an attempt to effectively utilize the constituent expressions. Almost all of the reported work train large networks using discriminative approaches, which come with a caveat of no proper capacity control, as they tend to latch on to any signal that may not generalize. Using various recent state-of-the-art approaches for text classification, we explore whether these models actually learn to compose the meaning of the sentences or still just focus on some keywords or lexicons for classifying the document. To test our hypothesis, we carefully construct datasets where the training and test splits have no direct overlap of such lexicons, but overall language structure would be similar. We study various text classifiers and observe that there is a big performance drop on these datasets. Finally, we show that even simple models with our proposed regularization techniques, which disincentivize focusing on key lexicons, can substantially improve classification accuracy.
Tasks	Text Classification
Published	2018-01-19
URL	http://arxiv.org/abs/1801.06261v2
PDF	http://arxiv.org/pdf/1801.06261v2.pdf
PWC	https://paperswithcode.com/paper/investigating-the-working-of-text-classifiers
Repo	https://github.com/DevSinghSachan/investigating-text-classifiers
Framework	none


Title	Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
Authors	Xin Wang, Qiuyuan Huang, Asli Celikyilmaz, Jianfeng Gao, Dinghan Shen, Yuan-Fang Wang, William Yang Wang, Lei Zhang
Abstract	Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments. In this paper, we study how to address three critical challenges for this task: the cross-modal grounding, the ill-posed feedback, and the generalization problems. First, we propose a novel Reinforced Cross-Modal Matching (RCM) approach that enforces cross-modal grounding both locally and globally via reinforcement learning (RL). Particularly, a matching critic is used to provide an intrinsic reward to encourage global matching between instructions and trajectories, and a reasoning navigator is employed to perform cross-modal grounding in the local visual scene. Evaluation on a VLN benchmark dataset shows that our RCM model significantly outperforms previous methods by 10% on SPL and achieves the new state-of-the-art performance. To improve the generalizability of the learned policy, we further introduce a Self-Supervised Imitation Learning (SIL) method to explore unseen environments by imitating its own past, good decisions. We demonstrate that SIL can approximate a better and more efficient policy, which tremendously minimizes the success rate performance gap between seen and unseen environments (from 30.7% to 11.7%).
Tasks	Imitation Learning, Vision-Language Navigation
Published	2018-11-25
URL	http://arxiv.org/abs/1811.10092v2
PDF	http://arxiv.org/pdf/1811.10092v2.pdf
PWC	https://paperswithcode.com/paper/reinforced-cross-modal-matching-and-self
Repo	https://github.com/extreme-assistant/cvpr2019
Framework	none

XNet: A convolutional neural network (CNN) implementation for medical X-Ray image segmentation suitable for small datasets


Title	XNet: A convolutional neural network (CNN) implementation for medical X-Ray image segmentation suitable for small datasets
Authors	Joseph Bullock, Carolina Cuesta-Lazaro, Arnau Quera-Bofarull
Abstract	X-Ray image enhancement, along with many other medical image processing applications, requires the segmentation of images into bone, soft tissue, and open beam regions. We apply a machine learning approach to this problem, presenting an end-to-end solution which results in robust and efficient inference. Since medical institutions frequently do not have the resources to process and label the large quantity of X-Ray images usually needed for neural network training, we design an end-to-end solution for small datasets, while achieving state-of-the-art results. Our implementation produces an overall accuracy of 92%, F1 score of 0.92, and an AUC of 0.98, surpassing classical image processing techniques, such as clustering and entropy based methods, while improving upon the output of existing neural networks used for segmentation in non-medical contexts. The code used for this project is available online.
Tasks	Image Enhancement, Medical X-Ray Image Segmentation, Semantic Segmentation
Published	2018-12-03
URL	http://arxiv.org/abs/1812.00548v2
PDF	http://arxiv.org/pdf/1812.00548v2.pdf
PWC	https://paperswithcode.com/paper/xnet-a-convolutional-neural-network-cnn
Repo	https://github.com/JosephPB/XNet
Framework	none

Video Action Transformer Network


Title	Video Action Transformer Network
Authors	Rohit Girdhar, João Carreira, Carl Doersch, Andrew Zisserman
Abstract	We introduce the Action Transformer model for recognizing and localizing human actions in video clips. We repurpose a Transformer-style architecture to aggregate features from the spatiotemporal context around the person whose actions we are trying to classify. We show that by using high-resolution, person-specific, class-agnostic queries, the model spontaneously learns to track individual people and to pick up on semantic context from the actions of others. Additionally its attention mechanism learns to emphasize hands and faces, which are often crucial to discriminate an action - all without explicit supervision other than boxes and class labels. We train and test our Action Transformer network on the Atomic Visual Actions (AVA) dataset, outperforming the state-of-the-art by a significant margin using only raw RGB frames as input.
Tasks	Action Recognition In Videos, Recognizing And Localizing Human Actions
Published	2018-12-06
URL	https://arxiv.org/abs/1812.02707v2
PDF	https://arxiv.org/pdf/1812.02707v2.pdf
PWC	https://paperswithcode.com/paper/video-action-transformer-network
Repo	https://github.com/alainray/action_transformer
Framework	pytorch

DroNet: Efficient convolutional neural network detector for real-time UAV applications


Title	DroNet: Efficient convolutional neural network detector for real-time UAV applications
Authors	Christos Kyrkou, George Plastiras, Stylianos Venieris, Theocharis Theocharides, Christos-Savvas Bouganis
Abstract	Unmanned Aerial Vehicles (drones) are emerging as a promising technology for both environmental and infrastructure monitoring, with broad use in a plethora of applications. Many such applications require the use of computer vision algorithms in order to analyse the information captured from an on-board camera. Such applications include detecting vehicles for emergency response and traffic monitoring. This paper therefore, explores the trade-offs involved in the development of a single-shot object detector based on deep convolutional neural networks (CNNs) that can enable UAVs to perform vehicle detection under a resource constrained environment such as in a UAV. The paper presents a holistic approach for designing such systems; the data collection and training stages, the CNN architecture, and the optimizations necessary to efficiently map such a CNN on a lightweight embedded processing platform suitable for deployment on UAVs. Through the analysis we propose a CNN architecture that is capable of detecting vehicles from aerial UAV images and can operate between 5-18 frames-per-second for a variety of platforms with an overall accuracy of ~95%. Overall, the proposed architecture is suitable for UAV applications, utilizing low-power embedded processors that can be deployed on commercial UAVs.
Tasks	Object Detection In Aerial Images, One-Shot Object Detection, Real-Time Object Detection
Published	2018-07-18
URL	http://arxiv.org/abs/1807.06789v1
PDF	http://arxiv.org/pdf/1807.06789v1.pdf
PWC	https://paperswithcode.com/paper/dronet-efficient-convolutional-neural-network
Repo	https://github.com/gplast/DroNet_PyTorch
Framework	pytorch