October 18, 2019

2887 words 14 mins read

Paper Group ANR 457

Multimodal neural pronunciation modeling for spoken languages with logographic origin. Generating Mandarin and Cantonese F0 Contours with Decision Trees and BLSTMs. Generating lyrics with variational autoencoder and multi-modal artist embeddings. Deep Reinforcement Learning for Green Security Games with Real-Time Information. Parallel Transport Con …

Multimodal neural pronunciation modeling for spoken languages with logographic origin


Title	Multimodal neural pronunciation modeling for spoken languages with logographic origin
Authors	Minh Nguyen, Gia H. Ngo, Nancy F. Chen
Abstract	Graphemes of most languages encode pronunciation, though some are more explicit than others. Languages like Spanish have a straightforward mapping between its graphemes and phonemes, while this mapping is more convoluted for languages like English. Spoken languages such as Cantonese present even more challenges in pronunciation modeling: (1) they do not have a standard written form, (2) the closest graphemic origins are logographic Han characters, of which only a subset of these logographic characters implicitly encodes pronunciation. In this work, we propose a multimodal approach to predict the pronunciation of Cantonese logographic characters, using neural networks with a geometric representation of logographs and pronunciation of cognates in historically related languages. The proposed framework improves performance by 18.1% and 25.0% respective to unimodal and multimodal baselines.
Tasks
Published	2018-09-12
URL	http://arxiv.org/abs/1809.04203v1
PDF	http://arxiv.org/pdf/1809.04203v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-neural-pronunciation-modeling-for
Repo
Framework

Generating Mandarin and Cantonese F0 Contours with Decision Trees and BLSTMs


Title	Generating Mandarin and Cantonese F0 Contours with Decision Trees and BLSTMs
Authors	Weidong Yuan, Alan W Black
Abstract	This paper models the fundamental frequency contours on both Mandarin and Cantonese speech with decision trees and DNNs (deep neural networks). Different kinds of f0 representations and model architectures are tested for decision trees and DNNs. A new model called Additive-BLSTM (additive bidirectional long short term memory) that predicts a base f0 contour and a residual f0 contour with two BLSTMs is proposed. With respect to objective measures of RMSE and correlation, applying tone-dependent trees together with sample normalization and delta feature regularization within decision tree framework performs best. While the new Additive-BLSTM model with delta feature regularization performs even better. Subjective listening tests on both Mandarin and Cantonese comparing Random Forest model (multiple decision trees) and the Additive-BLSTM model were also held and confirmed the advantage of the new model according to the listeners’ preference.
Tasks
Published	2018-07-04
URL	http://arxiv.org/abs/1807.01682v1
PDF	http://arxiv.org/pdf/1807.01682v1.pdf
PWC	https://paperswithcode.com/paper/generating-mandarin-and-cantonese-f0-contours
Repo
Framework


Title	Generating lyrics with variational autoencoder and multi-modal artist embeddings
Authors	Olga Vechtomova, Hareesh Bahuleyan, Amirpasha Ghabussi, Vineet John
Abstract	We present a system for generating song lyrics lines conditioned on the style of a specified artist. The system uses a variational autoencoder with artist embeddings. We propose the pre-training of artist embeddings with the representations learned by a CNN classifier, which is trained to predict artists based on MEL spectrograms of their song clips. This work is the first step towards combining audio and text modalities of songs for generating lyrics conditioned on the artist’s style. Our preliminary results suggest that there is a benefit in initializing artists’ embeddings with the representations learned by a spectrogram classifier.
Tasks
Published	2018-12-20
URL	http://arxiv.org/abs/1812.08318v1
PDF	http://arxiv.org/pdf/1812.08318v1.pdf
PWC	https://paperswithcode.com/paper/generating-lyrics-with-variational
Repo
Framework

Deep Reinforcement Learning for Green Security Games with Real-Time Information


Title	Deep Reinforcement Learning for Green Security Games with Real-Time Information
Authors	Yufei Wang, Zheyuan Ryan Shi, Lantao Yu, Yi Wu, Rohit Singh, Lucas Joppa, Fei Fang
Abstract	Green Security Games (GSGs) have been proposed and applied to optimize patrols conducted by law enforcement agencies in green security domains such as combating poaching, illegal logging and overfishing. However, real-time information such as footprints and agents’ subsequent actions upon receiving the information, e.g., rangers following the footprints to chase the poacher, have been neglected in previous work. To fill the gap, we first propose a new game model GSG-I which augments GSGs with sequential movement and the vital element of real-time information. Second, we design a novel deep reinforcement learning-based algorithm, DeDOL, to compute a patrolling strategy that adapts to the real-time information against a best-responding attacker. DeDOL is built upon the double oracle framework and the policy-space response oracle, solving a restricted game and iteratively adding best response strategies to it through training deep Q-networks. Exploring the game structure, DeDOL uses domain-specific heuristic strategies as initial strategies and constructs several local modes for efficient and parallelized training. To our knowledge, this is the first attempt to use Deep Q-Learning for security games.
Tasks	Q-Learning
Published	2018-11-06
URL	http://arxiv.org/abs/1811.02483v1
PDF	http://arxiv.org/pdf/1811.02483v1.pdf
PWC	https://paperswithcode.com/paper/deep-reinforcement-learning-for-green
Repo
Framework

Parallel Transport Convolution: A New Tool for Convolutional Neural Networks on Manifolds


Title	Parallel Transport Convolution: A New Tool for Convolutional Neural Networks on Manifolds
Authors	Stefan C. Schonsheck, Bin Dong, Rongjie Lai
Abstract	Convolution has been playing a prominent role in various applications in science and engineering for many years. It is the most important operation in convolutional neural networks. There has been a recent growth of interests of research in generalizing convolutions on curved domains such as manifolds and graphs. However, existing approaches cannot preserve all the desirable properties of Euclidean convolutions, namely compactly supported filters, directionality, transferability across different manifolds. In this paper we develop a new generalization of the convolution operation, referred to as parallel transport convolution (PTC), on Riemannian manifolds and their discrete counterparts. PTC is designed based on the parallel transportation which is able to translate information along a manifold and to intrinsically preserve directionality. PTC allows for the construction of compactly supported filters and is also robust to manifold deformations. This enables us to preform wavelet-like operations and to define deep convolutional neural networks on curved domains.
Tasks
Published	2018-05-21
URL	http://arxiv.org/abs/1805.07857v2
PDF	http://arxiv.org/pdf/1805.07857v2.pdf
PWC	https://paperswithcode.com/paper/parallel-transport-convolution-a-new-tool-for
Repo
Framework

Text classification based on ensemble extreme learning machine


Title	Text classification based on ensemble extreme learning machine
Authors	Ming Li, Peilun Xiao, Ju Zhang
Abstract	In this paper, we propose a novel approach based on cost-sensitive ensemble weighted extreme learning machine; we call this approach AE1-WELM. We apply this approach to text classification. AE1-WELM is an algorithm including balanced and imbalanced multiclassification for text classification. Weighted ELM assigning the different weights to the different samples improves the classification accuracy to a certain extent, but weighted ELM considers the differences between samples in the different categories only and ignores the differences between samples within the same categories. We measure the importance of the documents by the sample information entropy, and generate cost-sensitive matrix and factor based on the document importance, then embed the cost-sensitive weighted ELM into the AdaBoost.M1 framework seamlessly. Vector space model(VSM) text representation produces the high dimensions and sparse features which increase the burden of ELM. To overcome this problem, we develop a text classification framework combining the word vector and AE1-WELM. The experimental results show that our method provides an accurate, reliable and effective solution for text classification.
Tasks	Text Classification
Published	2018-05-10
URL	http://arxiv.org/abs/1805.06525v1
PDF	http://arxiv.org/pdf/1805.06525v1.pdf
PWC	https://paperswithcode.com/paper/text-classification-based-on-ensemble-extreme
Repo
Framework

Uncertainty Gated Network for Land Cover Segmentation


Title	Uncertainty Gated Network for Land Cover Segmentation
Authors	Guillem Pascual, Santi Seguí, Jordi Vitrià
Abstract	The production of thematic maps depicting land cover is one of the most common applications of remote sensing. To this end, several semantic segmentation approaches, based on deep learning, have been proposed in the literature, but land cover segmentation is still considered an open problem due to some specific problems related to remote sensing imaging. In this paper we propose a novel approach to deal with the problem of modelling multiscale contexts surrounding pixels of different land cover categories. The approach leverages the computation of a heteroscedastic measure of uncertainty when classifying individual pixels in an image. This classification uncertainty measure is used to define a set of memory gates between layers that allow a principled method to select the optimal decision for each pixel.
Tasks	Semantic Segmentation
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11348v1
PDF	http://arxiv.org/pdf/1805.11348v1.pdf
PWC	https://paperswithcode.com/paper/uncertainty-gated-network-for-land-cover
Repo
Framework

Random Hinge Forest for Differentiable Learning


Title	Random Hinge Forest for Differentiable Learning
Authors	Nathan Lay, Adam P. Harrison, Sharon Schreiber, Gitesh Dawer, Adrian Barbu
Abstract	We propose random hinge forests, a simple, efficient, and novel variant of decision forests. Importantly, random hinge forests can be readily incorporated as a general component within arbitrary computation graphs that are optimized end-to-end with stochastic gradient descent or variants thereof. We derive random hinge forest and ferns, focusing on their sparse and efficient nature, their min-max margin property, strategies to initialize them for arbitrary network architectures, and the class of optimizers most suitable for optimizing random hinge forest. The performance and versatility of random hinge forests are demonstrated by experiments incorporating a variety of of small and large UCI machine learning data sets and also ones involving the MNIST, Letter, and USPS image datasets. We compare random hinge forests with random forests and the more recent backpropagating deep neural decision forests.
Tasks
Published	2018-02-12
URL	http://arxiv.org/abs/1802.03882v2
PDF	http://arxiv.org/pdf/1802.03882v2.pdf
PWC	https://paperswithcode.com/paper/random-hinge-forest-for-differentiable
Repo
Framework

VectorDefense: Vectorization as a Defense to Adversarial Examples


Title	VectorDefense: Vectorization as a Defense to Adversarial Examples
Authors	Vishaal Munusamy Kabilan, Brandon Morris, Anh Nguyen
Abstract	Training deep neural networks on images represented as grids of pixels has brought to light an interesting phenomenon known as adversarial examples. Inspired by how humans reconstruct abstract concepts, we attempt to codify the input bitmap image into a set of compact, interpretable elements to avoid being fooled by the adversarial structures. We take the first step in this direction by experimenting with image vectorization as an input transformation step to map the adversarial examples back into the natural manifold of MNIST handwritten digits. We compare our method vs. state-of-the-art input transformations and further discuss the trade-offs between a hand-designed and a learned transformation defense.
Tasks
Published	2018-04-23
URL	http://arxiv.org/abs/1804.08529v1
PDF	http://arxiv.org/pdf/1804.08529v1.pdf
PWC	https://paperswithcode.com/paper/vectordefense-vectorization-as-a-defense-to
Repo
Framework


Title	Thermodynamics of Restricted Boltzmann Machines and related learning dynamics
Authors	Aurélien Decelle, Giancarlo Fissore, Cyril Furtlehner
Abstract	We investigate the thermodynamic properties of a Restricted Boltzmann Machine (RBM), a simple energy-based generative model used in the context of unsupervised learning. Assuming the information content of this model to be mainly reflected by the spectral properties of its weight matrix $W$, we try to make a realistic analysis by averaging over an appropriate statistical ensemble of RBMs. First, a phase diagram is derived. Otherwise similar to that of the Sherrington- Kirkpatrick (SK) model with ferromagnetic couplings, the RBM’s phase diagram presents a ferromagnetic phase which may or may not be of compositional type depending on the kurtosis of the distribution of the components of the singular vectors of $W$. Subsequently, the learning dynamics of the RBM is studied in the thermodynamic limit. A “typical” learning trajectory is shown to solve an effective dynamical equation, based on the aforementioned ensemble average and explicitly involving order parameters obtained from the thermodynamic analysis. In particular, this let us show how the evolution of the dominant singular values of $W$, and thus of the unstable modes, is driven by the input data. At the beginning of the training, in which the RBM is found to operate in the linear regime, the unstable modes reflect the dominant covariance modes of the data. In the non-linear regime, instead, the selected modes interact and eventually impose a matching of the order parameters to their empirical counterparts estimated from the data. Finally, we illustrate our considerations by performing experiments on both artificial and real data, showing in particular how the RBM operates in the ferromagnetic compositional phase.
Tasks
Published	2018-03-05
URL	http://arxiv.org/abs/1803.01960v2
PDF	http://arxiv.org/pdf/1803.01960v2.pdf
PWC	https://paperswithcode.com/paper/thermodynamics-of-restricted-boltzmann
Repo
Framework

TequilaGAN: How to easily identify GAN samples


Title	TequilaGAN: How to easily identify GAN samples
Authors	Rafael Valle, Wilson Cai, Anish Doshi
Abstract	In this paper we show strategies to easily identify fake samples generated with the Generative Adversarial Network framework. One strategy is based on the statistical analysis and comparison of raw pixel values and features extracted from them. The other strategy learns formal specifications from the real data and shows that fake samples violate the specifications of the real data. We show that fake samples produced with GANs have a universal signature that can be used to identify fake samples. We provide results on MNIST, CIFAR10, music and speech data.
Tasks
Published	2018-07-13
URL	http://arxiv.org/abs/1807.04919v1
PDF	http://arxiv.org/pdf/1807.04919v1.pdf
PWC	https://paperswithcode.com/paper/tequilagan-how-to-easily-identify-gan-samples
Repo
Framework

Faster SGD training by minibatch persistency


Title	Faster SGD training by minibatch persistency
Authors	Matteo Fischetti, Iacopo Mandatelli, Domenico Salvagnin
Abstract	It is well known that, for most datasets, the use of large-size minibatches for Stochastic Gradient Descent (SGD) typically leads to slow convergence and poor generalization. On the other hand, large minibatches are of great practical interest as they allow for a better exploitation of modern GPUs. Previous literature on the subject concentrated on how to adjust the main SGD parameters (in particular, the learning rate) when using large minibatches. In this work we introduce an additional feature, that we call minibatch persistency, that consists in reusing the same minibatch for K consecutive SGD iterations. The computational conjecture here is that a large minibatch contains a significant sample of the training set, so one can afford to slightly overfitting it without worsening generalization too much. The approach is intended to speedup SGD convergence, and also has the advantage of reducing the overhead related to data loading on the internal GPU memory. We present computational results on CIFAR-10 with an AlexNet architecture, showing that even small persistency values (K=2 or 5) already lead to a significantly faster convergence and to a comparable (or even better) generalization than the standard “disposable minibatch” approach (K=1), in particular when large minibatches are used. The lesson learned is that minibatch persistency can be a simple yet effective way to deal with large minibatches.
Tasks
Published	2018-06-19
URL	http://arxiv.org/abs/1806.07353v1
PDF	http://arxiv.org/pdf/1806.07353v1.pdf
PWC	https://paperswithcode.com/paper/faster-sgd-training-by-minibatch-persistency
Repo
Framework

Multiple People Tracking Using Hierarchical Deep Tracklet Re-identification


Title	Multiple People Tracking Using Hierarchical Deep Tracklet Re-identification
Authors	Maryam Babaee, Ali Athar, Gerhard Rigoll
Abstract	The task of multiple people tracking in monocular videos is challenging because of the numerous difficulties involved: occlusions, varying environments, crowded scenes, camera parameters and motion. In the tracking-by-detection paradigm, most approaches adopt person re-identification techniques based on computing the pairwise similarity between detections. However, these techniques are less effective in handling long-term occlusions. By contrast, tracklet (a sequence of detections) re-identification can improve association accuracy since tracklets offer a richer set of visual appearance and spatio-temporal cues. In this paper, we propose a tracking framework that employs a hierarchical clustering mechanism for merging tracklets. To this end, tracklet re-identification is performed by utilizing a novel multi-stage deep network that can jointly reason about the visual appearance and spatio-temporal properties of a pair of tracklets, thereby providing a robust measure of affinity. Experimental results on the challenging MOT16 and MOT17 benchmarks show that our method significantly outperforms state-of-the-arts.
Tasks	Multiple People Tracking, Person Re-Identification
Published	2018-11-09
URL	http://arxiv.org/abs/1811.04091v2
PDF	http://arxiv.org/pdf/1811.04091v2.pdf
PWC	https://paperswithcode.com/paper/multiple-people-tracking-using-hierarchical
Repo
Framework

Three tree priors and five datasets: A study of the effect of tree priors in Indo-European phylogenetics


Title	Three tree priors and five datasets: A study of the effect of tree priors in Indo-European phylogenetics
Authors	Taraka Rama
Abstract	The age of the root of the Indo-European language family has received much attention since the application of Bayesian phylogenetic methods by Gray and Atkinson(2003). The root age of the Indo-European family has tended to decrease from an age that supported the Anatolian origin hypothesis to an age that supports the Steppe origin hypothesis with the application of new models (Chang et al., 2015). However, none of the published work in the Indo-European phylogenetics studied the effect of tree priors on phylogenetic analyses of the Indo-European family. In this paper, I intend to fill this gap by exploring the effect of tree priors on different aspects of the Indo-European family’s phylogenetic inference. I apply three tree priors—Uniform, Fossilized Birth-Death (FBD), and Coalescent—to five publicly available datasets of the Indo-European language family. I evaluate the posterior distribution of the trees from the Bayesian analysis using Bayes Factor, and find that there is support for the Steppe origin hypothesis in the case of two tree priors. I report the median and 95% highest posterior density (HPD) interval of the root ages for all the three tree priors. A model comparison suggested that either Uniform prior or FBD prior is more suitable than the Coalescent prior to the datasets belonging to the Indo-European language family.
Tasks
Published	2018-05-09
URL	http://arxiv.org/abs/1805.03645v1
PDF	http://arxiv.org/pdf/1805.03645v1.pdf
PWC	https://paperswithcode.com/paper/three-tree-priors-and-five-datasets-a-study
Repo
Framework

Stream Reasoning on Expressive Logics


Title	Stream Reasoning on Expressive Logics
Authors	Gulay Unel
Abstract	Data streams occur widely in various real world applications. The research on streaming data mainly focuses on the data management, query evaluation and optimization on these data, however the work on reasoning procedures for streaming knowledge bases on both the assertional and terminological levels is very limited. Typically reasoning services on large knowledge bases are very expensive, and need to be applied continuously when the data is received as a stream. Hence new techniques for optimizing this continuous process is needed for developing efficient reasoners on streaming data. In this paper, we survey the related research on reasoning on expressive logics that can be applied to this setting, and point to further research directions in this area.
Tasks
Published	2018-08-14
URL	http://arxiv.org/abs/1808.04738v2
PDF	http://arxiv.org/pdf/1808.04738v2.pdf
PWC	https://paperswithcode.com/paper/stream-reasoning-on-expressive-logics
Repo
Framework