Paper Group ANR 457
Multimodal neural pronunciation modeling for spoken languages with logographic origin. Generating Mandarin and Cantonese F0 Contours with Decision Trees and BLSTMs. Generating lyrics with variational autoencoder and multi-modal artist embeddings. Deep Reinforcement Learning for Green Security Games with Real-Time Information. Parallel Transport Con …
Multimodal neural pronunciation modeling for spoken languages with logographic origin
Title | Multimodal neural pronunciation modeling for spoken languages with logographic origin |
Authors | Minh Nguyen, Gia H. Ngo, Nancy F. Chen |
Abstract | Graphemes of most languages encode pronunciation, though some are more explicit than others. Languages like Spanish have a straightforward mapping between its graphemes and phonemes, while this mapping is more convoluted for languages like English. Spoken languages such as Cantonese present even more challenges in pronunciation modeling: (1) they do not have a standard written form, (2) the closest graphemic origins are logographic Han characters, of which only a subset of these logographic characters implicitly encodes pronunciation. In this work, we propose a multimodal approach to predict the pronunciation of Cantonese logographic characters, using neural networks with a geometric representation of logographs and pronunciation of cognates in historically related languages. The proposed framework improves performance by 18.1% and 25.0% respective to unimodal and multimodal baselines. |
Tasks | |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04203v1 |
http://arxiv.org/pdf/1809.04203v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-neural-pronunciation-modeling-for |
Repo | |
Framework | |
Generating Mandarin and Cantonese F0 Contours with Decision Trees and BLSTMs
Title | Generating Mandarin and Cantonese F0 Contours with Decision Trees and BLSTMs |
Authors | Weidong Yuan, Alan W Black |
Abstract | This paper models the fundamental frequency contours on both Mandarin and Cantonese speech with decision trees and DNNs (deep neural networks). Different kinds of f0 representations and model architectures are tested for decision trees and DNNs. A new model called Additive-BLSTM (additive bidirectional long short term memory) that predicts a base f0 contour and a residual f0 contour with two BLSTMs is proposed. With respect to objective measures of RMSE and correlation, applying tone-dependent trees together with sample normalization and delta feature regularization within decision tree framework performs best. While the new Additive-BLSTM model with delta feature regularization performs even better. Subjective listening tests on both Mandarin and Cantonese comparing Random Forest model (multiple decision trees) and the Additive-BLSTM model were also held and confirmed the advantage of the new model according to the listeners’ preference. |
Tasks | |
Published | 2018-07-04 |
URL | http://arxiv.org/abs/1807.01682v1 |
http://arxiv.org/pdf/1807.01682v1.pdf | |
PWC | https://paperswithcode.com/paper/generating-mandarin-and-cantonese-f0-contours |
Repo | |
Framework | |
Generating lyrics with variational autoencoder and multi-modal artist embeddings
Title | Generating lyrics with variational autoencoder and multi-modal artist embeddings |
Authors | Olga Vechtomova, Hareesh Bahuleyan, Amirpasha Ghabussi, Vineet John |
Abstract | We present a system for generating song lyrics lines conditioned on the style of a specified artist. The system uses a variational autoencoder with artist embeddings. We propose the pre-training of artist embeddings with the representations learned by a CNN classifier, which is trained to predict artists based on MEL spectrograms of their song clips. This work is the first step towards combining audio and text modalities of songs for generating lyrics conditioned on the artist’s style. Our preliminary results suggest that there is a benefit in initializing artists’ embeddings with the representations learned by a spectrogram classifier. |
Tasks | |
Published | 2018-12-20 |
URL | http://arxiv.org/abs/1812.08318v1 |
http://arxiv.org/pdf/1812.08318v1.pdf | |
PWC | https://paperswithcode.com/paper/generating-lyrics-with-variational |
Repo | |
Framework | |
Deep Reinforcement Learning for Green Security Games with Real-Time Information
Title | Deep Reinforcement Learning for Green Security Games with Real-Time Information |
Authors | Yufei Wang, Zheyuan Ryan Shi, Lantao Yu, Yi Wu, Rohit Singh, Lucas Joppa, Fei Fang |
Abstract | Green Security Games (GSGs) have been proposed and applied to optimize patrols conducted by law enforcement agencies in green security domains such as combating poaching, illegal logging and overfishing. However, real-time information such as footprints and agents’ subsequent actions upon receiving the information, e.g., rangers following the footprints to chase the poacher, have been neglected in previous work. To fill the gap, we first propose a new game model GSG-I which augments GSGs with sequential movement and the vital element of real-time information. Second, we design a novel deep reinforcement learning-based algorithm, DeDOL, to compute a patrolling strategy that adapts to the real-time information against a best-responding attacker. DeDOL is built upon the double oracle framework and the policy-space response oracle, solving a restricted game and iteratively adding best response strategies to it through training deep Q-networks. Exploring the game structure, DeDOL uses domain-specific heuristic strategies as initial strategies and constructs several local modes for efficient and parallelized training. To our knowledge, this is the first attempt to use Deep Q-Learning for security games. |
Tasks | Q-Learning |
Published | 2018-11-06 |
URL | http://arxiv.org/abs/1811.02483v1 |
http://arxiv.org/pdf/1811.02483v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-for-green |
Repo | |
Framework | |
Parallel Transport Convolution: A New Tool for Convolutional Neural Networks on Manifolds
Title | Parallel Transport Convolution: A New Tool for Convolutional Neural Networks on Manifolds |
Authors | Stefan C. Schonsheck, Bin Dong, Rongjie Lai |
Abstract | Convolution has been playing a prominent role in various applications in science and engineering for many years. It is the most important operation in convolutional neural networks. There has been a recent growth of interests of research in generalizing convolutions on curved domains such as manifolds and graphs. However, existing approaches cannot preserve all the desirable properties of Euclidean convolutions, namely compactly supported filters, directionality, transferability across different manifolds. In this paper we develop a new generalization of the convolution operation, referred to as parallel transport convolution (PTC), on Riemannian manifolds and their discrete counterparts. PTC is designed based on the parallel transportation which is able to translate information along a manifold and to intrinsically preserve directionality. PTC allows for the construction of compactly supported filters and is also robust to manifold deformations. This enables us to preform wavelet-like operations and to define deep convolutional neural networks on curved domains. |
Tasks | |
Published | 2018-05-21 |
URL | http://arxiv.org/abs/1805.07857v2 |
http://arxiv.org/pdf/1805.07857v2.pdf | |
PWC | https://paperswithcode.com/paper/parallel-transport-convolution-a-new-tool-for |
Repo | |
Framework | |
Text classification based on ensemble extreme learning machine
Title | Text classification based on ensemble extreme learning machine |
Authors | Ming Li, Peilun Xiao, Ju Zhang |
Abstract | In this paper, we propose a novel approach based on cost-sensitive ensemble weighted extreme learning machine; we call this approach AE1-WELM. We apply this approach to text classification. AE1-WELM is an algorithm including balanced and imbalanced multiclassification for text classification. Weighted ELM assigning the different weights to the different samples improves the classification accuracy to a certain extent, but weighted ELM considers the differences between samples in the different categories only and ignores the differences between samples within the same categories. We measure the importance of the documents by the sample information entropy, and generate cost-sensitive matrix and factor based on the document importance, then embed the cost-sensitive weighted ELM into the AdaBoost.M1 framework seamlessly. Vector space model(VSM) text representation produces the high dimensions and sparse features which increase the burden of ELM. To overcome this problem, we develop a text classification framework combining the word vector and AE1-WELM. The experimental results show that our method provides an accurate, reliable and effective solution for text classification. |
Tasks | Text Classification |
Published | 2018-05-10 |
URL | http://arxiv.org/abs/1805.06525v1 |
http://arxiv.org/pdf/1805.06525v1.pdf | |
PWC | https://paperswithcode.com/paper/text-classification-based-on-ensemble-extreme |
Repo | |
Framework | |
Uncertainty Gated Network for Land Cover Segmentation
Title | Uncertainty Gated Network for Land Cover Segmentation |
Authors | Guillem Pascual, Santi Seguí, Jordi Vitrià |
Abstract | The production of thematic maps depicting land cover is one of the most common applications of remote sensing. To this end, several semantic segmentation approaches, based on deep learning, have been proposed in the literature, but land cover segmentation is still considered an open problem due to some specific problems related to remote sensing imaging. In this paper we propose a novel approach to deal with the problem of modelling multiscale contexts surrounding pixels of different land cover categories. The approach leverages the computation of a heteroscedastic measure of uncertainty when classifying individual pixels in an image. This classification uncertainty measure is used to define a set of memory gates between layers that allow a principled method to select the optimal decision for each pixel. |
Tasks | Semantic Segmentation |
Published | 2018-05-29 |
URL | http://arxiv.org/abs/1805.11348v1 |
http://arxiv.org/pdf/1805.11348v1.pdf | |
PWC | https://paperswithcode.com/paper/uncertainty-gated-network-for-land-cover |
Repo | |
Framework | |
Random Hinge Forest for Differentiable Learning
Title | Random Hinge Forest for Differentiable Learning |
Authors | Nathan Lay, Adam P. Harrison, Sharon Schreiber, Gitesh Dawer, Adrian Barbu |
Abstract | We propose random hinge forests, a simple, efficient, and novel variant of decision forests. Importantly, random hinge forests can be readily incorporated as a general component within arbitrary computation graphs that are optimized end-to-end with stochastic gradient descent or variants thereof. We derive random hinge forest and ferns, focusing on their sparse and efficient nature, their min-max margin property, strategies to initialize them for arbitrary network architectures, and the class of optimizers most suitable for optimizing random hinge forest. The performance and versatility of random hinge forests are demonstrated by experiments incorporating a variety of of small and large UCI machine learning data sets and also ones involving the MNIST, Letter, and USPS image datasets. We compare random hinge forests with random forests and the more recent backpropagating deep neural decision forests. |
Tasks | |
Published | 2018-02-12 |
URL | http://arxiv.org/abs/1802.03882v2 |
http://arxiv.org/pdf/1802.03882v2.pdf | |
PWC | https://paperswithcode.com/paper/random-hinge-forest-for-differentiable |
Repo | |
Framework | |
VectorDefense: Vectorization as a Defense to Adversarial Examples
Title | VectorDefense: Vectorization as a Defense to Adversarial Examples |
Authors | Vishaal Munusamy Kabilan, Brandon Morris, Anh Nguyen |
Abstract | Training deep neural networks on images represented as grids of pixels has brought to light an interesting phenomenon known as adversarial examples. Inspired by how humans reconstruct abstract concepts, we attempt to codify the input bitmap image into a set of compact, interpretable elements to avoid being fooled by the adversarial structures. We take the first step in this direction by experimenting with image vectorization as an input transformation step to map the adversarial examples back into the natural manifold of MNIST handwritten digits. We compare our method vs. state-of-the-art input transformations and further discuss the trade-offs between a hand-designed and a learned transformation defense. |
Tasks | |
Published | 2018-04-23 |
URL | http://arxiv.org/abs/1804.08529v1 |
http://arxiv.org/pdf/1804.08529v1.pdf | |
PWC | https://paperswithcode.com/paper/vectordefense-vectorization-as-a-defense-to |
Repo | |
Framework | |
Thermodynamics of Restricted Boltzmann Machines and related learning dynamics
Title | Thermodynamics of Restricted Boltzmann Machines and related learning dynamics |
Authors | Aurélien Decelle, Giancarlo Fissore, Cyril Furtlehner |
Abstract | We investigate the thermodynamic properties of a Restricted Boltzmann Machine (RBM), a simple energy-based generative model used in the context of unsupervised learning. Assuming the information content of this model to be mainly reflected by the spectral properties of its weight matrix $W$, we try to make a realistic analysis by averaging over an appropriate statistical ensemble of RBMs. First, a phase diagram is derived. Otherwise similar to that of the Sherrington- Kirkpatrick (SK) model with ferromagnetic couplings, the RBM’s phase diagram presents a ferromagnetic phase which may or may not be of compositional type depending on the kurtosis of the distribution of the components of the singular vectors of $W$. Subsequently, the learning dynamics of the RBM is studied in the thermodynamic limit. A “typical” learning trajectory is shown to solve an effective dynamical equation, based on the aforementioned ensemble average and explicitly involving order parameters obtained from the thermodynamic analysis. In particular, this let us show how the evolution of the dominant singular values of $W$, and thus of the unstable modes, is driven by the input data. At the beginning of the training, in which the RBM is found to operate in the linear regime, the unstable modes reflect the dominant covariance modes of the data. In the non-linear regime, instead, the selected modes interact and eventually impose a matching of the order parameters to their empirical counterparts estimated from the data. Finally, we illustrate our considerations by performing experiments on both artificial and real data, showing in particular how the RBM operates in the ferromagnetic compositional phase. |
Tasks | |
Published | 2018-03-05 |
URL | http://arxiv.org/abs/1803.01960v2 |
http://arxiv.org/pdf/1803.01960v2.pdf | |
PWC | https://paperswithcode.com/paper/thermodynamics-of-restricted-boltzmann |
Repo | |
Framework | |
TequilaGAN: How to easily identify GAN samples
Title | TequilaGAN: How to easily identify GAN samples |
Authors | Rafael Valle, Wilson Cai, Anish Doshi |
Abstract | In this paper we show strategies to easily identify fake samples generated with the Generative Adversarial Network framework. One strategy is based on the statistical analysis and comparison of raw pixel values and features extracted from them. The other strategy learns formal specifications from the real data and shows that fake samples violate the specifications of the real data. We show that fake samples produced with GANs have a universal signature that can be used to identify fake samples. We provide results on MNIST, CIFAR10, music and speech data. |
Tasks | |
Published | 2018-07-13 |
URL | http://arxiv.org/abs/1807.04919v1 |
http://arxiv.org/pdf/1807.04919v1.pdf | |
PWC | https://paperswithcode.com/paper/tequilagan-how-to-easily-identify-gan-samples |
Repo | |
Framework | |
Faster SGD training by minibatch persistency
Title | Faster SGD training by minibatch persistency |
Authors | Matteo Fischetti, Iacopo Mandatelli, Domenico Salvagnin |
Abstract | It is well known that, for most datasets, the use of large-size minibatches for Stochastic Gradient Descent (SGD) typically leads to slow convergence and poor generalization. On the other hand, large minibatches are of great practical interest as they allow for a better exploitation of modern GPUs. Previous literature on the subject concentrated on how to adjust the main SGD parameters (in particular, the learning rate) when using large minibatches. In this work we introduce an additional feature, that we call minibatch persistency, that consists in reusing the same minibatch for K consecutive SGD iterations. The computational conjecture here is that a large minibatch contains a significant sample of the training set, so one can afford to slightly overfitting it without worsening generalization too much. The approach is intended to speedup SGD convergence, and also has the advantage of reducing the overhead related to data loading on the internal GPU memory. We present computational results on CIFAR-10 with an AlexNet architecture, showing that even small persistency values (K=2 or 5) already lead to a significantly faster convergence and to a comparable (or even better) generalization than the standard “disposable minibatch” approach (K=1), in particular when large minibatches are used. The lesson learned is that minibatch persistency can be a simple yet effective way to deal with large minibatches. |
Tasks | |
Published | 2018-06-19 |
URL | http://arxiv.org/abs/1806.07353v1 |
http://arxiv.org/pdf/1806.07353v1.pdf | |
PWC | https://paperswithcode.com/paper/faster-sgd-training-by-minibatch-persistency |
Repo | |
Framework | |
Multiple People Tracking Using Hierarchical Deep Tracklet Re-identification
Title | Multiple People Tracking Using Hierarchical Deep Tracklet Re-identification |
Authors | Maryam Babaee, Ali Athar, Gerhard Rigoll |
Abstract | The task of multiple people tracking in monocular videos is challenging because of the numerous difficulties involved: occlusions, varying environments, crowded scenes, camera parameters and motion. In the tracking-by-detection paradigm, most approaches adopt person re-identification techniques based on computing the pairwise similarity between detections. However, these techniques are less effective in handling long-term occlusions. By contrast, tracklet (a sequence of detections) re-identification can improve association accuracy since tracklets offer a richer set of visual appearance and spatio-temporal cues. In this paper, we propose a tracking framework that employs a hierarchical clustering mechanism for merging tracklets. To this end, tracklet re-identification is performed by utilizing a novel multi-stage deep network that can jointly reason about the visual appearance and spatio-temporal properties of a pair of tracklets, thereby providing a robust measure of affinity. Experimental results on the challenging MOT16 and MOT17 benchmarks show that our method significantly outperforms state-of-the-arts. |
Tasks | Multiple People Tracking, Person Re-Identification |
Published | 2018-11-09 |
URL | http://arxiv.org/abs/1811.04091v2 |
http://arxiv.org/pdf/1811.04091v2.pdf | |
PWC | https://paperswithcode.com/paper/multiple-people-tracking-using-hierarchical |
Repo | |
Framework | |
Three tree priors and five datasets: A study of the effect of tree priors in Indo-European phylogenetics
Title | Three tree priors and five datasets: A study of the effect of tree priors in Indo-European phylogenetics |
Authors | Taraka Rama |
Abstract | The age of the root of the Indo-European language family has received much attention since the application of Bayesian phylogenetic methods by Gray and Atkinson(2003). The root age of the Indo-European family has tended to decrease from an age that supported the Anatolian origin hypothesis to an age that supports the Steppe origin hypothesis with the application of new models (Chang et al., 2015). However, none of the published work in the Indo-European phylogenetics studied the effect of tree priors on phylogenetic analyses of the Indo-European family. In this paper, I intend to fill this gap by exploring the effect of tree priors on different aspects of the Indo-European family’s phylogenetic inference. I apply three tree priors—Uniform, Fossilized Birth-Death (FBD), and Coalescent—to five publicly available datasets of the Indo-European language family. I evaluate the posterior distribution of the trees from the Bayesian analysis using Bayes Factor, and find that there is support for the Steppe origin hypothesis in the case of two tree priors. I report the median and 95% highest posterior density (HPD) interval of the root ages for all the three tree priors. A model comparison suggested that either Uniform prior or FBD prior is more suitable than the Coalescent prior to the datasets belonging to the Indo-European language family. |
Tasks | |
Published | 2018-05-09 |
URL | http://arxiv.org/abs/1805.03645v1 |
http://arxiv.org/pdf/1805.03645v1.pdf | |
PWC | https://paperswithcode.com/paper/three-tree-priors-and-five-datasets-a-study |
Repo | |
Framework | |
Stream Reasoning on Expressive Logics
Title | Stream Reasoning on Expressive Logics |
Authors | Gulay Unel |
Abstract | Data streams occur widely in various real world applications. The research on streaming data mainly focuses on the data management, query evaluation and optimization on these data, however the work on reasoning procedures for streaming knowledge bases on both the assertional and terminological levels is very limited. Typically reasoning services on large knowledge bases are very expensive, and need to be applied continuously when the data is received as a stream. Hence new techniques for optimizing this continuous process is needed for developing efficient reasoners on streaming data. In this paper, we survey the related research on reasoning on expressive logics that can be applied to this setting, and point to further research directions in this area. |
Tasks | |
Published | 2018-08-14 |
URL | http://arxiv.org/abs/1808.04738v2 |
http://arxiv.org/pdf/1808.04738v2.pdf | |
PWC | https://paperswithcode.com/paper/stream-reasoning-on-expressive-logics |
Repo | |
Framework | |