July 30, 2019

2796 words 14 mins read

Paper Group AWR 76

Designing and building the mlpack open-source machine learning library. Evolving Spatially Aggregated Features from Satellite Imagery for Regional Modeling. LSTM Pose Machines. A Batch Learning Framework for Scalable Personalized Ranking. How consistent is my model with the data? Information-Theoretic Model Check. The implementation of a Deep Recur …

Designing and building the mlpack open-source machine learning library


Title	Designing and building the mlpack open-source machine learning library
Authors	Ryan R. Curtin, Marcus Edel
Abstract	mlpack is an open-source C++ machine learning library with an emphasis on speed and flexibility. Since its original inception in 2007, it has grown to be a large project implementing a wide variety of machine learning algorithms, from standard techniques such as decision trees and logistic regression to modern techniques such as deep neural networks as well as other recently-published cutting-edge techniques not found in any other library. mlpack is quite fast, with benchmarks showing mlpack outperforming other libraries’ implementations of the same methods. mlpack has an active community, with contributors from around the world—including some from PUST. This short paper describes the goals and design of mlpack, discusses how the open-source community functions, and shows an example usage of mlpack for a simple data science problem.
Tasks
Published	2017-08-17
URL	http://arxiv.org/abs/1708.05279v2
PDF	http://arxiv.org/pdf/1708.05279v2.pdf
PWC	https://paperswithcode.com/paper/designing-and-building-the-mlpack-open-source
Repo	https://github.com/mlpack/mlpack
Framework	none

Evolving Spatially Aggregated Features from Satellite Imagery for Regional Modeling


Title	Evolving Spatially Aggregated Features from Satellite Imagery for Regional Modeling
Authors	Sam Kriegman, Marcin Szubert, Josh C. Bongard, Christian Skalka
Abstract	Satellite imagery and remote sensing provide explanatory variables at relatively high resolutions for modeling geospatial phenomena, yet regional summaries are often desirable for analysis and actionable insight. In this paper, we propose a novel method of inducing spatial aggregations as a component of the machine learning process, yielding regional model features whose construction is driven by model prediction performance rather than prior assumptions. Our results demonstrate that Genetic Programming is particularly well suited to this type of feature construction because it can automatically synthesize appropriate aggregations, as well as better incorporate them into predictive models compared to other regression methods we tested. In our experiments we consider a specific problem instance and real-world dataset relevant to predicting snow properties in high-mountain Asia.
Tasks
Published	2017-06-24
URL	http://arxiv.org/abs/1706.07888v2
PDF	http://arxiv.org/pdf/1706.07888v2.pdf
PWC	https://paperswithcode.com/paper/evolving-spatially-aggregated-features-from
Repo	https://github.com/skriegman/ppsn_2016
Framework	none

LSTM Pose Machines


Title	LSTM Pose Machines
Authors	Yue Luo, Jimmy Ren, Zhouxia Wang, Wenxiu Sun, Jinshan Pan, Jianbo Liu, Jiahao Pang, Liang Lin
Abstract	We observed that recent state-of-the-art results on single image human pose estimation were achieved by multi-stage Convolution Neural Networks (CNN). Notwithstanding the superior performance on static images, the application of these models on videos is not only computationally intensive, it also suffers from performance degeneration and flicking. Such suboptimal results are mainly attributed to the inability of imposing sequential geometric consistency, handling severe image quality degradation (e.g. motion blur and occlusion) as well as the inability of capturing the temporal correlation among video frames. In this paper, we proposed a novel recurrent network to tackle these problems. We showed that if we were to impose the weight sharing scheme to the multi-stage CNN, it could be re-written as a Recurrent Neural Network (RNN). This property decouples the relationship among multiple network stages and results in significantly faster speed in invoking the network for videos. It also enables the adoption of Long Short-Term Memory (LSTM) units between video frames. We found such memory augmented RNN is very effective in imposing geometric consistency among frames. It also well handles input quality degradation in videos while successfully stabilizes the sequential outputs. The experiments showed that our approach significantly outperformed current state-of-the-art methods on two large-scale video pose estimation benchmarks. We also explored the memory cells inside the LSTM and provided insights on why such mechanism would benefit the prediction for video-based pose estimations.
Tasks	Pose Estimation
Published	2017-12-18
URL	http://arxiv.org/abs/1712.06316v4
PDF	http://arxiv.org/pdf/1712.06316v4.pdf
PWC	https://paperswithcode.com/paper/lstm-pose-machines
Repo	https://github.com/lawy623/LSTM_Pose_Machines
Framework	none

A Batch Learning Framework for Scalable Personalized Ranking


Title	A Batch Learning Framework for Scalable Personalized Ranking
Authors	Kuan Liu, Prem Natarajan
Abstract	In designing personalized ranking algorithms, it is desirable to encourage a high precision at the top of the ranked list. Existing methods either seek a smooth convex surrogate for a non-smooth ranking metric or directly modify updating procedures to encourage top accuracy. In this work we point out that these methods do not scale well to a large-scale setting, and this is partly due to the inaccurate pointwise or pairwise rank estimation. We propose a new framework for personalized ranking. It uses batch-based rank estimators and smooth rank-sensitive loss functions. This new batch learning framework leads to more stable and accurate rank approximations compared to previous work. Moreover, it enables explicit use of parallel computation to speed up training. We conduct empirical evaluation on three item recommendation tasks. Our method shows consistent accuracy improvements over state-of-the-art methods. Additionally, we observe time efficiency advantages when data scale increases.
Tasks
Published	2017-11-10
URL	http://arxiv.org/abs/1711.04019v1
PDF	http://arxiv.org/pdf/1711.04019v1.pdf
PWC	https://paperswithcode.com/paper/a-batch-learning-framework-for-scalable
Repo	https://github.com/skywaLKer518/A-Recsys
Framework	tf

How consistent is my model with the data? Information-Theoretic Model Check


Title	How consistent is my model with the data? Information-Theoretic Model Check
Authors	Andreas Svensson, Dave Zachariah, Thomas B. Schön
Abstract	The choice of model class is fundamental in statistical learning and system identification, no matter whether the class is derived from physical principles or is a generic black-box. We develop a method to evaluate the specified model class by assessing its capability of reproducing data that is similar to the observed data record. This model check is based on the information-theoretic properties of models viewed as data generators and is applicable to e.g. sequential data and nonlinear dynamical models. The method can be understood as a specific two-sided posterior predictive test. We apply the information-theoretic model check to both synthetic and real data and compare it with a classical whiteness test.
Tasks
Published	2017-12-07
URL	http://arxiv.org/abs/1712.02675v2
PDF	http://arxiv.org/pdf/1712.02675v2.pdf
PWC	https://paperswithcode.com/paper/how-consistent-is-my-model-with-the-data
Repo	https://github.com/saerdna-se/itmc
Framework	none

The implementation of a Deep Recurrent Neural Network Language Model on a Xilinx FPGA


Title	The implementation of a Deep Recurrent Neural Network Language Model on a Xilinx FPGA
Authors	Yufeng Hao, Steven Quigley
Abstract	Recently, FPGA has been increasingly applied to problems such as speech recognition, machine learning, and cloud computation such as the Bing search engine used by Microsoft. This is due to FPGAs great parallel computation capacity as well as low power consumption compared to general purpose processors. However, these applications mainly focus on large scale FPGA clusters which have an extreme processing power for executing massive matrix or convolution operations but are unsuitable for portable or mobile applications. This paper describes research on single-FPGA platform to explore the applications of FPGAs in these fields. In this project, we design a Deep Recurrent Neural Network (DRNN) Language Model (LM) and implement a hardware accelerator with AXI Stream interface on a PYNQ board which is equipped with a XILINX ZYNQ SOC XC7Z020 1CLG400C. The PYNQ has not only abundant programmable logic resources but also a flexible embedded operation system, which makes it suitable to be applied in the natural language processing field. We design the DRNN language model with Python and Theano, train the model on a CPU platform, and deploy the model on a PYNQ board to validate the model with Jupyter notebook. Meanwhile, we design the hardware accelerator with Overlay, which is a kind of hardware library on PYNQ, and verify the acceleration effect on the PYNQ board. Finally, we have found that the DRNN language model can be deployed on the embedded system smoothly and the Overlay accelerator with AXI Stream interface performs at 20 GOPS processing throughput, which constitutes a 70.5X and 2.75X speed up compared to the work in Ref.30 and Ref.31 respectively.
Tasks	Language Modelling, Speech Recognition
Published	2017-10-26
URL	http://arxiv.org/abs/1710.10296v3
PDF	http://arxiv.org/pdf/1710.10296v3.pdf
PWC	https://paperswithcode.com/paper/the-implementation-of-a-deep-recurrent-neural
Repo	https://github.com/hillhao/PYNQ-project
Framework	tf

Mechanics Automatically Recognized via Interactive Observation: Jumping


Title	Mechanics Automatically Recognized via Interactive Observation: Jumping
Authors	Adam Summerville, Joseph C. Osborn, Christoffer Holmgård, Daniel W. Zhang
Abstract	Jumping has been an important mechanic since its introduction in Donkey Kong. It has taken a variety of forms and shown up in numerous games, with each jump having a different feel. In this paper, we use a modified Nintendo Entertainment System (NES) emulator to semi-automatically run experiments on a large subset (30%) of NES platform games. We use these experiments to build models of jumps from different developers, series, and games across the history of the console. We then examine these models to gain insights into different forms of jumping and their associated feel.
Tasks
Published	2017-07-12
URL	http://arxiv.org/abs/1707.03865v1
PDF	http://arxiv.org/pdf/1707.03865v1.pdf
PWC	https://paperswithcode.com/paper/mechanics-automatically-recognized-via
Repo	https://github.com/JoeOsborn/mechlearn
Framework	none

Stronger Baselines for Trustable Results in Neural Machine Translation


Title	Stronger Baselines for Trustable Results in Neural Machine Translation
Authors	Michael Denkowski, Graham Neubig
Abstract	Interest in neural machine translation has grown rapidly as its effectiveness has been demonstrated across language and data scenarios. New research regularly introduces architectural and algorithmic improvements that lead to significant gains over “vanilla” NMT implementations. However, these new techniques are rarely evaluated in the context of previously published techniques, specifically those that are widely used in state-of-theart production and shared-task systems. As a result, it is often difficult to determine whether improvements from research will carry over to systems deployed for real-world use. In this work, we recommend three specific methods that are relatively easy to implement and result in much stronger experimental systems. Beyond reporting significantly higher BLEU scores, we conduct an in-depth analysis of where improvements originate and what inherent weaknesses of basic NMT models are being addressed. We then compare the relative gains afforded by several other techniques proposed in the literature when starting with vanilla systems versus our stronger baselines, showing that experimental conclusions may change depending on the baseline chosen. This indicates that choosing a strong baseline is crucial for reporting reliable experimental results.
Tasks	Machine Translation
Published	2017-06-29
URL	http://arxiv.org/abs/1706.09733v1
PDF	http://arxiv.org/pdf/1706.09733v1.pdf
PWC	https://paperswithcode.com/paper/stronger-baselines-for-trustable-results-in
Repo	https://github.com/ijauregiCMCRC/ReWE_NMT
Framework	pytorch

Emotional End-to-End Neural Speech Synthesizer


Title	Emotional End-to-End Neural Speech Synthesizer
Authors	Younggun Lee, Azam Rabiee, Soo-Young Lee
Abstract	In this paper, we introduce an emotional speech synthesizer based on the recent end-to-end neural model, named Tacotron. Despite its benefits, we found that the original Tacotron suffers from the exposure bias problem and irregularity of the attention alignment. Later, we address the problem by utilization of context vector and residual connection at recurrent neural networks (RNNs). Our experiments showed that the model could successfully train and generate speech for given emotion labels.
Tasks
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05447v2
PDF	http://arxiv.org/pdf/1711.05447v2.pdf
PWC	https://paperswithcode.com/paper/171105447
Repo	https://github.com/AzamRabiee/Emotional-TTS
Framework	none

TAC-GAN - Text Conditioned Auxiliary Classifier Generative Adversarial Network


Title	TAC-GAN - Text Conditioned Auxiliary Classifier Generative Adversarial Network
Authors	Ayushman Dash, John Cristian Borges Gamboa, Sheraz Ahmed, Marcus Liwicki, Muhammad Zeshan Afzal
Abstract	In this work, we present the Text Conditioned Auxiliary Classifier Generative Adversarial Network, (TAC-GAN) a text to image Generative Adversarial Network (GAN) for synthesizing images from their text descriptions. Former approaches have tried to condition the generative process on the textual data; but allying it to the usage of class information, known to diversify the generated samples and improve their structural coherence, has not been explored. We trained the presented TAC-GAN model on the Oxford-102 dataset of flowers, and evaluated the discriminability of the generated images with Inception-Score, as well as their diversity using the Multi-Scale Structural Similarity Index (MS-SSIM). Our approach outperforms the state-of-the-art models, i.e., its inception score is 3.45, corresponding to a relative increase of 7.8% compared to the recently introduced StackGan. A comparison of the mean MS-SSIM scores of the training and generated samples per class shows that our approach is able to generate highly diverse images with an average MS-SSIM of 0.14 over all generated classes.
Tasks
Published	2017-03-19
URL	http://arxiv.org/abs/1703.06412v2
PDF	http://arxiv.org/pdf/1703.06412v2.pdf
PWC	https://paperswithcode.com/paper/tac-gan-text-conditioned-auxiliary-classifier
Repo	https://github.com/dashayushman/TAC-GAN
Framework	tf

A ROS multi-ontology references services: OWL reasoners and application prototyping issues


Title	A ROS multi-ontology references services: OWL reasoners and application prototyping issues
Authors	Luca Buoncompagni, Alessio Capitanelli, Fulvio Mastrogiovanni
Abstract	This paper introduces a ROS Multi Ontology References (ARMOR) service, a general-purpose and scalable interface between robot architectures and OWL reasoners. ARMOR addresses synchronisation and communication issues among heterogeneous and distributed software components. As a guiding scenario, we consider a prototyping approach for the use of symbolic reasoning in human-robot interaction applications.
Tasks
Published	2017-06-30
URL	https://arxiv.org/abs/1706.10151v2
PDF	https://arxiv.org/pdf/1706.10151v2.pdf
PWC	https://paperswithcode.com/paper/a-ros-multi-ontology-references-services-owl
Repo	https://github.com/EmaroLab/injected_armor_pkgs
Framework	none

Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders


Title	Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
Authors	Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Douglas Eck, Karen Simonyan, Mohammad Norouzi
Abstract	Generative models in vision have seen rapid progress due to algorithmic improvements and the availability of high-quality image datasets. In this paper, we offer contributions in both these areas to enable similar progress in audio modeling. First, we detail a powerful new WaveNet-style autoencoder model that conditions an autoregressive decoder on temporal codes learned from the raw audio waveform. Second, we introduce NSynth, a large-scale and high-quality dataset of musical notes that is an order of magnitude larger than comparable public datasets. Using NSynth, we demonstrate improved qualitative and quantitative performance of the WaveNet autoencoder over a well-tuned spectral autoencoder baseline. Finally, we show that the model learns a manifold of embeddings that allows for morphing between instruments, meaningfully interpolating in timbre to create new types of sounds that are realistic and expressive.
Tasks
Published	2017-04-05
URL	http://arxiv.org/abs/1704.01279v1
PDF	http://arxiv.org/pdf/1704.01279v1.pdf
PWC	https://paperswithcode.com/paper/neural-audio-synthesis-of-musical-notes-with
Repo	https://github.com/NoaCahan/WavenetAutoEncoder
Framework	pytorch

Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image


Title	Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image
Authors	Denis Tome, Chris Russell, Lourdes Agapito
Abstract	We propose a unified formulation for the problem of 3D human pose estimation from a single raw RGB image that reasons jointly about 2D joint estimation and 3D pose reconstruction to improve both tasks. We take an integrated approach that fuses probabilistic knowledge of 3D human pose with a multi-stage CNN architecture and uses the knowledge of plausible 3D landmark locations to refine the search for better 2D locations. The entire process is trained end-to-end, is extremely efficient and obtains state- of-the-art results on Human3.6M outperforming previous approaches both on 2D and 3D errors.
Tasks	3D Human Pose Estimation, 3D Pose Estimation, Pose Estimation
Published	2017-01-01
URL	http://arxiv.org/abs/1701.00295v4
PDF	http://arxiv.org/pdf/1701.00295v4.pdf
PWC	https://paperswithcode.com/paper/lifting-from-the-deep-convolutional-3d-pose
Repo	https://github.com/SyBorg91/pose-estimation-detection
Framework	tf

Emotion Detection on TV Show Transcripts with Sequence-based Convolutional Neural Networks


Title	Emotion Detection on TV Show Transcripts with Sequence-based Convolutional Neural Networks
Authors	Sayyed M. Zahiri, Jinho D. Choi
Abstract	While there have been significant advances in detecting emotions from speech and image recognition, emotion detection on text is still under-explored and remained as an active research field. This paper introduces a corpus for text-based emotion detection on multiparty dialogue as well as deep neural models that outperform the existing approaches for document classification. We first present a new corpus that provides annotation of seven emotions on consecutive utterances in dialogues extracted from the show, Friends. We then suggest four types of sequence-based convolutional neural network models with attention that leverage the sequence information encapsulated in dialogue. Our best model shows the accuracies of 37.9% and 54% for fine- and coarse-grained emotions, respectively. Given the difficulty of this task, this is promising.
Tasks	Document Classification
Published	2017-08-14
URL	http://arxiv.org/abs/1708.04299v1
PDF	http://arxiv.org/pdf/1708.04299v1.pdf
PWC	https://paperswithcode.com/paper/emotion-detection-on-tv-show-transcripts-with
Repo	https://github.com/emorynlp/emotion-detection
Framework	none

The Cramer Distance as a Solution to Biased Wasserstein Gradients


Title	The Cramer Distance as a Solution to Biased Wasserstein Gradients
Authors	Marc G. Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, Rémi Munos
Abstract	The Wasserstein probability metric has received much attention from the machine learning community. Unlike the Kullback-Leibler divergence, which strictly measures change in probability, the Wasserstein metric reflects the underlying geometry between outcomes. The value of being sensitive to this geometry has been demonstrated, among others, in ordinal regression and generative modelling. In this paper we describe three natural properties of probability divergences that reflect requirements from machine learning: sum invariance, scale sensitivity, and unbiased sample gradients. The Wasserstein metric possesses the first two properties but, unlike the Kullback-Leibler divergence, does not possess the third. We provide empirical evidence suggesting that this is a serious issue in practice. Leveraging insights from probabilistic forecasting we propose an alternative to the Wasserstein metric, the Cram'er distance. We show that the Cram'er distance possesses all three desired properties, combining the best of the Wasserstein and Kullback-Leibler divergences. To illustrate the relevance of the Cram'er distance in practice we design a new algorithm, the Cram'er Generative Adversarial Network (GAN), and show that it performs significantly better than the related Wasserstein GAN.
Tasks
Published	2017-05-30
URL	http://arxiv.org/abs/1705.10743v1
PDF	http://arxiv.org/pdf/1705.10743v1.pdf
PWC	https://paperswithcode.com/paper/the-cramer-distance-as-a-solution-to-biased
Repo	https://github.com/Mintas/diving-deep-learning
Framework	pytorch