Paper Group AWR 76
![Paper Group AWR 76](/2017/images/pwc/paper-arxiv_hu144ec288a26b3e360d673e256787de3e_28623_900x500_fit_q75_box.jpg)
Designing and building the mlpack open-source machine learning library. Evolving Spatially Aggregated Features from Satellite Imagery for Regional Modeling. LSTM Pose Machines. A Batch Learning Framework for Scalable Personalized Ranking. How consistent is my model with the data? Information-Theoretic Model Check. The implementation of a Deep Recur …
Designing and building the mlpack open-source machine learning library
Title | Designing and building the mlpack open-source machine learning library |
Authors | Ryan R. Curtin, Marcus Edel |
Abstract | mlpack is an open-source C++ machine learning library with an emphasis on speed and flexibility. Since its original inception in 2007, it has grown to be a large project implementing a wide variety of machine learning algorithms, from standard techniques such as decision trees and logistic regression to modern techniques such as deep neural networks as well as other recently-published cutting-edge techniques not found in any other library. mlpack is quite fast, with benchmarks showing mlpack outperforming other libraries’ implementations of the same methods. mlpack has an active community, with contributors from around the world—including some from PUST. This short paper describes the goals and design of mlpack, discusses how the open-source community functions, and shows an example usage of mlpack for a simple data science problem. |
Tasks | |
Published | 2017-08-17 |
URL | http://arxiv.org/abs/1708.05279v2 |
http://arxiv.org/pdf/1708.05279v2.pdf | |
PWC | https://paperswithcode.com/paper/designing-and-building-the-mlpack-open-source |
Repo | https://github.com/mlpack/mlpack |
Framework | none |
Evolving Spatially Aggregated Features from Satellite Imagery for Regional Modeling
Title | Evolving Spatially Aggregated Features from Satellite Imagery for Regional Modeling |
Authors | Sam Kriegman, Marcin Szubert, Josh C. Bongard, Christian Skalka |
Abstract | Satellite imagery and remote sensing provide explanatory variables at relatively high resolutions for modeling geospatial phenomena, yet regional summaries are often desirable for analysis and actionable insight. In this paper, we propose a novel method of inducing spatial aggregations as a component of the machine learning process, yielding regional model features whose construction is driven by model prediction performance rather than prior assumptions. Our results demonstrate that Genetic Programming is particularly well suited to this type of feature construction because it can automatically synthesize appropriate aggregations, as well as better incorporate them into predictive models compared to other regression methods we tested. In our experiments we consider a specific problem instance and real-world dataset relevant to predicting snow properties in high-mountain Asia. |
Tasks | |
Published | 2017-06-24 |
URL | http://arxiv.org/abs/1706.07888v2 |
http://arxiv.org/pdf/1706.07888v2.pdf | |
PWC | https://paperswithcode.com/paper/evolving-spatially-aggregated-features-from |
Repo | https://github.com/skriegman/ppsn_2016 |
Framework | none |
LSTM Pose Machines
Title | LSTM Pose Machines |
Authors | Yue Luo, Jimmy Ren, Zhouxia Wang, Wenxiu Sun, Jinshan Pan, Jianbo Liu, Jiahao Pang, Liang Lin |
Abstract | We observed that recent state-of-the-art results on single image human pose estimation were achieved by multi-stage Convolution Neural Networks (CNN). Notwithstanding the superior performance on static images, the application of these models on videos is not only computationally intensive, it also suffers from performance degeneration and flicking. Such suboptimal results are mainly attributed to the inability of imposing sequential geometric consistency, handling severe image quality degradation (e.g. motion blur and occlusion) as well as the inability of capturing the temporal correlation among video frames. In this paper, we proposed a novel recurrent network to tackle these problems. We showed that if we were to impose the weight sharing scheme to the multi-stage CNN, it could be re-written as a Recurrent Neural Network (RNN). This property decouples the relationship among multiple network stages and results in significantly faster speed in invoking the network for videos. It also enables the adoption of Long Short-Term Memory (LSTM) units between video frames. We found such memory augmented RNN is very effective in imposing geometric consistency among frames. It also well handles input quality degradation in videos while successfully stabilizes the sequential outputs. The experiments showed that our approach significantly outperformed current state-of-the-art methods on two large-scale video pose estimation benchmarks. We also explored the memory cells inside the LSTM and provided insights on why such mechanism would benefit the prediction for video-based pose estimations. |
Tasks | Pose Estimation |
Published | 2017-12-18 |
URL | http://arxiv.org/abs/1712.06316v4 |
http://arxiv.org/pdf/1712.06316v4.pdf | |
PWC | https://paperswithcode.com/paper/lstm-pose-machines |
Repo | https://github.com/lawy623/LSTM_Pose_Machines |
Framework | none |
A Batch Learning Framework for Scalable Personalized Ranking
Title | A Batch Learning Framework for Scalable Personalized Ranking |
Authors | Kuan Liu, Prem Natarajan |
Abstract | In designing personalized ranking algorithms, it is desirable to encourage a high precision at the top of the ranked list. Existing methods either seek a smooth convex surrogate for a non-smooth ranking metric or directly modify updating procedures to encourage top accuracy. In this work we point out that these methods do not scale well to a large-scale setting, and this is partly due to the inaccurate pointwise or pairwise rank estimation. We propose a new framework for personalized ranking. It uses batch-based rank estimators and smooth rank-sensitive loss functions. This new batch learning framework leads to more stable and accurate rank approximations compared to previous work. Moreover, it enables explicit use of parallel computation to speed up training. We conduct empirical evaluation on three item recommendation tasks. Our method shows consistent accuracy improvements over state-of-the-art methods. Additionally, we observe time efficiency advantages when data scale increases. |
Tasks | |
Published | 2017-11-10 |
URL | http://arxiv.org/abs/1711.04019v1 |
http://arxiv.org/pdf/1711.04019v1.pdf | |
PWC | https://paperswithcode.com/paper/a-batch-learning-framework-for-scalable |
Repo | https://github.com/skywaLKer518/A-Recsys |
Framework | tf |
How consistent is my model with the data? Information-Theoretic Model Check
Title | How consistent is my model with the data? Information-Theoretic Model Check |
Authors | Andreas Svensson, Dave Zachariah, Thomas B. Schön |
Abstract | The choice of model class is fundamental in statistical learning and system identification, no matter whether the class is derived from physical principles or is a generic black-box. We develop a method to evaluate the specified model class by assessing its capability of reproducing data that is similar to the observed data record. This model check is based on the information-theoretic properties of models viewed as data generators and is applicable to e.g. sequential data and nonlinear dynamical models. The method can be understood as a specific two-sided posterior predictive test. We apply the information-theoretic model check to both synthetic and real data and compare it with a classical whiteness test. |
Tasks | |
Published | 2017-12-07 |
URL | http://arxiv.org/abs/1712.02675v2 |
http://arxiv.org/pdf/1712.02675v2.pdf | |
PWC | https://paperswithcode.com/paper/how-consistent-is-my-model-with-the-data |
Repo | https://github.com/saerdna-se/itmc |
Framework | none |
The implementation of a Deep Recurrent Neural Network Language Model on a Xilinx FPGA
Title | The implementation of a Deep Recurrent Neural Network Language Model on a Xilinx FPGA |
Authors | Yufeng Hao, Steven Quigley |
Abstract | Recently, FPGA has been increasingly applied to problems such as speech recognition, machine learning, and cloud computation such as the Bing search engine used by Microsoft. This is due to FPGAs great parallel computation capacity as well as low power consumption compared to general purpose processors. However, these applications mainly focus on large scale FPGA clusters which have an extreme processing power for executing massive matrix or convolution operations but are unsuitable for portable or mobile applications. This paper describes research on single-FPGA platform to explore the applications of FPGAs in these fields. In this project, we design a Deep Recurrent Neural Network (DRNN) Language Model (LM) and implement a hardware accelerator with AXI Stream interface on a PYNQ board which is equipped with a XILINX ZYNQ SOC XC7Z020 1CLG400C. The PYNQ has not only abundant programmable logic resources but also a flexible embedded operation system, which makes it suitable to be applied in the natural language processing field. We design the DRNN language model with Python and Theano, train the model on a CPU platform, and deploy the model on a PYNQ board to validate the model with Jupyter notebook. Meanwhile, we design the hardware accelerator with Overlay, which is a kind of hardware library on PYNQ, and verify the acceleration effect on the PYNQ board. Finally, we have found that the DRNN language model can be deployed on the embedded system smoothly and the Overlay accelerator with AXI Stream interface performs at 20 GOPS processing throughput, which constitutes a 70.5X and 2.75X speed up compared to the work in Ref.30 and Ref.31 respectively. |
Tasks | Language Modelling, Speech Recognition |
Published | 2017-10-26 |
URL | http://arxiv.org/abs/1710.10296v3 |
http://arxiv.org/pdf/1710.10296v3.pdf | |
PWC | https://paperswithcode.com/paper/the-implementation-of-a-deep-recurrent-neural |
Repo | https://github.com/hillhao/PYNQ-project |
Framework | tf |
Mechanics Automatically Recognized via Interactive Observation: Jumping
Title | Mechanics Automatically Recognized via Interactive Observation: Jumping |
Authors | Adam Summerville, Joseph C. Osborn, Christoffer Holmgård, Daniel W. Zhang |
Abstract | Jumping has been an important mechanic since its introduction in Donkey Kong. It has taken a variety of forms and shown up in numerous games, with each jump having a different feel. In this paper, we use a modified Nintendo Entertainment System (NES) emulator to semi-automatically run experiments on a large subset (30%) of NES platform games. We use these experiments to build models of jumps from different developers, series, and games across the history of the console. We then examine these models to gain insights into different forms of jumping and their associated feel. |
Tasks | |
Published | 2017-07-12 |
URL | http://arxiv.org/abs/1707.03865v1 |
http://arxiv.org/pdf/1707.03865v1.pdf | |
PWC | https://paperswithcode.com/paper/mechanics-automatically-recognized-via |
Repo | https://github.com/JoeOsborn/mechlearn |
Framework | none |
Stronger Baselines for Trustable Results in Neural Machine Translation
Title | Stronger Baselines for Trustable Results in Neural Machine Translation |
Authors | Michael Denkowski, Graham Neubig |
Abstract | Interest in neural machine translation has grown rapidly as its effectiveness has been demonstrated across language and data scenarios. New research regularly introduces architectural and algorithmic improvements that lead to significant gains over “vanilla” NMT implementations. However, these new techniques are rarely evaluated in the context of previously published techniques, specifically those that are widely used in state-of-theart production and shared-task systems. As a result, it is often difficult to determine whether improvements from research will carry over to systems deployed for real-world use. In this work, we recommend three specific methods that are relatively easy to implement and result in much stronger experimental systems. Beyond reporting significantly higher BLEU scores, we conduct an in-depth analysis of where improvements originate and what inherent weaknesses of basic NMT models are being addressed. We then compare the relative gains afforded by several other techniques proposed in the literature when starting with vanilla systems versus our stronger baselines, showing that experimental conclusions may change depending on the baseline chosen. This indicates that choosing a strong baseline is crucial for reporting reliable experimental results. |
Tasks | Machine Translation |
Published | 2017-06-29 |
URL | http://arxiv.org/abs/1706.09733v1 |
http://arxiv.org/pdf/1706.09733v1.pdf | |
PWC | https://paperswithcode.com/paper/stronger-baselines-for-trustable-results-in |
Repo | https://github.com/ijauregiCMCRC/ReWE_NMT |
Framework | pytorch |
Emotional End-to-End Neural Speech Synthesizer
Title | Emotional End-to-End Neural Speech Synthesizer |
Authors | Younggun Lee, Azam Rabiee, Soo-Young Lee |
Abstract | In this paper, we introduce an emotional speech synthesizer based on the recent end-to-end neural model, named Tacotron. Despite its benefits, we found that the original Tacotron suffers from the exposure bias problem and irregularity of the attention alignment. Later, we address the problem by utilization of context vector and residual connection at recurrent neural networks (RNNs). Our experiments showed that the model could successfully train and generate speech for given emotion labels. |
Tasks | |
Published | 2017-11-15 |
URL | http://arxiv.org/abs/1711.05447v2 |
http://arxiv.org/pdf/1711.05447v2.pdf | |
PWC | https://paperswithcode.com/paper/171105447 |
Repo | https://github.com/AzamRabiee/Emotional-TTS |
Framework | none |
TAC-GAN - Text Conditioned Auxiliary Classifier Generative Adversarial Network
Title | TAC-GAN - Text Conditioned Auxiliary Classifier Generative Adversarial Network |
Authors | Ayushman Dash, John Cristian Borges Gamboa, Sheraz Ahmed, Marcus Liwicki, Muhammad Zeshan Afzal |
Abstract | In this work, we present the Text Conditioned Auxiliary Classifier Generative Adversarial Network, (TAC-GAN) a text to image Generative Adversarial Network (GAN) for synthesizing images from their text descriptions. Former approaches have tried to condition the generative process on the textual data; but allying it to the usage of class information, known to diversify the generated samples and improve their structural coherence, has not been explored. We trained the presented TAC-GAN model on the Oxford-102 dataset of flowers, and evaluated the discriminability of the generated images with Inception-Score, as well as their diversity using the Multi-Scale Structural Similarity Index (MS-SSIM). Our approach outperforms the state-of-the-art models, i.e., its inception score is 3.45, corresponding to a relative increase of 7.8% compared to the recently introduced StackGan. A comparison of the mean MS-SSIM scores of the training and generated samples per class shows that our approach is able to generate highly diverse images with an average MS-SSIM of 0.14 over all generated classes. |
Tasks | |
Published | 2017-03-19 |
URL | http://arxiv.org/abs/1703.06412v2 |
http://arxiv.org/pdf/1703.06412v2.pdf | |
PWC | https://paperswithcode.com/paper/tac-gan-text-conditioned-auxiliary-classifier |
Repo | https://github.com/dashayushman/TAC-GAN |
Framework | tf |
A ROS multi-ontology references services: OWL reasoners and application prototyping issues
Title | A ROS multi-ontology references services: OWL reasoners and application prototyping issues |
Authors | Luca Buoncompagni, Alessio Capitanelli, Fulvio Mastrogiovanni |
Abstract | This paper introduces a ROS Multi Ontology References (ARMOR) service, a general-purpose and scalable interface between robot architectures and OWL reasoners. ARMOR addresses synchronisation and communication issues among heterogeneous and distributed software components. As a guiding scenario, we consider a prototyping approach for the use of symbolic reasoning in human-robot interaction applications. |
Tasks | |
Published | 2017-06-30 |
URL | https://arxiv.org/abs/1706.10151v2 |
https://arxiv.org/pdf/1706.10151v2.pdf | |
PWC | https://paperswithcode.com/paper/a-ros-multi-ontology-references-services-owl |
Repo | https://github.com/EmaroLab/injected_armor_pkgs |
Framework | none |
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
Title | Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders |
Authors | Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Douglas Eck, Karen Simonyan, Mohammad Norouzi |
Abstract | Generative models in vision have seen rapid progress due to algorithmic improvements and the availability of high-quality image datasets. In this paper, we offer contributions in both these areas to enable similar progress in audio modeling. First, we detail a powerful new WaveNet-style autoencoder model that conditions an autoregressive decoder on temporal codes learned from the raw audio waveform. Second, we introduce NSynth, a large-scale and high-quality dataset of musical notes that is an order of magnitude larger than comparable public datasets. Using NSynth, we demonstrate improved qualitative and quantitative performance of the WaveNet autoencoder over a well-tuned spectral autoencoder baseline. Finally, we show that the model learns a manifold of embeddings that allows for morphing between instruments, meaningfully interpolating in timbre to create new types of sounds that are realistic and expressive. |
Tasks | |
Published | 2017-04-05 |
URL | http://arxiv.org/abs/1704.01279v1 |
http://arxiv.org/pdf/1704.01279v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-audio-synthesis-of-musical-notes-with |
Repo | https://github.com/NoaCahan/WavenetAutoEncoder |
Framework | pytorch |
Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image
Title | Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image |
Authors | Denis Tome, Chris Russell, Lourdes Agapito |
Abstract | We propose a unified formulation for the problem of 3D human pose estimation from a single raw RGB image that reasons jointly about 2D joint estimation and 3D pose reconstruction to improve both tasks. We take an integrated approach that fuses probabilistic knowledge of 3D human pose with a multi-stage CNN architecture and uses the knowledge of plausible 3D landmark locations to refine the search for better 2D locations. The entire process is trained end-to-end, is extremely efficient and obtains state- of-the-art results on Human3.6M outperforming previous approaches both on 2D and 3D errors. |
Tasks | 3D Human Pose Estimation, 3D Pose Estimation, Pose Estimation |
Published | 2017-01-01 |
URL | http://arxiv.org/abs/1701.00295v4 |
http://arxiv.org/pdf/1701.00295v4.pdf | |
PWC | https://paperswithcode.com/paper/lifting-from-the-deep-convolutional-3d-pose |
Repo | https://github.com/SyBorg91/pose-estimation-detection |
Framework | tf |
Emotion Detection on TV Show Transcripts with Sequence-based Convolutional Neural Networks
Title | Emotion Detection on TV Show Transcripts with Sequence-based Convolutional Neural Networks |
Authors | Sayyed M. Zahiri, Jinho D. Choi |
Abstract | While there have been significant advances in detecting emotions from speech and image recognition, emotion detection on text is still under-explored and remained as an active research field. This paper introduces a corpus for text-based emotion detection on multiparty dialogue as well as deep neural models that outperform the existing approaches for document classification. We first present a new corpus that provides annotation of seven emotions on consecutive utterances in dialogues extracted from the show, Friends. We then suggest four types of sequence-based convolutional neural network models with attention that leverage the sequence information encapsulated in dialogue. Our best model shows the accuracies of 37.9% and 54% for fine- and coarse-grained emotions, respectively. Given the difficulty of this task, this is promising. |
Tasks | Document Classification |
Published | 2017-08-14 |
URL | http://arxiv.org/abs/1708.04299v1 |
http://arxiv.org/pdf/1708.04299v1.pdf | |
PWC | https://paperswithcode.com/paper/emotion-detection-on-tv-show-transcripts-with |
Repo | https://github.com/emorynlp/emotion-detection |
Framework | none |
The Cramer Distance as a Solution to Biased Wasserstein Gradients
Title | The Cramer Distance as a Solution to Biased Wasserstein Gradients |
Authors | Marc G. Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, Rémi Munos |
Abstract | The Wasserstein probability metric has received much attention from the machine learning community. Unlike the Kullback-Leibler divergence, which strictly measures change in probability, the Wasserstein metric reflects the underlying geometry between outcomes. The value of being sensitive to this geometry has been demonstrated, among others, in ordinal regression and generative modelling. In this paper we describe three natural properties of probability divergences that reflect requirements from machine learning: sum invariance, scale sensitivity, and unbiased sample gradients. The Wasserstein metric possesses the first two properties but, unlike the Kullback-Leibler divergence, does not possess the third. We provide empirical evidence suggesting that this is a serious issue in practice. Leveraging insights from probabilistic forecasting we propose an alternative to the Wasserstein metric, the Cram'er distance. We show that the Cram'er distance possesses all three desired properties, combining the best of the Wasserstein and Kullback-Leibler divergences. To illustrate the relevance of the Cram'er distance in practice we design a new algorithm, the Cram'er Generative Adversarial Network (GAN), and show that it performs significantly better than the related Wasserstein GAN. |
Tasks | |
Published | 2017-05-30 |
URL | http://arxiv.org/abs/1705.10743v1 |
http://arxiv.org/pdf/1705.10743v1.pdf | |
PWC | https://paperswithcode.com/paper/the-cramer-distance-as-a-solution-to-biased |
Repo | https://github.com/Mintas/diving-deep-learning |
Framework | pytorch |