July 30, 2019

2971 words 14 mins read

Paper Group AWR 50

Paper Group AWR 50

Ray: A Distributed Framework for Emerging AI Applications. Bayesian Recurrent Neural Networks. Machine learning modeling for time series problem: Predicting flight ticket prices. MonoPerfCap: Human Performance Capture from Monocular Video. Mimicking Word Embeddings using Subword RNNs. Overcoming Catastrophic Forgetting by Incremental Moment Matchin …

Ray: A Distributed Framework for Emerging AI Applications

Title Ray: A Distributed Framework for Emerging AI Applications
Authors Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, Ion Stoica
Abstract The next generation of AI applications will continuously interact with the environment and learn from these interactions. These applications impose new and demanding systems requirements, both in terms of performance and flexibility. In this paper, we consider these requirements and present Ray—a distributed system to address them. Ray implements a unified interface that can express both task-parallel and actor-based computations, supported by a single dynamic execution engine. To meet the performance requirements, Ray employs a distributed scheduler and a distributed and fault-tolerant store to manage the system’s control state. In our experiments, we demonstrate scaling beyond 1.8 million tasks per second and better performance than existing specialized systems for several challenging reinforcement learning applications.
Tasks
Published 2017-12-16
URL http://arxiv.org/abs/1712.05889v2
PDF http://arxiv.org/pdf/1712.05889v2.pdf
PWC https://paperswithcode.com/paper/ray-a-distributed-framework-for-emerging-ai
Repo https://github.com/ray-project/ray
Framework tf

Bayesian Recurrent Neural Networks

Title Bayesian Recurrent Neural Networks
Authors Meire Fortunato, Charles Blundell, Oriol Vinyals
Abstract In this work we explore a straightforward variational Bayes scheme for Recurrent Neural Networks. Firstly, we show that a simple adaptation of truncated backpropagation through time can yield good quality uncertainty estimates and superior regularisation at only a small extra computational cost during training, also reducing the amount of parameters by 80%. Secondly, we demonstrate how a novel kind of posterior approximation yields further improvements to the performance of Bayesian RNNs. We incorporate local gradient information into the approximate posterior to sharpen it around the current batch statistics. We show how this technique is not exclusive to recurrent neural networks and can be applied more widely to train Bayesian neural networks. We also empirically demonstrate how Bayesian RNNs are superior to traditional RNNs on a language modelling benchmark and an image captioning task, as well as showing how each of these methods improve our model over a variety of other schemes for training them. We also introduce a new benchmark for studying uncertainty for language models so future methods can be easily compared.
Tasks Image Captioning, Language Modelling
Published 2017-04-10
URL https://arxiv.org/abs/1704.02798v4
PDF https://arxiv.org/pdf/1704.02798v4.pdf
PWC https://paperswithcode.com/paper/bayesian-recurrent-neural-networks
Repo https://github.com/JP-MRPhys/bayesianLSTM
Framework tf

Machine learning modeling for time series problem: Predicting flight ticket prices

Title Machine learning modeling for time series problem: Predicting flight ticket prices
Authors Jun Lu
Abstract Machine learning has been used in all kinds of fields. In this article, we introduce how machine learning can be applied into time series problem. Especially, we use the airline ticket prediction problem as our specific problem. Airline companies use many different variables to determine the flight ticket prices: indicator whether the travel is during the holidays, the number of free seats in the plane etc. Some of the variables are observed, but some of them are hidden. Based on the data over a 103 day period, we trained our models, getting the best model - which is AdaBoost-Decision Tree Classification. This algorithm has best performance over the observed 8 routes which has 61.35$%$ better performance than the random purchase strategy, and relatively small variance over these routes. And we also considered the situation that we cannot get too much historical datas for some routes (for example the route is new and does not have historical data) or we do not want to train historical data to predict to buy or wait quickly, in which problem, we used HMM Sequence Classification based AdaBoost-Decision Tree Classification to perform our prediction on 12 new routes. Finally, we got 31.71$%$ better performance than the random purchase strategy.
Tasks Time Series
Published 2017-05-19
URL http://arxiv.org/abs/1705.07205v2
PDF http://arxiv.org/pdf/1705.07205v2.pdf
PWC https://paperswithcode.com/paper/machine-learning-modeling-for-time-series
Repo https://github.com/junlulocky/AirTicketPredicting
Framework none

MonoPerfCap: Human Performance Capture from Monocular Video

Title MonoPerfCap: Human Performance Capture from Monocular Video
Authors Weipeng Xu, Avishek Chatterjee, Michael Zollhöfer, Helge Rhodin, Dushyant Mehta, Hans-Peter Seidel, Christian Theobalt
Abstract We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows to resolve the ambiguities of the monocular reconstruction problem based on a low dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness and scene complexity that can be handled.
Tasks Pose Estimation
Published 2017-08-07
URL http://arxiv.org/abs/1708.02136v2
PDF http://arxiv.org/pdf/1708.02136v2.pdf
PWC https://paperswithcode.com/paper/monoperfcap-human-performance-capture-from
Repo https://github.com/daitomanabe/Human-Pose-and-Motion
Framework tf

Mimicking Word Embeddings using Subword RNNs

Title Mimicking Word Embeddings using Subword RNNs
Authors Yuval Pinter, Robert Guthrie, Jacob Eisenstein
Abstract Word embeddings improve generalization over lexical features by placing each word in a lower-dimensional space, using distributional information obtained from unlabeled data. However, the effectiveness of word embeddings for downstream NLP tasks is limited by out-of-vocabulary (OOV) words, for which embeddings do not exist. In this paper, we present MIMICK, an approach to generating OOV word embeddings compositionally, by learning a function from spellings to distributional embeddings. Unlike prior work, MIMICK does not require re-training on the original word embedding corpus; instead, learning is performed at the type level. Intrinsic and extrinsic evaluations demonstrate the power of this simple approach. On 23 languages, MIMICK improves performance over a word-based baseline for tagging part-of-speech and morphosyntactic attributes. It is competitive with (and complementary to) a supervised character-based model in low-resource settings.
Tasks Word Embeddings
Published 2017-07-21
URL http://arxiv.org/abs/1707.06961v1
PDF http://arxiv.org/pdf/1707.06961v1.pdf
PWC https://paperswithcode.com/paper/mimicking-word-embeddings-using-subword-rnns
Repo https://github.com/50kawa/mimick_chainer
Framework none

Overcoming Catastrophic Forgetting by Incremental Moment Matching

Title Overcoming Catastrophic Forgetting by Incremental Moment Matching
Authors Sang-Woo Lee, Jin-Hwa Kim, Jaehyun Jun, Jung-Woo Ha, Byoung-Tak Zhang
Abstract Catastrophic forgetting is a problem of neural networks that loses the information of the first task after training the second task. Here, we propose a method, i.e. incremental moment matching (IMM), to resolve this problem. IMM incrementally matches the moment of the posterior distribution of the neural network which is trained on the first and the second task, respectively. To make the search space of posterior parameter smooth, the IMM procedure is complemented by various transfer learning techniques including weight transfer, L2-norm of the old and the new parameter, and a variant of dropout with the old parameter. We analyze our approach on a variety of datasets including the MNIST, CIFAR-10, Caltech-UCSD-Birds, and Lifelog datasets. The experimental results show that IMM achieves state-of-the-art performance by balancing the information between an old and a new network.
Tasks Transfer Learning
Published 2017-03-24
URL http://arxiv.org/abs/1703.08475v3
PDF http://arxiv.org/pdf/1703.08475v3.pdf
PWC https://paperswithcode.com/paper/overcoming-catastrophic-forgetting-by
Repo https://github.com/btjhjeon/IMM_tensorflow
Framework tf

Knowledge Graph Embedding with Iterative Guidance from Soft Rules

Title Knowledge Graph Embedding with Iterative Guidance from Soft Rules
Authors Shu Guo, Quan Wang, Lihong Wang, Bin Wang, Li Guo
Abstract Embedding knowledge graphs (KGs) into continuous vector spaces is a focus of current research. Combining such an embedding model with logic rules has recently attracted increasing attention. Most previous attempts made a one-time injection of logic rules, ignoring the interactive nature between embedding learning and logical inference. And they focused only on hard rules, which always hold with no exception and usually require extensive manual effort to create or validate. In this paper, we propose Rule-Guided Embedding (RUGE), a novel paradigm of KG embedding with iterative guidance from soft rules. RUGE enables an embedding model to learn simultaneously from 1) labeled triples that have been directly observed in a given KG, 2) unlabeled triples whose labels are going to be predicted iteratively, and 3) soft rules with various confidence levels extracted automatically from the KG. In the learning process, RUGE iteratively queries rules to obtain soft labels for unlabeled triples, and integrates such newly labeled triples to update the embedding model. Through this iterative procedure, knowledge embodied in logic rules may be better transferred into the learned embeddings. We evaluate RUGE in link prediction on Freebase and YAGO. Experimental results show that: 1) with rule knowledge injected iteratively, RUGE achieves significant and consistent improvements over state-of-the-art baselines; and 2) despite their uncertainties, automatically extracted soft rules are highly beneficial to KG embedding, even those with moderate confidence levels. The code and data used for this paper can be obtained from https://github.com/iieir-km/RUGE.
Tasks Graph Embedding, Knowledge Graph Embedding, Knowledge Graphs, Link Prediction
Published 2017-11-30
URL http://arxiv.org/abs/1711.11231v1
PDF http://arxiv.org/pdf/1711.11231v1.pdf
PWC https://paperswithcode.com/paper/knowledge-graph-embedding-with-iterative
Repo https://github.com/iieir-km/RUGE
Framework none

NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks

Title NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks
Authors Ermao Cai, Da-Cheng Juan, Dimitrios Stamoulis, Diana Marculescu
Abstract “How much energy is consumed for an inference made by a convolutional neural network (CNN)?” With the increased popularity of CNNs deployed on the wide-spectrum of platforms (from mobile devices to workstations), the answer to this question has drawn significant attention. From lengthening battery life of mobile devices to reducing the energy bill of a datacenter, it is important to understand the energy efficiency of CNNs during serving for making an inference, before actually training the model. In this work, we propose NeuralPower: a layer-wise predictive framework based on sparse polynomial regression, for predicting the serving energy consumption of a CNN deployed on any GPU platform. Given the architecture of a CNN, NeuralPower provides an accurate prediction and breakdown for power and runtime across all layers in the whole network, helping machine learners quickly identify the power, runtime, or energy bottlenecks. We also propose the “energy-precision ratio” (EPR) metric to guide machine learners in selecting an energy-efficient CNN architecture that better trades off the energy consumption and prediction accuracy. The experimental results show that the prediction accuracy of the proposed NeuralPower outperforms the best published model to date, yielding an improvement in accuracy of up to 68.5%. We also assess the accuracy of predictions at the network level, by predicting the runtime, power, and energy of state-of-the-art CNN architectures, achieving an average accuracy of 88.24% in runtime, 88.34% in power, and 97.21% in energy. We comprehensively corroborate the effectiveness of NeuralPower as a powerful framework for machine learners by testing it on different GPU platforms and Deep Learning software tools.
Tasks
Published 2017-10-15
URL http://arxiv.org/abs/1710.05420v1
PDF http://arxiv.org/pdf/1710.05420v1.pdf
PWC https://paperswithcode.com/paper/neuralpower-predict-and-deploy-energy
Repo https://github.com/cmu-enyac/NeuralPower
Framework none

Stochastic Variance Reduction for Policy Gradient Estimation

Title Stochastic Variance Reduction for Policy Gradient Estimation
Authors Tianbing Xu, Qiang Liu, Jian Peng
Abstract Recent advances in policy gradient methods and deep learning have demonstrated their applicability for complex reinforcement learning problems. However, the variance of the performance gradient estimates obtained from the simulation is often excessive, leading to poor sample efficiency. In this paper, we apply the stochastic variance reduced gradient descent (SVRG) to model-free policy gradient to significantly improve the sample-efficiency. The SVRG estimation is incorporated into a trust-region Newton conjugate gradient framework for the policy optimization. On several Mujoco tasks, our method achieves significantly better performance compared to the state-of-the-art model-free policy gradient methods in robotic continuous control such as trust region policy optimization (TRPO)
Tasks Continuous Control, Policy Gradient Methods
Published 2017-10-17
URL http://arxiv.org/abs/1710.06034v4
PDF http://arxiv.org/pdf/1710.06034v4.pdf
PWC https://paperswithcode.com/paper/stochastic-variance-reduction-for-policy
Repo https://github.com/tianbingsz/MLResearch
Framework none

Dynamic Data Selection for Neural Machine Translation

Title Dynamic Data Selection for Neural Machine Translation
Authors Marlies van der Wees, Arianna Bisazza, Christof Monz
Abstract Intelligent selection of training data has proven a successful technique to simultaneously increase training efficiency and translation performance for phrase-based machine translation (PBMT). With the recent increase in popularity of neural machine translation (NMT), we explore in this paper to what extent and how NMT can also benefit from data selection. While state-of-the-art data selection (Axelrod et al., 2011) consistently performs well for PBMT, we show that gains are substantially lower for NMT. Next, we introduce dynamic data selection for NMT, a method in which we vary the selected subset of training data between different training epochs. Our experiments show that the best results are achieved when applying a technique we call gradual fine-tuning, with improvements up to +2.6 BLEU over the original data selection approach and up to +3.1 BLEU over a general baseline.
Tasks Machine Translation
Published 2017-08-02
URL http://arxiv.org/abs/1708.00712v1
PDF http://arxiv.org/pdf/1708.00712v1.pdf
PWC https://paperswithcode.com/paper/dynamic-data-selection-for-neural-machine
Repo https://github.com/marliesvanderwees/dds-nmt
Framework none

Exploiting Cross-Sentence Context for Neural Machine Translation

Title Exploiting Cross-Sentence Context for Neural Machine Translation
Authors Longyue Wang, Zhaopeng Tu, Andy Way, Qun Liu
Abstract In translation, considering the document as a whole can help to resolve ambiguities and inconsistencies. In this paper, we propose a cross-sentence context-aware approach and investigate the influence of historical contextual information on the performance of neural machine translation (NMT). First, this history is summarized in a hierarchical way. We then integrate the historical representation into NMT in two strategies: 1) a warm-start of encoder and decoder states, and 2) an auxiliary context source for updating decoder states. Experimental results on a large Chinese-English translation task show that our approach significantly improves upon a strong attention-based NMT system by up to +2.1 BLEU points.
Tasks Machine Translation
Published 2017-04-14
URL http://arxiv.org/abs/1704.04347v3
PDF http://arxiv.org/pdf/1704.04347v3.pdf
PWC https://paperswithcode.com/paper/exploiting-cross-sentence-context-for-neural
Repo https://github.com/tuzhaopeng/LC-NMT
Framework none

CharManteau: Character Embedding Models For Portmanteau Creation

Title CharManteau: Character Embedding Models For Portmanteau Creation
Authors Varun Gangal, Harsh Jhamtani, Graham Neubig, Eduard Hovy, Eric Nyberg
Abstract Portmanteaus are a word formation phenomenon where two words are combined to form a new word. We propose character-level neural sequence-to-sequence (S2S) methods for the task of portmanteau generation that are end-to-end-trainable, language independent, and do not explicitly use additional phonetic information. We propose a noisy-channel-style model, which allows for the incorporation of unsupervised word lists, improving performance over a standard source-to-target model. This model is made possible by an exhaustive candidate generation strategy specifically enabled by the features of the portmanteau task. Experiments find our approach superior to a state-of-the-art FST-based baseline with respect to ground truth accuracy and human evaluation.
Tasks
Published 2017-07-04
URL http://arxiv.org/abs/1707.01176v2
PDF http://arxiv.org/pdf/1707.01176v2.pdf
PWC https://paperswithcode.com/paper/charmanteau-character-embedding-models-for
Repo https://github.com/vgtomahawk/Charmanteau-CamReady
Framework none

Neural Machine Translation and Sequence-to-sequence Models: A Tutorial

Title Neural Machine Translation and Sequence-to-sequence Models: A Tutorial
Authors Graham Neubig
Abstract This tutorial introduces a new and powerful set of techniques variously called “neural machine translation” or “neural sequence-to-sequence models”. These techniques have been used in a number of tasks regarding the handling of human language, and can be a powerful tool in the toolbox of anyone who wants to model sequential data of some sort. The tutorial assumes that the reader knows the basics of math and programming, but does not assume any particular experience with neural networks or natural language processing. It attempts to explain the intuition behind the various methods covered, then delves into them with enough mathematical detail to understand them concretely, and culiminates with a suggestion for an implementation exercise, where readers can test that they understood the content in practice.
Tasks Machine Translation
Published 2017-03-05
URL http://arxiv.org/abs/1703.01619v1
PDF http://arxiv.org/pdf/1703.01619v1.pdf
PWC https://paperswithcode.com/paper/neural-machine-translation-and-sequence-to
Repo https://github.com/nier79/Machine-translation
Framework tf

Generative Modeling with Conditional Autoencoders: Building an Integrated Cell

Title Generative Modeling with Conditional Autoencoders: Building an Integrated Cell
Authors Gregory R. Johnson, Rory M. Donovan-Maiye, Mary M. Maleckar
Abstract We present a conditional generative model to learn variation in cell and nuclear morphology and the location of subcellular structures from microscopy images. Our model generalizes to a wide range of subcellular localization and allows for a probabilistic interpretation of cell and nuclear morphology and structure localization from fluorescence images. We demonstrate the effectiveness of our approach by producing photo-realistic cell images using our generative model. The conditional nature of the model provides the ability to predict the localization of unobserved structures given cell and nuclear morphology.
Tasks
Published 2017-04-28
URL http://arxiv.org/abs/1705.00092v1
PDF http://arxiv.org/pdf/1705.00092v1.pdf
PWC https://paperswithcode.com/paper/generative-modeling-with-conditional
Repo https://github.com/AllenCellModeling/pytorch_integrated_cell
Framework pytorch

Structural Regularities in Text-based Entity Vector Spaces

Title Structural Regularities in Text-based Entity Vector Spaces
Authors Christophe Van Gysel, Maarten de Rijke, Evangelos Kanoulas
Abstract Entity retrieval is the task of finding entities such as people or products in response to a query, based solely on the textual documents they are associated with. Recent semantic entity retrieval algorithms represent queries and experts in finite-dimensional vector spaces, where both are constructed from text sequences. We investigate entity vector spaces and the degree to which they capture structural regularities. Such vector spaces are constructed in an unsupervised manner without explicit information about structural aspects. For concreteness, we address these questions for a specific type of entity: experts in the context of expert finding. We discover how clusterings of experts correspond to committees in organizations, the ability of expert representations to encode the co-author graph, and the degree to which they encode academic rank. We compare latent, continuous representations created using methods based on distributional semantics (LSI), topic models (LDA) and neural networks (word2vec, doc2vec, SERT). Vector spaces created using neural methods, such as doc2vec and SERT, systematically perform better at clustering than LSI, LDA and word2vec. When it comes to encoding entity relations, SERT performs best.
Tasks Topic Models
Published 2017-07-25
URL http://arxiv.org/abs/1707.07930v1
PDF http://arxiv.org/pdf/1707.07930v1.pdf
PWC https://paperswithcode.com/paper/structural-regularities-in-text-based-entity
Repo https://github.com/cvangysel/SERT
Framework none
comments powered by Disqus