July 30, 2019

2971 words 14 mins read

Paper Group AWR 50

Ray: A Distributed Framework for Emerging AI Applications. Bayesian Recurrent Neural Networks. Machine learning modeling for time series problem: Predicting flight ticket prices. MonoPerfCap: Human Performance Capture from Monocular Video. Mimicking Word Embeddings using Subword RNNs. Overcoming Catastrophic Forgetting by Incremental Moment Matchin …

Ray: A Distributed Framework for Emerging AI Applications


Title	Ray: A Distributed Framework for Emerging AI Applications
Authors	Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, Ion Stoica
Abstract	The next generation of AI applications will continuously interact with the environment and learn from these interactions. These applications impose new and demanding systems requirements, both in terms of performance and flexibility. In this paper, we consider these requirements and present Ray—a distributed system to address them. Ray implements a unified interface that can express both task-parallel and actor-based computations, supported by a single dynamic execution engine. To meet the performance requirements, Ray employs a distributed scheduler and a distributed and fault-tolerant store to manage the system’s control state. In our experiments, we demonstrate scaling beyond 1.8 million tasks per second and better performance than existing specialized systems for several challenging reinforcement learning applications.
Tasks
Published	2017-12-16
URL	http://arxiv.org/abs/1712.05889v2
PDF	http://arxiv.org/pdf/1712.05889v2.pdf
PWC	https://paperswithcode.com/paper/ray-a-distributed-framework-for-emerging-ai
Repo	https://github.com/ray-project/ray
Framework	tf

Bayesian Recurrent Neural Networks


Title	Bayesian Recurrent Neural Networks
Authors	Meire Fortunato, Charles Blundell, Oriol Vinyals
Abstract	In this work we explore a straightforward variational Bayes scheme for Recurrent Neural Networks. Firstly, we show that a simple adaptation of truncated backpropagation through time can yield good quality uncertainty estimates and superior regularisation at only a small extra computational cost during training, also reducing the amount of parameters by 80%. Secondly, we demonstrate how a novel kind of posterior approximation yields further improvements to the performance of Bayesian RNNs. We incorporate local gradient information into the approximate posterior to sharpen it around the current batch statistics. We show how this technique is not exclusive to recurrent neural networks and can be applied more widely to train Bayesian neural networks. We also empirically demonstrate how Bayesian RNNs are superior to traditional RNNs on a language modelling benchmark and an image captioning task, as well as showing how each of these methods improve our model over a variety of other schemes for training them. We also introduce a new benchmark for studying uncertainty for language models so future methods can be easily compared.
Tasks	Image Captioning, Language Modelling
Published	2017-04-10
URL	https://arxiv.org/abs/1704.02798v4
PDF	https://arxiv.org/pdf/1704.02798v4.pdf
PWC	https://paperswithcode.com/paper/bayesian-recurrent-neural-networks
Repo	https://github.com/JP-MRPhys/bayesianLSTM
Framework	tf

Machine learning modeling for time series problem: Predicting flight ticket prices


Title	Machine learning modeling for time series problem: Predicting flight ticket prices
Authors	Jun Lu
Abstract	Machine learning has been used in all kinds of fields. In this article, we introduce how machine learning can be applied into time series problem. Especially, we use the airline ticket prediction problem as our specific problem. Airline companies use many different variables to determine the flight ticket prices: indicator whether the travel is during the holidays, the number of free seats in the plane etc. Some of the variables are observed, but some of them are hidden. Based on the data over a 103 day period, we trained our models, getting the best model - which is AdaBoost-Decision Tree Classification. This algorithm has best performance over the observed 8 routes which has 61.35$%$ better performance than the random purchase strategy, and relatively small variance over these routes. And we also considered the situation that we cannot get too much historical datas for some routes (for example the route is new and does not have historical data) or we do not want to train historical data to predict to buy or wait quickly, in which problem, we used HMM Sequence Classification based AdaBoost-Decision Tree Classification to perform our prediction on 12 new routes. Finally, we got 31.71$%$ better performance than the random purchase strategy.
Tasks	Time Series
Published	2017-05-19
URL	http://arxiv.org/abs/1705.07205v2
PDF	http://arxiv.org/pdf/1705.07205v2.pdf
PWC	https://paperswithcode.com/paper/machine-learning-modeling-for-time-series
Repo	https://github.com/junlulocky/AirTicketPredicting
Framework	none

MonoPerfCap: Human Performance Capture from Monocular Video


Title	MonoPerfCap: Human Performance Capture from Monocular Video
Authors	Weipeng Xu, Avishek Chatterjee, Michael Zollhöfer, Helge Rhodin, Dushyant Mehta, Hans-Peter Seidel, Christian Theobalt
Abstract	We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows to resolve the ambiguities of the monocular reconstruction problem based on a low dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness and scene complexity that can be handled.
Tasks	Pose Estimation
Published	2017-08-07
URL	http://arxiv.org/abs/1708.02136v2
PDF	http://arxiv.org/pdf/1708.02136v2.pdf
PWC	https://paperswithcode.com/paper/monoperfcap-human-performance-capture-from
Repo	https://github.com/daitomanabe/Human-Pose-and-Motion
Framework	tf

Mimicking Word Embeddings using Subword RNNs


Title	Mimicking Word Embeddings using Subword RNNs
Authors	Yuval Pinter, Robert Guthrie, Jacob Eisenstein
Abstract	Word embeddings improve generalization over lexical features by placing each word in a lower-dimensional space, using distributional information obtained from unlabeled data. However, the effectiveness of word embeddings for downstream NLP tasks is limited by out-of-vocabulary (OOV) words, for which embeddings do not exist. In this paper, we present MIMICK, an approach to generating OOV word embeddings compositionally, by learning a function from spellings to distributional embeddings. Unlike prior work, MIMICK does not require re-training on the original word embedding corpus; instead, learning is performed at the type level. Intrinsic and extrinsic evaluations demonstrate the power of this simple approach. On 23 languages, MIMICK improves performance over a word-based baseline for tagging part-of-speech and morphosyntactic attributes. It is competitive with (and complementary to) a supervised character-based model in low-resource settings.
Tasks	Word Embeddings
Published	2017-07-21
URL	http://arxiv.org/abs/1707.06961v1
PDF	http://arxiv.org/pdf/1707.06961v1.pdf
PWC	https://paperswithcode.com/paper/mimicking-word-embeddings-using-subword-rnns
Repo	https://github.com/50kawa/mimick_chainer
Framework	none

Overcoming Catastrophic Forgetting by Incremental Moment Matching


Title	Overcoming Catastrophic Forgetting by Incremental Moment Matching
Authors	Sang-Woo Lee, Jin-Hwa Kim, Jaehyun Jun, Jung-Woo Ha, Byoung-Tak Zhang
Abstract	Catastrophic forgetting is a problem of neural networks that loses the information of the first task after training the second task. Here, we propose a method, i.e. incremental moment matching (IMM), to resolve this problem. IMM incrementally matches the moment of the posterior distribution of the neural network which is trained on the first and the second task, respectively. To make the search space of posterior parameter smooth, the IMM procedure is complemented by various transfer learning techniques including weight transfer, L2-norm of the old and the new parameter, and a variant of dropout with the old parameter. We analyze our approach on a variety of datasets including the MNIST, CIFAR-10, Caltech-UCSD-Birds, and Lifelog datasets. The experimental results show that IMM achieves state-of-the-art performance by balancing the information between an old and a new network.
Tasks	Transfer Learning
Published	2017-03-24
URL	http://arxiv.org/abs/1703.08475v3
PDF	http://arxiv.org/pdf/1703.08475v3.pdf
PWC	https://paperswithcode.com/paper/overcoming-catastrophic-forgetting-by
Repo	https://github.com/btjhjeon/IMM_tensorflow
Framework	tf

Knowledge Graph Embedding with Iterative Guidance from Soft Rules


Title	Knowledge Graph Embedding with Iterative Guidance from Soft Rules
Authors	Shu Guo, Quan Wang, Lihong Wang, Bin Wang, Li Guo
Abstract	Embedding knowledge graphs (KGs) into continuous vector spaces is a focus of current research. Combining such an embedding model with logic rules has recently attracted increasing attention. Most previous attempts made a one-time injection of logic rules, ignoring the interactive nature between embedding learning and logical inference. And they focused only on hard rules, which always hold with no exception and usually require extensive manual effort to create or validate. In this paper, we propose Rule-Guided Embedding (RUGE), a novel paradigm of KG embedding with iterative guidance from soft rules. RUGE enables an embedding model to learn simultaneously from 1) labeled triples that have been directly observed in a given KG, 2) unlabeled triples whose labels are going to be predicted iteratively, and 3) soft rules with various confidence levels extracted automatically from the KG. In the learning process, RUGE iteratively queries rules to obtain soft labels for unlabeled triples, and integrates such newly labeled triples to update the embedding model. Through this iterative procedure, knowledge embodied in logic rules may be better transferred into the learned embeddings. We evaluate RUGE in link prediction on Freebase and YAGO. Experimental results show that: 1) with rule knowledge injected iteratively, RUGE achieves significant and consistent improvements over state-of-the-art baselines; and 2) despite their uncertainties, automatically extracted soft rules are highly beneficial to KG embedding, even those with moderate confidence levels. The code and data used for this paper can be obtained from https://github.com/iieir-km/RUGE.
Tasks	Graph Embedding, Knowledge Graph Embedding, Knowledge Graphs, Link Prediction
Published	2017-11-30
URL	http://arxiv.org/abs/1711.11231v1
PDF	http://arxiv.org/pdf/1711.11231v1.pdf
PWC	https://paperswithcode.com/paper/knowledge-graph-embedding-with-iterative
Repo	https://github.com/iieir-km/RUGE
Framework	none

NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks


Title	NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks
Authors	Ermao Cai, Da-Cheng Juan, Dimitrios Stamoulis, Diana Marculescu
Abstract	“How much energy is consumed for an inference made by a convolutional neural network (CNN)?” With the increased popularity of CNNs deployed on the wide-spectrum of platforms (from mobile devices to workstations), the answer to this question has drawn significant attention. From lengthening battery life of mobile devices to reducing the energy bill of a datacenter, it is important to understand the energy efficiency of CNNs during serving for making an inference, before actually training the model. In this work, we propose NeuralPower: a layer-wise predictive framework based on sparse polynomial regression, for predicting the serving energy consumption of a CNN deployed on any GPU platform. Given the architecture of a CNN, NeuralPower provides an accurate prediction and breakdown for power and runtime across all layers in the whole network, helping machine learners quickly identify the power, runtime, or energy bottlenecks. We also propose the “energy-precision ratio” (EPR) metric to guide machine learners in selecting an energy-efficient CNN architecture that better trades off the energy consumption and prediction accuracy. The experimental results show that the prediction accuracy of the proposed NeuralPower outperforms the best published model to date, yielding an improvement in accuracy of up to 68.5%. We also assess the accuracy of predictions at the network level, by predicting the runtime, power, and energy of state-of-the-art CNN architectures, achieving an average accuracy of 88.24% in runtime, 88.34% in power, and 97.21% in energy. We comprehensively corroborate the effectiveness of NeuralPower as a powerful framework for machine learners by testing it on different GPU platforms and Deep Learning software tools.
Tasks
Published	2017-10-15
URL	http://arxiv.org/abs/1710.05420v1
PDF	http://arxiv.org/pdf/1710.05420v1.pdf
PWC	https://paperswithcode.com/paper/neuralpower-predict-and-deploy-energy
Repo	https://github.com/cmu-enyac/NeuralPower
Framework	none

Stochastic Variance Reduction for Policy Gradient Estimation


Title	Stochastic Variance Reduction for Policy Gradient Estimation
Authors	Tianbing Xu, Qiang Liu, Jian Peng
Abstract	Recent advances in policy gradient methods and deep learning have demonstrated their applicability for complex reinforcement learning problems. However, the variance of the performance gradient estimates obtained from the simulation is often excessive, leading to poor sample efficiency. In this paper, we apply the stochastic variance reduced gradient descent (SVRG) to model-free policy gradient to significantly improve the sample-efficiency. The SVRG estimation is incorporated into a trust-region Newton conjugate gradient framework for the policy optimization. On several Mujoco tasks, our method achieves significantly better performance compared to the state-of-the-art model-free policy gradient methods in robotic continuous control such as trust region policy optimization (TRPO)
Tasks	Continuous Control, Policy Gradient Methods
Published	2017-10-17
URL	http://arxiv.org/abs/1710.06034v4
PDF	http://arxiv.org/pdf/1710.06034v4.pdf
PWC	https://paperswithcode.com/paper/stochastic-variance-reduction-for-policy
Repo	https://github.com/tianbingsz/MLResearch
Framework	none

Dynamic Data Selection for Neural Machine Translation


Title	Dynamic Data Selection for Neural Machine Translation
Authors	Marlies van der Wees, Arianna Bisazza, Christof Monz
Abstract	Intelligent selection of training data has proven a successful technique to simultaneously increase training efficiency and translation performance for phrase-based machine translation (PBMT). With the recent increase in popularity of neural machine translation (NMT), we explore in this paper to what extent and how NMT can also benefit from data selection. While state-of-the-art data selection (Axelrod et al., 2011) consistently performs well for PBMT, we show that gains are substantially lower for NMT. Next, we introduce dynamic data selection for NMT, a method in which we vary the selected subset of training data between different training epochs. Our experiments show that the best results are achieved when applying a technique we call gradual fine-tuning, with improvements up to +2.6 BLEU over the original data selection approach and up to +3.1 BLEU over a general baseline.
Tasks	Machine Translation
Published	2017-08-02
URL	http://arxiv.org/abs/1708.00712v1
PDF	http://arxiv.org/pdf/1708.00712v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-data-selection-for-neural-machine
Repo	https://github.com/marliesvanderwees/dds-nmt
Framework	none

Exploiting Cross-Sentence Context for Neural Machine Translation


Title	Exploiting Cross-Sentence Context for Neural Machine Translation
Authors	Longyue Wang, Zhaopeng Tu, Andy Way, Qun Liu
Abstract	In translation, considering the document as a whole can help to resolve ambiguities and inconsistencies. In this paper, we propose a cross-sentence context-aware approach and investigate the influence of historical contextual information on the performance of neural machine translation (NMT). First, this history is summarized in a hierarchical way. We then integrate the historical representation into NMT in two strategies: 1) a warm-start of encoder and decoder states, and 2) an auxiliary context source for updating decoder states. Experimental results on a large Chinese-English translation task show that our approach significantly improves upon a strong attention-based NMT system by up to +2.1 BLEU points.
Tasks	Machine Translation
Published	2017-04-14
URL	http://arxiv.org/abs/1704.04347v3
PDF	http://arxiv.org/pdf/1704.04347v3.pdf
PWC	https://paperswithcode.com/paper/exploiting-cross-sentence-context-for-neural
Repo	https://github.com/tuzhaopeng/LC-NMT
Framework	none

CharManteau: Character Embedding Models For Portmanteau Creation


Title	CharManteau: Character Embedding Models For Portmanteau Creation
Authors	Varun Gangal, Harsh Jhamtani, Graham Neubig, Eduard Hovy, Eric Nyberg
Abstract	Portmanteaus are a word formation phenomenon where two words are combined to form a new word. We propose character-level neural sequence-to-sequence (S2S) methods for the task of portmanteau generation that are end-to-end-trainable, language independent, and do not explicitly use additional phonetic information. We propose a noisy-channel-style model, which allows for the incorporation of unsupervised word lists, improving performance over a standard source-to-target model. This model is made possible by an exhaustive candidate generation strategy specifically enabled by the features of the portmanteau task. Experiments find our approach superior to a state-of-the-art FST-based baseline with respect to ground truth accuracy and human evaluation.
Tasks
Published	2017-07-04
URL	http://arxiv.org/abs/1707.01176v2
PDF	http://arxiv.org/pdf/1707.01176v2.pdf
PWC	https://paperswithcode.com/paper/charmanteau-character-embedding-models-for
Repo	https://github.com/vgtomahawk/Charmanteau-CamReady
Framework	none

Neural Machine Translation and Sequence-to-sequence Models: A Tutorial


Title	Neural Machine Translation and Sequence-to-sequence Models: A Tutorial
Authors	Graham Neubig
Abstract	This tutorial introduces a new and powerful set of techniques variously called “neural machine translation” or “neural sequence-to-sequence models”. These techniques have been used in a number of tasks regarding the handling of human language, and can be a powerful tool in the toolbox of anyone who wants to model sequential data of some sort. The tutorial assumes that the reader knows the basics of math and programming, but does not assume any particular experience with neural networks or natural language processing. It attempts to explain the intuition behind the various methods covered, then delves into them with enough mathematical detail to understand them concretely, and culiminates with a suggestion for an implementation exercise, where readers can test that they understood the content in practice.
Tasks	Machine Translation
Published	2017-03-05
URL	http://arxiv.org/abs/1703.01619v1
PDF	http://arxiv.org/pdf/1703.01619v1.pdf
PWC	https://paperswithcode.com/paper/neural-machine-translation-and-sequence-to
Repo	https://github.com/nier79/Machine-translation
Framework	tf

Generative Modeling with Conditional Autoencoders: Building an Integrated Cell


Title	Generative Modeling with Conditional Autoencoders: Building an Integrated Cell
Authors	Gregory R. Johnson, Rory M. Donovan-Maiye, Mary M. Maleckar
Abstract	We present a conditional generative model to learn variation in cell and nuclear morphology and the location of subcellular structures from microscopy images. Our model generalizes to a wide range of subcellular localization and allows for a probabilistic interpretation of cell and nuclear morphology and structure localization from fluorescence images. We demonstrate the effectiveness of our approach by producing photo-realistic cell images using our generative model. The conditional nature of the model provides the ability to predict the localization of unobserved structures given cell and nuclear morphology.
Tasks
Published	2017-04-28
URL	http://arxiv.org/abs/1705.00092v1
PDF	http://arxiv.org/pdf/1705.00092v1.pdf
PWC	https://paperswithcode.com/paper/generative-modeling-with-conditional
Repo	https://github.com/AllenCellModeling/pytorch_integrated_cell
Framework	pytorch

Structural Regularities in Text-based Entity Vector Spaces


Title	Structural Regularities in Text-based Entity Vector Spaces
Authors	Christophe Van Gysel, Maarten de Rijke, Evangelos Kanoulas
Abstract	Entity retrieval is the task of finding entities such as people or products in response to a query, based solely on the textual documents they are associated with. Recent semantic entity retrieval algorithms represent queries and experts in finite-dimensional vector spaces, where both are constructed from text sequences. We investigate entity vector spaces and the degree to which they capture structural regularities. Such vector spaces are constructed in an unsupervised manner without explicit information about structural aspects. For concreteness, we address these questions for a specific type of entity: experts in the context of expert finding. We discover how clusterings of experts correspond to committees in organizations, the ability of expert representations to encode the co-author graph, and the degree to which they encode academic rank. We compare latent, continuous representations created using methods based on distributional semantics (LSI), topic models (LDA) and neural networks (word2vec, doc2vec, SERT). Vector spaces created using neural methods, such as doc2vec and SERT, systematically perform better at clustering than LSI, LDA and word2vec. When it comes to encoding entity relations, SERT performs best.
Tasks	Topic Models
Published	2017-07-25
URL	http://arxiv.org/abs/1707.07930v1
PDF	http://arxiv.org/pdf/1707.07930v1.pdf
PWC	https://paperswithcode.com/paper/structural-regularities-in-text-based-entity
Repo	https://github.com/cvangysel/SERT
Framework	none