Paper Group AWR 50
Ray: A Distributed Framework for Emerging AI Applications. Bayesian Recurrent Neural Networks. Machine learning modeling for time series problem: Predicting flight ticket prices. MonoPerfCap: Human Performance Capture from Monocular Video. Mimicking Word Embeddings using Subword RNNs. Overcoming Catastrophic Forgetting by Incremental Moment Matchin …
Ray: A Distributed Framework for Emerging AI Applications
Title | Ray: A Distributed Framework for Emerging AI Applications |
Authors | Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, Ion Stoica |
Abstract | The next generation of AI applications will continuously interact with the environment and learn from these interactions. These applications impose new and demanding systems requirements, both in terms of performance and flexibility. In this paper, we consider these requirements and present Ray—a distributed system to address them. Ray implements a unified interface that can express both task-parallel and actor-based computations, supported by a single dynamic execution engine. To meet the performance requirements, Ray employs a distributed scheduler and a distributed and fault-tolerant store to manage the system’s control state. In our experiments, we demonstrate scaling beyond 1.8 million tasks per second and better performance than existing specialized systems for several challenging reinforcement learning applications. |
Tasks | |
Published | 2017-12-16 |
URL | http://arxiv.org/abs/1712.05889v2 |
http://arxiv.org/pdf/1712.05889v2.pdf | |
PWC | https://paperswithcode.com/paper/ray-a-distributed-framework-for-emerging-ai |
Repo | https://github.com/ray-project/ray |
Framework | tf |
Bayesian Recurrent Neural Networks
Title | Bayesian Recurrent Neural Networks |
Authors | Meire Fortunato, Charles Blundell, Oriol Vinyals |
Abstract | In this work we explore a straightforward variational Bayes scheme for Recurrent Neural Networks. Firstly, we show that a simple adaptation of truncated backpropagation through time can yield good quality uncertainty estimates and superior regularisation at only a small extra computational cost during training, also reducing the amount of parameters by 80%. Secondly, we demonstrate how a novel kind of posterior approximation yields further improvements to the performance of Bayesian RNNs. We incorporate local gradient information into the approximate posterior to sharpen it around the current batch statistics. We show how this technique is not exclusive to recurrent neural networks and can be applied more widely to train Bayesian neural networks. We also empirically demonstrate how Bayesian RNNs are superior to traditional RNNs on a language modelling benchmark and an image captioning task, as well as showing how each of these methods improve our model over a variety of other schemes for training them. We also introduce a new benchmark for studying uncertainty for language models so future methods can be easily compared. |
Tasks | Image Captioning, Language Modelling |
Published | 2017-04-10 |
URL | https://arxiv.org/abs/1704.02798v4 |
https://arxiv.org/pdf/1704.02798v4.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-recurrent-neural-networks |
Repo | https://github.com/JP-MRPhys/bayesianLSTM |
Framework | tf |
Machine learning modeling for time series problem: Predicting flight ticket prices
Title | Machine learning modeling for time series problem: Predicting flight ticket prices |
Authors | Jun Lu |
Abstract | Machine learning has been used in all kinds of fields. In this article, we introduce how machine learning can be applied into time series problem. Especially, we use the airline ticket prediction problem as our specific problem. Airline companies use many different variables to determine the flight ticket prices: indicator whether the travel is during the holidays, the number of free seats in the plane etc. Some of the variables are observed, but some of them are hidden. Based on the data over a 103 day period, we trained our models, getting the best model - which is AdaBoost-Decision Tree Classification. This algorithm has best performance over the observed 8 routes which has 61.35$%$ better performance than the random purchase strategy, and relatively small variance over these routes. And we also considered the situation that we cannot get too much historical datas for some routes (for example the route is new and does not have historical data) or we do not want to train historical data to predict to buy or wait quickly, in which problem, we used HMM Sequence Classification based AdaBoost-Decision Tree Classification to perform our prediction on 12 new routes. Finally, we got 31.71$%$ better performance than the random purchase strategy. |
Tasks | Time Series |
Published | 2017-05-19 |
URL | http://arxiv.org/abs/1705.07205v2 |
http://arxiv.org/pdf/1705.07205v2.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-modeling-for-time-series |
Repo | https://github.com/junlulocky/AirTicketPredicting |
Framework | none |
MonoPerfCap: Human Performance Capture from Monocular Video
Title | MonoPerfCap: Human Performance Capture from Monocular Video |
Authors | Weipeng Xu, Avishek Chatterjee, Michael Zollhöfer, Helge Rhodin, Dushyant Mehta, Hans-Peter Seidel, Christian Theobalt |
Abstract | We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows to resolve the ambiguities of the monocular reconstruction problem based on a low dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness and scene complexity that can be handled. |
Tasks | Pose Estimation |
Published | 2017-08-07 |
URL | http://arxiv.org/abs/1708.02136v2 |
http://arxiv.org/pdf/1708.02136v2.pdf | |
PWC | https://paperswithcode.com/paper/monoperfcap-human-performance-capture-from |
Repo | https://github.com/daitomanabe/Human-Pose-and-Motion |
Framework | tf |
Mimicking Word Embeddings using Subword RNNs
Title | Mimicking Word Embeddings using Subword RNNs |
Authors | Yuval Pinter, Robert Guthrie, Jacob Eisenstein |
Abstract | Word embeddings improve generalization over lexical features by placing each word in a lower-dimensional space, using distributional information obtained from unlabeled data. However, the effectiveness of word embeddings for downstream NLP tasks is limited by out-of-vocabulary (OOV) words, for which embeddings do not exist. In this paper, we present MIMICK, an approach to generating OOV word embeddings compositionally, by learning a function from spellings to distributional embeddings. Unlike prior work, MIMICK does not require re-training on the original word embedding corpus; instead, learning is performed at the type level. Intrinsic and extrinsic evaluations demonstrate the power of this simple approach. On 23 languages, MIMICK improves performance over a word-based baseline for tagging part-of-speech and morphosyntactic attributes. It is competitive with (and complementary to) a supervised character-based model in low-resource settings. |
Tasks | Word Embeddings |
Published | 2017-07-21 |
URL | http://arxiv.org/abs/1707.06961v1 |
http://arxiv.org/pdf/1707.06961v1.pdf | |
PWC | https://paperswithcode.com/paper/mimicking-word-embeddings-using-subword-rnns |
Repo | https://github.com/50kawa/mimick_chainer |
Framework | none |
Overcoming Catastrophic Forgetting by Incremental Moment Matching
Title | Overcoming Catastrophic Forgetting by Incremental Moment Matching |
Authors | Sang-Woo Lee, Jin-Hwa Kim, Jaehyun Jun, Jung-Woo Ha, Byoung-Tak Zhang |
Abstract | Catastrophic forgetting is a problem of neural networks that loses the information of the first task after training the second task. Here, we propose a method, i.e. incremental moment matching (IMM), to resolve this problem. IMM incrementally matches the moment of the posterior distribution of the neural network which is trained on the first and the second task, respectively. To make the search space of posterior parameter smooth, the IMM procedure is complemented by various transfer learning techniques including weight transfer, L2-norm of the old and the new parameter, and a variant of dropout with the old parameter. We analyze our approach on a variety of datasets including the MNIST, CIFAR-10, Caltech-UCSD-Birds, and Lifelog datasets. The experimental results show that IMM achieves state-of-the-art performance by balancing the information between an old and a new network. |
Tasks | Transfer Learning |
Published | 2017-03-24 |
URL | http://arxiv.org/abs/1703.08475v3 |
http://arxiv.org/pdf/1703.08475v3.pdf | |
PWC | https://paperswithcode.com/paper/overcoming-catastrophic-forgetting-by |
Repo | https://github.com/btjhjeon/IMM_tensorflow |
Framework | tf |
Knowledge Graph Embedding with Iterative Guidance from Soft Rules
Title | Knowledge Graph Embedding with Iterative Guidance from Soft Rules |
Authors | Shu Guo, Quan Wang, Lihong Wang, Bin Wang, Li Guo |
Abstract | Embedding knowledge graphs (KGs) into continuous vector spaces is a focus of current research. Combining such an embedding model with logic rules has recently attracted increasing attention. Most previous attempts made a one-time injection of logic rules, ignoring the interactive nature between embedding learning and logical inference. And they focused only on hard rules, which always hold with no exception and usually require extensive manual effort to create or validate. In this paper, we propose Rule-Guided Embedding (RUGE), a novel paradigm of KG embedding with iterative guidance from soft rules. RUGE enables an embedding model to learn simultaneously from 1) labeled triples that have been directly observed in a given KG, 2) unlabeled triples whose labels are going to be predicted iteratively, and 3) soft rules with various confidence levels extracted automatically from the KG. In the learning process, RUGE iteratively queries rules to obtain soft labels for unlabeled triples, and integrates such newly labeled triples to update the embedding model. Through this iterative procedure, knowledge embodied in logic rules may be better transferred into the learned embeddings. We evaluate RUGE in link prediction on Freebase and YAGO. Experimental results show that: 1) with rule knowledge injected iteratively, RUGE achieves significant and consistent improvements over state-of-the-art baselines; and 2) despite their uncertainties, automatically extracted soft rules are highly beneficial to KG embedding, even those with moderate confidence levels. The code and data used for this paper can be obtained from https://github.com/iieir-km/RUGE. |
Tasks | Graph Embedding, Knowledge Graph Embedding, Knowledge Graphs, Link Prediction |
Published | 2017-11-30 |
URL | http://arxiv.org/abs/1711.11231v1 |
http://arxiv.org/pdf/1711.11231v1.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-graph-embedding-with-iterative |
Repo | https://github.com/iieir-km/RUGE |
Framework | none |
NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks
Title | NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks |
Authors | Ermao Cai, Da-Cheng Juan, Dimitrios Stamoulis, Diana Marculescu |
Abstract | “How much energy is consumed for an inference made by a convolutional neural network (CNN)?” With the increased popularity of CNNs deployed on the wide-spectrum of platforms (from mobile devices to workstations), the answer to this question has drawn significant attention. From lengthening battery life of mobile devices to reducing the energy bill of a datacenter, it is important to understand the energy efficiency of CNNs during serving for making an inference, before actually training the model. In this work, we propose NeuralPower: a layer-wise predictive framework based on sparse polynomial regression, for predicting the serving energy consumption of a CNN deployed on any GPU platform. Given the architecture of a CNN, NeuralPower provides an accurate prediction and breakdown for power and runtime across all layers in the whole network, helping machine learners quickly identify the power, runtime, or energy bottlenecks. We also propose the “energy-precision ratio” (EPR) metric to guide machine learners in selecting an energy-efficient CNN architecture that better trades off the energy consumption and prediction accuracy. The experimental results show that the prediction accuracy of the proposed NeuralPower outperforms the best published model to date, yielding an improvement in accuracy of up to 68.5%. We also assess the accuracy of predictions at the network level, by predicting the runtime, power, and energy of state-of-the-art CNN architectures, achieving an average accuracy of 88.24% in runtime, 88.34% in power, and 97.21% in energy. We comprehensively corroborate the effectiveness of NeuralPower as a powerful framework for machine learners by testing it on different GPU platforms and Deep Learning software tools. |
Tasks | |
Published | 2017-10-15 |
URL | http://arxiv.org/abs/1710.05420v1 |
http://arxiv.org/pdf/1710.05420v1.pdf | |
PWC | https://paperswithcode.com/paper/neuralpower-predict-and-deploy-energy |
Repo | https://github.com/cmu-enyac/NeuralPower |
Framework | none |
Stochastic Variance Reduction for Policy Gradient Estimation
Title | Stochastic Variance Reduction for Policy Gradient Estimation |
Authors | Tianbing Xu, Qiang Liu, Jian Peng |
Abstract | Recent advances in policy gradient methods and deep learning have demonstrated their applicability for complex reinforcement learning problems. However, the variance of the performance gradient estimates obtained from the simulation is often excessive, leading to poor sample efficiency. In this paper, we apply the stochastic variance reduced gradient descent (SVRG) to model-free policy gradient to significantly improve the sample-efficiency. The SVRG estimation is incorporated into a trust-region Newton conjugate gradient framework for the policy optimization. On several Mujoco tasks, our method achieves significantly better performance compared to the state-of-the-art model-free policy gradient methods in robotic continuous control such as trust region policy optimization (TRPO) |
Tasks | Continuous Control, Policy Gradient Methods |
Published | 2017-10-17 |
URL | http://arxiv.org/abs/1710.06034v4 |
http://arxiv.org/pdf/1710.06034v4.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-variance-reduction-for-policy |
Repo | https://github.com/tianbingsz/MLResearch |
Framework | none |
Dynamic Data Selection for Neural Machine Translation
Title | Dynamic Data Selection for Neural Machine Translation |
Authors | Marlies van der Wees, Arianna Bisazza, Christof Monz |
Abstract | Intelligent selection of training data has proven a successful technique to simultaneously increase training efficiency and translation performance for phrase-based machine translation (PBMT). With the recent increase in popularity of neural machine translation (NMT), we explore in this paper to what extent and how NMT can also benefit from data selection. While state-of-the-art data selection (Axelrod et al., 2011) consistently performs well for PBMT, we show that gains are substantially lower for NMT. Next, we introduce dynamic data selection for NMT, a method in which we vary the selected subset of training data between different training epochs. Our experiments show that the best results are achieved when applying a technique we call gradual fine-tuning, with improvements up to +2.6 BLEU over the original data selection approach and up to +3.1 BLEU over a general baseline. |
Tasks | Machine Translation |
Published | 2017-08-02 |
URL | http://arxiv.org/abs/1708.00712v1 |
http://arxiv.org/pdf/1708.00712v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-data-selection-for-neural-machine |
Repo | https://github.com/marliesvanderwees/dds-nmt |
Framework | none |
Exploiting Cross-Sentence Context for Neural Machine Translation
Title | Exploiting Cross-Sentence Context for Neural Machine Translation |
Authors | Longyue Wang, Zhaopeng Tu, Andy Way, Qun Liu |
Abstract | In translation, considering the document as a whole can help to resolve ambiguities and inconsistencies. In this paper, we propose a cross-sentence context-aware approach and investigate the influence of historical contextual information on the performance of neural machine translation (NMT). First, this history is summarized in a hierarchical way. We then integrate the historical representation into NMT in two strategies: 1) a warm-start of encoder and decoder states, and 2) an auxiliary context source for updating decoder states. Experimental results on a large Chinese-English translation task show that our approach significantly improves upon a strong attention-based NMT system by up to +2.1 BLEU points. |
Tasks | Machine Translation |
Published | 2017-04-14 |
URL | http://arxiv.org/abs/1704.04347v3 |
http://arxiv.org/pdf/1704.04347v3.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-cross-sentence-context-for-neural |
Repo | https://github.com/tuzhaopeng/LC-NMT |
Framework | none |
CharManteau: Character Embedding Models For Portmanteau Creation
Title | CharManteau: Character Embedding Models For Portmanteau Creation |
Authors | Varun Gangal, Harsh Jhamtani, Graham Neubig, Eduard Hovy, Eric Nyberg |
Abstract | Portmanteaus are a word formation phenomenon where two words are combined to form a new word. We propose character-level neural sequence-to-sequence (S2S) methods for the task of portmanteau generation that are end-to-end-trainable, language independent, and do not explicitly use additional phonetic information. We propose a noisy-channel-style model, which allows for the incorporation of unsupervised word lists, improving performance over a standard source-to-target model. This model is made possible by an exhaustive candidate generation strategy specifically enabled by the features of the portmanteau task. Experiments find our approach superior to a state-of-the-art FST-based baseline with respect to ground truth accuracy and human evaluation. |
Tasks | |
Published | 2017-07-04 |
URL | http://arxiv.org/abs/1707.01176v2 |
http://arxiv.org/pdf/1707.01176v2.pdf | |
PWC | https://paperswithcode.com/paper/charmanteau-character-embedding-models-for |
Repo | https://github.com/vgtomahawk/Charmanteau-CamReady |
Framework | none |
Neural Machine Translation and Sequence-to-sequence Models: A Tutorial
Title | Neural Machine Translation and Sequence-to-sequence Models: A Tutorial |
Authors | Graham Neubig |
Abstract | This tutorial introduces a new and powerful set of techniques variously called “neural machine translation” or “neural sequence-to-sequence models”. These techniques have been used in a number of tasks regarding the handling of human language, and can be a powerful tool in the toolbox of anyone who wants to model sequential data of some sort. The tutorial assumes that the reader knows the basics of math and programming, but does not assume any particular experience with neural networks or natural language processing. It attempts to explain the intuition behind the various methods covered, then delves into them with enough mathematical detail to understand them concretely, and culiminates with a suggestion for an implementation exercise, where readers can test that they understood the content in practice. |
Tasks | Machine Translation |
Published | 2017-03-05 |
URL | http://arxiv.org/abs/1703.01619v1 |
http://arxiv.org/pdf/1703.01619v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-machine-translation-and-sequence-to |
Repo | https://github.com/nier79/Machine-translation |
Framework | tf |
Generative Modeling with Conditional Autoencoders: Building an Integrated Cell
Title | Generative Modeling with Conditional Autoencoders: Building an Integrated Cell |
Authors | Gregory R. Johnson, Rory M. Donovan-Maiye, Mary M. Maleckar |
Abstract | We present a conditional generative model to learn variation in cell and nuclear morphology and the location of subcellular structures from microscopy images. Our model generalizes to a wide range of subcellular localization and allows for a probabilistic interpretation of cell and nuclear morphology and structure localization from fluorescence images. We demonstrate the effectiveness of our approach by producing photo-realistic cell images using our generative model. The conditional nature of the model provides the ability to predict the localization of unobserved structures given cell and nuclear morphology. |
Tasks | |
Published | 2017-04-28 |
URL | http://arxiv.org/abs/1705.00092v1 |
http://arxiv.org/pdf/1705.00092v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-modeling-with-conditional |
Repo | https://github.com/AllenCellModeling/pytorch_integrated_cell |
Framework | pytorch |
Structural Regularities in Text-based Entity Vector Spaces
Title | Structural Regularities in Text-based Entity Vector Spaces |
Authors | Christophe Van Gysel, Maarten de Rijke, Evangelos Kanoulas |
Abstract | Entity retrieval is the task of finding entities such as people or products in response to a query, based solely on the textual documents they are associated with. Recent semantic entity retrieval algorithms represent queries and experts in finite-dimensional vector spaces, where both are constructed from text sequences. We investigate entity vector spaces and the degree to which they capture structural regularities. Such vector spaces are constructed in an unsupervised manner without explicit information about structural aspects. For concreteness, we address these questions for a specific type of entity: experts in the context of expert finding. We discover how clusterings of experts correspond to committees in organizations, the ability of expert representations to encode the co-author graph, and the degree to which they encode academic rank. We compare latent, continuous representations created using methods based on distributional semantics (LSI), topic models (LDA) and neural networks (word2vec, doc2vec, SERT). Vector spaces created using neural methods, such as doc2vec and SERT, systematically perform better at clustering than LSI, LDA and word2vec. When it comes to encoding entity relations, SERT performs best. |
Tasks | Topic Models |
Published | 2017-07-25 |
URL | http://arxiv.org/abs/1707.07930v1 |
http://arxiv.org/pdf/1707.07930v1.pdf | |
PWC | https://paperswithcode.com/paper/structural-regularities-in-text-based-entity |
Repo | https://github.com/cvangysel/SERT |
Framework | none |