Paper Group AWR 44
Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm. XLNet: Generalized Autoregressive Pretraining for Language Understanding. Automatic Posterior Transformation for Likelihood-Free Inference. Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control. Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommenda …
Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm
Title | Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm |
Authors | Andrew Redd, Kaung Khin, Aldo Marini |
Abstract | Due to their prevalence, time series forecasting is crucial in multiple domains. We seek to make state-of-the-art forecasting fast, accessible, and generalizable. ES-RNN is a hybrid between classical state space forecasting models and modern RNNs that achieved a 9.4% sMAPE improvement in the M4 competition. Crucially, ES-RNN implementation requires per-time series parameters. By vectorizing the original implementation and porting the algorithm to a GPU, we achieve up to 322x training speedup depending on batch size with similar results as those reported in the original submission. Our code can be found at: https://github.com/damitkwr/ESRNN-GPU |
Tasks | Time Series, Time Series Forecasting |
Published | 2019-07-07 |
URL | https://arxiv.org/abs/1907.03329v1 |
https://arxiv.org/pdf/1907.03329v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-es-rnn-a-gpu-implementation-of-the-es |
Repo | https://github.com/damitkwr/ESRNN-GPU |
Framework | pytorch |
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Title | XLNet: Generalized Autoregressive Pretraining for Language Understanding |
Authors | Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le |
Abstract | With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, under comparable experiment settings, XLNet outperforms BERT on 20 tasks, often by a large margin, including question answering, natural language inference, sentiment analysis, and document ranking. |
Tasks | Document Ranking, Language Modelling, Natural Language Inference, Question Answering, Reading Comprehension, Semantic Textual Similarity, Sentiment Analysis, Text Classification |
Published | 2019-06-19 |
URL | https://arxiv.org/abs/1906.08237v2 |
https://arxiv.org/pdf/1906.08237v2.pdf | |
PWC | https://paperswithcode.com/paper/xlnet-generalized-autoregressive-pretraining |
Repo | https://github.com/graykode/xlnet-Pytorch |
Framework | pytorch |
Automatic Posterior Transformation for Likelihood-Free Inference
Title | Automatic Posterior Transformation for Likelihood-Free Inference |
Authors | David S. Greenberg, Marcel Nonnenmacher, Jakob H. Macke |
Abstract | How can one perform Bayesian inference on stochastic simulators with intractable likelihoods? A recent approach is to learn the posterior from adaptively proposed simulations using neural network-based conditional density estimators. However, existing methods are limited to a narrow range of proposal distributions or require importance weighting that can limit performance in practice. Here we present automatic posterior transformation (APT), a new sequential neural posterior estimation method for simulation-based inference. APT can modify the posterior estimate using arbitrary, dynamically updated proposals, and is compatible with powerful flow-based density estimators. It is more flexible, scalable and efficient than previous simulation-based inference techniques. APT can operate directly on high-dimensional time series and image data, opening up new applications for likelihood-free inference. |
Tasks | Bayesian Inference, Time Series |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.07488v1 |
https://arxiv.org/pdf/1905.07488v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-posterior-transformation-for |
Repo | https://github.com/mackelab/sbi |
Framework | pytorch |
Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control
Title | Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control |
Authors | Yaofeng Desmond Zhong, Biswadip Dey, Amit Chakraborty |
Abstract | In this paper, we introduce Symplectic ODE-Net (SymODEN), a deep learning framework which can infer the dynamics of a physical system, given by an ordinary differential equation (ODE), from observed state trajectories. To achieve better generalization with fewer training samples, SymODEN incorporates appropriate inductive bias by designing the associated computation graph in a physics-informed manner. In particular, we enforce Hamiltonian dynamics with control to learn the underlying dynamics in a transparent way, which can then be leveraged to draw insight about relevant physical aspects of the system, such as mass and potential energy. In addition, we propose a parametrization which can enforce this Hamiltonian formalism even when the generalized coordinate data is embedded in a high-dimensional space or we can only access velocity data instead of generalized momentum. This framework, by offering interpretable, physically-consistent models for physical systems, opens up new possibilities for synthesizing model-based control strategies. |
Tasks | |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.12077v3 |
https://arxiv.org/pdf/1909.12077v3.pdf | |
PWC | https://paperswithcode.com/paper/symplectic-ode-net-learning-hamiltonian |
Repo | https://github.com/d-biswa/Symplectic-ODENet |
Framework | pytorch |
Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems
Title | Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems |
Authors | Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, Jiyan Yang |
Abstract | Modern deep learning-based recommendation systems exploit hundreds to thousands of different categorical features, each with millions of different categories ranging from clicks to posts. To respect the natural diversity within the categorical data, embeddings map each category to a unique dense representation within an embedded space. Since each categorical feature could take on as many as tens of millions of different possible categories, the embedding tables form the primary memory bottleneck during both training and inference. We propose a novel approach for reducing the embedding size in an end-to-end fashion by exploiting complementary partitions of the category set to produce a unique embedding vector for each category without explicit definition. By storing multiple smaller embedding tables based on each complementary partition and combining embeddings from each table, we define a unique embedding for each category at smaller cost. This approach may be interpreted as using a specific fixed codebook to ensure uniqueness of each category’s representation. Our experimental results demonstrate the effectiveness of our approach over the hashing trick for reducing the size of the embedding tables in terms of model loss and accuracy, while retaining a similar reduction in the number of parameters. |
Tasks | Recommendation Systems |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.02107v1 |
https://arxiv.org/pdf/1909.02107v1.pdf | |
PWC | https://paperswithcode.com/paper/compositional-embeddings-using-complementary |
Repo | https://github.com/facebookresearch/dlrm |
Framework | pytorch |
Bayesian Temporal Factorization for Multidimensional Time Series Prediction
Title | Bayesian Temporal Factorization for Multidimensional Time Series Prediction |
Authors | Lijun Sun, Xinyu Chen |
Abstract | Large-scale and multidimensional spatiotemporal data sets are becoming ubiquitous in many real-world applications such as monitoring urban traffic and air quality. Making predictions on these time series has become a critical challenge due to not only the large-scale and high-dimensional nature but also the considerable amount of missing data. In this paper, we propose a Bayesian temporal factorization (BTF) framework for modeling multidimensional time series—in particular spatiotemporal data—in the presence of missing values. By integrating low-rank matrix/tensor factorization and vector autoregressive (VAR) process into a single probabilistic graphical model, this framework can characterize both global and local consistencies in large-scale time series data. The graphical model allows us to effectively perform probabilistic predictions and produce uncertainty estimates without imputing those missing values. We develop efficient Gibbs sampling algorithms for model inference and test the proposed BTF framework on several real-world spatiotemporal data sets for both missing data imputation and short-term/long-term rolling prediction tasks. The numerical experiments demonstrate the superiority of the proposed BTF approaches over many state-of-the-art techniques. |
Tasks | Imputation, Time Series, Time Series Prediction |
Published | 2019-10-14 |
URL | https://arxiv.org/abs/1910.06366v1 |
https://arxiv.org/pdf/1910.06366v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-temporal-factorization-for |
Repo | https://github.com/xinychen/awesome-latex-drawing |
Framework | none |
Scalable Realistic Recommendation Datasets through Fractal Expansions
Title | Scalable Realistic Recommendation Datasets through Fractal Expansions |
Authors | Francois Belletti, Karthik Lakshmanan, Walid Krichene, Yi-Fan Chen, John Anderson |
Abstract | Recommender System research suffers currently from a disconnect between the size of academic data sets and the scale of industrial production systems. In order to bridge that gap we propose to generate more massive user/item interaction data sets by expanding pre-existing public data sets. User/item incidence matrices record interactions between users and items on a given platform as a large sparse matrix whose rows correspond to users and whose columns correspond to items. Our technique expands such matrices to larger numbers of rows (users), columns (items) and non zero values (interactions) while preserving key higher order statistical properties. We adapt the Kronecker Graph Theory to user/item incidence matrices and show that the corresponding fractal expansions preserve the fat-tailed distributions of user engagements, item popularity and singular value spectra of user/item interaction matrices. Preserving such properties is key to building large realistic synthetic data sets which in turn can be employed reliably to benchmark Recommender Systems and the systems employed to train them. We provide algorithms to produce such expansions and apply them to the MovieLens 20 million data set comprising 20 million ratings of 27K movies by 138K users. The resulting expanded data set has 10 billion ratings, 864K items and 2 million users in its smaller version and can be scaled up or down. A larger version features 655 billion ratings, 7 million items and 17 million users. |
Tasks | Recommendation Systems |
Published | 2019-01-23 |
URL | http://arxiv.org/abs/1901.08910v3 |
http://arxiv.org/pdf/1901.08910v3.pdf | |
PWC | https://paperswithcode.com/paper/scalable-realistic-recommendation-datasets |
Repo | https://github.com/mlperf/training/tree/master/data_generation |
Framework | pytorch |
Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition
Title | Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition |
Authors | Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, Qi Tian |
Abstract | Action recognition with skeleton data has recently attracted much attention in computer vision. Previous studies are mostly based on fixed skeleton graphs, only capturing local physical dependencies among joints, which may miss implicit joint correlations. To capture richer dependencies, we introduce an encoder-decoder structure, called A-link inference module, to capture action-specific latent dependencies, i.e. actional links, directly from actions. We also extend the existing skeleton graphs to represent higher-order dependencies, i.e. structural links. Combing the two types of links into a generalized skeleton graph, we further propose the actional-structural graph convolution network (AS-GCN), which stacks actional-structural graph convolution and temporal convolution as a basic building block, to learn both spatial and temporal features for action recognition. A future pose prediction head is added in parallel to the recognition head to help capture more detailed action patterns through self-supervision. We validate AS-GCN in action recognition using two skeleton data sets, NTU-RGB+D and Kinetics. The proposed AS-GCN achieves consistently large improvement compared to the state-of-the-art methods. As a side product, AS-GCN also shows promising results for future pose prediction. |
Tasks | Pose Prediction, Skeleton Based Action Recognition, Temporal Action Localization |
Published | 2019-04-26 |
URL | http://arxiv.org/abs/1904.12659v1 |
http://arxiv.org/pdf/1904.12659v1.pdf | |
PWC | https://paperswithcode.com/paper/actional-structural-graph-convolutional |
Repo | https://github.com/limaosen0/AS-GCN |
Framework | pytorch |
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding
Title | Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding |
Authors | Mathew Monfort, Kandan Ramakrishnan, Alex Andonian, Barry A McNamara, Alex Lascelles, Bowen Pan, Quanfu Fan, Dan Gutfreund, Rogerio Feris, Aude Oliva |
Abstract | An event happening in the world is often made of different activities and actions that can unfold simultaneously or sequentially within a few seconds. However, most large-scale datasets built to train models for action recognition provide a single label per video clip. Consequently, models can be incorrectly penalized for classifying actions that exist in the videos but are not explicitly labeled and do not learn the full spectrum of information that would be mandatory to more completely comprehend different events and eventually learn causality between them. Towards this goal, we augmented the existing video dataset, Moments in Time (MiT), to include over two million action labels for over one million three second videos. This multi-label dataset introduces novel challenges on how to train and analyze models for multi-action detection. Here, we present baseline results for multi-action recognition using loss functions adapted for long tail multi-label learning and provide improved methods for visualizing and interpreting models trained for multi-label action detection. |
Tasks | Action Detection, Multi-Label Learning, Video Understanding |
Published | 2019-11-01 |
URL | https://arxiv.org/abs/1911.00232v3 |
https://arxiv.org/pdf/1911.00232v3.pdf | |
PWC | https://paperswithcode.com/paper/multi-moments-in-time-learning-and |
Repo | https://github.com/zhoubolei/moments_models |
Framework | pytorch |
r/Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection
Title | r/Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection |
Authors | Kai Nakamura, Sharon Levy, William Yang Wang |
Abstract | Fake news has altered society in negative ways in politics and culture. It has adversely affected both online social network systems as well as offline communities and conversations. Using automatic machine learning classification models is an efficient way to combat the widespread dissemination of fake news. However, a lack of effective, comprehensive datasets has been a problem for fake news research and detection model development. Prior fake news datasets do not provide multimodal text and image data, metadata, comment data, and fine-grained fake news categorization at the scale and breadth of our dataset. We present Fakeddit, a novel multimodal dataset consisting of over 1 million samples from multiple categories of fake news. After being processed through several stages of review, the samples are labeled according to 2-way, 3-way, and 6-way classification categories through distant supervision. We construct hybrid text+image models and perform extensive experiments for multiple variations of classification, demonstrating the importance of the novel aspect of multimodality and fine-grained classification unique to Fakeddit. |
Tasks | Fake News Detection |
Published | 2019-11-10 |
URL | https://arxiv.org/abs/1911.03854v2 |
https://arxiv.org/pdf/1911.03854v2.pdf | |
PWC | https://paperswithcode.com/paper/rfakeddit-a-new-multimodal-benchmark-dataset |
Repo | https://github.com/entitize/fakeddit |
Framework | none |
Context-Aware Visual Policy Network for Fine-Grained Image Captioning
Title | Context-Aware Visual Policy Network for Fine-Grained Image Captioning |
Authors | Zheng-Jun Zha, Daqing Liu, Hanwang Zhang, Yongdong Zhang, Feng Wu |
Abstract | With the maturity of visual detection techniques, we are more ambitious in describing visual content with open-vocabulary, fine-grained and free-form language, i.e., the task of image captioning. In particular, we are interested in generating longer, richer and more fine-grained sentences and paragraphs as image descriptions. Image captioning can be translated to the task of sequential language prediction given visual content, where the output sequence forms natural language description with plausible grammar. However, existing image captioning methods focus only on language policy while not visual policy, and thus fail to capture visual context that are crucial for compositional reasoning such as object relationships (e.g., “man riding horse”) and visual comparisons (e.g., “small(er) cat”). This issue is especially severe when generating longer sequences such as a paragraph. To fill the gap, we propose a Context-Aware Visual Policy network (CAVP) for fine-grained image-to-language generation: image sentence captioning and image paragraph captioning. During captioning, CAVP explicitly considers the previous visual attentions as context, and decides whether the context is used for the current word/sentence generation given the current visual attention. Compared against traditional visual attention mechanism that only fixes a single visual region at each step, CAVP can attend to complex visual compositions over time. The whole image captioning model – CAVP and its subsequent language policy network – can be efficiently optimized end-to-end by using an actor-critic policy gradient method. We have demonstrated the effectiveness of CAVP by state-of-the-art performances on MS-COCO and Stanford captioning datasets, using various metrics and sensible visualizations of qualitative visual context. |
Tasks | Image Captioning, Image Paragraph Captioning, Text Generation |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02365v1 |
https://arxiv.org/pdf/1906.02365v1.pdf | |
PWC | https://paperswithcode.com/paper/context-aware-visual-policy-network-for-fine |
Repo | https://github.com/daqingliu/CAVP |
Framework | pytorch |
Towards Accurate One-Stage Object Detection with AP-Loss
Title | Towards Accurate One-Stage Object Detection with AP-Loss |
Authors | Kean Chen, Jianguo Li, Weiyao Lin, John See, Ji Wang, Lingyu Duan, Zhibo Chen, Changwei He, Junni Zou |
Abstract | One-stage object detectors are trained by optimizing classification-loss and localization-loss simultaneously, with the former suffering much from extreme foreground-background class imbalance issue due to the large number of anchors. This paper alleviates this issue by proposing a novel framework to replace the classification task in one-stage detectors with a ranking task, and adopting the Average-Precision loss (AP-loss) for the ranking problem. Due to its non-differentiability and non-convexity, the AP-loss cannot be optimized directly. For this purpose, we develop a novel optimization algorithm, which seamlessly combines the error-driven update scheme in perceptron learning and backpropagation algorithm in deep networks. We verify good convergence property of the proposed algorithm theoretically and empirically. Experimental results demonstrate notable performance improvement in state-of-the-art one-stage detectors based on AP-loss over different kinds of classification-losses on various benchmarks, without changing the network architectures. Code is available at https://github.com/cccorn/AP-loss. |
Tasks | Object Detection |
Published | 2019-04-12 |
URL | https://arxiv.org/abs/1904.06373v3 |
https://arxiv.org/pdf/1904.06373v3.pdf | |
PWC | https://paperswithcode.com/paper/towards-accurate-one-stage-object-detection |
Repo | https://github.com/cccorn/AP-loss |
Framework | pytorch |
Forgetting to learn logic programs
Title | Forgetting to learn logic programs |
Authors | Andrew Cropper |
Abstract | Most program induction approaches require predefined, often hand-engineered, background knowledge (BK). To overcome this limitation, we explore methods to automatically acquire BK through multi-task learning. In this approach, a learner adds learned programs to its BK so that they can be reused to help learn other programs. To improve learning performance, we explore the idea of forgetting, where a learner can additionally remove programs from its BK. We consider forgetting in an inductive logic programming (ILP) setting. We show that forgetting can significantly reduce both the size of the hypothesis space and the sample complexity of an ILP learner. We introduce Forgetgol, a multi-task ILP learner which supports forgetting. We experimentally compare Forgetgol against approaches that either remember or forget everything. Our experimental results show that Forgetgol outperforms the alternative approaches when learning from over 10,000 tasks. |
Tasks | Multi-Task Learning |
Published | 2019-11-15 |
URL | https://arxiv.org/abs/1911.06643v1 |
https://arxiv.org/pdf/1911.06643v1.pdf | |
PWC | https://paperswithcode.com/paper/forgetting-to-learn-logic-programs |
Repo | https://github.com/metagol/metagol |
Framework | none |
Root Mean Square Layer Normalization
Title | Root Mean Square Layer Normalization |
Authors | Biao Zhang, Rico Sennrich |
Abstract | Layer normalization (LayerNorm) has been successfully applied to various deep neural networks to help stabilize training and boost model convergence because of its capability in handling re-centering and re-scaling of both inputs and weight matrix. However, the computational overhead introduced by LayerNorm makes these improvements expensive and significantly slows the underlying network, e.g. RNN in particular. In this paper, we hypothesize that re-centering invariance in LayerNorm is dispensable and propose root mean square layer normalization, or RMSNorm. RMSNorm regularizes the summed inputs to a neuron in one layer according to root mean square (RMS), giving the model re-scaling invariance property and implicit learning rate adaptation ability. RMSNorm is computationally simpler and thus more efficient than LayerNorm. We also present partial RMSNorm, or pRMSNorm where the RMS is estimated from p% of the summed inputs without breaking the above properties. Extensive experiments on several tasks using diverse network architectures show that RMSNorm achieves comparable performance against LayerNorm but reduces the running time by 7%~64% on different models. Source code is available at https://github.com/bzhangGo/rmsnorm. |
Tasks | |
Published | 2019-10-16 |
URL | https://arxiv.org/abs/1910.07467v1 |
https://arxiv.org/pdf/1910.07467v1.pdf | |
PWC | https://paperswithcode.com/paper/root-mean-square-layer-normalization |
Repo | https://github.com/bzhangGo/rmsnorm |
Framework | pytorch |
Evaluating Generalization Ability of Convolutional Neural Networks and Capsule Networks for Image Classification via Top-2 Classification
Title | Evaluating Generalization Ability of Convolutional Neural Networks and Capsule Networks for Image Classification via Top-2 Classification |
Authors | Hao Ren, Jianlin Su, Hong Lu |
Abstract | Image classification is a challenging problem which aims to identify the category of object in the image. In recent years, deep Convolutional Neural Networks (CNNs) have been applied to handle this task, and impressive improvement has been achieved. However, some research showed the output of CNNs can be easily altered by adding relatively small perturbations to the input image, such as modifying few pixels. Recently, Capsule Networks (CapsNets) are proposed, which can help eliminating this limitation. Experiments on MNIST dataset revealed that capsules can better characterize the features of object than CNNs. But it’s hard to find a suitable quantitative method to compare the generalization ability of CNNs and CapsNets. In this paper, we propose a new image classification task called Top-2 classification to evaluate the generalization ability of CNNs and CapsNets. The models are trained on single label image samples same as the traditional image classification task. But in the test stage, we randomly concatenate two test image samples which contain different labels, and then use the trained models to predict the top-2 labels on the unseen newly-created two label image samples. This task can provide us precise quantitative results to compare the generalization ability of CNNs and CapsNets. Back to the CapsNet, because it uses Full Connectivity (FC) mechanism among all capsules, it requires many parameters. To reduce the number of parameters, we introduce the Parameter-Sharing (PS) mechanism between capsules. Experiments on five widely used benchmark image datasets demonstrate the method significantly reduces the number of parameters, without losing the effectiveness of extracting features. Further, on the Top-2 classification task, the proposed PS CapsNets obtain impressive higher accuracy compared to the traditional CNNs and FC CapsNets by a large margin. |
Tasks | Image Classification |
Published | 2019-01-29 |
URL | http://arxiv.org/abs/1901.10112v2 |
http://arxiv.org/pdf/1901.10112v2.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-generalization-ability-of |
Repo | https://github.com/leftthomas/PSCapsNet |
Framework | pytorch |