February 2, 2020

3294 words 16 mins read

Paper Group AWR 44

Paper Group AWR 44

Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm. XLNet: Generalized Autoregressive Pretraining for Language Understanding. Automatic Posterior Transformation for Likelihood-Free Inference. Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control. Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommenda …

Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm

Title Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm
Authors Andrew Redd, Kaung Khin, Aldo Marini
Abstract Due to their prevalence, time series forecasting is crucial in multiple domains. We seek to make state-of-the-art forecasting fast, accessible, and generalizable. ES-RNN is a hybrid between classical state space forecasting models and modern RNNs that achieved a 9.4% sMAPE improvement in the M4 competition. Crucially, ES-RNN implementation requires per-time series parameters. By vectorizing the original implementation and porting the algorithm to a GPU, we achieve up to 322x training speedup depending on batch size with similar results as those reported in the original submission. Our code can be found at: https://github.com/damitkwr/ESRNN-GPU
Tasks Time Series, Time Series Forecasting
Published 2019-07-07
URL https://arxiv.org/abs/1907.03329v1
PDF https://arxiv.org/pdf/1907.03329v1.pdf
PWC https://paperswithcode.com/paper/fast-es-rnn-a-gpu-implementation-of-the-es
Repo https://github.com/damitkwr/ESRNN-GPU
Framework pytorch

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Title XLNet: Generalized Autoregressive Pretraining for Language Understanding
Authors Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le
Abstract With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, under comparable experiment settings, XLNet outperforms BERT on 20 tasks, often by a large margin, including question answering, natural language inference, sentiment analysis, and document ranking.
Tasks Document Ranking, Language Modelling, Natural Language Inference, Question Answering, Reading Comprehension, Semantic Textual Similarity, Sentiment Analysis, Text Classification
Published 2019-06-19
URL https://arxiv.org/abs/1906.08237v2
PDF https://arxiv.org/pdf/1906.08237v2.pdf
PWC https://paperswithcode.com/paper/xlnet-generalized-autoregressive-pretraining
Repo https://github.com/graykode/xlnet-Pytorch
Framework pytorch

Automatic Posterior Transformation for Likelihood-Free Inference

Title Automatic Posterior Transformation for Likelihood-Free Inference
Authors David S. Greenberg, Marcel Nonnenmacher, Jakob H. Macke
Abstract How can one perform Bayesian inference on stochastic simulators with intractable likelihoods? A recent approach is to learn the posterior from adaptively proposed simulations using neural network-based conditional density estimators. However, existing methods are limited to a narrow range of proposal distributions or require importance weighting that can limit performance in practice. Here we present automatic posterior transformation (APT), a new sequential neural posterior estimation method for simulation-based inference. APT can modify the posterior estimate using arbitrary, dynamically updated proposals, and is compatible with powerful flow-based density estimators. It is more flexible, scalable and efficient than previous simulation-based inference techniques. APT can operate directly on high-dimensional time series and image data, opening up new applications for likelihood-free inference.
Tasks Bayesian Inference, Time Series
Published 2019-05-17
URL https://arxiv.org/abs/1905.07488v1
PDF https://arxiv.org/pdf/1905.07488v1.pdf
PWC https://paperswithcode.com/paper/automatic-posterior-transformation-for
Repo https://github.com/mackelab/sbi
Framework pytorch

Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control

Title Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control
Authors Yaofeng Desmond Zhong, Biswadip Dey, Amit Chakraborty
Abstract In this paper, we introduce Symplectic ODE-Net (SymODEN), a deep learning framework which can infer the dynamics of a physical system, given by an ordinary differential equation (ODE), from observed state trajectories. To achieve better generalization with fewer training samples, SymODEN incorporates appropriate inductive bias by designing the associated computation graph in a physics-informed manner. In particular, we enforce Hamiltonian dynamics with control to learn the underlying dynamics in a transparent way, which can then be leveraged to draw insight about relevant physical aspects of the system, such as mass and potential energy. In addition, we propose a parametrization which can enforce this Hamiltonian formalism even when the generalized coordinate data is embedded in a high-dimensional space or we can only access velocity data instead of generalized momentum. This framework, by offering interpretable, physically-consistent models for physical systems, opens up new possibilities for synthesizing model-based control strategies.
Tasks
Published 2019-09-26
URL https://arxiv.org/abs/1909.12077v3
PDF https://arxiv.org/pdf/1909.12077v3.pdf
PWC https://paperswithcode.com/paper/symplectic-ode-net-learning-hamiltonian
Repo https://github.com/d-biswa/Symplectic-ODENet
Framework pytorch

Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems

Title Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems
Authors Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, Jiyan Yang
Abstract Modern deep learning-based recommendation systems exploit hundreds to thousands of different categorical features, each with millions of different categories ranging from clicks to posts. To respect the natural diversity within the categorical data, embeddings map each category to a unique dense representation within an embedded space. Since each categorical feature could take on as many as tens of millions of different possible categories, the embedding tables form the primary memory bottleneck during both training and inference. We propose a novel approach for reducing the embedding size in an end-to-end fashion by exploiting complementary partitions of the category set to produce a unique embedding vector for each category without explicit definition. By storing multiple smaller embedding tables based on each complementary partition and combining embeddings from each table, we define a unique embedding for each category at smaller cost. This approach may be interpreted as using a specific fixed codebook to ensure uniqueness of each category’s representation. Our experimental results demonstrate the effectiveness of our approach over the hashing trick for reducing the size of the embedding tables in terms of model loss and accuracy, while retaining a similar reduction in the number of parameters.
Tasks Recommendation Systems
Published 2019-09-04
URL https://arxiv.org/abs/1909.02107v1
PDF https://arxiv.org/pdf/1909.02107v1.pdf
PWC https://paperswithcode.com/paper/compositional-embeddings-using-complementary
Repo https://github.com/facebookresearch/dlrm
Framework pytorch

Bayesian Temporal Factorization for Multidimensional Time Series Prediction

Title Bayesian Temporal Factorization for Multidimensional Time Series Prediction
Authors Lijun Sun, Xinyu Chen
Abstract Large-scale and multidimensional spatiotemporal data sets are becoming ubiquitous in many real-world applications such as monitoring urban traffic and air quality. Making predictions on these time series has become a critical challenge due to not only the large-scale and high-dimensional nature but also the considerable amount of missing data. In this paper, we propose a Bayesian temporal factorization (BTF) framework for modeling multidimensional time series—in particular spatiotemporal data—in the presence of missing values. By integrating low-rank matrix/tensor factorization and vector autoregressive (VAR) process into a single probabilistic graphical model, this framework can characterize both global and local consistencies in large-scale time series data. The graphical model allows us to effectively perform probabilistic predictions and produce uncertainty estimates without imputing those missing values. We develop efficient Gibbs sampling algorithms for model inference and test the proposed BTF framework on several real-world spatiotemporal data sets for both missing data imputation and short-term/long-term rolling prediction tasks. The numerical experiments demonstrate the superiority of the proposed BTF approaches over many state-of-the-art techniques.
Tasks Imputation, Time Series, Time Series Prediction
Published 2019-10-14
URL https://arxiv.org/abs/1910.06366v1
PDF https://arxiv.org/pdf/1910.06366v1.pdf
PWC https://paperswithcode.com/paper/bayesian-temporal-factorization-for
Repo https://github.com/xinychen/awesome-latex-drawing
Framework none

Scalable Realistic Recommendation Datasets through Fractal Expansions

Title Scalable Realistic Recommendation Datasets through Fractal Expansions
Authors Francois Belletti, Karthik Lakshmanan, Walid Krichene, Yi-Fan Chen, John Anderson
Abstract Recommender System research suffers currently from a disconnect between the size of academic data sets and the scale of industrial production systems. In order to bridge that gap we propose to generate more massive user/item interaction data sets by expanding pre-existing public data sets. User/item incidence matrices record interactions between users and items on a given platform as a large sparse matrix whose rows correspond to users and whose columns correspond to items. Our technique expands such matrices to larger numbers of rows (users), columns (items) and non zero values (interactions) while preserving key higher order statistical properties. We adapt the Kronecker Graph Theory to user/item incidence matrices and show that the corresponding fractal expansions preserve the fat-tailed distributions of user engagements, item popularity and singular value spectra of user/item interaction matrices. Preserving such properties is key to building large realistic synthetic data sets which in turn can be employed reliably to benchmark Recommender Systems and the systems employed to train them. We provide algorithms to produce such expansions and apply them to the MovieLens 20 million data set comprising 20 million ratings of 27K movies by 138K users. The resulting expanded data set has 10 billion ratings, 864K items and 2 million users in its smaller version and can be scaled up or down. A larger version features 655 billion ratings, 7 million items and 17 million users.
Tasks Recommendation Systems
Published 2019-01-23
URL http://arxiv.org/abs/1901.08910v3
PDF http://arxiv.org/pdf/1901.08910v3.pdf
PWC https://paperswithcode.com/paper/scalable-realistic-recommendation-datasets
Repo https://github.com/mlperf/training/tree/master/data_generation
Framework pytorch

Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition

Title Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition
Authors Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, Qi Tian
Abstract Action recognition with skeleton data has recently attracted much attention in computer vision. Previous studies are mostly based on fixed skeleton graphs, only capturing local physical dependencies among joints, which may miss implicit joint correlations. To capture richer dependencies, we introduce an encoder-decoder structure, called A-link inference module, to capture action-specific latent dependencies, i.e. actional links, directly from actions. We also extend the existing skeleton graphs to represent higher-order dependencies, i.e. structural links. Combing the two types of links into a generalized skeleton graph, we further propose the actional-structural graph convolution network (AS-GCN), which stacks actional-structural graph convolution and temporal convolution as a basic building block, to learn both spatial and temporal features for action recognition. A future pose prediction head is added in parallel to the recognition head to help capture more detailed action patterns through self-supervision. We validate AS-GCN in action recognition using two skeleton data sets, NTU-RGB+D and Kinetics. The proposed AS-GCN achieves consistently large improvement compared to the state-of-the-art methods. As a side product, AS-GCN also shows promising results for future pose prediction.
Tasks Pose Prediction, Skeleton Based Action Recognition, Temporal Action Localization
Published 2019-04-26
URL http://arxiv.org/abs/1904.12659v1
PDF http://arxiv.org/pdf/1904.12659v1.pdf
PWC https://paperswithcode.com/paper/actional-structural-graph-convolutional
Repo https://github.com/limaosen0/AS-GCN
Framework pytorch

Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding

Title Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding
Authors Mathew Monfort, Kandan Ramakrishnan, Alex Andonian, Barry A McNamara, Alex Lascelles, Bowen Pan, Quanfu Fan, Dan Gutfreund, Rogerio Feris, Aude Oliva
Abstract An event happening in the world is often made of different activities and actions that can unfold simultaneously or sequentially within a few seconds. However, most large-scale datasets built to train models for action recognition provide a single label per video clip. Consequently, models can be incorrectly penalized for classifying actions that exist in the videos but are not explicitly labeled and do not learn the full spectrum of information that would be mandatory to more completely comprehend different events and eventually learn causality between them. Towards this goal, we augmented the existing video dataset, Moments in Time (MiT), to include over two million action labels for over one million three second videos. This multi-label dataset introduces novel challenges on how to train and analyze models for multi-action detection. Here, we present baseline results for multi-action recognition using loss functions adapted for long tail multi-label learning and provide improved methods for visualizing and interpreting models trained for multi-label action detection.
Tasks Action Detection, Multi-Label Learning, Video Understanding
Published 2019-11-01
URL https://arxiv.org/abs/1911.00232v3
PDF https://arxiv.org/pdf/1911.00232v3.pdf
PWC https://paperswithcode.com/paper/multi-moments-in-time-learning-and
Repo https://github.com/zhoubolei/moments_models
Framework pytorch

r/Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection

Title r/Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection
Authors Kai Nakamura, Sharon Levy, William Yang Wang
Abstract Fake news has altered society in negative ways in politics and culture. It has adversely affected both online social network systems as well as offline communities and conversations. Using automatic machine learning classification models is an efficient way to combat the widespread dissemination of fake news. However, a lack of effective, comprehensive datasets has been a problem for fake news research and detection model development. Prior fake news datasets do not provide multimodal text and image data, metadata, comment data, and fine-grained fake news categorization at the scale and breadth of our dataset. We present Fakeddit, a novel multimodal dataset consisting of over 1 million samples from multiple categories of fake news. After being processed through several stages of review, the samples are labeled according to 2-way, 3-way, and 6-way classification categories through distant supervision. We construct hybrid text+image models and perform extensive experiments for multiple variations of classification, demonstrating the importance of the novel aspect of multimodality and fine-grained classification unique to Fakeddit.
Tasks Fake News Detection
Published 2019-11-10
URL https://arxiv.org/abs/1911.03854v2
PDF https://arxiv.org/pdf/1911.03854v2.pdf
PWC https://paperswithcode.com/paper/rfakeddit-a-new-multimodal-benchmark-dataset
Repo https://github.com/entitize/fakeddit
Framework none

Context-Aware Visual Policy Network for Fine-Grained Image Captioning

Title Context-Aware Visual Policy Network for Fine-Grained Image Captioning
Authors Zheng-Jun Zha, Daqing Liu, Hanwang Zhang, Yongdong Zhang, Feng Wu
Abstract With the maturity of visual detection techniques, we are more ambitious in describing visual content with open-vocabulary, fine-grained and free-form language, i.e., the task of image captioning. In particular, we are interested in generating longer, richer and more fine-grained sentences and paragraphs as image descriptions. Image captioning can be translated to the task of sequential language prediction given visual content, where the output sequence forms natural language description with plausible grammar. However, existing image captioning methods focus only on language policy while not visual policy, and thus fail to capture visual context that are crucial for compositional reasoning such as object relationships (e.g., “man riding horse”) and visual comparisons (e.g., “small(er) cat”). This issue is especially severe when generating longer sequences such as a paragraph. To fill the gap, we propose a Context-Aware Visual Policy network (CAVP) for fine-grained image-to-language generation: image sentence captioning and image paragraph captioning. During captioning, CAVP explicitly considers the previous visual attentions as context, and decides whether the context is used for the current word/sentence generation given the current visual attention. Compared against traditional visual attention mechanism that only fixes a single visual region at each step, CAVP can attend to complex visual compositions over time. The whole image captioning model – CAVP and its subsequent language policy network – can be efficiently optimized end-to-end by using an actor-critic policy gradient method. We have demonstrated the effectiveness of CAVP by state-of-the-art performances on MS-COCO and Stanford captioning datasets, using various metrics and sensible visualizations of qualitative visual context.
Tasks Image Captioning, Image Paragraph Captioning, Text Generation
Published 2019-06-06
URL https://arxiv.org/abs/1906.02365v1
PDF https://arxiv.org/pdf/1906.02365v1.pdf
PWC https://paperswithcode.com/paper/context-aware-visual-policy-network-for-fine
Repo https://github.com/daqingliu/CAVP
Framework pytorch

Towards Accurate One-Stage Object Detection with AP-Loss

Title Towards Accurate One-Stage Object Detection with AP-Loss
Authors Kean Chen, Jianguo Li, Weiyao Lin, John See, Ji Wang, Lingyu Duan, Zhibo Chen, Changwei He, Junni Zou
Abstract One-stage object detectors are trained by optimizing classification-loss and localization-loss simultaneously, with the former suffering much from extreme foreground-background class imbalance issue due to the large number of anchors. This paper alleviates this issue by proposing a novel framework to replace the classification task in one-stage detectors with a ranking task, and adopting the Average-Precision loss (AP-loss) for the ranking problem. Due to its non-differentiability and non-convexity, the AP-loss cannot be optimized directly. For this purpose, we develop a novel optimization algorithm, which seamlessly combines the error-driven update scheme in perceptron learning and backpropagation algorithm in deep networks. We verify good convergence property of the proposed algorithm theoretically and empirically. Experimental results demonstrate notable performance improvement in state-of-the-art one-stage detectors based on AP-loss over different kinds of classification-losses on various benchmarks, without changing the network architectures. Code is available at https://github.com/cccorn/AP-loss.
Tasks Object Detection
Published 2019-04-12
URL https://arxiv.org/abs/1904.06373v3
PDF https://arxiv.org/pdf/1904.06373v3.pdf
PWC https://paperswithcode.com/paper/towards-accurate-one-stage-object-detection
Repo https://github.com/cccorn/AP-loss
Framework pytorch

Forgetting to learn logic programs

Title Forgetting to learn logic programs
Authors Andrew Cropper
Abstract Most program induction approaches require predefined, often hand-engineered, background knowledge (BK). To overcome this limitation, we explore methods to automatically acquire BK through multi-task learning. In this approach, a learner adds learned programs to its BK so that they can be reused to help learn other programs. To improve learning performance, we explore the idea of forgetting, where a learner can additionally remove programs from its BK. We consider forgetting in an inductive logic programming (ILP) setting. We show that forgetting can significantly reduce both the size of the hypothesis space and the sample complexity of an ILP learner. We introduce Forgetgol, a multi-task ILP learner which supports forgetting. We experimentally compare Forgetgol against approaches that either remember or forget everything. Our experimental results show that Forgetgol outperforms the alternative approaches when learning from over 10,000 tasks.
Tasks Multi-Task Learning
Published 2019-11-15
URL https://arxiv.org/abs/1911.06643v1
PDF https://arxiv.org/pdf/1911.06643v1.pdf
PWC https://paperswithcode.com/paper/forgetting-to-learn-logic-programs
Repo https://github.com/metagol/metagol
Framework none

Root Mean Square Layer Normalization

Title Root Mean Square Layer Normalization
Authors Biao Zhang, Rico Sennrich
Abstract Layer normalization (LayerNorm) has been successfully applied to various deep neural networks to help stabilize training and boost model convergence because of its capability in handling re-centering and re-scaling of both inputs and weight matrix. However, the computational overhead introduced by LayerNorm makes these improvements expensive and significantly slows the underlying network, e.g. RNN in particular. In this paper, we hypothesize that re-centering invariance in LayerNorm is dispensable and propose root mean square layer normalization, or RMSNorm. RMSNorm regularizes the summed inputs to a neuron in one layer according to root mean square (RMS), giving the model re-scaling invariance property and implicit learning rate adaptation ability. RMSNorm is computationally simpler and thus more efficient than LayerNorm. We also present partial RMSNorm, or pRMSNorm where the RMS is estimated from p% of the summed inputs without breaking the above properties. Extensive experiments on several tasks using diverse network architectures show that RMSNorm achieves comparable performance against LayerNorm but reduces the running time by 7%~64% on different models. Source code is available at https://github.com/bzhangGo/rmsnorm.
Tasks
Published 2019-10-16
URL https://arxiv.org/abs/1910.07467v1
PDF https://arxiv.org/pdf/1910.07467v1.pdf
PWC https://paperswithcode.com/paper/root-mean-square-layer-normalization
Repo https://github.com/bzhangGo/rmsnorm
Framework pytorch

Evaluating Generalization Ability of Convolutional Neural Networks and Capsule Networks for Image Classification via Top-2 Classification

Title Evaluating Generalization Ability of Convolutional Neural Networks and Capsule Networks for Image Classification via Top-2 Classification
Authors Hao Ren, Jianlin Su, Hong Lu
Abstract Image classification is a challenging problem which aims to identify the category of object in the image. In recent years, deep Convolutional Neural Networks (CNNs) have been applied to handle this task, and impressive improvement has been achieved. However, some research showed the output of CNNs can be easily altered by adding relatively small perturbations to the input image, such as modifying few pixels. Recently, Capsule Networks (CapsNets) are proposed, which can help eliminating this limitation. Experiments on MNIST dataset revealed that capsules can better characterize the features of object than CNNs. But it’s hard to find a suitable quantitative method to compare the generalization ability of CNNs and CapsNets. In this paper, we propose a new image classification task called Top-2 classification to evaluate the generalization ability of CNNs and CapsNets. The models are trained on single label image samples same as the traditional image classification task. But in the test stage, we randomly concatenate two test image samples which contain different labels, and then use the trained models to predict the top-2 labels on the unseen newly-created two label image samples. This task can provide us precise quantitative results to compare the generalization ability of CNNs and CapsNets. Back to the CapsNet, because it uses Full Connectivity (FC) mechanism among all capsules, it requires many parameters. To reduce the number of parameters, we introduce the Parameter-Sharing (PS) mechanism between capsules. Experiments on five widely used benchmark image datasets demonstrate the method significantly reduces the number of parameters, without losing the effectiveness of extracting features. Further, on the Top-2 classification task, the proposed PS CapsNets obtain impressive higher accuracy compared to the traditional CNNs and FC CapsNets by a large margin.
Tasks Image Classification
Published 2019-01-29
URL http://arxiv.org/abs/1901.10112v2
PDF http://arxiv.org/pdf/1901.10112v2.pdf
PWC https://paperswithcode.com/paper/evaluating-generalization-ability-of
Repo https://github.com/leftthomas/PSCapsNet
Framework pytorch
comments powered by Disqus