February 2, 2020

3294 words 16 mins read

Paper Group AWR 44

Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm. XLNet: Generalized Autoregressive Pretraining for Language Understanding. Automatic Posterior Transformation for Likelihood-Free Inference. Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control. Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommenda …

Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm


Title	Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm
Authors	Andrew Redd, Kaung Khin, Aldo Marini
Abstract	Due to their prevalence, time series forecasting is crucial in multiple domains. We seek to make state-of-the-art forecasting fast, accessible, and generalizable. ES-RNN is a hybrid between classical state space forecasting models and modern RNNs that achieved a 9.4% sMAPE improvement in the M4 competition. Crucially, ES-RNN implementation requires per-time series parameters. By vectorizing the original implementation and porting the algorithm to a GPU, we achieve up to 322x training speedup depending on batch size with similar results as those reported in the original submission. Our code can be found at: https://github.com/damitkwr/ESRNN-GPU
Tasks	Time Series, Time Series Forecasting
Published	2019-07-07
URL	https://arxiv.org/abs/1907.03329v1
PDF	https://arxiv.org/pdf/1907.03329v1.pdf
PWC	https://paperswithcode.com/paper/fast-es-rnn-a-gpu-implementation-of-the-es
Repo	https://github.com/damitkwr/ESRNN-GPU
Framework	pytorch

XLNet: Generalized Autoregressive Pretraining for Language Understanding


Title	XLNet: Generalized Autoregressive Pretraining for Language Understanding
Authors	Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le
Abstract	With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, under comparable experiment settings, XLNet outperforms BERT on 20 tasks, often by a large margin, including question answering, natural language inference, sentiment analysis, and document ranking.
Tasks	Document Ranking, Language Modelling, Natural Language Inference, Question Answering, Reading Comprehension, Semantic Textual Similarity, Sentiment Analysis, Text Classification
Published	2019-06-19
URL	https://arxiv.org/abs/1906.08237v2
PDF	https://arxiv.org/pdf/1906.08237v2.pdf
PWC	https://paperswithcode.com/paper/xlnet-generalized-autoregressive-pretraining
Repo	https://github.com/graykode/xlnet-Pytorch
Framework	pytorch

Automatic Posterior Transformation for Likelihood-Free Inference


Title	Automatic Posterior Transformation for Likelihood-Free Inference
Authors	David S. Greenberg, Marcel Nonnenmacher, Jakob H. Macke
Abstract	How can one perform Bayesian inference on stochastic simulators with intractable likelihoods? A recent approach is to learn the posterior from adaptively proposed simulations using neural network-based conditional density estimators. However, existing methods are limited to a narrow range of proposal distributions or require importance weighting that can limit performance in practice. Here we present automatic posterior transformation (APT), a new sequential neural posterior estimation method for simulation-based inference. APT can modify the posterior estimate using arbitrary, dynamically updated proposals, and is compatible with powerful flow-based density estimators. It is more flexible, scalable and efficient than previous simulation-based inference techniques. APT can operate directly on high-dimensional time series and image data, opening up new applications for likelihood-free inference.
Tasks	Bayesian Inference, Time Series
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07488v1
PDF	https://arxiv.org/pdf/1905.07488v1.pdf
PWC	https://paperswithcode.com/paper/automatic-posterior-transformation-for
Repo	https://github.com/mackelab/sbi
Framework	pytorch

Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control


Title	Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control
Authors	Yaofeng Desmond Zhong, Biswadip Dey, Amit Chakraborty
Abstract	In this paper, we introduce Symplectic ODE-Net (SymODEN), a deep learning framework which can infer the dynamics of a physical system, given by an ordinary differential equation (ODE), from observed state trajectories. To achieve better generalization with fewer training samples, SymODEN incorporates appropriate inductive bias by designing the associated computation graph in a physics-informed manner. In particular, we enforce Hamiltonian dynamics with control to learn the underlying dynamics in a transparent way, which can then be leveraged to draw insight about relevant physical aspects of the system, such as mass and potential energy. In addition, we propose a parametrization which can enforce this Hamiltonian formalism even when the generalized coordinate data is embedded in a high-dimensional space or we can only access velocity data instead of generalized momentum. This framework, by offering interpretable, physically-consistent models for physical systems, opens up new possibilities for synthesizing model-based control strategies.
Tasks
Published	2019-09-26
URL	https://arxiv.org/abs/1909.12077v3
PDF	https://arxiv.org/pdf/1909.12077v3.pdf
PWC	https://paperswithcode.com/paper/symplectic-ode-net-learning-hamiltonian
Repo	https://github.com/d-biswa/Symplectic-ODENet
Framework	pytorch

Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems


Title	Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems
Authors	Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, Jiyan Yang
Abstract	Modern deep learning-based recommendation systems exploit hundreds to thousands of different categorical features, each with millions of different categories ranging from clicks to posts. To respect the natural diversity within the categorical data, embeddings map each category to a unique dense representation within an embedded space. Since each categorical feature could take on as many as tens of millions of different possible categories, the embedding tables form the primary memory bottleneck during both training and inference. We propose a novel approach for reducing the embedding size in an end-to-end fashion by exploiting complementary partitions of the category set to produce a unique embedding vector for each category without explicit definition. By storing multiple smaller embedding tables based on each complementary partition and combining embeddings from each table, we define a unique embedding for each category at smaller cost. This approach may be interpreted as using a specific fixed codebook to ensure uniqueness of each category’s representation. Our experimental results demonstrate the effectiveness of our approach over the hashing trick for reducing the size of the embedding tables in terms of model loss and accuracy, while retaining a similar reduction in the number of parameters.
Tasks	Recommendation Systems
Published	2019-09-04
URL	https://arxiv.org/abs/1909.02107v1
PDF	https://arxiv.org/pdf/1909.02107v1.pdf
PWC	https://paperswithcode.com/paper/compositional-embeddings-using-complementary
Repo	https://github.com/facebookresearch/dlrm
Framework	pytorch

Bayesian Temporal Factorization for Multidimensional Time Series Prediction


Title	Bayesian Temporal Factorization for Multidimensional Time Series Prediction
Authors	Lijun Sun, Xinyu Chen
Abstract	Large-scale and multidimensional spatiotemporal data sets are becoming ubiquitous in many real-world applications such as monitoring urban traffic and air quality. Making predictions on these time series has become a critical challenge due to not only the large-scale and high-dimensional nature but also the considerable amount of missing data. In this paper, we propose a Bayesian temporal factorization (BTF) framework for modeling multidimensional time series—in particular spatiotemporal data—in the presence of missing values. By integrating low-rank matrix/tensor factorization and vector autoregressive (VAR) process into a single probabilistic graphical model, this framework can characterize both global and local consistencies in large-scale time series data. The graphical model allows us to effectively perform probabilistic predictions and produce uncertainty estimates without imputing those missing values. We develop efficient Gibbs sampling algorithms for model inference and test the proposed BTF framework on several real-world spatiotemporal data sets for both missing data imputation and short-term/long-term rolling prediction tasks. The numerical experiments demonstrate the superiority of the proposed BTF approaches over many state-of-the-art techniques.
Tasks	Imputation, Time Series, Time Series Prediction
Published	2019-10-14
URL	https://arxiv.org/abs/1910.06366v1
PDF	https://arxiv.org/pdf/1910.06366v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-temporal-factorization-for
Repo	https://github.com/xinychen/awesome-latex-drawing
Framework	none

Scalable Realistic Recommendation Datasets through Fractal Expansions


Title	Scalable Realistic Recommendation Datasets through Fractal Expansions
Authors	Francois Belletti, Karthik Lakshmanan, Walid Krichene, Yi-Fan Chen, John Anderson
Abstract	Recommender System research suffers currently from a disconnect between the size of academic data sets and the scale of industrial production systems. In order to bridge that gap we propose to generate more massive user/item interaction data sets by expanding pre-existing public data sets. User/item incidence matrices record interactions between users and items on a given platform as a large sparse matrix whose rows correspond to users and whose columns correspond to items. Our technique expands such matrices to larger numbers of rows (users), columns (items) and non zero values (interactions) while preserving key higher order statistical properties. We adapt the Kronecker Graph Theory to user/item incidence matrices and show that the corresponding fractal expansions preserve the fat-tailed distributions of user engagements, item popularity and singular value spectra of user/item interaction matrices. Preserving such properties is key to building large realistic synthetic data sets which in turn can be employed reliably to benchmark Recommender Systems and the systems employed to train them. We provide algorithms to produce such expansions and apply them to the MovieLens 20 million data set comprising 20 million ratings of 27K movies by 138K users. The resulting expanded data set has 10 billion ratings, 864K items and 2 million users in its smaller version and can be scaled up or down. A larger version features 655 billion ratings, 7 million items and 17 million users.
Tasks	Recommendation Systems
Published	2019-01-23
URL	http://arxiv.org/abs/1901.08910v3
PDF	http://arxiv.org/pdf/1901.08910v3.pdf
PWC	https://paperswithcode.com/paper/scalable-realistic-recommendation-datasets
Repo	https://github.com/mlperf/training/tree/master/data_generation
Framework	pytorch

Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition


Title	Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition
Authors	Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, Qi Tian
Abstract	Action recognition with skeleton data has recently attracted much attention in computer vision. Previous studies are mostly based on fixed skeleton graphs, only capturing local physical dependencies among joints, which may miss implicit joint correlations. To capture richer dependencies, we introduce an encoder-decoder structure, called A-link inference module, to capture action-specific latent dependencies, i.e. actional links, directly from actions. We also extend the existing skeleton graphs to represent higher-order dependencies, i.e. structural links. Combing the two types of links into a generalized skeleton graph, we further propose the actional-structural graph convolution network (AS-GCN), which stacks actional-structural graph convolution and temporal convolution as a basic building block, to learn both spatial and temporal features for action recognition. A future pose prediction head is added in parallel to the recognition head to help capture more detailed action patterns through self-supervision. We validate AS-GCN in action recognition using two skeleton data sets, NTU-RGB+D and Kinetics. The proposed AS-GCN achieves consistently large improvement compared to the state-of-the-art methods. As a side product, AS-GCN also shows promising results for future pose prediction.
Tasks	Pose Prediction, Skeleton Based Action Recognition, Temporal Action Localization
Published	2019-04-26
URL	http://arxiv.org/abs/1904.12659v1
PDF	http://arxiv.org/pdf/1904.12659v1.pdf
PWC	https://paperswithcode.com/paper/actional-structural-graph-convolutional
Repo	https://github.com/limaosen0/AS-GCN
Framework	pytorch

Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding


Title	Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding
Authors	Mathew Monfort, Kandan Ramakrishnan, Alex Andonian, Barry A McNamara, Alex Lascelles, Bowen Pan, Quanfu Fan, Dan Gutfreund, Rogerio Feris, Aude Oliva
Abstract	An event happening in the world is often made of different activities and actions that can unfold simultaneously or sequentially within a few seconds. However, most large-scale datasets built to train models for action recognition provide a single label per video clip. Consequently, models can be incorrectly penalized for classifying actions that exist in the videos but are not explicitly labeled and do not learn the full spectrum of information that would be mandatory to more completely comprehend different events and eventually learn causality between them. Towards this goal, we augmented the existing video dataset, Moments in Time (MiT), to include over two million action labels for over one million three second videos. This multi-label dataset introduces novel challenges on how to train and analyze models for multi-action detection. Here, we present baseline results for multi-action recognition using loss functions adapted for long tail multi-label learning and provide improved methods for visualizing and interpreting models trained for multi-label action detection.
Tasks	Action Detection, Multi-Label Learning, Video Understanding
Published	2019-11-01
URL	https://arxiv.org/abs/1911.00232v3
PDF	https://arxiv.org/pdf/1911.00232v3.pdf
PWC	https://paperswithcode.com/paper/multi-moments-in-time-learning-and
Repo	https://github.com/zhoubolei/moments_models
Framework	pytorch

r/Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection


Title	r/Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection
Authors	Kai Nakamura, Sharon Levy, William Yang Wang
Abstract	Fake news has altered society in negative ways in politics and culture. It has adversely affected both online social network systems as well as offline communities and conversations. Using automatic machine learning classification models is an efficient way to combat the widespread dissemination of fake news. However, a lack of effective, comprehensive datasets has been a problem for fake news research and detection model development. Prior fake news datasets do not provide multimodal text and image data, metadata, comment data, and fine-grained fake news categorization at the scale and breadth of our dataset. We present Fakeddit, a novel multimodal dataset consisting of over 1 million samples from multiple categories of fake news. After being processed through several stages of review, the samples are labeled according to 2-way, 3-way, and 6-way classification categories through distant supervision. We construct hybrid text+image models and perform extensive experiments for multiple variations of classification, demonstrating the importance of the novel aspect of multimodality and fine-grained classification unique to Fakeddit.
Tasks	Fake News Detection
Published	2019-11-10
URL	https://arxiv.org/abs/1911.03854v2
PDF	https://arxiv.org/pdf/1911.03854v2.pdf
PWC	https://paperswithcode.com/paper/rfakeddit-a-new-multimodal-benchmark-dataset
Repo	https://github.com/entitize/fakeddit
Framework	none

Context-Aware Visual Policy Network for Fine-Grained Image Captioning


Title	Context-Aware Visual Policy Network for Fine-Grained Image Captioning
Authors	Zheng-Jun Zha, Daqing Liu, Hanwang Zhang, Yongdong Zhang, Feng Wu
Abstract	With the maturity of visual detection techniques, we are more ambitious in describing visual content with open-vocabulary, fine-grained and free-form language, i.e., the task of image captioning. In particular, we are interested in generating longer, richer and more fine-grained sentences and paragraphs as image descriptions. Image captioning can be translated to the task of sequential language prediction given visual content, where the output sequence forms natural language description with plausible grammar. However, existing image captioning methods focus only on language policy while not visual policy, and thus fail to capture visual context that are crucial for compositional reasoning such as object relationships (e.g., “man riding horse”) and visual comparisons (e.g., “small(er) cat”). This issue is especially severe when generating longer sequences such as a paragraph. To fill the gap, we propose a Context-Aware Visual Policy network (CAVP) for fine-grained image-to-language generation: image sentence captioning and image paragraph captioning. During captioning, CAVP explicitly considers the previous visual attentions as context, and decides whether the context is used for the current word/sentence generation given the current visual attention. Compared against traditional visual attention mechanism that only fixes a single visual region at each step, CAVP can attend to complex visual compositions over time. The whole image captioning model – CAVP and its subsequent language policy network – can be efficiently optimized end-to-end by using an actor-critic policy gradient method. We have demonstrated the effectiveness of CAVP by state-of-the-art performances on MS-COCO and Stanford captioning datasets, using various metrics and sensible visualizations of qualitative visual context.
Tasks	Image Captioning, Image Paragraph Captioning, Text Generation
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02365v1
PDF	https://arxiv.org/pdf/1906.02365v1.pdf
PWC	https://paperswithcode.com/paper/context-aware-visual-policy-network-for-fine
Repo	https://github.com/daqingliu/CAVP
Framework	pytorch

Towards Accurate One-Stage Object Detection with AP-Loss


Title	Towards Accurate One-Stage Object Detection with AP-Loss
Authors	Kean Chen, Jianguo Li, Weiyao Lin, John See, Ji Wang, Lingyu Duan, Zhibo Chen, Changwei He, Junni Zou
Abstract	One-stage object detectors are trained by optimizing classification-loss and localization-loss simultaneously, with the former suffering much from extreme foreground-background class imbalance issue due to the large number of anchors. This paper alleviates this issue by proposing a novel framework to replace the classification task in one-stage detectors with a ranking task, and adopting the Average-Precision loss (AP-loss) for the ranking problem. Due to its non-differentiability and non-convexity, the AP-loss cannot be optimized directly. For this purpose, we develop a novel optimization algorithm, which seamlessly combines the error-driven update scheme in perceptron learning and backpropagation algorithm in deep networks. We verify good convergence property of the proposed algorithm theoretically and empirically. Experimental results demonstrate notable performance improvement in state-of-the-art one-stage detectors based on AP-loss over different kinds of classification-losses on various benchmarks, without changing the network architectures. Code is available at https://github.com/cccorn/AP-loss.
Tasks	Object Detection
Published	2019-04-12
URL	https://arxiv.org/abs/1904.06373v3
PDF	https://arxiv.org/pdf/1904.06373v3.pdf
PWC	https://paperswithcode.com/paper/towards-accurate-one-stage-object-detection
Repo	https://github.com/cccorn/AP-loss
Framework	pytorch

Forgetting to learn logic programs


Title	Forgetting to learn logic programs
Authors	Andrew Cropper
Abstract	Most program induction approaches require predefined, often hand-engineered, background knowledge (BK). To overcome this limitation, we explore methods to automatically acquire BK through multi-task learning. In this approach, a learner adds learned programs to its BK so that they can be reused to help learn other programs. To improve learning performance, we explore the idea of forgetting, where a learner can additionally remove programs from its BK. We consider forgetting in an inductive logic programming (ILP) setting. We show that forgetting can significantly reduce both the size of the hypothesis space and the sample complexity of an ILP learner. We introduce Forgetgol, a multi-task ILP learner which supports forgetting. We experimentally compare Forgetgol against approaches that either remember or forget everything. Our experimental results show that Forgetgol outperforms the alternative approaches when learning from over 10,000 tasks.
Tasks	Multi-Task Learning
Published	2019-11-15
URL	https://arxiv.org/abs/1911.06643v1
PDF	https://arxiv.org/pdf/1911.06643v1.pdf
PWC	https://paperswithcode.com/paper/forgetting-to-learn-logic-programs
Repo	https://github.com/metagol/metagol
Framework	none

Root Mean Square Layer Normalization


Title	Root Mean Square Layer Normalization
Authors	Biao Zhang, Rico Sennrich
Abstract	Layer normalization (LayerNorm) has been successfully applied to various deep neural networks to help stabilize training and boost model convergence because of its capability in handling re-centering and re-scaling of both inputs and weight matrix. However, the computational overhead introduced by LayerNorm makes these improvements expensive and significantly slows the underlying network, e.g. RNN in particular. In this paper, we hypothesize that re-centering invariance in LayerNorm is dispensable and propose root mean square layer normalization, or RMSNorm. RMSNorm regularizes the summed inputs to a neuron in one layer according to root mean square (RMS), giving the model re-scaling invariance property and implicit learning rate adaptation ability. RMSNorm is computationally simpler and thus more efficient than LayerNorm. We also present partial RMSNorm, or pRMSNorm where the RMS is estimated from p% of the summed inputs without breaking the above properties. Extensive experiments on several tasks using diverse network architectures show that RMSNorm achieves comparable performance against LayerNorm but reduces the running time by 7%~64% on different models. Source code is available at https://github.com/bzhangGo/rmsnorm.
Tasks
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07467v1
PDF	https://arxiv.org/pdf/1910.07467v1.pdf
PWC	https://paperswithcode.com/paper/root-mean-square-layer-normalization
Repo	https://github.com/bzhangGo/rmsnorm
Framework	pytorch

Evaluating Generalization Ability of Convolutional Neural Networks and Capsule Networks for Image Classification via Top-2 Classification


Title	Evaluating Generalization Ability of Convolutional Neural Networks and Capsule Networks for Image Classification via Top-2 Classification
Authors	Hao Ren, Jianlin Su, Hong Lu
Abstract	Image classification is a challenging problem which aims to identify the category of object in the image. In recent years, deep Convolutional Neural Networks (CNNs) have been applied to handle this task, and impressive improvement has been achieved. However, some research showed the output of CNNs can be easily altered by adding relatively small perturbations to the input image, such as modifying few pixels. Recently, Capsule Networks (CapsNets) are proposed, which can help eliminating this limitation. Experiments on MNIST dataset revealed that capsules can better characterize the features of object than CNNs. But it’s hard to find a suitable quantitative method to compare the generalization ability of CNNs and CapsNets. In this paper, we propose a new image classification task called Top-2 classification to evaluate the generalization ability of CNNs and CapsNets. The models are trained on single label image samples same as the traditional image classification task. But in the test stage, we randomly concatenate two test image samples which contain different labels, and then use the trained models to predict the top-2 labels on the unseen newly-created two label image samples. This task can provide us precise quantitative results to compare the generalization ability of CNNs and CapsNets. Back to the CapsNet, because it uses Full Connectivity (FC) mechanism among all capsules, it requires many parameters. To reduce the number of parameters, we introduce the Parameter-Sharing (PS) mechanism between capsules. Experiments on five widely used benchmark image datasets demonstrate the method significantly reduces the number of parameters, without losing the effectiveness of extracting features. Further, on the Top-2 classification task, the proposed PS CapsNets obtain impressive higher accuracy compared to the traditional CNNs and FC CapsNets by a large margin.
Tasks	Image Classification
Published	2019-01-29
URL	http://arxiv.org/abs/1901.10112v2
PDF	http://arxiv.org/pdf/1901.10112v2.pdf
PWC	https://paperswithcode.com/paper/evaluating-generalization-ability-of
Repo	https://github.com/leftthomas/PSCapsNet
Framework	pytorch