February 1, 2020

3198 words 16 mins read

Paper Group AWR 112

On the adequacy of untuned warmup for adaptive optimization. Thompson Sampling for a Fatigue-aware Online Recommendation System. GraphSW: a training protocol based on stage-wise training for GNN-based Recommender Model. Neural Canonical Transformation with Symplectic Flows. Warped Input Gaussian Processes for Time Series Forecasting. GSLAM: A Gener …

On the adequacy of untuned warmup for adaptive optimization


Title	On the adequacy of untuned warmup for adaptive optimization
Authors	Jerry Ma, Denis Yarats
Abstract	Adaptive optimization algorithms such as Adam (Kingma & Ba, 2014) are widely used in deep learning. The stability of such algorithms is often improved with a warmup schedule for the learning rate. Motivated by the difficulty of choosing and tuning warmup schedules, Liu et al. (2019) propose automatic variance rectification of Adam’s adaptive learning rate, claiming that this rectified approach (“RAdam”) surpasses the vanilla Adam algorithm and reduces the need for expensive tuning of Adam with warmup. In this work, we point out various shortcomings of this analysis. We then provide an alternative explanation for the necessity of warmup based on the magnitude of the update term, which is of greater relevance to training stability. Finally, we provide some “rule-of-thumb” warmup schedules, and we demonstrate that simple untuned warmup of Adam performs more-or-less identically to RAdam in typical practical settings. We conclude by suggesting that practitioners stick to linear warmup with Adam, with a sensible default being linear warmup over $2 / (1 - \beta_2)$ training iterations.
Tasks	Image Classification, Language Modelling, Machine Translation
Published	2019-10-09
URL	https://arxiv.org/abs/1910.04209v1
PDF	https://arxiv.org/pdf/1910.04209v1.pdf
PWC	https://paperswithcode.com/paper/on-the-adequacy-of-untuned-warmup-for
Repo	https://github.com/Tony-Y/pytorch_warmup
Framework	pytorch

Thompson Sampling for a Fatigue-aware Online Recommendation System


Title	Thompson Sampling for a Fatigue-aware Online Recommendation System
Authors	Yunjuan Wang, Theja Tulabandhula
Abstract	In this paper we consider an online recommendation setting, where a platform recommends a sequence of items to its users at every time period. The users respond by selecting one of the items recommended or abandon the platform due to fatigue from seeing less useful items. Assuming a parametric stochastic model of user behavior, which captures positional effects of these items as well as the abandoning behavior of users, the platform’s goal is to recommend sequences of items that are competitive to the single best sequence of items in hindsight, without knowing the true user model a priori. Naively applying a stochastic bandit algorithm in this setting leads to an exponential dependence on the number of items. We propose a new Thompson sampling based algorithm with expected regret that is polynomial in the number of items in this combinatorial setting, and performs extremely well in practice.
Tasks
Published	2019-01-23
URL	http://arxiv.org/abs/1901.07734v2
PDF	http://arxiv.org/pdf/1901.07734v2.pdf
PWC	https://paperswithcode.com/paper/thompson-sampling-for-a-fatigue-aware-online
Repo	https://github.com/bettyttytty/Thompson-Sampling-for-a-Fatigue-aware-Online-Recommendation-System
Framework	none

GraphSW: a training protocol based on stage-wise training for GNN-based Recommender Model


Title	GraphSW: a training protocol based on stage-wise training for GNN-based Recommender Model
Authors	Chang-You Tai, Meng-Ru Wu, Yun-Wei Chu, Shao-Yu Chu
Abstract	Recently, researchers utilize Knowledge Graph (KG) as side information in recommendation system to address cold start and sparsity issue and improve the recommendation performance. Existing KG-aware recommendation model use the feature of neighboring entities and structural information to update the embedding of currently located entity. Although the fruitful information is beneficial to the following task, the cost of exploring the entire graph is massive and impractical. In order to reduce the computational cost and maintain the pattern of extracting features, KG-aware recommendation model usually utilize fixed-size and random set of neighbors rather than complete information in KG. Nonetheless, there are two critical issues in these approaches: First of all, fixed-size and randomly selected neighbors restrict the view of graph. In addition, as the order of graph feature increases, the growth of parameter dimensionality of the model may lead the training process hard to converge. To solve the aforementioned limitations, we propose GraphSW, a strategy based on stage-wise training framework which would only access to a subset of the entities in KG in every stage. During the following stages, the learned embedding from previous stages is provided to the network in the next stage and the model can learn the information gradually from the KG. We apply stage-wise training on two SOTA recommendation models, RippleNet and Knowledge Graph Convolutional Networks (KGCN). Moreover, we evaluate the performance on six real world datasets, Last.FM 2011, Book-Crossing,movie, LFM-1b 2015, Amazon-book and Yelp 2018. The result of our experiments shows that proposed strategy can help both models to collect more information from the KG and improve the performance. Furthermore, it is observed that GraphSW can assist KGCN to converge effectively in high-order graph feature.
Tasks
Published	2019-08-13
URL	https://arxiv.org/abs/1908.05611v2
PDF	https://arxiv.org/pdf/1908.05611v2.pdf
PWC	https://paperswithcode.com/paper/graphsw-a-training-protocol-based-on-stage
Repo	https://github.com/mengruwu/graphsw
Framework	tf

Neural Canonical Transformation with Symplectic Flows


Title	Neural Canonical Transformation with Symplectic Flows
Authors	Shuo-Hui Li, Chen-Xiao Dong, Linfeng Zhang, Lei Wang
Abstract	Canonical transformation plays a fundamental role in simplifying and solving classical Hamiltonian systems. We construct flexible and powerful canonical transformations as generative models using symplectic neural networks. The model transforms physical variables towards a latent representation with an independent harmonic oscillator Hamiltonian. Correspondingly, the phase space density of the physical system flows towards a factorized Gaussian distribution in the latent space. Since the canonical transformation preserves the Hamiltonian evolution, the model captures nonlinear collective modes in the learned latent representation. We present an efficient implementation of symplectic neural coordinate transformations and two ways to train the model. The variational free energy calculation is based on the analytical form of physical Hamiltonian. While the phase space density estimation only requires samples in the coordinate space for separable Hamiltonians. We demonstrate appealing features of neural canonical transformation using toy problems including two-dimensional ring potential and harmonic chain. Finally, we apply the approach to real-world problems such as identifying slow collective modes in alanine dipeptide and conceptual compression of the MNIST dataset.
Tasks	Density Estimation
Published	2019-09-30
URL	https://arxiv.org/abs/1910.00024v2
PDF	https://arxiv.org/pdf/1910.00024v2.pdf
PWC	https://paperswithcode.com/paper/neural-canonical-transformation-with
Repo	https://github.com/li012589/neuralCT
Framework	pytorch

Warped Input Gaussian Processes for Time Series Forecasting


Title	Warped Input Gaussian Processes for Time Series Forecasting
Authors	David Tolpin
Abstract	We introduce a Gaussian process-based model for handling of non-stationarity. The warping is achieved non-parametrically, through imposing a prior on the relative change of distance between subsequent observation inputs. The model allows the use of general gradient optimization algorithms for training and incurs only a small computational overhead on training and prediction. The model finds its applications in forecasting in non-stationary time series with either gradually varying volatility, presence of change points, or a combination thereof. We evaluate the model on synthetic and real-world time series data comparing against both baseline and known state-of-the-art approaches and show that the model exhibits state-of-the-art forecasting performance at a lower implementation and computation cost.
Tasks	Gaussian Processes, Time Series, Time Series Forecasting
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02527v1
PDF	https://arxiv.org/pdf/1912.02527v1.pdf
PWC	https://paperswithcode.com/paper/warped-input-gaussian-processes-for-time
Repo	https://github.com/dtolpin/wigp
Framework	none

GSLAM: A General SLAM Framework and Benchmark


Title	GSLAM: A General SLAM Framework and Benchmark
Authors	Yong Zhao, Shibiao Xu, Shuhui Bu, Hongkai Jiang, Pengcheng Han
Abstract	SLAM technology has recently seen many successes and attracted the attention of high-technological companies. However, how to unify the interface of existing or emerging algorithms, and effectively perform benchmark about the speed, robustness and portability are still problems. In this paper, we propose a novel SLAM platform named GSLAM, which not only provides evaluation functionality, but also supplies useful toolkit for researchers to quickly develop their own SLAM systems. The core contribution of GSLAM is an universal, cross-platform and full open-source SLAM interface for both research and commercial usage, which is aimed to handle interactions with input dataset, SLAM implementation, visualization and applications in an unified framework. Through this platform, users can implement their own functions for better performance with plugin form and further boost the application to practical usage of the SLAM.
Tasks	Simultaneous Localization and Mapping
Published	2019-02-21
URL	http://arxiv.org/abs/1902.07995v1
PDF	http://arxiv.org/pdf/1902.07995v1.pdf
PWC	https://paperswithcode.com/paper/gslam-a-general-slam-framework-and-benchmark
Repo	https://github.com/zdzhaoyong/GSLAM
Framework	none


Title	DSM Building Shape Refinement from Combined Remote Sensing Images based on Wnet-cGANs
Authors	Ksenia Bittner, Marco Körner, Peter Reinartz
Abstract	We describe the workflow of a digital surface models (DSMs) refinement algorithm using a hybrid conditional generative adversarial network (cGAN) where the generative part consists of two parallel networks merged at the last stage forming a WNet architecture. The inputs to the so-called WNet-cGAN are stereo DSMs and panchromatic (PAN) half-meter resolution satellite images. Fusing these helps to propagate fine detailed information from a spectral image and complete the missing 3D knowledge from a stereo DSM about building shapes. Besides, it refines the building outlines and edges making them more rectangular and sharp.
Tasks
Published	2019-03-08
URL	http://arxiv.org/abs/1903.03519v1
PDF	http://arxiv.org/pdf/1903.03519v1.pdf
PWC	https://paperswithcode.com/paper/dsm-building-shape-refinement-from-combined
Repo	https://github.com/0xzayd/Wnet-cGAN
Framework	none

Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning


Title	Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning
Authors	Dong-Jin Kim, Jinsoo Choi, Tae-Hyun Oh, In So Kweon
Abstract	Our goal in this work is to train an image captioning model that generates more dense and informative captions. We introduce “relational captioning,” a novel image captioning task which aims to generate multiple captions with respect to relational information between objects in an image. Relational captioning is a framework that is advantageous in both diversity and amount of information, leading to image understanding based on relationships. Part-of speech (POS, i.e. subject-object-predicate categories) tags can be assigned to every English word. We leverage the POS as a prior to guide the correct sequence of words in a caption. To this end, we propose a multi-task triple-stream network (MTTSNet) which consists of three recurrent units for the respective POS and jointly performs POS prediction and captioning. We demonstrate more diverse and richer representations generated by the proposed model against several baselines and competing methods.
Tasks	Image Captioning
Published	2019-03-14
URL	https://arxiv.org/abs/1903.05942v4
PDF	https://arxiv.org/pdf/1903.05942v4.pdf
PWC	https://paperswithcode.com/paper/dense-relational-captioning-triple-stream
Repo	https://github.com/Dong-JinKim/DenseRelationalCaptioning
Framework	pytorch

Learning to Prove Theorems via Interacting with Proof Assistants


Title	Learning to Prove Theorems via Interacting with Proof Assistants
Authors	Kaiyu Yang, Jia Deng
Abstract	Humans prove theorems by relying on substantial high-level reasoning and problem-specific insights. Proof assistants offer a formalism that resembles human mathematical reasoning, representing theorems in higher-order logic and proofs as high-level tactics. However, human experts have to construct proofs manually by entering tactics into the proof assistant. In this paper, we study the problem of using machine learning to automate the interaction with proof assistants. We construct CoqGym, a large-scale dataset and learning environment containing 71K human-written proofs from 123 projects developed with the Coq proof assistant. We develop ASTactic, a deep learning-based model that generates tactics as programs in the form of abstract syntax trees (ASTs). Experiments show that ASTactic trained on CoqGym can generate effective tactics and can be used to prove new theorems not previously provable by automated methods. Code is available at https://github.com/princeton-vl/CoqGym.
Tasks	Automated Theorem Proving, Mathematical Proofs
Published	2019-05-21
URL	https://arxiv.org/abs/1905.09381v1
PDF	https://arxiv.org/pdf/1905.09381v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-prove-theorems-via-interacting
Repo	https://github.com/princeton-vl/CoqGym
Framework	pytorch


Title	A Neural Influence Diffusion Model for Social Recommendation
Authors	Le Wu, Peijie Sun, Yanjie Fu, Richang Hong, Xiting Wang, Meng Wang
Abstract	Precise user and item embedding learning is the key to building a successful recommender system. Traditionally, Collaborative Filtering(CF) provides a way to learn user and item embeddings from the user-item interaction history. However, the performance is limited due to the sparseness of user behavior data. With the emergence of online social networks, social recommender systems have been proposed to utilize each user’s local neighbors’ preferences to alleviate the data sparsity for better user embedding modeling. We argue that, for each user of a social platform, her potential embedding is influenced by her trusted users. As social influence recursively propagates and diffuses in the social network, each user’s interests change in the recursive process. Nevertheless, the current social recommendation models simply developed static models by leveraging the local neighbors of each user without simulating the recursive diffusion in the global social network, leading to suboptimal recommendation performance. In this paper, we propose a deep influence propagation model to stimulate how users are influenced by the recursive social diffusion process for social recommendation. For each user, the diffusion process starts with an initial embedding that fuses the related features and a free user latent vector that captures the latent behavior preference. The key idea of our proposed model is that we design a layer-wise influence propagation structure to model how users’ latent embeddings evolve as the social diffusion process continues. We further show that our proposed model is general and could be applied when the user~(item) attributes or the social network structure is not available. Finally, extensive experimental results on two real-world datasets clearly show the effectiveness of our proposed model, with more than 13% performance improvements over the best baselines.
Tasks	Recommendation Systems
Published	2019-04-20
URL	http://arxiv.org/abs/1904.10322v1
PDF	http://arxiv.org/pdf/1904.10322v1.pdf
PWC	https://paperswithcode.com/paper/a-neural-influence-diffusion-model-for-social
Repo	https://github.com/Kanika91/diffnet
Framework	tf

Are Clusterings of Multiple Data Views Independent?


Title	Are Clusterings of Multiple Data Views Independent?
Authors	Lucy L. Gao, Jacob Bien, Daniela Witten
Abstract	In the Pioneer 100 (P100) Wellness Project (Price and others, 2017), multiple types of data are collected on a single set of healthy participants at multiple timepoints in order to characterize and optimize wellness. One way to do this is to identify clusters, or subgroups, among the participants, and then to tailor personalized health recommendations to each subgroup. It is tempting to cluster the participants using all of the data types and timepoints, in order to fully exploit the available information. However, clustering the participants based on multiple data views implicitly assumes that a single underlying clustering of the participants is shared across all data views. If this assumption does not hold, then clustering the participants using multiple data views may lead to spurious results. In this paper, we seek to evaluate the assumption that there is some underlying relationship among the clusterings from the different data views, by asking the question: are the clusters within each data view dependent or independent? We develop a new test for answering this question, which we then apply to clinical, proteomic, and metabolomic data, across two distinct timepoints, from the P100 study. We find that while the subgroups of the participants defined with respect to any single data type seem to be dependent across time, the clustering among the participants based on one data type (e.g. proteomic data) appears not to be associated with the clustering based on another data type (e.g. clinical data).
Tasks
Published	2019-01-12
URL	http://arxiv.org/abs/1901.03905v1
PDF	http://arxiv.org/pdf/1901.03905v1.pdf
PWC	https://paperswithcode.com/paper/are-clusterings-of-multiple-data-views
Repo	https://github.com/lucylgao/independent-clusterings-code
Framework	none

Explainable AI for Trees: From Local Explanations to Global Understanding


Title	Explainable AI for Trees: From Local Explanations to Global Understanding
Authors	Scott M. Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, Su-In Lee
Abstract	Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are the most popular non-linear predictive models used in practice today, yet comparatively little attention has been paid to explaining their predictions. Here we significantly improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorithm to compute optimal explanations based on game theory. 2) A new type of explanation that directly measures local feature interaction effects. 3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to i) identify high magnitude but low frequency non-linear mortality risk factors in the general US population, ii) highlight distinct population sub-groups with shared risk characteristics, iii) identify non-linear interaction effects among risk factors for chronic kidney disease, and iv) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model’s performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains.
Tasks
Published	2019-05-11
URL	https://arxiv.org/abs/1905.04610v1
PDF	https://arxiv.org/pdf/1905.04610v1.pdf
PWC	https://paperswithcode.com/paper/explainable-ai-for-trees-from-local
Repo	https://github.com/suinleelab/treeexplainer-study
Framework	none

Wasserstein Barycenter Model Ensembling


Title	Wasserstein Barycenter Model Ensembling
Authors	Pierre Dognin, Igor Melnyk, Youssef Mroueh, Jerret Ross, Cicero Dos Santos, Tom Sercu
Abstract	In this paper we propose to perform model ensembling in a multiclass or a multilabel learning setting using Wasserstein (W.) barycenters. Optimal transport metrics, such as the Wasserstein distance, allow incorporating semantic side information such as word embeddings. Using W. barycenters to find the consensus between models allows us to balance confidence and semantics in finding the agreement between the models. We show applications of Wasserstein ensembling in attribute-based classification, multilabel learning and image captioning generation. These results show that the W. ensembling is a viable alternative to the basic geometric or arithmetic mean ensembling.
Tasks	Image Captioning, Word Embeddings
Published	2019-02-13
URL	http://arxiv.org/abs/1902.04999v1
PDF	http://arxiv.org/pdf/1902.04999v1.pdf
PWC	https://paperswithcode.com/paper/wasserstein-barycenter-model-ensembling-1
Repo	https://github.com/IBM/wasserstein-barycenters
Framework	pytorch

Few-Shot Unsupervised Image-to-Image Translation


Title	Few-Shot Unsupervised Image-to-Image Translation
Authors	Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, Jan Kautz
Abstract	Unsupervised image-to-image translation methods learn to map images in a given class to an analogous image in a different class, drawing on unstructured (non-registered) datasets of images. While remarkably successful, current methods require access to many images in both source and destination classes at training time. We argue this greatly limits their use. Drawing inspiration from the human capability of picking up the essence of a novel object from a small number of examples and generalizing from there, we seek a few-shot, unsupervised image-to-image translation algorithm that works on previously unseen target classes that are specified, at test time, only by a few example images. Our model achieves this few-shot generation capability by coupling an adversarial training scheme with a novel network design. Through extensive experimental validation and comparisons to several baseline methods on benchmark datasets, we verify the effectiveness of the proposed framework. Our implementation and datasets are available at https://github.com/NVlabs/FUNIT .
Tasks	Image-to-Image Translation, Unsupervised Image-To-Image Translation
Published	2019-05-05
URL	https://arxiv.org/abs/1905.01723v2
PDF	https://arxiv.org/pdf/1905.01723v2.pdf
PWC	https://paperswithcode.com/paper/few-shot-unsupervised-image-to-image
Repo	https://github.com/bocharm/FUNIT
Framework	pytorch

Wasserstein Adversarial Regularization (WAR) on label noise


Title	Wasserstein Adversarial Regularization (WAR) on label noise
Authors	Bharath Bhushan Damodaran, Kilian Fatras, Sylvain Lobry, Rémi Flamary, Devis Tuia, Nicolas Courty
Abstract	Noisy labels often occur in vision datasets, especially when they are obtained from crowdsourcing or Web scraping. We propose a new regularization method, which enables learning robust classifiers in presence of noisy data. To achieve this goal, we propose a new adversarial regularization scheme based on the Wasserstein distance. Using this distance allows taking into account specific relations between classes by leveraging the geometric properties of the labels space. Our Wasserstein Adversarial Regularization (WAR) encodes a selective regularization, which promotes smoothness of the classifier between some classes, while preserving sufficient complexity of the decision boundary between others. We first discuss how and why adversarial regularization can be used in the context of label noise and then show the effectiveness of our method on five datasets corrupted with noisy labels: in both benchmarks and real datasets, WAR outperforms the state-of-the-art competitors.
Tasks	Semantic Segmentation
Published	2019-04-08
URL	https://arxiv.org/abs/1904.03936v2
PDF	https://arxiv.org/pdf/1904.03936v2.pdf
PWC	https://paperswithcode.com/paper/pushing-the-right-boundaries-matters
Repo	https://github.com/kilianFatras/kilianFatras.github.io
Framework	none