Paper Group AWR 112
On the adequacy of untuned warmup for adaptive optimization. Thompson Sampling for a Fatigue-aware Online Recommendation System. GraphSW: a training protocol based on stage-wise training for GNN-based Recommender Model. Neural Canonical Transformation with Symplectic Flows. Warped Input Gaussian Processes for Time Series Forecasting. GSLAM: A Gener …
On the adequacy of untuned warmup for adaptive optimization
Title | On the adequacy of untuned warmup for adaptive optimization |
Authors | Jerry Ma, Denis Yarats |
Abstract | Adaptive optimization algorithms such as Adam (Kingma & Ba, 2014) are widely used in deep learning. The stability of such algorithms is often improved with a warmup schedule for the learning rate. Motivated by the difficulty of choosing and tuning warmup schedules, Liu et al. (2019) propose automatic variance rectification of Adam’s adaptive learning rate, claiming that this rectified approach (“RAdam”) surpasses the vanilla Adam algorithm and reduces the need for expensive tuning of Adam with warmup. In this work, we point out various shortcomings of this analysis. We then provide an alternative explanation for the necessity of warmup based on the magnitude of the update term, which is of greater relevance to training stability. Finally, we provide some “rule-of-thumb” warmup schedules, and we demonstrate that simple untuned warmup of Adam performs more-or-less identically to RAdam in typical practical settings. We conclude by suggesting that practitioners stick to linear warmup with Adam, with a sensible default being linear warmup over $2 / (1 - \beta_2)$ training iterations. |
Tasks | Image Classification, Language Modelling, Machine Translation |
Published | 2019-10-09 |
URL | https://arxiv.org/abs/1910.04209v1 |
https://arxiv.org/pdf/1910.04209v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-adequacy-of-untuned-warmup-for |
Repo | https://github.com/Tony-Y/pytorch_warmup |
Framework | pytorch |
Thompson Sampling for a Fatigue-aware Online Recommendation System
Title | Thompson Sampling for a Fatigue-aware Online Recommendation System |
Authors | Yunjuan Wang, Theja Tulabandhula |
Abstract | In this paper we consider an online recommendation setting, where a platform recommends a sequence of items to its users at every time period. The users respond by selecting one of the items recommended or abandon the platform due to fatigue from seeing less useful items. Assuming a parametric stochastic model of user behavior, which captures positional effects of these items as well as the abandoning behavior of users, the platform’s goal is to recommend sequences of items that are competitive to the single best sequence of items in hindsight, without knowing the true user model a priori. Naively applying a stochastic bandit algorithm in this setting leads to an exponential dependence on the number of items. We propose a new Thompson sampling based algorithm with expected regret that is polynomial in the number of items in this combinatorial setting, and performs extremely well in practice. |
Tasks | |
Published | 2019-01-23 |
URL | http://arxiv.org/abs/1901.07734v2 |
http://arxiv.org/pdf/1901.07734v2.pdf | |
PWC | https://paperswithcode.com/paper/thompson-sampling-for-a-fatigue-aware-online |
Repo | https://github.com/bettyttytty/Thompson-Sampling-for-a-Fatigue-aware-Online-Recommendation-System |
Framework | none |
GraphSW: a training protocol based on stage-wise training for GNN-based Recommender Model
Title | GraphSW: a training protocol based on stage-wise training for GNN-based Recommender Model |
Authors | Chang-You Tai, Meng-Ru Wu, Yun-Wei Chu, Shao-Yu Chu |
Abstract | Recently, researchers utilize Knowledge Graph (KG) as side information in recommendation system to address cold start and sparsity issue and improve the recommendation performance. Existing KG-aware recommendation model use the feature of neighboring entities and structural information to update the embedding of currently located entity. Although the fruitful information is beneficial to the following task, the cost of exploring the entire graph is massive and impractical. In order to reduce the computational cost and maintain the pattern of extracting features, KG-aware recommendation model usually utilize fixed-size and random set of neighbors rather than complete information in KG. Nonetheless, there are two critical issues in these approaches: First of all, fixed-size and randomly selected neighbors restrict the view of graph. In addition, as the order of graph feature increases, the growth of parameter dimensionality of the model may lead the training process hard to converge. To solve the aforementioned limitations, we propose GraphSW, a strategy based on stage-wise training framework which would only access to a subset of the entities in KG in every stage. During the following stages, the learned embedding from previous stages is provided to the network in the next stage and the model can learn the information gradually from the KG. We apply stage-wise training on two SOTA recommendation models, RippleNet and Knowledge Graph Convolutional Networks (KGCN). Moreover, we evaluate the performance on six real world datasets, Last.FM 2011, Book-Crossing,movie, LFM-1b 2015, Amazon-book and Yelp 2018. The result of our experiments shows that proposed strategy can help both models to collect more information from the KG and improve the performance. Furthermore, it is observed that GraphSW can assist KGCN to converge effectively in high-order graph feature. |
Tasks | |
Published | 2019-08-13 |
URL | https://arxiv.org/abs/1908.05611v2 |
https://arxiv.org/pdf/1908.05611v2.pdf | |
PWC | https://paperswithcode.com/paper/graphsw-a-training-protocol-based-on-stage |
Repo | https://github.com/mengruwu/graphsw |
Framework | tf |
Neural Canonical Transformation with Symplectic Flows
Title | Neural Canonical Transformation with Symplectic Flows |
Authors | Shuo-Hui Li, Chen-Xiao Dong, Linfeng Zhang, Lei Wang |
Abstract | Canonical transformation plays a fundamental role in simplifying and solving classical Hamiltonian systems. We construct flexible and powerful canonical transformations as generative models using symplectic neural networks. The model transforms physical variables towards a latent representation with an independent harmonic oscillator Hamiltonian. Correspondingly, the phase space density of the physical system flows towards a factorized Gaussian distribution in the latent space. Since the canonical transformation preserves the Hamiltonian evolution, the model captures nonlinear collective modes in the learned latent representation. We present an efficient implementation of symplectic neural coordinate transformations and two ways to train the model. The variational free energy calculation is based on the analytical form of physical Hamiltonian. While the phase space density estimation only requires samples in the coordinate space for separable Hamiltonians. We demonstrate appealing features of neural canonical transformation using toy problems including two-dimensional ring potential and harmonic chain. Finally, we apply the approach to real-world problems such as identifying slow collective modes in alanine dipeptide and conceptual compression of the MNIST dataset. |
Tasks | Density Estimation |
Published | 2019-09-30 |
URL | https://arxiv.org/abs/1910.00024v2 |
https://arxiv.org/pdf/1910.00024v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-canonical-transformation-with |
Repo | https://github.com/li012589/neuralCT |
Framework | pytorch |
Warped Input Gaussian Processes for Time Series Forecasting
Title | Warped Input Gaussian Processes for Time Series Forecasting |
Authors | David Tolpin |
Abstract | We introduce a Gaussian process-based model for handling of non-stationarity. The warping is achieved non-parametrically, through imposing a prior on the relative change of distance between subsequent observation inputs. The model allows the use of general gradient optimization algorithms for training and incurs only a small computational overhead on training and prediction. The model finds its applications in forecasting in non-stationary time series with either gradually varying volatility, presence of change points, or a combination thereof. We evaluate the model on synthetic and real-world time series data comparing against both baseline and known state-of-the-art approaches and show that the model exhibits state-of-the-art forecasting performance at a lower implementation and computation cost. |
Tasks | Gaussian Processes, Time Series, Time Series Forecasting |
Published | 2019-12-05 |
URL | https://arxiv.org/abs/1912.02527v1 |
https://arxiv.org/pdf/1912.02527v1.pdf | |
PWC | https://paperswithcode.com/paper/warped-input-gaussian-processes-for-time |
Repo | https://github.com/dtolpin/wigp |
Framework | none |
GSLAM: A General SLAM Framework and Benchmark
Title | GSLAM: A General SLAM Framework and Benchmark |
Authors | Yong Zhao, Shibiao Xu, Shuhui Bu, Hongkai Jiang, Pengcheng Han |
Abstract | SLAM technology has recently seen many successes and attracted the attention of high-technological companies. However, how to unify the interface of existing or emerging algorithms, and effectively perform benchmark about the speed, robustness and portability are still problems. In this paper, we propose a novel SLAM platform named GSLAM, which not only provides evaluation functionality, but also supplies useful toolkit for researchers to quickly develop their own SLAM systems. The core contribution of GSLAM is an universal, cross-platform and full open-source SLAM interface for both research and commercial usage, which is aimed to handle interactions with input dataset, SLAM implementation, visualization and applications in an unified framework. Through this platform, users can implement their own functions for better performance with plugin form and further boost the application to practical usage of the SLAM. |
Tasks | Simultaneous Localization and Mapping |
Published | 2019-02-21 |
URL | http://arxiv.org/abs/1902.07995v1 |
http://arxiv.org/pdf/1902.07995v1.pdf | |
PWC | https://paperswithcode.com/paper/gslam-a-general-slam-framework-and-benchmark |
Repo | https://github.com/zdzhaoyong/GSLAM |
Framework | none |
DSM Building Shape Refinement from Combined Remote Sensing Images based on Wnet-cGANs
Title | DSM Building Shape Refinement from Combined Remote Sensing Images based on Wnet-cGANs |
Authors | Ksenia Bittner, Marco Körner, Peter Reinartz |
Abstract | We describe the workflow of a digital surface models (DSMs) refinement algorithm using a hybrid conditional generative adversarial network (cGAN) where the generative part consists of two parallel networks merged at the last stage forming a WNet architecture. The inputs to the so-called WNet-cGAN are stereo DSMs and panchromatic (PAN) half-meter resolution satellite images. Fusing these helps to propagate fine detailed information from a spectral image and complete the missing 3D knowledge from a stereo DSM about building shapes. Besides, it refines the building outlines and edges making them more rectangular and sharp. |
Tasks | |
Published | 2019-03-08 |
URL | http://arxiv.org/abs/1903.03519v1 |
http://arxiv.org/pdf/1903.03519v1.pdf | |
PWC | https://paperswithcode.com/paper/dsm-building-shape-refinement-from-combined |
Repo | https://github.com/0xzayd/Wnet-cGAN |
Framework | none |
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning
Title | Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning |
Authors | Dong-Jin Kim, Jinsoo Choi, Tae-Hyun Oh, In So Kweon |
Abstract | Our goal in this work is to train an image captioning model that generates more dense and informative captions. We introduce “relational captioning,” a novel image captioning task which aims to generate multiple captions with respect to relational information between objects in an image. Relational captioning is a framework that is advantageous in both diversity and amount of information, leading to image understanding based on relationships. Part-of speech (POS, i.e. subject-object-predicate categories) tags can be assigned to every English word. We leverage the POS as a prior to guide the correct sequence of words in a caption. To this end, we propose a multi-task triple-stream network (MTTSNet) which consists of three recurrent units for the respective POS and jointly performs POS prediction and captioning. We demonstrate more diverse and richer representations generated by the proposed model against several baselines and competing methods. |
Tasks | Image Captioning |
Published | 2019-03-14 |
URL | https://arxiv.org/abs/1903.05942v4 |
https://arxiv.org/pdf/1903.05942v4.pdf | |
PWC | https://paperswithcode.com/paper/dense-relational-captioning-triple-stream |
Repo | https://github.com/Dong-JinKim/DenseRelationalCaptioning |
Framework | pytorch |
Learning to Prove Theorems via Interacting with Proof Assistants
Title | Learning to Prove Theorems via Interacting with Proof Assistants |
Authors | Kaiyu Yang, Jia Deng |
Abstract | Humans prove theorems by relying on substantial high-level reasoning and problem-specific insights. Proof assistants offer a formalism that resembles human mathematical reasoning, representing theorems in higher-order logic and proofs as high-level tactics. However, human experts have to construct proofs manually by entering tactics into the proof assistant. In this paper, we study the problem of using machine learning to automate the interaction with proof assistants. We construct CoqGym, a large-scale dataset and learning environment containing 71K human-written proofs from 123 projects developed with the Coq proof assistant. We develop ASTactic, a deep learning-based model that generates tactics as programs in the form of abstract syntax trees (ASTs). Experiments show that ASTactic trained on CoqGym can generate effective tactics and can be used to prove new theorems not previously provable by automated methods. Code is available at https://github.com/princeton-vl/CoqGym. |
Tasks | Automated Theorem Proving, Mathematical Proofs |
Published | 2019-05-21 |
URL | https://arxiv.org/abs/1905.09381v1 |
https://arxiv.org/pdf/1905.09381v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-prove-theorems-via-interacting |
Repo | https://github.com/princeton-vl/CoqGym |
Framework | pytorch |
A Neural Influence Diffusion Model for Social Recommendation
Title | A Neural Influence Diffusion Model for Social Recommendation |
Authors | Le Wu, Peijie Sun, Yanjie Fu, Richang Hong, Xiting Wang, Meng Wang |
Abstract | Precise user and item embedding learning is the key to building a successful recommender system. Traditionally, Collaborative Filtering(CF) provides a way to learn user and item embeddings from the user-item interaction history. However, the performance is limited due to the sparseness of user behavior data. With the emergence of online social networks, social recommender systems have been proposed to utilize each user’s local neighbors’ preferences to alleviate the data sparsity for better user embedding modeling. We argue that, for each user of a social platform, her potential embedding is influenced by her trusted users. As social influence recursively propagates and diffuses in the social network, each user’s interests change in the recursive process. Nevertheless, the current social recommendation models simply developed static models by leveraging the local neighbors of each user without simulating the recursive diffusion in the global social network, leading to suboptimal recommendation performance. In this paper, we propose a deep influence propagation model to stimulate how users are influenced by the recursive social diffusion process for social recommendation. For each user, the diffusion process starts with an initial embedding that fuses the related features and a free user latent vector that captures the latent behavior preference. The key idea of our proposed model is that we design a layer-wise influence propagation structure to model how users’ latent embeddings evolve as the social diffusion process continues. We further show that our proposed model is general and could be applied when the user~(item) attributes or the social network structure is not available. Finally, extensive experimental results on two real-world datasets clearly show the effectiveness of our proposed model, with more than 13% performance improvements over the best baselines. |
Tasks | Recommendation Systems |
Published | 2019-04-20 |
URL | http://arxiv.org/abs/1904.10322v1 |
http://arxiv.org/pdf/1904.10322v1.pdf | |
PWC | https://paperswithcode.com/paper/a-neural-influence-diffusion-model-for-social |
Repo | https://github.com/Kanika91/diffnet |
Framework | tf |
Are Clusterings of Multiple Data Views Independent?
Title | Are Clusterings of Multiple Data Views Independent? |
Authors | Lucy L. Gao, Jacob Bien, Daniela Witten |
Abstract | In the Pioneer 100 (P100) Wellness Project (Price and others, 2017), multiple types of data are collected on a single set of healthy participants at multiple timepoints in order to characterize and optimize wellness. One way to do this is to identify clusters, or subgroups, among the participants, and then to tailor personalized health recommendations to each subgroup. It is tempting to cluster the participants using all of the data types and timepoints, in order to fully exploit the available information. However, clustering the participants based on multiple data views implicitly assumes that a single underlying clustering of the participants is shared across all data views. If this assumption does not hold, then clustering the participants using multiple data views may lead to spurious results. In this paper, we seek to evaluate the assumption that there is some underlying relationship among the clusterings from the different data views, by asking the question: are the clusters within each data view dependent or independent? We develop a new test for answering this question, which we then apply to clinical, proteomic, and metabolomic data, across two distinct timepoints, from the P100 study. We find that while the subgroups of the participants defined with respect to any single data type seem to be dependent across time, the clustering among the participants based on one data type (e.g. proteomic data) appears not to be associated with the clustering based on another data type (e.g. clinical data). |
Tasks | |
Published | 2019-01-12 |
URL | http://arxiv.org/abs/1901.03905v1 |
http://arxiv.org/pdf/1901.03905v1.pdf | |
PWC | https://paperswithcode.com/paper/are-clusterings-of-multiple-data-views |
Repo | https://github.com/lucylgao/independent-clusterings-code |
Framework | none |
Explainable AI for Trees: From Local Explanations to Global Understanding
Title | Explainable AI for Trees: From Local Explanations to Global Understanding |
Authors | Scott M. Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, Su-In Lee |
Abstract | Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are the most popular non-linear predictive models used in practice today, yet comparatively little attention has been paid to explaining their predictions. Here we significantly improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorithm to compute optimal explanations based on game theory. 2) A new type of explanation that directly measures local feature interaction effects. 3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to i) identify high magnitude but low frequency non-linear mortality risk factors in the general US population, ii) highlight distinct population sub-groups with shared risk characteristics, iii) identify non-linear interaction effects among risk factors for chronic kidney disease, and iv) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model’s performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains. |
Tasks | |
Published | 2019-05-11 |
URL | https://arxiv.org/abs/1905.04610v1 |
https://arxiv.org/pdf/1905.04610v1.pdf | |
PWC | https://paperswithcode.com/paper/explainable-ai-for-trees-from-local |
Repo | https://github.com/suinleelab/treeexplainer-study |
Framework | none |
Wasserstein Barycenter Model Ensembling
Title | Wasserstein Barycenter Model Ensembling |
Authors | Pierre Dognin, Igor Melnyk, Youssef Mroueh, Jerret Ross, Cicero Dos Santos, Tom Sercu |
Abstract | In this paper we propose to perform model ensembling in a multiclass or a multilabel learning setting using Wasserstein (W.) barycenters. Optimal transport metrics, such as the Wasserstein distance, allow incorporating semantic side information such as word embeddings. Using W. barycenters to find the consensus between models allows us to balance confidence and semantics in finding the agreement between the models. We show applications of Wasserstein ensembling in attribute-based classification, multilabel learning and image captioning generation. These results show that the W. ensembling is a viable alternative to the basic geometric or arithmetic mean ensembling. |
Tasks | Image Captioning, Word Embeddings |
Published | 2019-02-13 |
URL | http://arxiv.org/abs/1902.04999v1 |
http://arxiv.org/pdf/1902.04999v1.pdf | |
PWC | https://paperswithcode.com/paper/wasserstein-barycenter-model-ensembling-1 |
Repo | https://github.com/IBM/wasserstein-barycenters |
Framework | pytorch |
Few-Shot Unsupervised Image-to-Image Translation
Title | Few-Shot Unsupervised Image-to-Image Translation |
Authors | Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, Jan Kautz |
Abstract | Unsupervised image-to-image translation methods learn to map images in a given class to an analogous image in a different class, drawing on unstructured (non-registered) datasets of images. While remarkably successful, current methods require access to many images in both source and destination classes at training time. We argue this greatly limits their use. Drawing inspiration from the human capability of picking up the essence of a novel object from a small number of examples and generalizing from there, we seek a few-shot, unsupervised image-to-image translation algorithm that works on previously unseen target classes that are specified, at test time, only by a few example images. Our model achieves this few-shot generation capability by coupling an adversarial training scheme with a novel network design. Through extensive experimental validation and comparisons to several baseline methods on benchmark datasets, we verify the effectiveness of the proposed framework. Our implementation and datasets are available at https://github.com/NVlabs/FUNIT . |
Tasks | Image-to-Image Translation, Unsupervised Image-To-Image Translation |
Published | 2019-05-05 |
URL | https://arxiv.org/abs/1905.01723v2 |
https://arxiv.org/pdf/1905.01723v2.pdf | |
PWC | https://paperswithcode.com/paper/few-shot-unsupervised-image-to-image |
Repo | https://github.com/bocharm/FUNIT |
Framework | pytorch |
Wasserstein Adversarial Regularization (WAR) on label noise
Title | Wasserstein Adversarial Regularization (WAR) on label noise |
Authors | Bharath Bhushan Damodaran, Kilian Fatras, Sylvain Lobry, Rémi Flamary, Devis Tuia, Nicolas Courty |
Abstract | Noisy labels often occur in vision datasets, especially when they are obtained from crowdsourcing or Web scraping. We propose a new regularization method, which enables learning robust classifiers in presence of noisy data. To achieve this goal, we propose a new adversarial regularization scheme based on the Wasserstein distance. Using this distance allows taking into account specific relations between classes by leveraging the geometric properties of the labels space. Our Wasserstein Adversarial Regularization (WAR) encodes a selective regularization, which promotes smoothness of the classifier between some classes, while preserving sufficient complexity of the decision boundary between others. We first discuss how and why adversarial regularization can be used in the context of label noise and then show the effectiveness of our method on five datasets corrupted with noisy labels: in both benchmarks and real datasets, WAR outperforms the state-of-the-art competitors. |
Tasks | Semantic Segmentation |
Published | 2019-04-08 |
URL | https://arxiv.org/abs/1904.03936v2 |
https://arxiv.org/pdf/1904.03936v2.pdf | |
PWC | https://paperswithcode.com/paper/pushing-the-right-boundaries-matters |
Repo | https://github.com/kilianFatras/kilianFatras.github.io |
Framework | none |