January 31, 2020

3050 words 15 mins read

Paper Group AWR 398

Kernel-based Translations of Convolutional Networks. Newton vs the machine: solving the chaotic three-body problem using deep neural networks. Note on the bias and variance of variational inference. Learning to learn via Self-Critique. Validated Variational Inference via Practical Posterior Error Bounds. Signed Graph Attention Networks. Learning 2D …

Kernel-based Translations of Convolutional Networks


Title	Kernel-based Translations of Convolutional Networks
Authors	Corinne Jones, Vincent Roulet, Zaid Harchaoui
Abstract	Convolutional Neural Networks, as most artificial neural networks, are commonly viewed as methods different in essence from kernel-based methods. We provide a systematic translation of Convolutional Neural Networks (ConvNets) into their kernel-based counterparts, Convolutional Kernel Networks (CKNs), and demonstrate that this perception is unfounded both formally and empirically. We show that, given a Convolutional Neural Network, we can design a corresponding Convolutional Kernel Network, easily trainable using a new stochastic gradient algorithm based on an accurate gradient computation, that performs on par with its Convolutional Neural Network counterpart. We present experimental results supporting our claims on landmark ConvNet architectures comparing each ConvNet to its CKN counterpart over several parameter settings.
Tasks
Published	2019-03-19
URL	http://arxiv.org/abs/1903.08131v1
PDF	http://arxiv.org/pdf/1903.08131v1.pdf
PWC	https://paperswithcode.com/paper/kernel-based-translations-of-convolutional
Repo	https://github.com/cjones6/yesweckn
Framework	pytorch

Newton vs the machine: solving the chaotic three-body problem using deep neural networks


Title	Newton vs the machine: solving the chaotic three-body problem using deep neural networks
Authors	Philip G. Breen, Christopher N. Foley, Tjarda Boekholt, Simon Portegies Zwart
Abstract	Since its formulation by Sir Isaac Newton, the problem of solving the equations of motion for three bodies under their own gravitational force has remained practically unsolved. Currently, the solution for a given initialization can only be found by performing laborious iterative calculations that have unpredictable and potentially infinite computational cost, due to the system’s chaotic nature. We show that an ensemble of solutions obtained using an arbitrarily precise numerical integrator can be used to train a deep artificial neural network (ANN) that, over a bounded time interval, provides accurate solutions at fixed computational cost and up to 100 million times faster than a state-of-the-art solver. Our results provide evidence that, for computationally challenging regions of phase-space, a trained ANN can replace existing numerical solvers, enabling fast and scalable simulations of many-body systems to shed light on outstanding phenomena such as the formation of black-hole binary systems or the origin of the core collapse in dense star clusters.
Tasks
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07291v1
PDF	https://arxiv.org/pdf/1910.07291v1.pdf
PWC	https://paperswithcode.com/paper/newton-vs-the-machine-solving-the-chaotic
Repo	https://github.com/pgbreen/NVM
Framework	tf

Note on the bias and variance of variational inference


Title	Note on the bias and variance of variational inference
Authors	Chin-Wei Huang, Aaron Courville
Abstract	In this note, we study the relationship between the variational gap and the variance of the (log) likelihood ratio. We show that the gap can be upper bounded by some form of dispersion measure of the likelihood ratio, which suggests the bias of variational inference can be reduced by making the distribution of the likelihood ratio more concentrated, such as via averaging and variance reduction.
Tasks
Published	2019-06-09
URL	https://arxiv.org/abs/1906.03708v1
PDF	https://arxiv.org/pdf/1906.03708v1.pdf
PWC	https://paperswithcode.com/paper/note-on-the-bias-and-variance-of-variational
Repo	https://github.com/CW-Huang/HIWAE
Framework	pytorch

Learning to learn via Self-Critique


Title	Learning to learn via Self-Critique
Authors	Antreas Antoniou, Amos Storkey
Abstract	In few-shot learning, a machine learning system learns from a small set of labelled examples relating to a specific task, such that it can generalize to new examples of the same task. Given the limited availability of labelled examples in such tasks, we wish to make use of all the information we can. Usually a model learns task-specific information from a small training-set (support-set) to predict on an unlabelled validation set (target-set). The target-set contains additional task-specific information which is not utilized by existing few-shot learning methods. Making use of the target-set examples via transductive learning requires approaches beyond the current methods; at inference time, the target-set contains only unlabelled input data-points, and so discriminative learning cannot be used. In this paper, we propose a framework called Self-Critique and Adapt or SCA, which learns to learn a label-free loss function, parameterized as a neural network. A base-model learns on a support-set using existing methods (e.g. stochastic gradient descent combined with the cross-entropy loss), and then is updated for the incoming target-task using the learnt loss function. This label-free loss function is itself optimized such that the learnt model achieves higher generalization performance. Experiments demonstrate that SCA offers substantially reduced error-rates compared to baselines which only adapt on the support-set, and results in state of the art benchmark performance on Mini-ImageNet and Caltech-UCSD Birds 200.
Tasks	Few-Shot Image Classification, Few-Shot Learning
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10295v6
PDF	https://arxiv.org/pdf/1905.10295v6.pdf
PWC	https://paperswithcode.com/paper/learning-to-learn-by-self-critique
Repo	https://github.com/AntreasAntoniou/Learning_to_Learn_via_Self-Critique
Framework	pytorch

Validated Variational Inference via Practical Posterior Error Bounds


Title	Validated Variational Inference via Practical Posterior Error Bounds
Authors	Jonathan H. Huggins, Mikołaj Kasprzak, Trevor Campbell, Tamara Broderick
Abstract	Variational inference has become an increasingly attractive fast alternative to Markov chain Monte Carlo methods for approximate Bayesian inference. However, a major obstacle to the widespread use of variational methods is the lack of post-hoc accuracy measures that are both theoretically justified and computationally efficient. In this paper, we provide rigorous bounds on the error of posterior mean and uncertainty estimates that arise from full-distribution approximations, as in variational inference. Our bounds are widely applicable, as they require only that the approximating and exact posteriors have polynomial moments. Our bounds are also computationally efficient for variational inference because they require only standard values from variational objectives, straightforward analytic calculations, and simple Monte Carlo estimates. We show that our analysis naturally leads to a new and improved workflow for validated variational inference. Finally, we demonstrate the utility of our proposed workflow and error bounds on a robust regression problem and on a real-data example with a widely used multilevel hierarchical model.
Tasks	Bayesian Inference
Published	2019-10-09
URL	https://arxiv.org/abs/1910.04102v4
PDF	https://arxiv.org/pdf/1910.04102v4.pdf
PWC	https://paperswithcode.com/paper/practical-posterior-error-bounds-from
Repo	https://github.com/jhuggins/viabel
Framework	none

Signed Graph Attention Networks


Title	Signed Graph Attention Networks
Authors	Junjie Huang, Huawei Shen, Liang Hou, Xueqi Cheng
Abstract	Graph or network data is ubiquitous in the real world, including social networks, information networks, traffic networks, biological networks and various technical networks. The non-Euclidean nature of graph data poses the challenge for modeling and analyzing graph data. Recently, Graph Neural Network (GNNs) are proposed as a general and powerful framework to handle tasks on graph data, e.g., node embedding, link prediction and node classification. As a representative implementation of GNNs, Graph Attention Networks (GATs) are successfully applied in a variety of tasks on real datasets. However, GAT is designed to networks with only positive links and fails to handle signed networks which contain both positive and negative links. In this paper, we propose Signed Graph Attention Networks (SiGATs), generalizing GAT to signed networks. SiGAT incorporates graph motifs into GAT to capture two well-known theories in signed network research, i.e., balance theory and status theory. In SiGAT, motifs offer us the flexible structural pattern to aggregate and propagate messages on the signed network to generate node embeddings. We evaluate the proposed SiGAT method by applying it to the signed link prediction task. Experimental results on three real datasets demonstrate that SiGAT outperforms feature-based method, network embedding method and state-of-the-art GNN-based methods like signed graph convolutional network (SGCN).
Tasks	Link Prediction, Network Embedding, Node Classification
Published	2019-06-26
URL	https://arxiv.org/abs/1906.10958v3
PDF	https://arxiv.org/pdf/1906.10958v3.pdf
PWC	https://paperswithcode.com/paper/signed-graph-attention-networks
Repo	https://github.com/huangjunjie95/SiGAT
Framework	pytorch

Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language


Title	Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language
Authors	Songyang Zhang, Houwen Peng, Jianlong Fu, Jiebo Luo
Abstract	We address the problem of retrieving a specific moment from an untrimmed video by a query sentence. This is a challenging problem because a target moment may take place in relations to other temporal moments in the untrimmed video. Existing methods cannot tackle this challenge well since they consider temporal moments individually and neglect the temporal dependencies. In this paper, we model the temporal relations between video moments by a two-dimensional map, where one dimension indicates the starting time of a moment and the other indicates the end time. This 2D temporal map can cover diverse video moments with different lengths, while representing their adjacent relations. Based on the 2D map, we propose a Temporal Adjacent Network (2D-TAN), a single-shot framework for moment localization. It is capable of encoding the adjacent temporal relation, while learning discriminative features for matching video moments with referring expressions. We evaluate the proposed 2D-TAN on three challenging benchmarks, i.e., Charades-STA, ActivityNet Captions, and TACoS, where our 2D-TAN outperforms the state-of-the-art.
Tasks
Published	2019-12-08
URL	https://arxiv.org/abs/1912.03590v1
PDF	https://arxiv.org/pdf/1912.03590v1.pdf
PWC	https://paperswithcode.com/paper/learning-2d-temporal-adjacent-networks-for
Repo	https://github.com/researchmm/2D-TAN
Framework	pytorch

Few-shot Text Classification with Distributional Signatures


Title	Few-shot Text Classification with Distributional Signatures
Authors	Yujia Bao, Menghua Wu, Shiyu Chang, Regina Barzilay
Abstract	In this paper, we explore meta-learning for few-shot text classification. Meta-learning has shown strong performance in computer vision, where low-level patterns are transferable across learning tasks. However, directly applying this approach to text is challenging–lexical features highly informative for one task may be insignificant for another. Thus, rather than learning solely from words, our model also leverages their distributional signatures, which encode pertinent word occurrence patterns. Our model is trained within a meta-learning framework to map these signatures into attention scores, which are then used to weight the lexical representations of words. We demonstrate that our model consistently outperforms prototypical networks learned on lexical knowledge (Snell et al., 2017) in both few-shot text classification and relation classification by a significant margin across six benchmark datasets (20.0% on average in 1-shot classification).
Tasks	Meta-Learning, Relation Classification, Text Classification
Published	2019-08-16
URL	https://arxiv.org/abs/1908.06039v3
PDF	https://arxiv.org/pdf/1908.06039v3.pdf
PWC	https://paperswithcode.com/paper/few-shot-text-classification-with
Repo	https://github.com/YujiaBao/Distributional-Signatures
Framework	pytorch

A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research


Title	A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research
Authors	Maurizio Ferrari Dacrema, Simone Boglio, Paolo Cremonesi, Dietmar Jannach
Abstract	The design of algorithms that generate personalized ranked item lists is a central topic of research in the field of recommender systems. In the past few years, in particular, approaches based on deep learning (neural) techniques have become dominant in the literature. For all of them, substantial progress over the state-of-the-art is claimed. However, indications exist of certain problems in today’s research practice, e.g., with respect to the choice and optimization of the baselines used for comparison, raising questions about the published claims. In order to obtain a better understanding of the actual progress, we have tried to reproduce recent results in the area of neural recommendation approaches based on collaborative filtering. The worrying outcome of the analysis of these recent works-all were published at prestigious scientific conferences between 2015 and 2018-is that 11 out of the 12 reproducible neural approaches can be outperformed by conceptually simple methods, e.g., based on the nearest-neighbor heuristics. None of the computationally complex neural methods was actually consistently better than already existing learning-based techniques, e.g., using matrix factorization or linear models. In our analysis, we discuss common issues in today’s research practice, which, despite the many papers that are published on the topic, have apparently led the field to a certain level of stagnation.
Tasks	Recommendation Systems
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07698v1
PDF	https://arxiv.org/pdf/1911.07698v1.pdf
PWC	https://paperswithcode.com/paper/a-troubling-analysis-of-reproducibility-and
Repo	https://github.com/MaurizioFD/RecSys2019_DeepLearning_Evaluation
Framework	none

Graph Neural Based End-to-end Data Association Framework for Online Multiple-Object Tracking


Title	Graph Neural Based End-to-end Data Association Framework for Online Multiple-Object Tracking
Authors	Xiaolong Jiang, Peizhao Li, Yanjing Li, Xiantong Zhen
Abstract	In this work, we present an end-to-end framework to settle data association in online Multiple-Object Tracking (MOT). Given detection responses, we formulate the frame-by-frame data association as Maximum Weighted Bipartite Matching problem, whose solution is learned using a neural network. The network incorporates an affinity learning module, wherein both appearance and motion cues are investigated to encode object feature representation and compute pairwise affinities. Employing the computed affinities as edge weights, the following matching problem on a bipartite graph is resolved by the optimization module, which leverages a graph neural network to adapt with the varying cardinalities of the association problem and solve the combinatorial hardness with favorable scalability and compatibility. To facilitate effective training of the proposed tracking network, we design a multi-level matrix loss in conjunction with the assembled supervision methodology. Being trained end-to-end, all modules in the tracker can co-adapt and co-operate collaboratively, resulting in improved model adaptiveness and less parameter-tuning efforts. Experiment results on the MOT benchmarks demonstrate the efficacy of the proposed approach.
Tasks	Multiple Object Tracking, Object Tracking
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05315v1
PDF	https://arxiv.org/pdf/1907.05315v1.pdf
PWC	https://paperswithcode.com/paper/graph-neural-based-end-to-end-data
Repo	https://github.com/peizhaoli05/EDA_GNN
Framework	pytorch

Leveraging Knowledge Bases And Parallel Annotations For Music Genre Translation


Title	Leveraging Knowledge Bases And Parallel Annotations For Music Genre Translation
Authors	Elena V. Epure, Anis Khlif, Romain Hennequin
Abstract	Prevalent efforts have been put in automatically inferring genres of musical items. Yet, the propose solutions often rely on simplifications and fail to address the diversity and subjectivity of music genres. Accounting for these has, though, many benefits for aligning knowledge sources, integrating data and enriching musical items with tags. Here, we choose a new angle for the genre study by seeking to predict what would be the genres of musical items in a target tag system, knowing the genres assigned to them within source tag systems. We call this a translation task and identify three cases: 1) no common annotated corpus between source and target tag systems exists, 2) such a large corpus exists, 3) only few common annotations exist. We propose the related solutions: a knowledge-based translation modeled as taxonomy mapping, a statistical translation modeled with maximum likelihood logistic regression; a hybrid translation modeled with maximum a posteriori logistic regression with priors given by the knowledge-based translation. During evaluation, the solutions fit well the identified cases and the hybrid translation is systematically the most effective w.r.t. multilabel classification metrics. This is a first attempt to unify genre tag systems by leveraging both representation and interpretation diversity.
Tasks
Published	2019-07-18
URL	https://arxiv.org/abs/1907.08698v2
PDF	https://arxiv.org/pdf/1907.08698v2.pdf
PWC	https://paperswithcode.com/paper/leveraging-knowledge-bases-and-parallel
Repo	https://github.com/deezer/MusicGenreTranslation
Framework	none

Distributed Learning with Random Features


Title	Distributed Learning with Random Features
Authors	Jian Li, Yong Liu, Weiping Wang
Abstract	Distributed learning and random projections are the most common techniques in large scale nonparametric statistical learning. In this paper, we study the generalization properties of kernel ridge regression using both distributed methods and random features. Theoretical analysis shows the combination remarkably reduces computational cost while preserving the optimal generalization accuracy under standard assumptions. In a benign case, $\mathcal{O}(\sqrt{N})$ partitions and $\mathcal{O}(\sqrt{N})$ random features are sufficient to achieve $\mathcal{O}(1/N)$ learning rate, where $N$ is the labeled sample size. Further, we derive more refined results by using additional unlabeled data to enlarge the number of partitions and by generating features in a data-dependent way to reduce the number of random features.
Tasks
Published	2019-06-07
URL	https://arxiv.org/abs/1906.03155v2
PDF	https://arxiv.org/pdf/1906.03155v2.pdf
PWC	https://paperswithcode.com/paper/distributed-learning-with-random-features
Repo	https://github.com/superlj666/Distributed-Learning-with-Random-Features
Framework	none

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks


Title	Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
Authors	Yuanzhi Li, Colin Wei, Tengyu Ma
Abstract	Stochastic gradient descent with a large initial learning rate is a widely adopted method for training modern neural net architectures. Although a small initial learning rate allows for faster training and better test performance initially, the large learning rate achieves better generalization soon after the learning rate is annealed. Towards explaining this phenomenon, we devise a setting in which we can prove that a two layer network trained with large initial learning rate and annealing provably generalizes better than the same network trained with a small learning rate from the start. The key insight in our analysis is that the order of learning different types of patterns is crucial: because the small learning rate model first memorizes low noise, hard-to-fit patterns, it generalizes worse on higher noise, easier-to-fit patterns than its large learning rate counterpart. This concept translates to a larger-scale setting: we demonstrate that one can add a small patch to CIFAR-10 images that is immediately memorizable by a model with small initial learning rate, but ignored by the model with large learning rate until after annealing. Our experiments show that this causes the small learning rate model’s accuracy on unmodified images to suffer, as it relies too much on the patch early on.
Tasks
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04595v1
PDF	https://arxiv.org/pdf/1907.04595v1.pdf
PWC	https://paperswithcode.com/paper/towards-explaining-the-regularization-effect
Repo	https://github.com/cwein3/large-lr-experiments
Framework	pytorch

Adjusting Decision Boundary for Class Imbalanced Learning


Title	Adjusting Decision Boundary for Class Imbalanced Learning
Authors	Byungju Kim, Junmo Kim
Abstract	Training of deep neural networks heavily depends on the data distribution. In particular, the networks easily suffer from class imbalance. The trained networks would recognize the frequent classes better than the infrequent classes. To resolve this problem, existing approaches typically propose novel loss functions to obtain better feature embedding. In this paper, we argue that drawing a better decision boundary is as important as learning better features. Inspired by observations, we investigate how the class imbalance affects the decision boundary and deteriorates the performance. We also investigate the feature distributional discrepancy between training and test time. As a result, we propose a novel, yet simple method for class imbalanced learning. Despite its simplicity, our method shows outstanding performance. In particular, the experimental results show that we can significantly improve the network by scaling the weight vectors, even without additional training process.
Tasks
Published	2019-12-04
URL	https://arxiv.org/abs/1912.01857v2
PDF	https://arxiv.org/pdf/1912.01857v2.pdf
PWC	https://paperswithcode.com/paper/adjusting-decision-boundary-for-class
Repo	https://github.com/feidfoe/AdjustBnd4Imbalance
Framework	pytorch

RawNet: Fast End-to-End Neural Vocoder


Title	RawNet: Fast End-to-End Neural Vocoder
Authors	Yunchao He, Haitong Zhang, Yujun Wang
Abstract	Neural networks based vocoders have recently demonstrated the powerful ability to synthesize high quality speech. These models usually generate samples by conditioning on some spectrum features, such as Mel-spectrum. However, these features are extracted by using speech analysis module including some processing based on the human knowledge. In this work, we proposed RawNet, a truly end-to-end neural vocoder, which use a coder network to learn the higher representation of signal, and an autoregressive voder network to generate speech sample by sample. The coder and voder together act like an auto-encoder network, and could be jointly trained directly on raw waveform without any human-designed features. The experiments on the Copy-Synthesis tasks show that RawNet can achieve the comparative synthesized speech quality with LPCNet, with a smaller model architecture and faster speech generation at the inference step.
Tasks
Published	2019-04-10
URL	http://arxiv.org/abs/1904.05351v1
PDF	http://arxiv.org/pdf/1904.05351v1.pdf
PWC	https://paperswithcode.com/paper/rawnet-fast-end-to-end-neural-vocoder
Repo	https://github.com/candlewill/RawNet
Framework	none