Paper Group AWR 398
Kernel-based Translations of Convolutional Networks. Newton vs the machine: solving the chaotic three-body problem using deep neural networks. Note on the bias and variance of variational inference. Learning to learn via Self-Critique. Validated Variational Inference via Practical Posterior Error Bounds. Signed Graph Attention Networks. Learning 2D …
Kernel-based Translations of Convolutional Networks
Title | Kernel-based Translations of Convolutional Networks |
Authors | Corinne Jones, Vincent Roulet, Zaid Harchaoui |
Abstract | Convolutional Neural Networks, as most artificial neural networks, are commonly viewed as methods different in essence from kernel-based methods. We provide a systematic translation of Convolutional Neural Networks (ConvNets) into their kernel-based counterparts, Convolutional Kernel Networks (CKNs), and demonstrate that this perception is unfounded both formally and empirically. We show that, given a Convolutional Neural Network, we can design a corresponding Convolutional Kernel Network, easily trainable using a new stochastic gradient algorithm based on an accurate gradient computation, that performs on par with its Convolutional Neural Network counterpart. We present experimental results supporting our claims on landmark ConvNet architectures comparing each ConvNet to its CKN counterpart over several parameter settings. |
Tasks | |
Published | 2019-03-19 |
URL | http://arxiv.org/abs/1903.08131v1 |
http://arxiv.org/pdf/1903.08131v1.pdf | |
PWC | https://paperswithcode.com/paper/kernel-based-translations-of-convolutional |
Repo | https://github.com/cjones6/yesweckn |
Framework | pytorch |
Newton vs the machine: solving the chaotic three-body problem using deep neural networks
Title | Newton vs the machine: solving the chaotic three-body problem using deep neural networks |
Authors | Philip G. Breen, Christopher N. Foley, Tjarda Boekholt, Simon Portegies Zwart |
Abstract | Since its formulation by Sir Isaac Newton, the problem of solving the equations of motion for three bodies under their own gravitational force has remained practically unsolved. Currently, the solution for a given initialization can only be found by performing laborious iterative calculations that have unpredictable and potentially infinite computational cost, due to the system’s chaotic nature. We show that an ensemble of solutions obtained using an arbitrarily precise numerical integrator can be used to train a deep artificial neural network (ANN) that, over a bounded time interval, provides accurate solutions at fixed computational cost and up to 100 million times faster than a state-of-the-art solver. Our results provide evidence that, for computationally challenging regions of phase-space, a trained ANN can replace existing numerical solvers, enabling fast and scalable simulations of many-body systems to shed light on outstanding phenomena such as the formation of black-hole binary systems or the origin of the core collapse in dense star clusters. |
Tasks | |
Published | 2019-10-16 |
URL | https://arxiv.org/abs/1910.07291v1 |
https://arxiv.org/pdf/1910.07291v1.pdf | |
PWC | https://paperswithcode.com/paper/newton-vs-the-machine-solving-the-chaotic |
Repo | https://github.com/pgbreen/NVM |
Framework | tf |
Note on the bias and variance of variational inference
Title | Note on the bias and variance of variational inference |
Authors | Chin-Wei Huang, Aaron Courville |
Abstract | In this note, we study the relationship between the variational gap and the variance of the (log) likelihood ratio. We show that the gap can be upper bounded by some form of dispersion measure of the likelihood ratio, which suggests the bias of variational inference can be reduced by making the distribution of the likelihood ratio more concentrated, such as via averaging and variance reduction. |
Tasks | |
Published | 2019-06-09 |
URL | https://arxiv.org/abs/1906.03708v1 |
https://arxiv.org/pdf/1906.03708v1.pdf | |
PWC | https://paperswithcode.com/paper/note-on-the-bias-and-variance-of-variational |
Repo | https://github.com/CW-Huang/HIWAE |
Framework | pytorch |
Learning to learn via Self-Critique
Title | Learning to learn via Self-Critique |
Authors | Antreas Antoniou, Amos Storkey |
Abstract | In few-shot learning, a machine learning system learns from a small set of labelled examples relating to a specific task, such that it can generalize to new examples of the same task. Given the limited availability of labelled examples in such tasks, we wish to make use of all the information we can. Usually a model learns task-specific information from a small training-set (support-set) to predict on an unlabelled validation set (target-set). The target-set contains additional task-specific information which is not utilized by existing few-shot learning methods. Making use of the target-set examples via transductive learning requires approaches beyond the current methods; at inference time, the target-set contains only unlabelled input data-points, and so discriminative learning cannot be used. In this paper, we propose a framework called Self-Critique and Adapt or SCA, which learns to learn a label-free loss function, parameterized as a neural network. A base-model learns on a support-set using existing methods (e.g. stochastic gradient descent combined with the cross-entropy loss), and then is updated for the incoming target-task using the learnt loss function. This label-free loss function is itself optimized such that the learnt model achieves higher generalization performance. Experiments demonstrate that SCA offers substantially reduced error-rates compared to baselines which only adapt on the support-set, and results in state of the art benchmark performance on Mini-ImageNet and Caltech-UCSD Birds 200. |
Tasks | Few-Shot Image Classification, Few-Shot Learning |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1905.10295v6 |
https://arxiv.org/pdf/1905.10295v6.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-learn-by-self-critique |
Repo | https://github.com/AntreasAntoniou/Learning_to_Learn_via_Self-Critique |
Framework | pytorch |
Validated Variational Inference via Practical Posterior Error Bounds
Title | Validated Variational Inference via Practical Posterior Error Bounds |
Authors | Jonathan H. Huggins, Mikołaj Kasprzak, Trevor Campbell, Tamara Broderick |
Abstract | Variational inference has become an increasingly attractive fast alternative to Markov chain Monte Carlo methods for approximate Bayesian inference. However, a major obstacle to the widespread use of variational methods is the lack of post-hoc accuracy measures that are both theoretically justified and computationally efficient. In this paper, we provide rigorous bounds on the error of posterior mean and uncertainty estimates that arise from full-distribution approximations, as in variational inference. Our bounds are widely applicable, as they require only that the approximating and exact posteriors have polynomial moments. Our bounds are also computationally efficient for variational inference because they require only standard values from variational objectives, straightforward analytic calculations, and simple Monte Carlo estimates. We show that our analysis naturally leads to a new and improved workflow for validated variational inference. Finally, we demonstrate the utility of our proposed workflow and error bounds on a robust regression problem and on a real-data example with a widely used multilevel hierarchical model. |
Tasks | Bayesian Inference |
Published | 2019-10-09 |
URL | https://arxiv.org/abs/1910.04102v4 |
https://arxiv.org/pdf/1910.04102v4.pdf | |
PWC | https://paperswithcode.com/paper/practical-posterior-error-bounds-from |
Repo | https://github.com/jhuggins/viabel |
Framework | none |
Signed Graph Attention Networks
Title | Signed Graph Attention Networks |
Authors | Junjie Huang, Huawei Shen, Liang Hou, Xueqi Cheng |
Abstract | Graph or network data is ubiquitous in the real world, including social networks, information networks, traffic networks, biological networks and various technical networks. The non-Euclidean nature of graph data poses the challenge for modeling and analyzing graph data. Recently, Graph Neural Network (GNNs) are proposed as a general and powerful framework to handle tasks on graph data, e.g., node embedding, link prediction and node classification. As a representative implementation of GNNs, Graph Attention Networks (GATs) are successfully applied in a variety of tasks on real datasets. However, GAT is designed to networks with only positive links and fails to handle signed networks which contain both positive and negative links. In this paper, we propose Signed Graph Attention Networks (SiGATs), generalizing GAT to signed networks. SiGAT incorporates graph motifs into GAT to capture two well-known theories in signed network research, i.e., balance theory and status theory. In SiGAT, motifs offer us the flexible structural pattern to aggregate and propagate messages on the signed network to generate node embeddings. We evaluate the proposed SiGAT method by applying it to the signed link prediction task. Experimental results on three real datasets demonstrate that SiGAT outperforms feature-based method, network embedding method and state-of-the-art GNN-based methods like signed graph convolutional network (SGCN). |
Tasks | Link Prediction, Network Embedding, Node Classification |
Published | 2019-06-26 |
URL | https://arxiv.org/abs/1906.10958v3 |
https://arxiv.org/pdf/1906.10958v3.pdf | |
PWC | https://paperswithcode.com/paper/signed-graph-attention-networks |
Repo | https://github.com/huangjunjie95/SiGAT |
Framework | pytorch |
Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language
Title | Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language |
Authors | Songyang Zhang, Houwen Peng, Jianlong Fu, Jiebo Luo |
Abstract | We address the problem of retrieving a specific moment from an untrimmed video by a query sentence. This is a challenging problem because a target moment may take place in relations to other temporal moments in the untrimmed video. Existing methods cannot tackle this challenge well since they consider temporal moments individually and neglect the temporal dependencies. In this paper, we model the temporal relations between video moments by a two-dimensional map, where one dimension indicates the starting time of a moment and the other indicates the end time. This 2D temporal map can cover diverse video moments with different lengths, while representing their adjacent relations. Based on the 2D map, we propose a Temporal Adjacent Network (2D-TAN), a single-shot framework for moment localization. It is capable of encoding the adjacent temporal relation, while learning discriminative features for matching video moments with referring expressions. We evaluate the proposed 2D-TAN on three challenging benchmarks, i.e., Charades-STA, ActivityNet Captions, and TACoS, where our 2D-TAN outperforms the state-of-the-art. |
Tasks | |
Published | 2019-12-08 |
URL | https://arxiv.org/abs/1912.03590v1 |
https://arxiv.org/pdf/1912.03590v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-2d-temporal-adjacent-networks-for |
Repo | https://github.com/researchmm/2D-TAN |
Framework | pytorch |
Few-shot Text Classification with Distributional Signatures
Title | Few-shot Text Classification with Distributional Signatures |
Authors | Yujia Bao, Menghua Wu, Shiyu Chang, Regina Barzilay |
Abstract | In this paper, we explore meta-learning for few-shot text classification. Meta-learning has shown strong performance in computer vision, where low-level patterns are transferable across learning tasks. However, directly applying this approach to text is challenging–lexical features highly informative for one task may be insignificant for another. Thus, rather than learning solely from words, our model also leverages their distributional signatures, which encode pertinent word occurrence patterns. Our model is trained within a meta-learning framework to map these signatures into attention scores, which are then used to weight the lexical representations of words. We demonstrate that our model consistently outperforms prototypical networks learned on lexical knowledge (Snell et al., 2017) in both few-shot text classification and relation classification by a significant margin across six benchmark datasets (20.0% on average in 1-shot classification). |
Tasks | Meta-Learning, Relation Classification, Text Classification |
Published | 2019-08-16 |
URL | https://arxiv.org/abs/1908.06039v3 |
https://arxiv.org/pdf/1908.06039v3.pdf | |
PWC | https://paperswithcode.com/paper/few-shot-text-classification-with |
Repo | https://github.com/YujiaBao/Distributional-Signatures |
Framework | pytorch |
A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research
Title | A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research |
Authors | Maurizio Ferrari Dacrema, Simone Boglio, Paolo Cremonesi, Dietmar Jannach |
Abstract | The design of algorithms that generate personalized ranked item lists is a central topic of research in the field of recommender systems. In the past few years, in particular, approaches based on deep learning (neural) techniques have become dominant in the literature. For all of them, substantial progress over the state-of-the-art is claimed. However, indications exist of certain problems in today’s research practice, e.g., with respect to the choice and optimization of the baselines used for comparison, raising questions about the published claims. In order to obtain a better understanding of the actual progress, we have tried to reproduce recent results in the area of neural recommendation approaches based on collaborative filtering. The worrying outcome of the analysis of these recent works-all were published at prestigious scientific conferences between 2015 and 2018-is that 11 out of the 12 reproducible neural approaches can be outperformed by conceptually simple methods, e.g., based on the nearest-neighbor heuristics. None of the computationally complex neural methods was actually consistently better than already existing learning-based techniques, e.g., using matrix factorization or linear models. In our analysis, we discuss common issues in today’s research practice, which, despite the many papers that are published on the topic, have apparently led the field to a certain level of stagnation. |
Tasks | Recommendation Systems |
Published | 2019-11-18 |
URL | https://arxiv.org/abs/1911.07698v1 |
https://arxiv.org/pdf/1911.07698v1.pdf | |
PWC | https://paperswithcode.com/paper/a-troubling-analysis-of-reproducibility-and |
Repo | https://github.com/MaurizioFD/RecSys2019_DeepLearning_Evaluation |
Framework | none |
Graph Neural Based End-to-end Data Association Framework for Online Multiple-Object Tracking
Title | Graph Neural Based End-to-end Data Association Framework for Online Multiple-Object Tracking |
Authors | Xiaolong Jiang, Peizhao Li, Yanjing Li, Xiantong Zhen |
Abstract | In this work, we present an end-to-end framework to settle data association in online Multiple-Object Tracking (MOT). Given detection responses, we formulate the frame-by-frame data association as Maximum Weighted Bipartite Matching problem, whose solution is learned using a neural network. The network incorporates an affinity learning module, wherein both appearance and motion cues are investigated to encode object feature representation and compute pairwise affinities. Employing the computed affinities as edge weights, the following matching problem on a bipartite graph is resolved by the optimization module, which leverages a graph neural network to adapt with the varying cardinalities of the association problem and solve the combinatorial hardness with favorable scalability and compatibility. To facilitate effective training of the proposed tracking network, we design a multi-level matrix loss in conjunction with the assembled supervision methodology. Being trained end-to-end, all modules in the tracker can co-adapt and co-operate collaboratively, resulting in improved model adaptiveness and less parameter-tuning efforts. Experiment results on the MOT benchmarks demonstrate the efficacy of the proposed approach. |
Tasks | Multiple Object Tracking, Object Tracking |
Published | 2019-07-11 |
URL | https://arxiv.org/abs/1907.05315v1 |
https://arxiv.org/pdf/1907.05315v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-neural-based-end-to-end-data |
Repo | https://github.com/peizhaoli05/EDA_GNN |
Framework | pytorch |
Leveraging Knowledge Bases And Parallel Annotations For Music Genre Translation
Title | Leveraging Knowledge Bases And Parallel Annotations For Music Genre Translation |
Authors | Elena V. Epure, Anis Khlif, Romain Hennequin |
Abstract | Prevalent efforts have been put in automatically inferring genres of musical items. Yet, the propose solutions often rely on simplifications and fail to address the diversity and subjectivity of music genres. Accounting for these has, though, many benefits for aligning knowledge sources, integrating data and enriching musical items with tags. Here, we choose a new angle for the genre study by seeking to predict what would be the genres of musical items in a target tag system, knowing the genres assigned to them within source tag systems. We call this a translation task and identify three cases: 1) no common annotated corpus between source and target tag systems exists, 2) such a large corpus exists, 3) only few common annotations exist. We propose the related solutions: a knowledge-based translation modeled as taxonomy mapping, a statistical translation modeled with maximum likelihood logistic regression; a hybrid translation modeled with maximum a posteriori logistic regression with priors given by the knowledge-based translation. During evaluation, the solutions fit well the identified cases and the hybrid translation is systematically the most effective w.r.t. multilabel classification metrics. This is a first attempt to unify genre tag systems by leveraging both representation and interpretation diversity. |
Tasks | |
Published | 2019-07-18 |
URL | https://arxiv.org/abs/1907.08698v2 |
https://arxiv.org/pdf/1907.08698v2.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-knowledge-bases-and-parallel |
Repo | https://github.com/deezer/MusicGenreTranslation |
Framework | none |
Distributed Learning with Random Features
Title | Distributed Learning with Random Features |
Authors | Jian Li, Yong Liu, Weiping Wang |
Abstract | Distributed learning and random projections are the most common techniques in large scale nonparametric statistical learning. In this paper, we study the generalization properties of kernel ridge regression using both distributed methods and random features. Theoretical analysis shows the combination remarkably reduces computational cost while preserving the optimal generalization accuracy under standard assumptions. In a benign case, $\mathcal{O}(\sqrt{N})$ partitions and $\mathcal{O}(\sqrt{N})$ random features are sufficient to achieve $\mathcal{O}(1/N)$ learning rate, where $N$ is the labeled sample size. Further, we derive more refined results by using additional unlabeled data to enlarge the number of partitions and by generating features in a data-dependent way to reduce the number of random features. |
Tasks | |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.03155v2 |
https://arxiv.org/pdf/1906.03155v2.pdf | |
PWC | https://paperswithcode.com/paper/distributed-learning-with-random-features |
Repo | https://github.com/superlj666/Distributed-Learning-with-Random-Features |
Framework | none |
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
Title | Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks |
Authors | Yuanzhi Li, Colin Wei, Tengyu Ma |
Abstract | Stochastic gradient descent with a large initial learning rate is a widely adopted method for training modern neural net architectures. Although a small initial learning rate allows for faster training and better test performance initially, the large learning rate achieves better generalization soon after the learning rate is annealed. Towards explaining this phenomenon, we devise a setting in which we can prove that a two layer network trained with large initial learning rate and annealing provably generalizes better than the same network trained with a small learning rate from the start. The key insight in our analysis is that the order of learning different types of patterns is crucial: because the small learning rate model first memorizes low noise, hard-to-fit patterns, it generalizes worse on higher noise, easier-to-fit patterns than its large learning rate counterpart. This concept translates to a larger-scale setting: we demonstrate that one can add a small patch to CIFAR-10 images that is immediately memorizable by a model with small initial learning rate, but ignored by the model with large learning rate until after annealing. Our experiments show that this causes the small learning rate model’s accuracy on unmodified images to suffer, as it relies too much on the patch early on. |
Tasks | |
Published | 2019-07-10 |
URL | https://arxiv.org/abs/1907.04595v1 |
https://arxiv.org/pdf/1907.04595v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-explaining-the-regularization-effect |
Repo | https://github.com/cwein3/large-lr-experiments |
Framework | pytorch |
Adjusting Decision Boundary for Class Imbalanced Learning
Title | Adjusting Decision Boundary for Class Imbalanced Learning |
Authors | Byungju Kim, Junmo Kim |
Abstract | Training of deep neural networks heavily depends on the data distribution. In particular, the networks easily suffer from class imbalance. The trained networks would recognize the frequent classes better than the infrequent classes. To resolve this problem, existing approaches typically propose novel loss functions to obtain better feature embedding. In this paper, we argue that drawing a better decision boundary is as important as learning better features. Inspired by observations, we investigate how the class imbalance affects the decision boundary and deteriorates the performance. We also investigate the feature distributional discrepancy between training and test time. As a result, we propose a novel, yet simple method for class imbalanced learning. Despite its simplicity, our method shows outstanding performance. In particular, the experimental results show that we can significantly improve the network by scaling the weight vectors, even without additional training process. |
Tasks | |
Published | 2019-12-04 |
URL | https://arxiv.org/abs/1912.01857v2 |
https://arxiv.org/pdf/1912.01857v2.pdf | |
PWC | https://paperswithcode.com/paper/adjusting-decision-boundary-for-class |
Repo | https://github.com/feidfoe/AdjustBnd4Imbalance |
Framework | pytorch |
RawNet: Fast End-to-End Neural Vocoder
Title | RawNet: Fast End-to-End Neural Vocoder |
Authors | Yunchao He, Haitong Zhang, Yujun Wang |
Abstract | Neural networks based vocoders have recently demonstrated the powerful ability to synthesize high quality speech. These models usually generate samples by conditioning on some spectrum features, such as Mel-spectrum. However, these features are extracted by using speech analysis module including some processing based on the human knowledge. In this work, we proposed RawNet, a truly end-to-end neural vocoder, which use a coder network to learn the higher representation of signal, and an autoregressive voder network to generate speech sample by sample. The coder and voder together act like an auto-encoder network, and could be jointly trained directly on raw waveform without any human-designed features. The experiments on the Copy-Synthesis tasks show that RawNet can achieve the comparative synthesized speech quality with LPCNet, with a smaller model architecture and faster speech generation at the inference step. |
Tasks | |
Published | 2019-04-10 |
URL | http://arxiv.org/abs/1904.05351v1 |
http://arxiv.org/pdf/1904.05351v1.pdf | |
PWC | https://paperswithcode.com/paper/rawnet-fast-end-to-end-neural-vocoder |
Repo | https://github.com/candlewill/RawNet |
Framework | none |