Paper Group NANR 129
At Your Fingertips: Automatic Piano Fingering Detection. Rigging the Lottery: Making All Tickets Winners. A Uniform Generalization Error Bound for Generative Adversarial Networks. Set Functions for Time Series. The fairness-accuracy landscape of neural classifiers. A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation. Phy …
At Your Fingertips: Automatic Piano Fingering Detection
Title | At Your Fingertips: Automatic Piano Fingering Detection |
Authors | Anonymous |
Abstract | Automatic Piano Fingering is a hard task which computers can learn using data. As data collection is hard and expensive, we propose to automate this process by automatically extracting fingerings from public videos and MIDI files, using computer-vision techniques. Running this process on 90 videos results in the largest dataset for piano fingering with more than 150K notes. We show that when running a previously proposed model for automatic piano fingering on our dataset and then fine-tuning it on manually labeled piano fingering data, we achieve state-of-the-art results. In addition to the fingering extraction method, we also introduce a novel method for transferring deep-learning computer-vision models to work on out-of-domain data, by fine-tuning it on out-of-domain augmentation proposed by a Generative Adversarial Network (GAN). For demonstration, we anonymously release a visualization of the output of our process for a single video on https://youtu.be/Gfs1UWQhr5Q |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1MOqeHYvB |
https://openreview.net/pdf?id=H1MOqeHYvB | |
PWC | https://paperswithcode.com/paper/at-your-fingertips-automatic-piano-fingering |
Repo | |
Framework | |
Rigging the Lottery: Making All Tickets Winners
Title | Rigging the Lottery: Making All Tickets Winners |
Authors | Anonymous |
Abstract | Sparse neural networks have been shown to yield computationally efficient networks with improved inference times. There is a large body of work on training dense networks to yield sparse networks for inference (Molchanov et al., 2017;Zhu & Gupta, 2018; Louizos et al., 2017; Li et al., 2016; Guo et al., 2016). This limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods. Our method updates the topology of the network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires less floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. We demonstrate state-of-the-art sparse training results with ResNet-50, MobileNet v1 and MobileNet v2 on the ImageNet-2012 dataset. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryg7vA4tPB |
https://openreview.net/pdf?id=ryg7vA4tPB | |
PWC | https://paperswithcode.com/paper/rigging-the-lottery-making-all-tickets |
Repo | |
Framework | |
A Uniform Generalization Error Bound for Generative Adversarial Networks
Title | A Uniform Generalization Error Bound for Generative Adversarial Networks |
Authors | Anonymous |
Abstract | This paper focuses on the theoretical investigation of unsupervised generalization theory of generative adversarial networks (GANs). We first formulate a more reasonable definition of general error and generalization bounds for GANs. On top of that, we establish a bound for generalization error with a fixed generator in a general weight normalization context. Then, we obtain a width-independent bound by applying $\ell_{p,q}$ and spectral norm weight normalization. To better understand the unsupervised model, GANs, we establish the generalization bound, which uniformly holds with respect to the choice of generators. Hence, we can explain how the complexity of discriminators and generators contribute to generalization error. For $\ell_{p,q}$ and spectral weight normalization, we provide explicit guidance on how to design parameters to train robust generators. Our numerical simulations also verify that our generalization bound is reasonable. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Skek-TVYvr |
https://openreview.net/pdf?id=Skek-TVYvr | |
PWC | https://paperswithcode.com/paper/a-uniform-generalization-error-bound-for |
Repo | |
Framework | |
Set Functions for Time Series
Title | Set Functions for Time Series |
Authors | Anonymous |
Abstract | Despite the eminent successes of deep neural networks, many architectures are often hard to transfer to irregularly-sampled and asynchronous time series that occur in many real-world datasets, such as healthcare applications. This paper proposes a novel framework for classifying irregularly sampled time series with unaligned measurements, focusing on high scalability and data efficiency. Our method SeFT (Set Functions for Time Series) is based on recent advances in differentiable set function learning, extremely parallelizable, and scales well to very large datasets and online monitoring scenarios. We extensively compare our method to competitors on multiple healthcare time series datasets and show that it performs competitively whilst significantly reducing runtime. |
Tasks | Time Series |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ByxCrerKvS |
https://openreview.net/pdf?id=ByxCrerKvS | |
PWC | https://paperswithcode.com/paper/set-functions-for-time-series |
Repo | |
Framework | |
The fairness-accuracy landscape of neural classifiers
Title | The fairness-accuracy landscape of neural classifiers |
Authors | Anonymous |
Abstract | That machine learning algorithms can demonstrate bias is well-documented by now. This work confronts the challenge of bias mitigation in feedforward fully-connected neural nets from the lens of causal inference and multiobjective optimisation. Regarding the former, a new causal notion of fairness is introduced that is particularly suited to giving a nuanced treatment of datasets collected under unfair practices. In particular, special attention is paid to subjects whose covariates could appear with substantial probability in either value of the sensitive attribute. Next, recognising that fairness and accuracy are competing objectives, the proposed methodology uses techniques from multiobjective optimisation to ascertain the fairness-accuracy landscape of a neural net classifier. Experimental results suggest that the proposed method produces neural net classifiers that distribute evenly across the Pareto front of the fairness-accuracy space and is more efficient at finding non-dominated points than an adversarial approach. |
Tasks | Causal Inference |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1e3g1rtwB |
https://openreview.net/pdf?id=S1e3g1rtwB | |
PWC | https://paperswithcode.com/paper/the-fairness-accuracy-landscape-of-neural |
Repo | |
Framework | |
A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation
Title | A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation |
Authors | Anonymous |
Abstract | Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtually unknown. In this paper, we present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decision process and the action-value function is approximated by a deep ReLU neural network. We prove that neural Q-learning finds the optimal policy with $O(1/T)$ convergence rate if the neural function approximator is sufficiently overparameterized, where $T$ is the number of iterations. To our best knowledge, our result is the first finite-time analysis of neural Q-learning under non-i.i.d. data assumption. |
Tasks | Q-Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1xxAJHFwS |
https://openreview.net/pdf?id=B1xxAJHFwS | |
PWC | https://paperswithcode.com/paper/a-finite-time-analysis-of-q-learning-with |
Repo | |
Framework | |
Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics
Title | Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics |
Authors | Sungyong Seo*, Chuizheng Meng*, Yan Liu |
Abstract | Sparsely available data points cause a numerical error on finite differences which hinder to modeling the dynamics of physical systems. The discretization error becomes even larger when the sparse data are irregularly distributed so that the data defined on an unstructured grid, making it hard to build deep learning models to handle physics-governing observations on the unstructured grid. In this paper, we propose a novel architecture named Physics-aware Difference Graph Networks (PA-DGN) that exploits neighboring information to learn finite differences inspired by physics equations. PA-DGN further leverages data-driven end-to-end learning to discover underlying dynamical relations between the spatial and temporal differences in given observations. We demonstrate the superiority of PA-DGN in the approximation of directional derivatives and the prediction of graph signals on the synthetic data and the real-world climate observations from weather stations. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1gelyrtwH |
https://openreview.net/pdf?id=r1gelyrtwH | |
PWC | https://paperswithcode.com/paper/physics-aware-difference-graph-networks-for |
Repo | |
Framework | |
Learning to Prove Theorems by Learning to Generate Theorems
Title | Learning to Prove Theorems by Learning to Generate Theorems |
Authors | Anonymous |
Abstract | We consider the task of automated theorem proving, a key AI task. Deep learning has shown promise for training theorem provers, but there are limited human-written theorems and proofs available for supervised learning. To address this limitation, we propose to learn a neural generator that automatically synthesizes theorems and proofs for the purpose of training a theorem prover. Experiments on real-world tasks demonstrate that synthetic data from our approach significantly improves the theorem prover and advances the state of the art of automated theorem proving in Metamath. |
Tasks | Automated Theorem Proving |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJxiqxSYPB |
https://openreview.net/pdf?id=BJxiqxSYPB | |
PWC | https://paperswithcode.com/paper/learning-to-prove-theorems-by-learning-to |
Repo | |
Framework | |
Improving Multi-Manifold GANs with a Learned Noise Prior
Title | Improving Multi-Manifold GANs with a Learned Noise Prior |
Authors | Anonymous |
Abstract | Generative adversarial networks (GANs) learn to map samples from a noise distribution to a chosen data distribution. Recent work has demonstrated that GANs are consequently sensitive to, and limited by, the shape of the noise distribution. For example, a single generator struggles to map continuous noise (e.g. a uniform distribution) to discontinuous output (e.g. separate Gaussians) or complex output (e.g. intersecting parabolas). We address this problem by learning to generate from multiple models such that the generator’s output is actually the combination of several distinct networks. We contribute a novel formulation of multi-generator models where we learn a prior over the generators conditioned on the noise, parameterized by a neural network. Thus, this network not only learns the optimal rate to sample from each generator but also optimally shapes the noise received by each generator. The resulting Noise Prior GAN (NPGAN) achieves expressivity and flexibility that surpasses both single generator models and previous multi-generator models. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJlISCEKvB |
https://openreview.net/pdf?id=HJlISCEKvB | |
PWC | https://paperswithcode.com/paper/improving-multi-manifold-gans-with-a-learned |
Repo | |
Framework | |
Four Things Everyone Should Know to Improve Batch Normalization
Title | Four Things Everyone Should Know to Improve Batch Normalization |
Authors | Anonymous |
Abstract | A key component of most neural network architectures is the use of normalization layers, such as Batch Normalization. Despite its common use and large utility in optimizing deep architectures that are otherwise intractable, it has been challenging both to generically improve upon Batch Normalization and to understand the circumstances that lend themselves to other enhancements. In this paper, we identify four improvements to the generic form of Batch Normalization and the circumstances under which they work, yielding performance gains across all batch sizes while requiring no additional computation during training. These contributions include proposing a method for reasoning about the current example in inference normalization statistics, fixing a training vs. inference discrepancy; recognizing and validating the powerful regularization effect of Ghost Batch Normalization for small and medium batch sizes; examining the effect of weight decay regularization on the scaling and shifting parameters gamma and beta; and identifying a new normalization algorithm for very small batch sizes by combining the strengths of Batch and Group Normalization. We validate our results empirically on five datasets: CIFAR-100, SVHN, Caltech-256, Oxford Flowers102, and ImageNet. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJx8HANFDH |
https://openreview.net/pdf?id=HJx8HANFDH | |
PWC | https://paperswithcode.com/paper/four-things-everyone-should-know-to-improve-1 |
Repo | |
Framework | |
One-way prototypical networks
Title | One-way prototypical networks |
Authors | Anonymous |
Abstract | Few-shot models have become a popular topic of research in the past years. They offer the possibility to determine class belongings for unseen examples using just a handful of examples for each class. Such models are trained on a wide range of classes and their respective examples, learning a decision metric in the process. Types of few-shot models include matching networks and prototypical networks. We show a new way of training prototypical few-shot models for just a single class. These models have the ability to predict the likelihood of an unseen query belonging to a group of examples without any given counterexamples. The difficulty here lies in the fact that no relative distance to other classes can be calculated via softmax. We solve this problem by introducing a “null class” centered around zero, and enforcing centering with batch normalization. Trained on the commonly used Omniglot data set, we obtain a classification accuracy of .98 on the matched test set, and of .8 on unmatched MNIST data. On the more complex MiniImageNet data set, test accuracy is .8. In addition, we propose a novel Gaussian layer for distance calculation in a prototypical network, which takes the support examples’ distribution rather than just their centroid into account. This extension shows promising results when a higher number of support examples is available. |
Tasks | Omniglot |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJgWbpEtPr |
https://openreview.net/pdf?id=BJgWbpEtPr | |
PWC | https://paperswithcode.com/paper/one-way-prototypical-networks |
Repo | |
Framework | |
Supervised learning with incomplete data via sparse representations
Title | Supervised learning with incomplete data via sparse representations |
Authors | Anonymous |
Abstract | This paper addresses the problem of training a classifier on incomplete data and its application to a complete or incomplete test dataset. A supervised learning method is developed to train a general classifier, such as a logistic regression or a deep neural network, using only a limited number of observed entries, assuming sparse representations of data vectors on an unknown dictionary. The proposed method simultaneously learns the classifier, the dictionary and the corresponding sparse representations of each input data sample. A theoretical analysis is also provided comparing this method with the standard imputation approach, which consists on performing data completion followed by training the classifier based on their reconstructions. The limitations of this last “sequential” approach are identified, and a description of how the proposed new “simultaneous” method can overcome the problem of indiscernible observations is provided. Additionally, it is shown that, if it is possible to train a classifier on incomplete observations so that its reconstructions are well separated by a hyperplane, then the same classifier also correctly separates the original (unobserved) data samples. Extensive simulation results are presented on synthetic and well-known reference datasets that demonstrate the effectiveness of the proposed method compared to traditional data imputation methods. |
Tasks | Imputation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Syx_f6EFPr |
https://openreview.net/pdf?id=Syx_f6EFPr | |
PWC | https://paperswithcode.com/paper/supervised-learning-with-incomplete-data-via |
Repo | |
Framework | |
Robust Natural Language Representation Learning for Natural Language Inference by Projecting Superficial Words out
Title | Robust Natural Language Representation Learning for Natural Language Inference by Projecting Superficial Words out |
Authors | Anonymous |
Abstract | In natural language inference, the semantics of some words do not affect the inference. Such information is considered superficial and brings overfitting. How can we represent and discard such superficial information? In this paper, we use first order logic (FOL) - a classic technique from meaning representation language – to explain what information is superficial for a given sentence pair. Such explanation also suggests two inductive biases according to its properties. We proposed a neural network-based approach that utilizes the two inductive biases. We obtain substantial improvements over extensive experiments. |
Tasks | Natural Language Inference, Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HkxQzlHFPr |
https://openreview.net/pdf?id=HkxQzlHFPr | |
PWC | https://paperswithcode.com/paper/robust-natural-language-representation |
Repo | |
Framework | |
Learning Calibratable Policies using Programmatic Style-Consistency
Title | Learning Calibratable Policies using Programmatic Style-Consistency |
Authors | Anonymous |
Abstract | We study the important and challenging problem of controllable generation of long-term sequential behaviors. Solutions to this problem would impact many applications, such as calibrating behaviors of AI agents in games or predicting player trajectories in sports. In contrast to the well-studied areas of controllable generation of images, text, and speech, there are significant challenges that are unique to or exacerbated by generating long-term behaviors: how should we specify the factors of variation to control, and how can we ensure that the generated temporal behavior faithfully demonstrates diverse styles? In this paper, we leverage large amounts of raw behavioral data to learn policies that can be calibrated to generate a diverse range of behavior styles (e.g., aggressive versus passive play in sports). Inspired by recent work on leveraging programmatic labeling functions, we present a novel framework that combines imitation learning with data programming to learn style-calibratable policies. Our primary technical contribution is a formal notion of style-consistency as a learning objective, and its integration with conventional imitation learning approaches. We evaluate our framework using demonstrations from professional basketball players and agents in the MuJoCo physics environment, and show that our learned policies can be accurately calibrated to generate interesting behavior styles in both domains. |
Tasks | Imitation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Byx5R0NKPr |
https://openreview.net/pdf?id=Byx5R0NKPr | |
PWC | https://paperswithcode.com/paper/learning-calibratable-policies-using-1 |
Repo | |
Framework | |
Neural Video Encoding
Title | Neural Video Encoding |
Authors | Anonymous |
Abstract | Deep neural networks have had unprecedented success in computer vision, natural language processing, and speech largely due to the ability to search for suitable task algorithms via differentiable programming. In this paper, we borrow ideas from Kolmogorov complexity theory and normalizing flows to explore the possibilities of finding arbitrary algorithms that represent data. In particular, algorithms which encode sequences of video image frames. Ultimately, we demonstrate neural video encoded using convolutional neural networks to transform autoregressive noise processes and show that this method has surprising cryptographic analogs for information security. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Byeq_xHtwS |
https://openreview.net/pdf?id=Byeq_xHtwS | |
PWC | https://paperswithcode.com/paper/neural-video-encoding |
Repo | |
Framework | |