April 1, 2020

2696 words 13 mins read

Paper Group NANR 129

At Your Fingertips: Automatic Piano Fingering Detection. Rigging the Lottery: Making All Tickets Winners. A Uniform Generalization Error Bound for Generative Adversarial Networks. Set Functions for Time Series. The fairness-accuracy landscape of neural classifiers. A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation. Phy …

At Your Fingertips: Automatic Piano Fingering Detection


Title	At Your Fingertips: Automatic Piano Fingering Detection
Authors	Anonymous
Abstract	Automatic Piano Fingering is a hard task which computers can learn using data. As data collection is hard and expensive, we propose to automate this process by automatically extracting fingerings from public videos and MIDI files, using computer-vision techniques. Running this process on 90 videos results in the largest dataset for piano fingering with more than 150K notes. We show that when running a previously proposed model for automatic piano fingering on our dataset and then fine-tuning it on manually labeled piano fingering data, we achieve state-of-the-art results. In addition to the fingering extraction method, we also introduce a novel method for transferring deep-learning computer-vision models to work on out-of-domain data, by fine-tuning it on out-of-domain augmentation proposed by a Generative Adversarial Network (GAN). For demonstration, we anonymously release a visualization of the output of our process for a single video on https://youtu.be/Gfs1UWQhr5Q
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1MOqeHYvB
PDF	https://openreview.net/pdf?id=H1MOqeHYvB
PWC	https://paperswithcode.com/paper/at-your-fingertips-automatic-piano-fingering
Repo
Framework

Rigging the Lottery: Making All Tickets Winners


Title	Rigging the Lottery: Making All Tickets Winners
Authors	Anonymous
Abstract	Sparse neural networks have been shown to yield computationally efficient networks with improved inference times. There is a large body of work on training dense networks to yield sparse networks for inference (Molchanov et al., 2017;Zhu & Gupta, 2018; Louizos et al., 2017; Li et al., 2016; Guo et al., 2016). This limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods. Our method updates the topology of the network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires less floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. We demonstrate state-of-the-art sparse training results with ResNet-50, MobileNet v1 and MobileNet v2 on the ImageNet-2012 dataset. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=ryg7vA4tPB
PDF	https://openreview.net/pdf?id=ryg7vA4tPB
PWC	https://paperswithcode.com/paper/rigging-the-lottery-making-all-tickets
Repo
Framework

A Uniform Generalization Error Bound for Generative Adversarial Networks


Title	A Uniform Generalization Error Bound for Generative Adversarial Networks
Authors	Anonymous
Abstract	This paper focuses on the theoretical investigation of unsupervised generalization theory of generative adversarial networks (GANs). We first formulate a more reasonable definition of general error and generalization bounds for GANs. On top of that, we establish a bound for generalization error with a fixed generator in a general weight normalization context. Then, we obtain a width-independent bound by applying $\ell_{p,q}$ and spectral norm weight normalization. To better understand the unsupervised model, GANs, we establish the generalization bound, which uniformly holds with respect to the choice of generators. Hence, we can explain how the complexity of discriminators and generators contribute to generalization error. For $\ell_{p,q}$ and spectral weight normalization, we provide explicit guidance on how to design parameters to train robust generators. Our numerical simulations also verify that our generalization bound is reasonable.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Skek-TVYvr
PDF	https://openreview.net/pdf?id=Skek-TVYvr
PWC	https://paperswithcode.com/paper/a-uniform-generalization-error-bound-for
Repo
Framework

Set Functions for Time Series


Title	Set Functions for Time Series
Authors	Anonymous
Abstract	Despite the eminent successes of deep neural networks, many architectures are often hard to transfer to irregularly-sampled and asynchronous time series that occur in many real-world datasets, such as healthcare applications. This paper proposes a novel framework for classifying irregularly sampled time series with unaligned measurements, focusing on high scalability and data efficiency. Our method SeFT (Set Functions for Time Series) is based on recent advances in differentiable set function learning, extremely parallelizable, and scales well to very large datasets and online monitoring scenarios. We extensively compare our method to competitors on multiple healthcare time series datasets and show that it performs competitively whilst significantly reducing runtime.
Tasks	Time Series
Published	2020-01-01
URL	https://openreview.net/forum?id=ByxCrerKvS
PDF	https://openreview.net/pdf?id=ByxCrerKvS
PWC	https://paperswithcode.com/paper/set-functions-for-time-series
Repo
Framework

The fairness-accuracy landscape of neural classifiers


Title	The fairness-accuracy landscape of neural classifiers
Authors	Anonymous
Abstract	That machine learning algorithms can demonstrate bias is well-documented by now. This work confronts the challenge of bias mitigation in feedforward fully-connected neural nets from the lens of causal inference and multiobjective optimisation. Regarding the former, a new causal notion of fairness is introduced that is particularly suited to giving a nuanced treatment of datasets collected under unfair practices. In particular, special attention is paid to subjects whose covariates could appear with substantial probability in either value of the sensitive attribute. Next, recognising that fairness and accuracy are competing objectives, the proposed methodology uses techniques from multiobjective optimisation to ascertain the fairness-accuracy landscape of a neural net classifier. Experimental results suggest that the proposed method produces neural net classifiers that distribute evenly across the Pareto front of the fairness-accuracy space and is more efficient at finding non-dominated points than an adversarial approach.
Tasks	Causal Inference
Published	2020-01-01
URL	https://openreview.net/forum?id=S1e3g1rtwB
PDF	https://openreview.net/pdf?id=S1e3g1rtwB
PWC	https://paperswithcode.com/paper/the-fairness-accuracy-landscape-of-neural
Repo
Framework

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation


Title	A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation
Authors	Anonymous
Abstract	Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtually unknown. In this paper, we present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decision process and the action-value function is approximated by a deep ReLU neural network. We prove that neural Q-learning finds the optimal policy with $O(1/T)$ convergence rate if the neural function approximator is sufficiently overparameterized, where $T$ is the number of iterations. To our best knowledge, our result is the first finite-time analysis of neural Q-learning under non-i.i.d. data assumption.
Tasks	Q-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=B1xxAJHFwS
PDF	https://openreview.net/pdf?id=B1xxAJHFwS
PWC	https://paperswithcode.com/paper/a-finite-time-analysis-of-q-learning-with
Repo
Framework

Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics


Title	Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics
Authors	Sungyong Seo, Chuizheng Meng, Yan Liu
Abstract	Sparsely available data points cause a numerical error on finite differences which hinder to modeling the dynamics of physical systems. The discretization error becomes even larger when the sparse data are irregularly distributed so that the data defined on an unstructured grid, making it hard to build deep learning models to handle physics-governing observations on the unstructured grid. In this paper, we propose a novel architecture named Physics-aware Difference Graph Networks (PA-DGN) that exploits neighboring information to learn finite differences inspired by physics equations. PA-DGN further leverages data-driven end-to-end learning to discover underlying dynamical relations between the spatial and temporal differences in given observations. We demonstrate the superiority of PA-DGN in the approximation of directional derivatives and the prediction of graph signals on the synthetic data and the real-world climate observations from weather stations.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=r1gelyrtwH
PDF	https://openreview.net/pdf?id=r1gelyrtwH
PWC	https://paperswithcode.com/paper/physics-aware-difference-graph-networks-for
Repo
Framework

Learning to Prove Theorems by Learning to Generate Theorems


Title	Learning to Prove Theorems by Learning to Generate Theorems
Authors	Anonymous
Abstract	We consider the task of automated theorem proving, a key AI task. Deep learning has shown promise for training theorem provers, but there are limited human-written theorems and proofs available for supervised learning. To address this limitation, we propose to learn a neural generator that automatically synthesizes theorems and proofs for the purpose of training a theorem prover. Experiments on real-world tasks demonstrate that synthetic data from our approach significantly improves the theorem prover and advances the state of the art of automated theorem proving in Metamath.
Tasks	Automated Theorem Proving
Published	2020-01-01
URL	https://openreview.net/forum?id=BJxiqxSYPB
PDF	https://openreview.net/pdf?id=BJxiqxSYPB
PWC	https://paperswithcode.com/paper/learning-to-prove-theorems-by-learning-to
Repo
Framework

Improving Multi-Manifold GANs with a Learned Noise Prior


Title	Improving Multi-Manifold GANs with a Learned Noise Prior
Authors	Anonymous
Abstract	Generative adversarial networks (GANs) learn to map samples from a noise distribution to a chosen data distribution. Recent work has demonstrated that GANs are consequently sensitive to, and limited by, the shape of the noise distribution. For example, a single generator struggles to map continuous noise (e.g. a uniform distribution) to discontinuous output (e.g. separate Gaussians) or complex output (e.g. intersecting parabolas). We address this problem by learning to generate from multiple models such that the generator’s output is actually the combination of several distinct networks. We contribute a novel formulation of multi-generator models where we learn a prior over the generators conditioned on the noise, parameterized by a neural network. Thus, this network not only learns the optimal rate to sample from each generator but also optimally shapes the noise received by each generator. The resulting Noise Prior GAN (NPGAN) achieves expressivity and flexibility that surpasses both single generator models and previous multi-generator models.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HJlISCEKvB
PDF	https://openreview.net/pdf?id=HJlISCEKvB
PWC	https://paperswithcode.com/paper/improving-multi-manifold-gans-with-a-learned
Repo
Framework

Four Things Everyone Should Know to Improve Batch Normalization


Title	Four Things Everyone Should Know to Improve Batch Normalization
Authors	Anonymous
Abstract	A key component of most neural network architectures is the use of normalization layers, such as Batch Normalization. Despite its common use and large utility in optimizing deep architectures that are otherwise intractable, it has been challenging both to generically improve upon Batch Normalization and to understand the circumstances that lend themselves to other enhancements. In this paper, we identify four improvements to the generic form of Batch Normalization and the circumstances under which they work, yielding performance gains across all batch sizes while requiring no additional computation during training. These contributions include proposing a method for reasoning about the current example in inference normalization statistics, fixing a training vs. inference discrepancy; recognizing and validating the powerful regularization effect of Ghost Batch Normalization for small and medium batch sizes; examining the effect of weight decay regularization on the scaling and shifting parameters gamma and beta; and identifying a new normalization algorithm for very small batch sizes by combining the strengths of Batch and Group Normalization. We validate our results empirically on five datasets: CIFAR-100, SVHN, Caltech-256, Oxford Flowers102, and ImageNet.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HJx8HANFDH
PDF	https://openreview.net/pdf?id=HJx8HANFDH
PWC	https://paperswithcode.com/paper/four-things-everyone-should-know-to-improve-1
Repo
Framework

One-way prototypical networks


Title	One-way prototypical networks
Authors	Anonymous
Abstract	Few-shot models have become a popular topic of research in the past years. They offer the possibility to determine class belongings for unseen examples using just a handful of examples for each class. Such models are trained on a wide range of classes and their respective examples, learning a decision metric in the process. Types of few-shot models include matching networks and prototypical networks. We show a new way of training prototypical few-shot models for just a single class. These models have the ability to predict the likelihood of an unseen query belonging to a group of examples without any given counterexamples. The difficulty here lies in the fact that no relative distance to other classes can be calculated via softmax. We solve this problem by introducing a “null class” centered around zero, and enforcing centering with batch normalization. Trained on the commonly used Omniglot data set, we obtain a classification accuracy of .98 on the matched test set, and of .8 on unmatched MNIST data. On the more complex MiniImageNet data set, test accuracy is .8. In addition, we propose a novel Gaussian layer for distance calculation in a prototypical network, which takes the support examples’ distribution rather than just their centroid into account. This extension shows promising results when a higher number of support examples is available.
Tasks	Omniglot
Published	2020-01-01
URL	https://openreview.net/forum?id=BJgWbpEtPr
PDF	https://openreview.net/pdf?id=BJgWbpEtPr
PWC	https://paperswithcode.com/paper/one-way-prototypical-networks
Repo
Framework

Supervised learning with incomplete data via sparse representations


Title	Supervised learning with incomplete data via sparse representations
Authors	Anonymous
Abstract	This paper addresses the problem of training a classifier on incomplete data and its application to a complete or incomplete test dataset. A supervised learning method is developed to train a general classifier, such as a logistic regression or a deep neural network, using only a limited number of observed entries, assuming sparse representations of data vectors on an unknown dictionary. The proposed method simultaneously learns the classifier, the dictionary and the corresponding sparse representations of each input data sample. A theoretical analysis is also provided comparing this method with the standard imputation approach, which consists on performing data completion followed by training the classifier based on their reconstructions. The limitations of this last “sequential” approach are identified, and a description of how the proposed new “simultaneous” method can overcome the problem of indiscernible observations is provided. Additionally, it is shown that, if it is possible to train a classifier on incomplete observations so that its reconstructions are well separated by a hyperplane, then the same classifier also correctly separates the original (unobserved) data samples. Extensive simulation results are presented on synthetic and well-known reference datasets that demonstrate the effectiveness of the proposed method compared to traditional data imputation methods.
Tasks	Imputation
Published	2020-01-01
URL	https://openreview.net/forum?id=Syx_f6EFPr
PDF	https://openreview.net/pdf?id=Syx_f6EFPr
PWC	https://paperswithcode.com/paper/supervised-learning-with-incomplete-data-via
Repo
Framework

Robust Natural Language Representation Learning for Natural Language Inference by Projecting Superficial Words out


Title	Robust Natural Language Representation Learning for Natural Language Inference by Projecting Superficial Words out
Authors	Anonymous
Abstract	In natural language inference, the semantics of some words do not affect the inference. Such information is considered superficial and brings overfitting. How can we represent and discard such superficial information? In this paper, we use first order logic (FOL) - a classic technique from meaning representation language – to explain what information is superficial for a given sentence pair. Such explanation also suggests two inductive biases according to its properties. We proposed a neural network-based approach that utilizes the two inductive biases. We obtain substantial improvements over extensive experiments.
Tasks	Natural Language Inference, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HkxQzlHFPr
PDF	https://openreview.net/pdf?id=HkxQzlHFPr
PWC	https://paperswithcode.com/paper/robust-natural-language-representation
Repo
Framework

Learning Calibratable Policies using Programmatic Style-Consistency


Title	Learning Calibratable Policies using Programmatic Style-Consistency
Authors	Anonymous
Abstract	We study the important and challenging problem of controllable generation of long-term sequential behaviors. Solutions to this problem would impact many applications, such as calibrating behaviors of AI agents in games or predicting player trajectories in sports. In contrast to the well-studied areas of controllable generation of images, text, and speech, there are significant challenges that are unique to or exacerbated by generating long-term behaviors: how should we specify the factors of variation to control, and how can we ensure that the generated temporal behavior faithfully demonstrates diverse styles? In this paper, we leverage large amounts of raw behavioral data to learn policies that can be calibrated to generate a diverse range of behavior styles (e.g., aggressive versus passive play in sports). Inspired by recent work on leveraging programmatic labeling functions, we present a novel framework that combines imitation learning with data programming to learn style-calibratable policies. Our primary technical contribution is a formal notion of style-consistency as a learning objective, and its integration with conventional imitation learning approaches. We evaluate our framework using demonstrations from professional basketball players and agents in the MuJoCo physics environment, and show that our learned policies can be accurately calibrated to generate interesting behavior styles in both domains.
Tasks	Imitation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=Byx5R0NKPr
PDF	https://openreview.net/pdf?id=Byx5R0NKPr
PWC	https://paperswithcode.com/paper/learning-calibratable-policies-using-1
Repo
Framework

Neural Video Encoding


Title	Neural Video Encoding
Authors	Anonymous
Abstract	Deep neural networks have had unprecedented success in computer vision, natural language processing, and speech largely due to the ability to search for suitable task algorithms via differentiable programming. In this paper, we borrow ideas from Kolmogorov complexity theory and normalizing flows to explore the possibilities of finding arbitrary algorithms that represent data. In particular, algorithms which encode sequences of video image frames. Ultimately, we demonstrate neural video encoded using convolutional neural networks to transform autoregressive noise processes and show that this method has surprising cryptographic analogs for information security.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Byeq_xHtwS
PDF	https://openreview.net/pdf?id=Byeq_xHtwS
PWC	https://paperswithcode.com/paper/neural-video-encoding
Repo
Framework