April 1, 2020

3090 words 15 mins read

Paper Group NANR 31

Encoding word order in complex embeddings. Model Inversion Networks for Model-Based Optimization. Efficient generation of structured objects with Constrained Adversarial Networks. A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs. DDSP: Differentiable Digital Signal Processing. FRICATIVE PHONEME DETECTION WITH ZERO DELA …

Encoding word order in complex embeddings


Title	Encoding word order in complex embeddings
Authors	Anonymous
Abstract	Sequential word order is important when processing text. Currently, neural networks (NNs) address this by modeling word position using position embeddings. The problem is that position embeddings capture the position of individual words, but not the ordered relationship (e.g., adjacency or precedence) between individual word positions. We present a novel and principled solution for modeling both the global absolute positions of words and their order relationships. Our solution generalizes word embeddings, previously defined as independent vectors, to continuous word functions over a variable (position). The benefit of continuous functions over variable positions is that word representations shift smoothly with increasing positions. Hence, word representations in different positions can correlate with each other in a continuous function. The general solution of these functions can be extended to complex-valued variants. We extend CNN, RNN and Transformer NNs to complex-valued versions to incorporate our complex embedding (we make all code available). Experiments on text classification, machine translation and language modeling show gains over both classical word embeddings and position-enriched word embeddings. To our knowledge, this is the first work in NLP to link imaginary numbers in complex-valued representations to concrete meanings (i.e., word order).
Tasks	Language Modelling, Machine Translation, Text Classification, Word Embeddings
Published	2020-01-01
URL	https://openreview.net/forum?id=Hke-WTVtwr
PDF	https://openreview.net/pdf?id=Hke-WTVtwr
PWC	https://paperswithcode.com/paper/encoding-word-order-in-complex-embeddings
Repo
Framework

Model Inversion Networks for Model-Based Optimization


Title	Model Inversion Networks for Model-Based Optimization
Authors	Anonymous
Abstract	In this work, we aim to solve data-driven optimization problems, where the goal is to find an input that maximizes an unknown score function given access to a dataset of input, score pairs. Inputs may lie on extremely thin manifolds in high-dimensional spaces, making the optimization prone to falling-off the manifold. Further, evaluating the unknown function may be expensive, so the algorithm should be able to exploit static, offline data. We propose model inversion networks (MINs) as an approach to solve such problems. Unlike prior work, MINs scale to extremely high-dimensional input spaces and can efficiently leverage offline logged datasets for optimization in both contextual and non-contextual settings. We show that MINs can also be extended to the active setting, commonly studied in prior work, via a simple, novel and effective scheme for active data collection. Our experiments show that MINs act as powerful optimizers on a range of contextual/non-contextual, static/active problems including optimization over images and protein designs and learning from logged bandit feedback.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SklsBJHKDS
PDF	https://openreview.net/pdf?id=SklsBJHKDS
PWC	https://paperswithcode.com/paper/model-inversion-networks-for-model-based
Repo
Framework

Efficient generation of structured objects with Constrained Adversarial Networks


Title	Efficient generation of structured objects with Constrained Adversarial Networks
Authors	Anonymous
Abstract	Despite their success, generative adversarial networks (GANs) cannot easily generate structured objects like molecules or game maps. The issue is that such objects must satisfy structural requirements (e.g., molecules must be chemically valid, game maps must guarantee reachability of the end goal) that are difficult to capture with examples alone. As a remedy, we propose constrained adversarial networks (CANs), which embed the constraints into the model during training by penalizing the generator whenever it outputs invalid structures. As in unconstrained GANs, new objects can be sampled straightforwardly from the generator, but in addition they satisfy the constraints with high probability. Our approach handles arbitrary logical constraints and leverages knowledge compilation techniques to efficiently evaluate the expected disagreement between the model and the constraints. This setup is further extended to hybrid logical-neural constraints for capturing complex requirements like graph reachability. An extensive empirical analysis on constrained images, molecules, and video game levels shows that CANs efficiently generate valid structures that are both high-quality and novel.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HyeCnkHtwH
PDF	https://openreview.net/pdf?id=HyeCnkHtwH
PWC	https://paperswithcode.com/paper/efficient-generation-of-structured-objects
Repo
Framework

A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs


Title	A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs
Authors	Anonymous
Abstract	Training neural networks on image datasets generally require extensive experimentation to find the optimal learning rate regime. Especially, for the cases of adversarial training or for training a newly synthesized model, one would not know the best learning rate regime beforehand. We propose an automated algorithm for determining the learning rate trajectory, that works across datasets and models for both natural and adversarial training, without requiring any dataset/model specific tuning. It is a stand-alone, parameterless, adaptive approach with no computational overhead. We theoretically discuss the algorithm’s convergence behavior. We empirically validate our algorithm extensively. Our results show that our proposed approach \emph{consistently} achieves top-level accuracy compared to SOTA baselines in the literature in natural training, as well as in adversarial training.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJxyqkSYDH
PDF	https://openreview.net/pdf?id=rJxyqkSYDH
PWC	https://paperswithcode.com/paper/a-simple-dynamic-learning-rate-tuning
Repo
Framework

DDSP: Differentiable Digital Signal Processing


Title	DDSP: Differentiable Digital Signal Processing
Authors	Anonymous
Abstract	Most generative models of audio directly generate samples in one of two domains: time or frequency. While sufficient to express any signal, these representations are inefficient, as they do not utilize existing knowledge of how sound is generated and perceived. A third approach (vocoders/synthesizers) successfully incorporates strong domain knowledge of signal processing and perception, but has been less actively researched due to limited expressivity and difficulty integrating with modern auto-differentiation-based machine learning methods. In this paper, we introduce the Differentiable Digital Signal Processing (DDSP) library, which enables direct integration of classic signal processing elements with deep learning methods. Focusing on audio synthesis, we achieve high-fidelity generation without the need for large autoregressive models or adversarial losses, demonstrating that DDSP enables utilizing strong inductive biases without losing the expressive power of neural networks. Further, we show that combining interpretable modules permits manipulation of each separate model component, with applications such as independent control of pitch and loudness, realistic extrapolation to pitches not seen during training, blind dereverberation of room acoustics, transfer of extracted room acoustics to new environments, and transformation of timbre between disparate sources. In short, DDSP enables an interpretable and modular approach to generative modeling, without sacrificing the benefits of deep learning. The library will be made available upon paper acceptance and we encourage further contributions from the community and domain experts.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1x1ma4tDr
PDF	https://openreview.net/pdf?id=B1x1ma4tDr
PWC	https://paperswithcode.com/paper/ddsp-differentiable-digital-signal-processing
Repo
Framework

FRICATIVE PHONEME DETECTION WITH ZERO DELAY


Title	FRICATIVE PHONEME DETECTION WITH ZERO DELAY
Authors	Anonymous
Abstract	People with high-frequency hearing loss rely on hearing aids that employ frequency lowering algorithms. These algorithms shift some of the sounds from the high frequency band to the lower frequency band where the sounds become more perceptible for the people with the condition. Fricative phonemes have an important part of their content concentrated in high frequency bands. It is important that the frequency lowering algorithm is activated exactly for the duration of a fricative phoneme, and kept off at all other times. Therefore, timely (with zero delay) and accurate fricative phoneme detection is a key problem for high quality hearing aids. In this paper we present a deep learning based fricative phoneme detection algorithm that has zero detection delay and achieves state-of-the-art fricative phoneme detection accuracy on the TIMIT Speech Corpus. All reported results are reproducible and come with easy to use code that could serve as a baseline for future research.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJxlmeBKwS
PDF	https://openreview.net/pdf?id=BJxlmeBKwS
PWC	https://paperswithcode.com/paper/fricative-phoneme-detection-with-zero-delay
Repo
Framework

Small-GAN: Speeding up GAN Training using Core-Sets


Title	Small-GAN: Speeding up GAN Training using Core-Sets
Authors	Anonymous
Abstract	BigGAN suggests that Generative Adversarial Networks (GANs) benefit disproportionately from large minibatch sizes. This finding is interesting but also discouraging – large batch sizes are slow and expensive to emulate on conventional hardware. Thus, it would be nice if there were some trick by which we could generate batches that were effectively big though small in practice. In this work, we propose such a trick, inspired by the use of Coreset-selection in active learning. When training a GAN, we draw a large batch of samples from the prior and then compress that batch using Coreset-selection. To create effectively large batches of real images, we create a cached dataset of Inception activations of each training image, randomly project them down to a smaller dimension, and then use Coreset-selection on those projected embeddings at training time. We conduct experiments showing that this technique substantially reduces training time and memory usage for modern GAN variants, that it reduces the fraction of dropped modes in a synthetic dataset, and that it helps us use GANs to reach a new state of the art in anomaly detection.
Tasks	Active Learning, Anomaly Detection
Published	2020-01-01
URL	https://openreview.net/forum?id=rkeNr6EKwB
PDF	https://openreview.net/pdf?id=rkeNr6EKwB
PWC	https://paperswithcode.com/paper/small-gan-speeding-up-gan-training-using-core-1
Repo
Framework

A Theoretical Analysis of the Number of Shots in Few-Shot Learning


Title	A Theoretical Analysis of the Number of Shots in Few-Shot Learning
Authors	Anonymous
Abstract	Few-shot classification is the task of predicting the category of an example from a set of few labeled examples. The number of labeled examples per category is called the number of shots (or shot number). Recent works tackle this task through meta-learning, where a meta-learner extracts information from observed tasks during meta-training to quickly adapt to new tasks during meta-testing. In this formulation, the number of shots exploited during meta-training has an impact on the recognition performance at meta-test time. Generally, the shot number used in meta-training should match the one used in meta-testing to obtain the best performance. We introduce a theoretical analysis of the impact of the shot number on Prototypical Networks, a state-of-the-art few-shot classification method. From our analysis, we propose a simple method that is robust to the choice of shot number used during meta-training, which is a crucial hyperparameter. The performance of our model trained for an arbitrary meta-training shot number shows great performance for different values of meta-testing shot numbers. We experimentally demonstrate our approach on different few-shot classification benchmarks.
Tasks	Few-Shot Learning, Meta-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HkgB2TNYPS
PDF	https://openreview.net/pdf?id=HkgB2TNYPS
PWC	https://paperswithcode.com/paper/a-theoretical-analysis-of-the-number-of-shots-1
Repo
Framework

Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems


Title	Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems
Authors	Anonymous
Abstract	First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks. Second-order methods, despite their better convergence rate, are rarely used in practice due to the pro- hibitive computational cost in calculating the second-order information. In this paper, we propose a novel Gram-Gauss-Newton (GGN) algorithm to train deep neural networks for regression problems with square loss. Our method draws inspiration from the connection between neural network optimization and kernel regression of neural tangent kernel (NTK). Different from typical second-order methods that have heavy computational cost in each iteration, GGN only has minor overhead compared to first-order methods such as SGD. We also give theoretical results to show that for sufficiently wide neural networks, the convergence rate of GGN is quadratic. Furthermore, we provide convergence guarantee for mini-batch GGN algorithm, which is, to our knowledge, the first convergence result for the mini-batch version of a second-order method on overparameterized neural net- works. Preliminary experiments on regression tasks demonstrate that for training standard networks, our GGN algorithm converges much faster and achieves better performance than SGD.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1gCeyHFDS
PDF	https://openreview.net/pdf?id=H1gCeyHFDS
PWC	https://paperswithcode.com/paper/gram-gauss-newton-method-learning
Repo
Framework

Selective sampling for accelerating training of deep neural networks


Title	Selective sampling for accelerating training of deep neural networks
Authors	Anonymous
Abstract	We present a selective sampling method designed to accelerate the training of deep neural networks. To this end, we introduce a novel measurement, the {\it minimal margin score} (MMS), which measures the minimal amount of displacement an input should take until its predicted classification is switched. For multi-class linear classification, the MMS measure is a natural generalization of the margin-based selection criterion, which was thoroughly studied in the binary classification setting. In addition, the MMS measure provides an interesting insight into the progress of the training process and can be useful for designing and monitoring new training regimes. Empirically we demonstrate a substantial acceleration when training commonly used deep neural network architectures for popular image classification tasks. The efficiency of our method is compared against the standard training procedures, and against commonly used selective sampling alternatives: Hard negative mining selection, and Entropy-based selection. Finally, we demonstrate an additional speedup when we adopt a more aggressive learning-drop regime while using the MMS selective sampling method.
Tasks	Image Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=SJxNzgSKvH
PDF	https://openreview.net/pdf?id=SJxNzgSKvH
PWC	https://paperswithcode.com/paper/selective-sampling-for-accelerating-training
Repo
Framework

Structural Language Models for Any-Code Generation


Title	Structural Language Models for Any-Code Generation
Authors	Anonymous
Abstract	We address the problem of Any-Code Generation (AnyGen) - generating code without any restriction on the vocabulary or structure. The state-of-the-art in this problem is the sequence-to-sequence (seq2seq) approach, which treats code as a sequence and does not leverage any structural information. We introduce a new approach to AnyGen that leverages the strict syntax of programming languages to model a code snippet as tree structural language modeling (SLM). SLM estimates the probability of the program’s abstract syntax tree (AST) by decomposing it into a product of conditional probabilities over its nodes. We present a neural model that computes these conditional probabilities by considering all AST paths leading to a target node. Unlike previous structural techniques that have severely restricted the kinds of expressions that can be generated, our approach can generate arbitrary expressions in any programming language. Our model significantly outperforms both seq2seq and a variety of existing structured approaches in generating Java and C# code. We make our code, datasets, and models available online.
Tasks	Code Generation, Language Modelling
Published	2020-01-01
URL	https://openreview.net/forum?id=HylZIT4Yvr
PDF	https://openreview.net/pdf?id=HylZIT4Yvr
PWC	https://paperswithcode.com/paper/structural-language-models-for-any-code-1
Repo
Framework

Metagross: Meta Gated Recursive Controller Units for Sequence Modeling


Title	Metagross: Meta Gated Recursive Controller Units for Sequence Modeling
Authors	Yi Tay, Yikang Shen, Alvin Chan, Yew Soon Ong
Abstract	This paper proposes Metagross (Meta Gated Recursive Controller), a new neural sequence modeling unit. Our proposed unit is characterized by recursive parameterization of its gating functions, i.e., gating mechanisms of Metagross are controlled by instances of itself, which are repeatedly called in a recursive fashion. This can be interpreted as a form of meta-gating and recursively parameterizing a recurrent model. We postulate that our proposed inductive bias provides modeling benefits pertaining to learning with inherently hierarchically-structured sequence data (e.g., language, logical or music tasks). To this end, we conduct extensive experiments on recursive logic tasks (sorting, tree traversal, logical inference), sequential pixel-by-pixel classification, semantic parsing, code generation, machine translation and polyphonic music modeling, demonstrating the widespread utility of the proposed approach, i.e., achieving state-of-the-art (or close) performance on all tasks.
Tasks	Code Generation, Machine Translation, Music Modeling, Semantic Parsing
Published	2020-01-01
URL	https://openreview.net/forum?id=Sygn20VtwH
PDF	https://openreview.net/pdf?id=Sygn20VtwH
PWC	https://paperswithcode.com/paper/metagross-meta-gated-recursive-controller
Repo
Framework

LIA: Latently Invertible Autoencoder with Adversarial Learning


Title	LIA: Latently Invertible Autoencoder with Adversarial Learning
Authors	Anonymous
Abstract	Deep generative models such as Variational AutoEncoder (VAE) and Generative Adversarial Network (GAN) play an increasingly important role in machine learning and computer vision. However, there are two fundamental issues hindering their real-world applications: the difficulty of conducting variational inference in VAE and the functional absence of encoding real-world samples in GAN. In this paper, we propose a novel algorithm named Latently Invertible Autoencoder (LIA) to address the above two issues in one framework. An invertible network and its inverse mapping are symmetrically embedded in the latent space of VAE. Thus the partial encoder first transforms the input into feature vectors and then the distribution of these feature vectors is reshaped to fit a prior by the invertible network. The decoder proceeds in the reverse order of the encoder’s composite mappings. A two-stage stochasticity-free training scheme is designed to train LIA via adversarial learning, in the sense that the decoder of LIA is first trained as a standard GAN with the invertible network and then the partial encoder is learned from an autoencoder by detaching the invertible network from LIA. Experiments conducted on the FFHQ face dataset and three LSUN datasets validate the effectiveness of LIA for inference and generation.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=ryefE1SYDr
PDF	https://openreview.net/pdf?id=ryefE1SYDr
PWC	https://paperswithcode.com/paper/lia-latently-invertible-autoencoder-with-1
Repo
Framework

EvoNet: A Neural Network for Predicting the Evolution of Dynamic Graphs


Title	EvoNet: A Neural Network for Predicting the Evolution of Dynamic Graphs
Authors	Anonymous
Abstract	Neural networks for structured data like graphs have been studied extensively in recent years. To date, the bulk of research activity has focused mainly on static graphs. However, most real-world networks are dynamic since their topology tends to change over time. Predicting the evolution of dynamic graphs is a task of high significance in the area of graph mining. Despite its practical importance, the task has not been explored in depth so far, mainly due to its challenging nature. In this paper, we propose a model that predicts the evolution of dynamic graphs. Specifically, we use a graph neural network along with a recurrent architecture to capture the temporal evolution patterns of dynamic graphs. Then, we employ a generative model which predicts the topology of the graph at the next time step and constructs a graph instance that corresponds to that topology. We evaluate the proposed model on several artificial datasets following common network evolving dynamics, as well as on real-world datasets. Results demonstrate the effectiveness of the proposed model.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Byg5flHFDr
PDF	https://openreview.net/pdf?id=Byg5flHFDr
PWC	https://paperswithcode.com/paper/evonet-a-neural-network-for-predicting-the
Repo
Framework

Picking Winning Tickets Before Training by Preserving Gradient Flow


Title	Picking Winning Tickets Before Training by Preserving Gradient Flow
Authors	Anonymous
Abstract	Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time. Network pruning can reduce test-time resource requirements, but is typically applied to trained networks and therefore cannot avoid the expensive training process. We aim to prune networks at initialization, thereby saving resources at training time as well. Specifically, we argue that efficient training requires preserving the gradient flow through the network. This leads to a simple but effective pruning criterion we term Gradient Signal Preservation (GraSP). We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet, using VGGNet and ResNet architectures. Our method can prune 80% of the weights of a VGG-16 network on ImageNet at initialization, with only a 1.6% drop in top-1 accuracy. Moreover, our method achieves significantly better performance than the baseline at extreme sparsity levels.
Tasks	Network Pruning
Published	2020-01-01
URL	https://openreview.net/forum?id=SkgsACVKPH
PDF	https://openreview.net/pdf?id=SkgsACVKPH
PWC	https://paperswithcode.com/paper/picking-winning-tickets-before-training-by
Repo
Framework