Paper Group NANR 31
Encoding word order in complex embeddings. Model Inversion Networks for Model-Based Optimization. Efficient generation of structured objects with Constrained Adversarial Networks. A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs. DDSP: Differentiable Digital Signal Processing. FRICATIVE PHONEME DETECTION WITH ZERO DELA …
Encoding word order in complex embeddings
Title | Encoding word order in complex embeddings |
Authors | Anonymous |
Abstract | Sequential word order is important when processing text. Currently, neural networks (NNs) address this by modeling word position using position embeddings. The problem is that position embeddings capture the position of individual words, but not the ordered relationship (e.g., adjacency or precedence) between individual word positions. We present a novel and principled solution for modeling both the global absolute positions of words and their order relationships. Our solution generalizes word embeddings, previously defined as independent vectors, to continuous word functions over a variable (position). The benefit of continuous functions over variable positions is that word representations shift smoothly with increasing positions. Hence, word representations in different positions can correlate with each other in a continuous function. The general solution of these functions can be extended to complex-valued variants. We extend CNN, RNN and Transformer NNs to complex-valued versions to incorporate our complex embedding (we make all code available). Experiments on text classification, machine translation and language modeling show gains over both classical word embeddings and position-enriched word embeddings. To our knowledge, this is the first work in NLP to link imaginary numbers in complex-valued representations to concrete meanings (i.e., word order). |
Tasks | Language Modelling, Machine Translation, Text Classification, Word Embeddings |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hke-WTVtwr |
https://openreview.net/pdf?id=Hke-WTVtwr | |
PWC | https://paperswithcode.com/paper/encoding-word-order-in-complex-embeddings |
Repo | |
Framework | |
Model Inversion Networks for Model-Based Optimization
Title | Model Inversion Networks for Model-Based Optimization |
Authors | Anonymous |
Abstract | In this work, we aim to solve data-driven optimization problems, where the goal is to find an input that maximizes an unknown score function given access to a dataset of input, score pairs. Inputs may lie on extremely thin manifolds in high-dimensional spaces, making the optimization prone to falling-off the manifold. Further, evaluating the unknown function may be expensive, so the algorithm should be able to exploit static, offline data. We propose model inversion networks (MINs) as an approach to solve such problems. Unlike prior work, MINs scale to extremely high-dimensional input spaces and can efficiently leverage offline logged datasets for optimization in both contextual and non-contextual settings. We show that MINs can also be extended to the active setting, commonly studied in prior work, via a simple, novel and effective scheme for active data collection. Our experiments show that MINs act as powerful optimizers on a range of contextual/non-contextual, static/active problems including optimization over images and protein designs and learning from logged bandit feedback. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SklsBJHKDS |
https://openreview.net/pdf?id=SklsBJHKDS | |
PWC | https://paperswithcode.com/paper/model-inversion-networks-for-model-based |
Repo | |
Framework | |
Efficient generation of structured objects with Constrained Adversarial Networks
Title | Efficient generation of structured objects with Constrained Adversarial Networks |
Authors | Anonymous |
Abstract | Despite their success, generative adversarial networks (GANs) cannot easily generate structured objects like molecules or game maps. The issue is that such objects must satisfy structural requirements (e.g., molecules must be chemically valid, game maps must guarantee reachability of the end goal) that are difficult to capture with examples alone. As a remedy, we propose constrained adversarial networks (CANs), which embed the constraints into the model during training by penalizing the generator whenever it outputs invalid structures. As in unconstrained GANs, new objects can be sampled straightforwardly from the generator, but in addition they satisfy the constraints with high probability. Our approach handles arbitrary logical constraints and leverages knowledge compilation techniques to efficiently evaluate the expected disagreement between the model and the constraints. This setup is further extended to hybrid logical-neural constraints for capturing complex requirements like graph reachability. An extensive empirical analysis on constrained images, molecules, and video game levels shows that CANs efficiently generate valid structures that are both high-quality and novel. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyeCnkHtwH |
https://openreview.net/pdf?id=HyeCnkHtwH | |
PWC | https://paperswithcode.com/paper/efficient-generation-of-structured-objects |
Repo | |
Framework | |
A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs
Title | A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs |
Authors | Anonymous |
Abstract | Training neural networks on image datasets generally require extensive experimentation to find the optimal learning rate regime. Especially, for the cases of adversarial training or for training a newly synthesized model, one would not know the best learning rate regime beforehand. We propose an automated algorithm for determining the learning rate trajectory, that works across datasets and models for both natural and adversarial training, without requiring any dataset/model specific tuning. It is a stand-alone, parameterless, adaptive approach with no computational overhead. We theoretically discuss the algorithm’s convergence behavior. We empirically validate our algorithm extensively. Our results show that our proposed approach \emph{consistently} achieves top-level accuracy compared to SOTA baselines in the literature in natural training, as well as in adversarial training. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJxyqkSYDH |
https://openreview.net/pdf?id=rJxyqkSYDH | |
PWC | https://paperswithcode.com/paper/a-simple-dynamic-learning-rate-tuning |
Repo | |
Framework | |
DDSP: Differentiable Digital Signal Processing
Title | DDSP: Differentiable Digital Signal Processing |
Authors | Anonymous |
Abstract | Most generative models of audio directly generate samples in one of two domains: time or frequency. While sufficient to express any signal, these representations are inefficient, as they do not utilize existing knowledge of how sound is generated and perceived. A third approach (vocoders/synthesizers) successfully incorporates strong domain knowledge of signal processing and perception, but has been less actively researched due to limited expressivity and difficulty integrating with modern auto-differentiation-based machine learning methods. In this paper, we introduce the Differentiable Digital Signal Processing (DDSP) library, which enables direct integration of classic signal processing elements with deep learning methods. Focusing on audio synthesis, we achieve high-fidelity generation without the need for large autoregressive models or adversarial losses, demonstrating that DDSP enables utilizing strong inductive biases without losing the expressive power of neural networks. Further, we show that combining interpretable modules permits manipulation of each separate model component, with applications such as independent control of pitch and loudness, realistic extrapolation to pitches not seen during training, blind dereverberation of room acoustics, transfer of extracted room acoustics to new environments, and transformation of timbre between disparate sources. In short, DDSP enables an interpretable and modular approach to generative modeling, without sacrificing the benefits of deep learning. The library will be made available upon paper acceptance and we encourage further contributions from the community and domain experts. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1x1ma4tDr |
https://openreview.net/pdf?id=B1x1ma4tDr | |
PWC | https://paperswithcode.com/paper/ddsp-differentiable-digital-signal-processing |
Repo | |
Framework | |
FRICATIVE PHONEME DETECTION WITH ZERO DELAY
Title | FRICATIVE PHONEME DETECTION WITH ZERO DELAY |
Authors | Anonymous |
Abstract | People with high-frequency hearing loss rely on hearing aids that employ frequency lowering algorithms. These algorithms shift some of the sounds from the high frequency band to the lower frequency band where the sounds become more perceptible for the people with the condition. Fricative phonemes have an important part of their content concentrated in high frequency bands. It is important that the frequency lowering algorithm is activated exactly for the duration of a fricative phoneme, and kept off at all other times. Therefore, timely (with zero delay) and accurate fricative phoneme detection is a key problem for high quality hearing aids. In this paper we present a deep learning based fricative phoneme detection algorithm that has zero detection delay and achieves state-of-the-art fricative phoneme detection accuracy on the TIMIT Speech Corpus. All reported results are reproducible and come with easy to use code that could serve as a baseline for future research. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJxlmeBKwS |
https://openreview.net/pdf?id=BJxlmeBKwS | |
PWC | https://paperswithcode.com/paper/fricative-phoneme-detection-with-zero-delay |
Repo | |
Framework | |
Small-GAN: Speeding up GAN Training using Core-Sets
Title | Small-GAN: Speeding up GAN Training using Core-Sets |
Authors | Anonymous |
Abstract | BigGAN suggests that Generative Adversarial Networks (GANs) benefit disproportionately from large minibatch sizes. This finding is interesting but also discouraging – large batch sizes are slow and expensive to emulate on conventional hardware. Thus, it would be nice if there were some trick by which we could generate batches that were effectively big though small in practice. In this work, we propose such a trick, inspired by the use of Coreset-selection in active learning. When training a GAN, we draw a large batch of samples from the prior and then compress that batch using Coreset-selection. To create effectively large batches of real images, we create a cached dataset of Inception activations of each training image, randomly project them down to a smaller dimension, and then use Coreset-selection on those projected embeddings at training time. We conduct experiments showing that this technique substantially reduces training time and memory usage for modern GAN variants, that it reduces the fraction of dropped modes in a synthetic dataset, and that it helps us use GANs to reach a new state of the art in anomaly detection. |
Tasks | Active Learning, Anomaly Detection |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkeNr6EKwB |
https://openreview.net/pdf?id=rkeNr6EKwB | |
PWC | https://paperswithcode.com/paper/small-gan-speeding-up-gan-training-using-core-1 |
Repo | |
Framework | |
A Theoretical Analysis of the Number of Shots in Few-Shot Learning
Title | A Theoretical Analysis of the Number of Shots in Few-Shot Learning |
Authors | Anonymous |
Abstract | Few-shot classification is the task of predicting the category of an example from a set of few labeled examples. The number of labeled examples per category is called the number of shots (or shot number). Recent works tackle this task through meta-learning, where a meta-learner extracts information from observed tasks during meta-training to quickly adapt to new tasks during meta-testing. In this formulation, the number of shots exploited during meta-training has an impact on the recognition performance at meta-test time. Generally, the shot number used in meta-training should match the one used in meta-testing to obtain the best performance. We introduce a theoretical analysis of the impact of the shot number on Prototypical Networks, a state-of-the-art few-shot classification method. From our analysis, we propose a simple method that is robust to the choice of shot number used during meta-training, which is a crucial hyperparameter. The performance of our model trained for an arbitrary meta-training shot number shows great performance for different values of meta-testing shot numbers. We experimentally demonstrate our approach on different few-shot classification benchmarks. |
Tasks | Few-Shot Learning, Meta-Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HkgB2TNYPS |
https://openreview.net/pdf?id=HkgB2TNYPS | |
PWC | https://paperswithcode.com/paper/a-theoretical-analysis-of-the-number-of-shots-1 |
Repo | |
Framework | |
Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems
Title | Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems |
Authors | Anonymous |
Abstract | First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks. Second-order methods, despite their better convergence rate, are rarely used in practice due to the pro- hibitive computational cost in calculating the second-order information. In this paper, we propose a novel Gram-Gauss-Newton (GGN) algorithm to train deep neural networks for regression problems with square loss. Our method draws inspiration from the connection between neural network optimization and kernel regression of neural tangent kernel (NTK). Different from typical second-order methods that have heavy computational cost in each iteration, GGN only has minor overhead compared to first-order methods such as SGD. We also give theoretical results to show that for sufficiently wide neural networks, the convergence rate of GGN is quadratic. Furthermore, we provide convergence guarantee for mini-batch GGN algorithm, which is, to our knowledge, the first convergence result for the mini-batch version of a second-order method on overparameterized neural net- works. Preliminary experiments on regression tasks demonstrate that for training standard networks, our GGN algorithm converges much faster and achieves better performance than SGD. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1gCeyHFDS |
https://openreview.net/pdf?id=H1gCeyHFDS | |
PWC | https://paperswithcode.com/paper/gram-gauss-newton-method-learning |
Repo | |
Framework | |
Selective sampling for accelerating training of deep neural networks
Title | Selective sampling for accelerating training of deep neural networks |
Authors | Anonymous |
Abstract | We present a selective sampling method designed to accelerate the training of deep neural networks. To this end, we introduce a novel measurement, the {\it minimal margin score} (MMS), which measures the minimal amount of displacement an input should take until its predicted classification is switched. For multi-class linear classification, the MMS measure is a natural generalization of the margin-based selection criterion, which was thoroughly studied in the binary classification setting. In addition, the MMS measure provides an interesting insight into the progress of the training process and can be useful for designing and monitoring new training regimes. Empirically we demonstrate a substantial acceleration when training commonly used deep neural network architectures for popular image classification tasks. The efficiency of our method is compared against the standard training procedures, and against commonly used selective sampling alternatives: Hard negative mining selection, and Entropy-based selection. Finally, we demonstrate an additional speedup when we adopt a more aggressive learning-drop regime while using the MMS selective sampling method. |
Tasks | Image Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJxNzgSKvH |
https://openreview.net/pdf?id=SJxNzgSKvH | |
PWC | https://paperswithcode.com/paper/selective-sampling-for-accelerating-training |
Repo | |
Framework | |
Structural Language Models for Any-Code Generation
Title | Structural Language Models for Any-Code Generation |
Authors | Anonymous |
Abstract | We address the problem of Any-Code Generation (AnyGen) - generating code without any restriction on the vocabulary or structure. The state-of-the-art in this problem is the sequence-to-sequence (seq2seq) approach, which treats code as a sequence and does not leverage any structural information. We introduce a new approach to AnyGen that leverages the strict syntax of programming languages to model a code snippet as tree structural language modeling (SLM). SLM estimates the probability of the program’s abstract syntax tree (AST) by decomposing it into a product of conditional probabilities over its nodes. We present a neural model that computes these conditional probabilities by considering all AST paths leading to a target node. Unlike previous structural techniques that have severely restricted the kinds of expressions that can be generated, our approach can generate arbitrary expressions in any programming language. Our model significantly outperforms both seq2seq and a variety of existing structured approaches in generating Java and C# code. We make our code, datasets, and models available online. |
Tasks | Code Generation, Language Modelling |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HylZIT4Yvr |
https://openreview.net/pdf?id=HylZIT4Yvr | |
PWC | https://paperswithcode.com/paper/structural-language-models-for-any-code-1 |
Repo | |
Framework | |
Metagross: Meta Gated Recursive Controller Units for Sequence Modeling
Title | Metagross: Meta Gated Recursive Controller Units for Sequence Modeling |
Authors | Yi Tay, Yikang Shen, Alvin Chan, Yew Soon Ong |
Abstract | This paper proposes Metagross (Meta Gated Recursive Controller), a new neural sequence modeling unit. Our proposed unit is characterized by recursive parameterization of its gating functions, i.e., gating mechanisms of Metagross are controlled by instances of itself, which are repeatedly called in a recursive fashion. This can be interpreted as a form of meta-gating and recursively parameterizing a recurrent model. We postulate that our proposed inductive bias provides modeling benefits pertaining to learning with inherently hierarchically-structured sequence data (e.g., language, logical or music tasks). To this end, we conduct extensive experiments on recursive logic tasks (sorting, tree traversal, logical inference), sequential pixel-by-pixel classification, semantic parsing, code generation, machine translation and polyphonic music modeling, demonstrating the widespread utility of the proposed approach, i.e., achieving state-of-the-art (or close) performance on all tasks. |
Tasks | Code Generation, Machine Translation, Music Modeling, Semantic Parsing |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Sygn20VtwH |
https://openreview.net/pdf?id=Sygn20VtwH | |
PWC | https://paperswithcode.com/paper/metagross-meta-gated-recursive-controller |
Repo | |
Framework | |
LIA: Latently Invertible Autoencoder with Adversarial Learning
Title | LIA: Latently Invertible Autoencoder with Adversarial Learning |
Authors | Anonymous |
Abstract | Deep generative models such as Variational AutoEncoder (VAE) and Generative Adversarial Network (GAN) play an increasingly important role in machine learning and computer vision. However, there are two fundamental issues hindering their real-world applications: the difficulty of conducting variational inference in VAE and the functional absence of encoding real-world samples in GAN. In this paper, we propose a novel algorithm named Latently Invertible Autoencoder (LIA) to address the above two issues in one framework. An invertible network and its inverse mapping are symmetrically embedded in the latent space of VAE. Thus the partial encoder first transforms the input into feature vectors and then the distribution of these feature vectors is reshaped to fit a prior by the invertible network. The decoder proceeds in the reverse order of the encoder’s composite mappings. A two-stage stochasticity-free training scheme is designed to train LIA via adversarial learning, in the sense that the decoder of LIA is first trained as a standard GAN with the invertible network and then the partial encoder is learned from an autoencoder by detaching the invertible network from LIA. Experiments conducted on the FFHQ face dataset and three LSUN datasets validate the effectiveness of LIA for inference and generation. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryefE1SYDr |
https://openreview.net/pdf?id=ryefE1SYDr | |
PWC | https://paperswithcode.com/paper/lia-latently-invertible-autoencoder-with-1 |
Repo | |
Framework | |
EvoNet: A Neural Network for Predicting the Evolution of Dynamic Graphs
Title | EvoNet: A Neural Network for Predicting the Evolution of Dynamic Graphs |
Authors | Anonymous |
Abstract | Neural networks for structured data like graphs have been studied extensively in recent years. To date, the bulk of research activity has focused mainly on static graphs. However, most real-world networks are dynamic since their topology tends to change over time. Predicting the evolution of dynamic graphs is a task of high significance in the area of graph mining. Despite its practical importance, the task has not been explored in depth so far, mainly due to its challenging nature. In this paper, we propose a model that predicts the evolution of dynamic graphs. Specifically, we use a graph neural network along with a recurrent architecture to capture the temporal evolution patterns of dynamic graphs. Then, we employ a generative model which predicts the topology of the graph at the next time step and constructs a graph instance that corresponds to that topology. We evaluate the proposed model on several artificial datasets following common network evolving dynamics, as well as on real-world datasets. Results demonstrate the effectiveness of the proposed model. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Byg5flHFDr |
https://openreview.net/pdf?id=Byg5flHFDr | |
PWC | https://paperswithcode.com/paper/evonet-a-neural-network-for-predicting-the |
Repo | |
Framework | |
Picking Winning Tickets Before Training by Preserving Gradient Flow
Title | Picking Winning Tickets Before Training by Preserving Gradient Flow |
Authors | Anonymous |
Abstract | Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time. Network pruning can reduce test-time resource requirements, but is typically applied to trained networks and therefore cannot avoid the expensive training process. We aim to prune networks at initialization, thereby saving resources at training time as well. Specifically, we argue that efficient training requires preserving the gradient flow through the network. This leads to a simple but effective pruning criterion we term Gradient Signal Preservation (GraSP). We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet, using VGGNet and ResNet architectures. Our method can prune 80% of the weights of a VGG-16 network on ImageNet at initialization, with only a 1.6% drop in top-1 accuracy. Moreover, our method achieves significantly better performance than the baseline at extreme sparsity levels. |
Tasks | Network Pruning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkgsACVKPH |
https://openreview.net/pdf?id=SkgsACVKPH | |
PWC | https://paperswithcode.com/paper/picking-winning-tickets-before-training-by |
Repo | |
Framework | |