Paper Group NANR 37
Fully Quantized Transformer for Improved Translation. Scalable Neural Learning for Verifiable Consistency with Temporal Specifications. Cascade Style Transfer. Randomness in Deconvolutional Networks for Visual Representation. Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning. SINGLE PATH ONE-SHOT NEURAL ARCHITECTURE …
Fully Quantized Transformer for Improved Translation
Title | Fully Quantized Transformer for Improved Translation |
Authors | Anonymous |
Abstract | State-of-the-art neural machine translation methods employ massive amounts of parameters. Drastically reducing computational costs of such methods without affecting performance has been up to this point unsolved. In this work, we propose a quantization strategy tailored to the Transformer architecture. We evaluate our method on the WMT14 EN-FR and WMT14 EN-DE translation tasks and achieve state-of-the-art quantization results for the Transformer, obtaining no loss in BLEU scores compared to the non-quantized baseline. We further compress the Transformer by showing that, once the model is trained, a good portion of the nodes in the encoder can be removed without causing any loss in BLEU. |
Tasks | Machine Translation, Quantization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1eYGkBKDB |
https://openreview.net/pdf?id=B1eYGkBKDB | |
PWC | https://paperswithcode.com/paper/fully-quantized-transformer-for-improved-1 |
Repo | |
Framework | |
Scalable Neural Learning for Verifiable Consistency with Temporal Specifications
Title | Scalable Neural Learning for Verifiable Consistency with Temporal Specifications |
Authors | Anonymous |
Abstract | Formal verification of machine learning models has attracted attention recently, and significant progress has been made on proving simple properties like robustness to small perturbations of the input features. In this context, it has also been observed that folding the verification procedure into training makes it easier to train verifiably robust models. In this paper, we extend the applicability of verified training by extending it to (1) recurrent neural network architectures and (2) complex specifications that go beyond simple adversarial robustness, particularly specifications that capture temporal properties like requiring that a robot periodically visits a charging station or that a language model always produces sentences of bounded length. Experiments show that while models trained using standard training often violate desired specifications, our verified training method produces models that both perform well (in terms of test error or reward) and can be shown to be provably consistent with specifications. |
Tasks | Language Modelling |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BklC2RNKDS |
https://openreview.net/pdf?id=BklC2RNKDS | |
PWC | https://paperswithcode.com/paper/scalable-neural-learning-for-verifiable |
Repo | |
Framework | |
Cascade Style Transfer
Title | Cascade Style Transfer |
Authors | Zhizhong Wang, Lei Zhao, Qihang Mo, Sihuan Lin, Zhiwen Zuo, Wei Xing, Dongming Lu |
Abstract | Recent studies have made tremendous progress in style transfer for specific domains, e.g., artistic, semantic and photo-realistic. However, existing approaches have limited flexibility in extending to other domains, as different style representations are often specific to particular domains. This also limits the stylistic quality. To address these limitations, we propose Cascade Style Transfer, a simple yet effective framework that can improve the quality and flexibility of style transfer by combining multiple existing approaches directly. Our cascade framework contains two architectures, i.e., Serial Style Transfer (SST) and Parallel Style Transfer (PST). The SST takes the stylized output of one method as the input content of the others. This could help improve the stylistic quality. The PST uses a shared backbone and a loss module to optimize the loss functions of different methods in parallel. This could help improve the quality and flexibility, and guide us to find domain-independent approaches. Our experiments are conducted on three major style transfer domains: artistic, semantic and photo-realistic. In all these domains, our methods have shown superiority over the state-of-the-art methods. |
Tasks | Style Transfer |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJeuKnEtDH |
https://openreview.net/pdf?id=BJeuKnEtDH | |
PWC | https://paperswithcode.com/paper/cascade-style-transfer |
Repo | |
Framework | |
Randomness in Deconvolutional Networks for Visual Representation
Title | Randomness in Deconvolutional Networks for Visual Representation |
Authors | Anonymous |
Abstract | To understand the inner work of deep neural networks and provide possible theoretical explanations, we study the deep representations through the untrained, random weight CNN-DCN architecture. As a convolutional AutoEncoder, CNN indicates the portion of a convolutional neural network from the input to an intermediate convolutional layer, and DCN indicates the corresponding deconvolutional portion. As compared with DCN training for pre-trained CNN, training the DCN for random-weight CNN converges more quickly and yields higher quality image reconstruction. Then, what happens for the overall random CNN-DCN? We gain intriguing results that the image can be reconstructed with good quality. To gain more insight on the intermediate random representation, we investigate the impact of network width versus depth, number of random channels, and size of random kernels on the reconstruction quality, and provide theoretical justifications on empirical observations. We further provide a fast style transfer application using the random weight CNN-DCN architecture to show the potential of our observation. |
Tasks | Image Reconstruction, Style Transfer |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1xTMyHYwB |
https://openreview.net/pdf?id=S1xTMyHYwB | |
PWC | https://paperswithcode.com/paper/randomness-in-deconvolutional-networks-for-1 |
Repo | |
Framework | |
Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning
Title | Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning |
Authors | Anonymous |
Abstract | We demonstrate how to learn efficient heuristics for automated reasoning algorithms for quantified Boolean formulas through deep reinforcement learning. We focus on a backtracking search algorithm, which can already solve formulas of impressive size - up to hundreds of thousands of variables. The main challenge is to find a representation of these formulas that lends itself to making predictions in a scalable way. For a family of challenging problems, we learned a heuristic that solves significantly more formulas compared to the existing handwritten heuristics. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJluxREKDB |
https://openreview.net/pdf?id=BJluxREKDB | |
PWC | https://paperswithcode.com/paper/learning-heuristics-for-quantified-boolean |
Repo | |
Framework | |
SINGLE PATH ONE-SHOT NEURAL ARCHITECTURE SEARCH WITH UNIFORM SAMPLING
Title | SINGLE PATH ONE-SHOT NEURAL ARCHITECTURE SEARCH WITH UNIFORM SAMPLING |
Authors | Anonymous |
Abstract | We revisit the one-shot Neural Architecture Search (NAS) paradigm and analyze its advantages over existing NAS approaches. Existing one-shot method (Benderet al., 2018), however, is hard to train and not yet effective on large scale datasets like ImageNet. This work propose a Single Path One-Shot model to address the challenge in the training. Our central idea is to construct a simplified supernet, where all architectures are single paths so that weight co-adaption problem is alleviated. Training is performed by uniform path sampling. All architectures (and their weights) are trained fully and equally. Comprehensive experiments verify that our approach is flexible and effective. It is easy to train and fast to search. It effortlessly supports complex search spaces(e.g., building blocks, channel, mixed-precision quantization) and different search constraints (e.g., FLOPs, latency). It is thus convenient to use for various needs. It achieves start-of-the-art performance on the large dataset ImageNet. |
Tasks | Neural Architecture Search, Quantization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1gPoCEKvH |
https://openreview.net/pdf?id=r1gPoCEKvH | |
PWC | https://paperswithcode.com/paper/single-path-one-shot-neural-architecture-1 |
Repo | |
Framework | |
Mirror Descent View For Neural Network Quantization
Title | Mirror Descent View For Neural Network Quantization |
Authors | Anonymous |
Abstract | Quantizing large Neural Networks (NN) while maintaining the performance is highly desirable for resource-limited devices due to reduced memory and time complexity. NN quantization is usually formulated as a constrained optimization problem and optimized via a modified version of gradient descent. In this work, by interpreting the continuous parameters (unconstrained) as the dual of the quantized ones, we introduce a Mirror Descent (MD) framework (Bubeck (2015)) for NN quantization. Specifically, we provide conditions on the projections (i.e., mapping from continuous to quantized ones) which would enable us to derive valid mirror maps and in turn the respective MD updates. Furthermore, we discuss a numerically stable implementation of MD by storing an additional set of auxiliary dual variables (continuous). This update is strikingly analogous to the popular Straight Through Estimator (STE) based method which is typically viewed as a “trick” to avoid vanishing gradients issue but here we show that it is an implementation method for MD for certain projections. Our experiments on standard classification datasets (CIFAR-10/100, TinyImageNet) with convolutional and residual architectures show that our MD variants obtain fully-quantized networks with accuracies very close to the floating-point networks. |
Tasks | Quantization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1lUl6NFDH |
https://openreview.net/pdf?id=r1lUl6NFDH | |
PWC | https://paperswithcode.com/paper/mirror-descent-view-for-neural-network-1 |
Repo | |
Framework | |
Symmetric-APL Activations: Training Insights and Robustness to Adversarial Attacks
Title | Symmetric-APL Activations: Training Insights and Robustness to Adversarial Attacks |
Authors | Anonymous |
Abstract | Deep neural networks with learnable activation functions have shown superior performance over deep neural networks with fixed activation functions for many different problems. The adaptability of learnable activation functions adds expressive power to the model which results in better performance. Here, we propose a new learnable activation function based on Adaptive Piecewise Linear units (APL), which 1) gives equal expressive power to both the positive and negative halves on the input space and 2) is able to approximate any zero-centered continuous non-linearity in a closed interval. We investigate how the shape of the Symmetric-APL function changes during training and perform ablation studies to gain insight into the reason behind these changes. We hypothesize that these activation functions go through two distinct stages: 1) adding gradient information and 2) adding expressive power. Finally, we show that the use of Symmetric-APL activations can significantly increase the robustness of deep neural networks to adversarial attacks. Our experiments on both black-box and open-box adversarial attacks show that commonly-used architectures, namely Lenet, Network-in-Network, and ResNet-18 can be up to 51% more resistant to adversarial fooling by only using the proposed activation functions instead of ReLUs. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1ecVlrtDr |
https://openreview.net/pdf?id=B1ecVlrtDr | |
PWC | https://paperswithcode.com/paper/symmetric-apl-activations-training-insights |
Repo | |
Framework | |
Sparsity Learning in Deep Neural Networks
Title | Sparsity Learning in Deep Neural Networks |
Authors | Anonymous |
Abstract | The main goal of network pruning is imposing sparsity on the neural network by increasing the number of parameters with zero value in order to reduce the architecture size and the computational speedup. |
Tasks | Network Pruning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1ljpTEFPB |
https://openreview.net/pdf?id=S1ljpTEFPB | |
PWC | https://paperswithcode.com/paper/sparsity-learning-in-deep-neural-networks |
Repo | |
Framework | |
DeepXML: Scalable & Accurate Deep Extreme Classification for Matching User Queries to Advertiser Bid Phrases
Title | DeepXML: Scalable & Accurate Deep Extreme Classification for Matching User Queries to Advertiser Bid Phrases |
Authors | Anonymous |
Abstract | The objective in deep extreme multi-label learning is to jointly learn feature representations and classifiers to automatically tag data points with the most relevant subset of labels from an extremely large label set. Unfortunately, state-of-the-art deep extreme classifiers are either not scalable or inaccurate for short text documents. This paper develops the DeepXML algorithm which addresses both limitations by introducing a novel architecture that splits training of head and tail labels. DeepXML increases accuracy by (a) learning word embeddings on head labels and transferring them through a novel residual connection to data impoverished tail labels; (b) increasing the amount of negative training data available by extending state-of-the-art negative sub-sampling techniques; and (c) re-ranking the set of predicted labels to eliminate the hardest negatives for the original classifier. All of these contributions are implemented efficiently by extending the highly scalable Slice algorithm for pretrained embeddings to learn the proposed DeepXML architecture. As a result, DeepXML could efficiently scale to problems involving millions of labels that were beyond the pale of state-of-the-art deep extreme classifiers as it could be more than 10x faster at training than XML-CNN and AttentionXML. At the same time, DeepXML was also empirically determined to be up to 19% more accurate than leading techniques for matching search engine queries to advertiser bid phrases. |
Tasks | Learning Word Embeddings, Multi-Label Learning, Word Embeddings |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJlWyerFPS |
https://openreview.net/pdf?id=SJlWyerFPS | |
PWC | https://paperswithcode.com/paper/deepxml-scalable-accurate-deep-extreme |
Repo | |
Framework | |
AdaGAN: Adaptive GAN for Many-to-Many Non-Parallel Voice Conversion
Title | AdaGAN: Adaptive GAN for Many-to-Many Non-Parallel Voice Conversion |
Authors | Anonymous |
Abstract | Voice Conversion (VC) is a task of converting perceived speaker identity from a source speaker to a particular target speaker. Earlier approaches in the literature primarily find a mapping between the given source-target speaker-pairs. Developing mapping techniques for many-to-many VC using non-parallel data, including zero-shot learning remains less explored areas in VC. Most of the many-to-many VC architectures require training data from all the target speakers for whom we want to convert the voices. In this paper, we propose a novel style transfer architecture, which can also be extended to generate voices even for target speakers whose data were not used in the training (i.e., case of zero-shot learning). In particular, propose Adaptive Generative Adversarial Network (AdaGAN), new architectural training procedure help in learning normalized speaker-independent latent representation, which will be used to generate speech with different speaking styles in the context of VC. We compare our results with the state-of-the-art StarGAN-VC architecture. In particular, the AdaGAN achieves 31.73%, and 10.37% relative improvement compared to the StarGAN in MOS tests for speech quality and speaker similarity, respectively. The key strength of the proposed architectures is that it yields these results with less computational complexity. AdaGAN is 88.6% less complex than StarGAN-VC in terms of FLoating Operation Per Second (FLOPS), and 85.46% less complex in terms of trainable parameters. |
Tasks | Style Transfer, Voice Conversion, Zero-Shot Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJlk-eHFwH |
https://openreview.net/pdf?id=HJlk-eHFwH | |
PWC | https://paperswithcode.com/paper/adagan-adaptive-gan-for-many-to-many-non |
Repo | |
Framework | |
Model Imitation for Model-Based Reinforcement Learning
Title | Model Imitation for Model-Based Reinforcement Learning |
Authors | Anonymous |
Abstract | Model-based reinforcement learning (MBRL) aims to learn a dynamic model to reduce the number of interactions with real-world environments. However, due to estimation error, rollouts in the learned model, especially those of long horizon, fail to match the ones in real-world environments. This mismatching has seriously impacted the sample complexity of MBRL. The phenomenon can be attributed to the fact that previous works employ supervised learning to learn the one-step transition models, which has inherent difficulty ensuring the matching of distributions from multi-step rollouts. Based on the claim, we propose to learn the synthesized model by matching the distributions of multi-step rollouts sampled from the synthesized model and the real ones via WGAN. We theoretically show that matching the two can minimize the difference of cumulative rewards between the real transition and the learned one. Our experiments also show that the proposed model imitation method outperforms the state-of-the-art in terms of sample complexity and average return. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1lJv0VYDr |
https://openreview.net/pdf?id=S1lJv0VYDr | |
PWC | https://paperswithcode.com/paper/model-imitation-for-model-based-reinforcement-1 |
Repo | |
Framework | |
EDUCE: Explaining model Decision through Unsupervised Concepts Extraction
Title | EDUCE: Explaining model Decision through Unsupervised Concepts Extraction |
Authors | Anonymous |
Abstract | Providing explanations along with predictions is crucial in some text processing tasks. Therefore, we propose a new self-interpretable model that performs output prediction and simultaneously provides an explanation in terms of the presence of particular concepts in the input. To do so, our model’s prediction relies solely on a low-dimensional binary representation of the input, where each feature denotes the presence or absence of concepts. The presence of a concept is decided from an excerpt i.e. a small sequence of consecutive words in the text. Relevant concepts for the prediction task at hand are automatically defined by our model, avoiding the need for concept-level annotations. To ease interpretability, we enforce that for each concept, the corresponding excerpts share similar semantics and are differentiable from each others. We experimentally demonstrate the relevance of our approach on text classification and multi-sentiment analysis tasks. |
Tasks | Sentiment Analysis, Text Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1gnxaVFDB |
https://openreview.net/pdf?id=S1gnxaVFDB | |
PWC | https://paperswithcode.com/paper/educe-explaining-model-decision-through |
Repo | |
Framework | |
Task-Based Top-Down Modulation Network for Multi-Task-Learning Applications
Title | Task-Based Top-Down Modulation Network for Multi-Task-Learning Applications |
Authors | Anonymous |
Abstract | A general problem that received considerable recent attention is how to perform multiple tasks in the same network, maximizing both efficiency and prediction accuracy. A popular approach consists of a multi-branch architecture on top of a shared backbone, jointly trained on a weighted sum of losses. However, in many cases, the shared representation results in non-optimal performance, mainly due to an interference between conflicting gradients of uncorrelated tasks. Recent approaches address this problem by a channel-wise modulation of the feature-maps along the shared backbone, with task specific vectors, manually or dynamically tuned. Taking this approach a step further, we propose a novel architecture which modulate the recognition network channel-wise, as well as spatial-wise, with an efficient top-down image-dependent computation scheme. Our architecture uses no task-specific branches, nor task specific modules. Instead, it uses a top-down modulation network that is shared between all of the tasks. We show the effectiveness of our scheme by achieving on par or better results than alternative approaches on both correlated and uncorrelated sets of tasks. We also demonstrate our advantages in terms of model size, the addition of novel tasks and interpretability. Code will be released. |
Tasks | Multi-Task Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BklBp6EYvB |
https://openreview.net/pdf?id=BklBp6EYvB | |
PWC | https://paperswithcode.com/paper/task-based-top-down-modulation-network-for |
Repo | |
Framework | |
Quantifying uncertainty with GAN-based priors
Title | Quantifying uncertainty with GAN-based priors |
Authors | Anonymous |
Abstract | Bayesian inference is used extensively to quantify the uncertainty in an inferred field given the measurement of a related field when the two are linked by a mathematical model. Despite its many applications, Bayesian inference faces challenges when inferring fields that have discrete representations of large dimension, and/or have prior distributions that are difficult to characterize mathematically. In this work we demonstrate how the approximate distribution learned by a generative adversarial network (GAN) may be used as a prior in a Bayesian update to address both these challenges. We demonstrate the efficacy of this approach by inferring and quantifying uncertainty in inference problems arising in computer vision and physics-based applications. In both instances we highlight the role of computing uncertainty in providing a measure of confidence in the solution, and in designing successive measurements to improve this confidence. |
Tasks | Bayesian Inference |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyeAPeBFwS |
https://openreview.net/pdf?id=HyeAPeBFwS | |
PWC | https://paperswithcode.com/paper/quantifying-uncertainty-with-gan-based-priors |
Repo | |
Framework | |