April 1, 2020

3016 words 15 mins read

Paper Group NANR 132

FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary. ISBNet: Instance-aware Selective Branching Networks. Compositional Continual Language Learning. Testing Robustness Against Unforeseen Adversaries. Transfer Active Learning For Graph Neural Networks. Adaptive Online Planning for Continual Lifelong Learning. Symmetry and Syst …

FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary


Title	FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary
Authors	Anonymous
Abstract	We present a novel method of compression of deep Convolutional Neural Networks (CNNs) by weight sharing through a new representation of convolutional filters. The proposed method reduces the number of parameters of each convolutional layer by learning a 1D vector termed Filter Summary (FS). The convolutional filters are located in FS as overlapping 1D segments, and nearby filters in FS share weights in their overlapping regions in a natural way. The resultant neural network based on such weight sharing scheme, termed Filter Summary CNNs or FSNet, has a FS in each convolution layer instead of a set of independent filters in the conventional convolution layer. FSNet has the same architecture as that of the baseline CNN to be compressed, and each convolution layer of FSNet has the same number of filters from FS as that of the basline CNN in the forward process. With compelling computational acceleration ratio, the parameter space of FSNet is much smaller than that of the baseline CNN. In addition, FSNet is quantization friendly. FSNet with weight quantization leads to even higher compression ratio without noticeable performance loss. We further propose Differentiable FSNet where the way filters share weights is learned in a differentiable and end-to-end manner. Experiments demonstrate the effectiveness of FSNet in compression of CNNs for computer vision tasks including image classification and object detection, and the effectiveness of DFSNet is evidenced by the task of Neural Architecture Search.
Tasks	Image Classification, Neural Architecture Search, Object Detection, Quantization
Published	2020-01-01
URL	https://openreview.net/forum?id=S1xtORNFwH
PDF	https://openreview.net/pdf?id=S1xtORNFwH
PWC	https://paperswithcode.com/paper/fsnet-compression-of-deep-convolutional-1
Repo
Framework

ISBNet: Instance-aware Selective Branching Networks


Title	ISBNet: Instance-aware Selective Branching Networks
Authors	Shaofeng Cai, Yao Shu, Wei Wang, Gang Chen, Beng Chin Ooi
Abstract	Recent years have witnessed growing interests in designing efficient neural networks and neural architecture search (NAS). Although remarkable efficiency and accuracy have been achieved, existing expert designed and NAS models neglect the fact that input instances are of varying complexity and thus different amounts of computation are required. Inference with a fixed model that processes all instances through the same transformations would incur computational resources unnecessarily. Customizing the model capacity in an instance-aware manner is required to alleviate such a problem. In this paper, we propose a novel Instance-aware Selective Branching Network-ISBNet to support efficient instance-level inference by selectively bypassing transformation branches of insignificant importance weight. These weights are dynamically determined by a lightweight hypernetwork SelectionNet and recalibrated by gumbel-softmax for sparse branch selection. Extensive experiments show that ISBNet achieves extremely efficient inference in terms of parameter size and FLOPs comparing to existing networks. For example, ISBNet takes only 8.70% parameters and 31.01% FLOPs of the efficient network MobileNetV2 with comparable accuracy on CIFAR-10.
Tasks	Neural Architecture Search
Published	2020-01-01
URL	https://openreview.net/forum?id=rklz16Vtvr
PDF	https://openreview.net/pdf?id=rklz16Vtvr
PWC	https://paperswithcode.com/paper/isbnet-instance-aware-selective-branching-1
Repo
Framework

Compositional Continual Language Learning


Title	Compositional Continual Language Learning
Authors	Anonymous
Abstract	Motivated by the human’s ability to continually learn and gain knowledge over time, several research efforts have been pushing the limits of machines to constantly learn while alleviating catastrophic forgetting; significant drop of a machine skill accessed/gained far earlier in time. Most of the existing methods have been focusing on label prediction tasks to study continual learning. Humans, however, naturally interact and learn from natural language statements and instructions which is far less studied from continual learning angle. One of the key skills that enables humans to excel at learning language efficiently is ability to produce novel composition. To learn and complete new tasks, robots need to continually learn novel objects and concepts in a linguistic form which requires compositionality for efficient learning. Inspired by that, in this paper, we propose a method for compositional continual learning of sequence-to-sequence models. Experimental results show that the proposed method has significant improvement over state of the art methods, and it enables knowledge transfer and prevents catastrophic forgetting, resulting in more than 85% accuracy up to 100 stages, compared with less 50% accuracy for baselines. It also shows significant improvement in a machine translation task. This is the first work to combine continual learning and compositionality for natural language instruction learning, and we hope this work will make robots more helpful in various tasks.
Tasks	Continual Learning, Machine Translation, Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rklnDgHtDS
PDF	https://openreview.net/pdf?id=rklnDgHtDS
PWC	https://paperswithcode.com/paper/compositional-continual-language-learning
Repo
Framework

Testing Robustness Against Unforeseen Adversaries


Title	Testing Robustness Against Unforeseen Adversaries
Authors	Anonymous
Abstract	Most existing defenses against adversarial attacks only consider robustness to L_p-bounded distortions. In reality, the specific attack is rarely known in advance and adversaries are free to modify images in ways which lie outside any fixed distortion model; for example, adversarial rotations lie outside the set of L_p-bounded distortions. In this work, we advocate measuring robustness against a much broader range of unforeseen attacks, attacks whose precise form is unknown during defense design. We propose several new attacks and a methodology for evaluating a defense against a diverse range of unforeseen distortions. First, we construct novel adversarial JPEG, Fog, Gabor, and Snow distortions to simulate more diverse adversaries. We then introduce UAR, a summary metric that measures the robustness of a defense against a given distortion. Using UAR to assess robustness against existing and novel attacks, we perform an extensive study of adversarial robustness. We find that evaluation against existing L_p attacks yields redundant information which does not generalize to other attacks; we instead recommend evaluating against our significantly more diverse set of attacks. We further find that adversarial training against either one or multiple distortions fails to confer robustness to attacks with other distortion types. These results underscore the need to evaluate and study robustness against unforeseen distortions.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Hyl5V0EYvB
PDF	https://openreview.net/pdf?id=Hyl5V0EYvB
PWC	https://paperswithcode.com/paper/testing-robustness-against-unforeseen
Repo
Framework

Transfer Active Learning For Graph Neural Networks


Title	Transfer Active Learning For Graph Neural Networks
Authors	Anonymous
Abstract	Graph neural networks have been proved very effective for a variety of prediction tasks on graphs such as node classification. Generally, a large number of labeled data are required to train these networks. However, in reality it could be very expensive to obtain a large number of labeled data on large-scale graphs. In this paper, we studied active learning for graph neural networks, i.e., how to effectively label the nodes on a graph for training graph neural networks. We formulated the problem as a sequential decision process, which sequentially label informative nodes, and trained a policy network to maximize the performance of graph neural networks for a specific task. Moreover, we also studied how to learn a universal policy for labeling nodes on graphs with multiple training graphs and then transfer the learned policy to unseen graphs. Experimental results on both settings of a single graph and multiple training graphs (transfer learning setting) prove the effectiveness of our proposed approaches over many competitive baselines.
Tasks	Active Learning, Node Classification, Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=BklOXeBFDS
PDF	https://openreview.net/pdf?id=BklOXeBFDS
PWC	https://paperswithcode.com/paper/transfer-active-learning-for-graph-neural
Repo
Framework

Adaptive Online Planning for Continual Lifelong Learning


Title	Adaptive Online Planning for Continual Lifelong Learning
Authors	Anonymous
Abstract	We study learning control in an online lifelong learning scenario, where mistakes can compound catastrophically into the future and the underlying dynamics of the environment may change. Traditional model-free policy learning methods have achieved successes in difficult tasks due to their broad flexibility, and capably condense broad experiences into compact networks, but struggle in this setting, as they can activate failure modes early in their lifetimes which are difficult to recover from and face performance degradation as dynamics change. On the other hand, model-based planning methods learn and adapt quickly, but require prohibitive levels of computational resources. Under constrained computation limits, the agent must allocate its resources wisely, which requires the agent to understand both its own performance and the current state of the environment: knowing that its mastery over control in the current dynamics is poor, the agent should dedicate more time to planning. We present a new algorithm, Adaptive Online Planning (AOP), that achieves strong performance in this setting by combining model-based planning with model-free learning. By measuring the performance of the planner and the uncertainty of the model-free components, AOP is able to call upon more extensive planning only when necessary, leading to reduced computation times. We show that AOP gracefully deals with novel situations, adapting behaviors and policies effectively in the face of unpredictable changes in the world – challenges that a continual learning agent naturally faces over an extended lifetime – even when traditional reinforcement learning methods fail.
Tasks	Continual Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HkgFDgSYPH
PDF	https://openreview.net/pdf?id=HkgFDgSYPH
PWC	https://paperswithcode.com/paper/adaptive-online-planning-for-continual
Repo
Framework

Symmetry and Systematicity


Title	Symmetry and Systematicity
Authors	Anonymous
Abstract	We argue that symmetry is an important consideration in addressing the problem of systematicity and investigate two forms of symmetry relevant to symbolic processes. We implement this approach in terms of convolution and show that it can be used to achieve effective generalisation in three toy problems: rule learning, composition and grammar learning.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BylWglrYPH
PDF	https://openreview.net/pdf?id=BylWglrYPH
PWC	https://paperswithcode.com/paper/symmetry-and-systematicity
Repo
Framework

Alternating Recurrent Dialog Model with Large-Scale Pre-Trained Language Models


Title	Alternating Recurrent Dialog Model with Large-Scale Pre-Trained Language Models
Authors	Anonymous
Abstract	Existing dialog system models require extensive human annotations and are difficult to generalize to different tasks. The recent success of large pre-trained language models such as BERT and GPT-2 have suggested the effectiveness of incorporating language priors in down-stream NLP tasks. However, how much pre-trained language models can help dialog response generation is still under exploration. In this paper, we propose a simple, general, and effective framework: Alternating Recurrent Dialog Model (ARDM). ARDM models each speaker separately and takes advantage of the large pre-trained language model. It requires no supervision from human annotations such as belief states or dialog acts to achieve effective conversations. ARDM outperforms or is on par with state-of-the-art methods on two popular task-oriented dialog datasets: CamRest676 and MultiWOZ. Moreover, we can generalize ARDM to more challenging, non-collaborative tasks such as persuasion. In persuasion tasks, ARDM is capable of generating human-like responses to persuade people to donate to a charity.
Tasks	Language Modelling
Published	2020-01-01
URL	https://openreview.net/forum?id=HkeSdCEtDS
PDF	https://openreview.net/pdf?id=HkeSdCEtDS
PWC	https://paperswithcode.com/paper/alternating-recurrent-dialog-model-with-large-1
Repo
Framework

Depth-Adaptive Transformer


Title	Depth-Adaptive Transformer
Authors	Anonymous
Abstract	State of the art sequence-to-sequence models perform a fixed number of computations for each input sequence regardless of whether it is easy or hard to process. In this paper, we train Transformer models which can make output predictions at different stages of the network and we investigate different ways to predict how much computation is required for a particular sequence. Unlike dynamic computation in Universal Transformers, which applies the same set of layers iteratively, we apply different layers at every step to adjust both the amount of computation as well as the model capacity. Experiments on machine translation benchmarks show that this approach can match the accuracy of a baseline Transformer while using only half the number of decoder layers.
Tasks	Machine Translation
Published	2020-01-01
URL	https://openreview.net/forum?id=SJg7KhVKPH
PDF	https://openreview.net/pdf?id=SJg7KhVKPH
PWC	https://paperswithcode.com/paper/depth-adaptive-transformer-1
Repo
Framework

Informed Temporal Modeling via Logical Specification of Factorial LSTMs


Title	Informed Temporal Modeling via Logical Specification of Factorial LSTMs
Authors	Anonymous
Abstract	Consider a world in which events occur that involve various entities. Learning how to predict future events from patterns of past events becomes more difficult as we consider more types of events. Many of the patterns detected in the dataset by an ordinary LSTM will be spurious since the number of potential pairwise correlations, for example, grows quadratically with the number of events. We propose a type of factorial LSTM architecture where different blocks of LSTM cells are responsible for capturing different aspects of the world state. We use Datalog rules to specify how to derive the LSTM structure from a database of facts about the entities in the world. This is analogous to how a probabilistic relational model (Getoor & Taskar, 2007) specifies a recipe for deriving a graphical model structure from a database. In both cases, the goal is to obtain useful inductive biases by encoding informed independence assumptions into the model. We specifically consider the neural Hawkes process, which uses an LSTM to modulate the rate of instantaneous events in continuous time. In both synthetic and real-world domains, we show that we obtain better generalization by using appropriate factorial designs specified by simple Datalog programs.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=S1ghzlHFPS
PDF	https://openreview.net/pdf?id=S1ghzlHFPS
PWC	https://paperswithcode.com/paper/informed-temporal-modeling-via-logical
Repo
Framework

CP-GAN: Towards a Better Global Landscape of GANs


Title	CP-GAN: Towards a Better Global Landscape of GANs
Authors	Anonymous
Abstract	GANs have been very popular in data generation and unsupervised learning, but our understanding of GAN training is still very limited. One major reason is that GANs are often formulated as non-convex-concave min-max optimization. As a result, most recent studies focused on the analysis in the local region around the equilibrium. In this work, we perform a global analysis of GANs from two perspectives: the global landscape of the outer-optimization problem and the global behavior of the gradient descent dynamics. We find that the original GAN has exponentially many bad strict local minima which are perceived as mode-collapse, and the training dynamics (with linear discriminators) cannot escape mode collapse. To address these issues, we propose a simple modification to the original GAN, by coupling the generated samples and the true samples. We prove that the new formulation has no bad basins, and its training dynamics (with linear discriminators) has a Lyapunov function that leads to global convergence. Our experiments on standard datasets show that this simple loss outperforms the original GAN and WGAN-GP.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HylA41Btwr
PDF	https://openreview.net/pdf?id=HylA41Btwr
PWC	https://paperswithcode.com/paper/cp-gan-towards-a-better-global-landscape-of
Repo
Framework

Removing input features via a generative model to explain their attributions to classifier’s decisions


Title	Removing input features via a generative model to explain their attributions to classifier’s decisions
Authors	Anonymous
Abstract	Interpretability methods often measure the contribution of an input feature to an image classifier’s decisions by heuristically removing it via e.g. blurring, adding noise, or graying out, which often produce unrealistic, out-of-samples. Instead, we propose to integrate a generative inpainter into three representative attribution map methods as a mechanism for removing input features. Compared to the original counterparts, our methods (1) generate more plausible counterfactual samples under the true data generating process; (2) are more robust to hyperparameter settings; and (3) localize objects more accurately. Our findings were consistent across both ImageNet and Places365 datasets and two different pairs of classifiers and inpainters.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1eLVxrKwS
PDF	https://openreview.net/pdf?id=H1eLVxrKwS
PWC	https://paperswithcode.com/paper/removing-input-features-via-a-generative
Repo
Framework

Lossless Data Compression with Transformer


Title	Lossless Data Compression with Transformer
Authors	Anonymous
Abstract	Transformers have replaced long-short term memory and other recurrent neural networks variants in sequence modeling. It achieves state-of-the-art performance on a wide range of tasks related to natural language processing, including language modeling, machine translation, and sentence representation. Lossless compression is another problem that can benefit from better sequence models. It is closely related to the problem of online learning of language models. But, despite this ressemblance, it is an area where purely neural network based methods have not yet reached the compression ratio of state-of-the-art algorithms. In this paper, we propose a Transformer based lossless compression method that match the best compression ratio for text. Our approach is purely based on neural networks and does not rely on hand-crafted features as other lossless compression algorithms. We also provide a thorough study of the impact of the different components of the Transformer and its training on the compression ratio.
Tasks	Language Modelling, Machine Translation
Published	2020-01-01
URL	https://openreview.net/forum?id=Hygi7xStvS
PDF	https://openreview.net/pdf?id=Hygi7xStvS
PWC	https://paperswithcode.com/paper/lossless-data-compression-with-transformer
Repo
Framework

Mixed Setting Training Methods for Incremental Slot-Filling Tasks


Title	Mixed Setting Training Methods for Incremental Slot-Filling Tasks
Authors	Anonymous
Abstract	Model training remains a dominant financial cost and time investment in machine learning applications. Developing and debugging models often involve iterative training, further exacerbating this issue. With growing interest in increasingly complex models, there is a need for techniques that help to reduce overall training effort. While incremental training can save substantial time and cost by training an existing model on a small subset of data, little work has explored policies for determining when incremental training provides adequate model performance versus full retraining. We provide a method-agnostic algorithm for deciding when to incrementally train versus fully train. We call this setting of non-deterministic full- or incremental training ``Mixed Setting Training”. Upon evaluation in slot-filling tasks, we find that this algorithm provides a bounded error, avoids catastrophic forgetting, and results in a significant speedup over a policy of always fully training. \|
Tasks	Slot Filling
Published	2020-01-01
URL	https://openreview.net/forum?id=rJg4GgHKPB
PDF	https://openreview.net/pdf?id=rJg4GgHKPB
PWC	https://paperswithcode.com/paper/mixed-setting-training-methods-for
Repo
Framework

FleXOR: Trainable Fractional Quantization


Title	FleXOR: Trainable Fractional Quantization
Authors	Anonymous
Abstract	Parameter quantization is a popular model compression technique due to its regular form and high compression ratio. In particular, quantization based on binary codes is gaining attention because each quantized bit can be directly utilized for computations without dequantization using look-up tables. Previous attempts, however, only allow for integer numbers of quantization bits, which ends up restricting the search space for compression ratio and accuracy. Moreover, quantization bits are usually obtained by minimizing quantization loss in a local manner that does not directly correspond to minimizing the loss function. In this paper, we propose an encryption algorithm/architecture to compress quantized weights in order to achieve fractional numbers of bits per weight and new compression configurations further optimize accuracy/compression trade-offs. Decryption is implemented using XOR gates added into the neural network model and described as $\tanh(x)$, which enable gradient calculations superior to the straight-through gradient method. We perform experiments using MNIST, CIFAR-10, and ImageNet to show that inserting XOR gates learns quantization/encrypted bit decisions through training and obtains high accuracy even for fractional sub 1-bit weights.
Tasks	Model Compression, Quantization
Published	2020-01-01
URL	https://openreview.net/forum?id=HJlQ96EtPr
PDF	https://openreview.net/pdf?id=HJlQ96EtPr
PWC	https://paperswithcode.com/paper/flexor-trainable-fractional-quantization
Repo
Framework