April 1, 2020

2934 words 14 mins read

Paper Group NANR 30

Parallel Scheduled Sampling. OBJECT-ORIENTED REPRESENTATION OF 3D SCENES. HaarPooling: Graph Pooling with Compressive Haar Basis. Meta-Q-Learning. Towards Understanding the Transferability of Deep Representations. Switched linear projections and inactive state sensitivity for deep neural network interpretability. Stochastically Controlled Compositi …

Parallel Scheduled Sampling


Title	Parallel Scheduled Sampling
Authors	Anonymous
Abstract	Auto-regressive models are widely used in sequence generation problems. The output sequence is typically generated in a predetermined order, one discrete unit(pixel or word or character) at a time. The models are trained by teacher-forcing where ground-truth history is fed to the model as input, which at test time is replaced by the model prediction. Scheduled Sampling (Bengio et al., 2015) aimsto mitigate this discrepancy between train and test time by randomly replacing some discrete units in the history with the model’s prediction. While teacher-forced training works well with ML accelerators as the computation can be parallelized across time, Scheduled Sampling involves undesirable sequential processing. In this paper, we introduce a simple technique to parallelize Scheduled Sampling across time. Experimentally, we find the proposed technique leads to equivalent or better performance on image generation, summarization, dialog generation, and translation compared to teacher-forced training. n dialog response generation task,Parallel Scheduled Sampling achieves 1.6 BLEU score (11.5%) improvement over teacher-forcing while in image generation it achieves 20% and 13.8% improvement in Frechet Inception Distance (FID) and Inception Score (IS) respectively. Further, we discuss the effects of different hyper-parameters associated with Scheduled Sampling on the model performance.
Tasks	Image Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=HkedQp4tPr
PDF	https://openreview.net/pdf?id=HkedQp4tPr
PWC	https://paperswithcode.com/paper/parallel-scheduled-sampling-1
Repo
Framework

OBJECT-ORIENTED REPRESENTATION OF 3D SCENES


Title	OBJECT-ORIENTED REPRESENTATION OF 3D SCENES
Authors	Anonymous
Abstract	In this paper, we propose a generative model, called ROOTS (Representation of Object-Oriented Three-dimension Scenes), for unsupervised object-wise 3D-scene decomposition and and rendering. For 3D scene modeling, ROOTS bases on the Generative Query Networks (GQN) framework, but unlike GQN, provides object-oriented representation decomposition. The inferred object-representation of ROOTS is 3D in the sense that it is viewpoint invariant as the full scene representation of GQN is so. ROOTS also provides hierarchical object-oriented representation: at 3D global-scene level and at 2D local-image level. We achieve this without performance degradation. In experiments on datasets of 3D rooms with multiple objects, we demonstrate the above properties by focusing on its abilities for disentanglement, compositionality, and generalization in comparison to GQN.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJg8_xHtPr
PDF	https://openreview.net/pdf?id=BJg8_xHtPr
PWC	https://paperswithcode.com/paper/object-oriented-representation-of-3d-scenes
Repo
Framework

HaarPooling: Graph Pooling with Compressive Haar Basis


Title	HaarPooling: Graph Pooling with Compressive Haar Basis
Authors	Anonymous
Abstract	Deep Graph Neural Networks (GNNs) are instrumental in graph classification and graph-based regression tasks. In these tasks, graph pooling is a critical ingredient by which GNNs adapt to input graphs of varying size and structure. We propose a new graph pooling operation based on compressive Haar transforms, called HaarPooling. HaarPooling is computed following a chain of sequential clusterings of the input graph. The input of each pooling layer is transformed by the compressive Haar basis of the corresponding clustering. HaarPooling operates in the frequency domain by the synthesis of nodes in the same cluster and filters out fine detail information by compressive Haar transforms. Such transforms provide an effective characterization of the data and preserve the structure information of the input graph. By the sparsity of the Haar basis, the computation of HaarPooling is of linear complexity. The GNN with HaarPooling and existing graph convolution layers achieves state-of-the-art performance on diverse graph classification problems.
Tasks	Graph Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=BJleph4KvS
PDF	https://openreview.net/pdf?id=BJleph4KvS
PWC	https://paperswithcode.com/paper/haarpooling-graph-pooling-with-compressive-1
Repo
Framework

Meta-Q-Learning


Title	Meta-Q-Learning
Authors	Anonymous
Abstract	This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas. First, we show that Q-learning is competitive with state of the art meta-RL algorithms if given access to a context variable that is a representation of the past trajectory. Second, using a multi-task objective to maximize the average reward across the training tasks is an effective method to meta-train RL policies. Third, past data from the meta-training replay buffer can be recycled to adapt the policy on a new task using off-policy updates. MQL draws upon ideas in propensity estimation to do so and thereby amplifies the amount of available data for adaptation. Experiments on standard continuous-control benchmarks suggest that MQL compares favorably with state of the art meta-RL algorithms.
Tasks	Continuous Control, Q-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=SJeD3CEFPH
PDF	https://openreview.net/pdf?id=SJeD3CEFPH
PWC	https://paperswithcode.com/paper/meta-q-learning-1
Repo
Framework

Towards Understanding the Transferability of Deep Representations


Title	Towards Understanding the Transferability of Deep Representations
Authors	Anonymous
Abstract	Deep neural networks trained on a wide range of datasets demonstrate impressive transferability. Deep features appear general in that they are applicable to many datasets and tasks. Such property is in prevalent use in real-world applications. A neural network pretrained on large datasets, such as ImageNet, can significantly boost generalization and accelerate training if fine-tuned to a smaller target dataset. Despite its pervasiveness, few effort has been devoted to uncovering the reason of transferability in deep feature representations. This paper tries to understand transferability from the perspectives of improved generalization, optimization and the feasibility of transferability. We demonstrate that 1) Transferred models tend to find flatter minima, since their weight matrices stay close to the original flat region of pretrained parameters when transferred to a similar target dataset; 2) Transferred representations make the loss landscape more favorable with improved Lipschitzness, which accelerates and stabilizes training substantially. The improvement largely attributes to the fact that the principal component of gradient is suppressed in the pretrained parameters, thus stabilizing the magnitude of gradient in back-propagation. 3) The feasibility of transferability is related to the similarity of both input and label. And a surprising discovery is that the feasibility is also impacted by the training stages in that the transferability first increases during training, and then declines. We further provide a theoretical analysis to verify our observations.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BylKL1SKvr
PDF	https://openreview.net/pdf?id=BylKL1SKvr
PWC	https://paperswithcode.com/paper/towards-understanding-the-transferability-of
Repo
Framework

Switched linear projections and inactive state sensitivity for deep neural network interpretability


Title	Switched linear projections and inactive state sensitivity for deep neural network interpretability
Authors	Anonymous
Abstract	We introduce switched linear projections for expressing the activity of a neuron in a ReLU-based deep neural network in terms of a single linear projection in the input space. The method works by isolating the active subnetwork, a series of linear transformations, that completely determine the entire computation of the deep network for a given input instance. We also propose that for interpretability it is more instructive and meaningful to focus on the patterns that deactive the neurons in the network, which are ignored by the exisiting methods that implicitly track only the active aspect of the network’s computation. We introduce a novel interpretability method for the inactive state sensitivity (Insens). Comparison against existing methods shows that Insens is more robust (in the presence of noise), more complete (in terms of patterns that affect the computation) and a very effective interpretability method for deep neural networks
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SyxjVRVKDB
PDF	https://openreview.net/pdf?id=SyxjVRVKDB
PWC	https://paperswithcode.com/paper/switched-linear-projections-and-inactive-1
Repo
Framework

Stochastically Controlled Compositional Gradient for the Composition problem


Title	Stochastically Controlled Compositional Gradient for the Composition problem
Authors	Anonymous
Abstract	We consider composition problems of the form $\frac{1}{n}\sum\nolimits_{i= 1}^n F_i(\frac{1}{n}\sum\nolimits_{j = 1}^n G_j(x))$. Composition optimization arises in many important machine learning applications: reinforcement learning, variance-aware learning, nonlinear embedding, and many others. Both gradient descent and stochastic gradient descent are straightforward solution, but both require to compute $\frac{1}{n}\sum\nolimits_{j = 1}^n{G_j( x )} $ in each single iteration, which is inefficient-especially when $n$ is large. Therefore, with the aim of significantly reducing the query complexity of such problems, we designed a stochastically controlled compositional gradient algorithm that incorporates two kinds of variance reduction techniques, and works in both strongly convex and non-convex settings. The strategy is also accompanied by a mini-batch version of the proposed method that improves query complexity with respect to the size of the mini-batch. Comprehensive experiments demonstrate the superiority of the proposed method over existing methods.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rygRP2VYwB
PDF	https://openreview.net/pdf?id=rygRP2VYwB
PWC	https://paperswithcode.com/paper/stochastically-controlled-compositional
Repo
Framework

Layerwise Learning Rates for Object Features in Unsupervised and Supervised Neural Networks And Consequent Predictions for the Infant Visual System


Title	Layerwise Learning Rates for Object Features in Unsupervised and Supervised Neural Networks And Consequent Predictions for the Infant Visual System
Authors	Anonymous
Abstract	To understand how object vision develops in infancy and childhood, it will be necessary to develop testable computational models. Deep neural networks (DNNs) have proven valuable as models of adult vision, but it is not yet clear if they have any value as models of development. As a first model, we measured learning in a DNN designed to mimic the architecture and representational geometry of the visual system (CORnet). We quantified the development of explicit object representations at each level of this network through training by freezing the convolutional layers and training an additional linear decoding layer. We evaluate decoding accuracy on the whole ImageNet validation set, and also for individual visual classes. CORnet, however, uses supervised training and because infants have only extremely impoverished access to labels they must instead learn in an unsupervised manner. We therefore also measured learning in a state-of-the-art unsupervised network (DeepCluster). CORnet and DeepCluster differ in both supervision and in the convolutional networks at their heart, thus to isolate the effect of supervision, we ran a control experiment in which we trained the convolutional network from DeepCluster (an AlexNet variant) in a supervised manner. We make predictions on how learning should develop across brain regions in infants. In all three networks, we also tested for a relationship in the order in which infants and machines acquire visual classes, and found only evidence for a counter-intuitive relationship. We discuss the potential reasons for this.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BkxoglrtvH
PDF	https://openreview.net/pdf?id=BkxoglrtvH
PWC	https://paperswithcode.com/paper/layerwise-learning-rates-for-object-features
Repo
Framework

A NEW POINTWISE CONVOLUTION IN DEEP NEURAL NETWORKS THROUGH EXTREMELY FAST AND NON PARAMETRIC TRANSFORMS


Title	A NEW POINTWISE CONVOLUTION IN DEEP NEURAL NETWORKS THROUGH EXTREMELY FAST AND NON PARAMETRIC TRANSFORMS
Authors	Anonymous
Abstract	Some conventional transforms such as Discrete Walsh-Hadamard Transform (DWHT) and Discrete Cosine Transform (DCT) have been widely used as feature extractors in image processing but rarely applied in neural networks. However, we found that these conventional transforms have the ability to capture the cross-channel correlations without any learnable parameters in DNNs. This paper firstly proposes to apply conventional transforms on pointwise convolution, showing that such transforms significantly reduce the computational complexity of neural networks without accuracy performance degradation. Especially for DWHT, it requires no floating point multiplications but only additions and subtractions, which can considerably reduce computation overheads. In addition, its fast algorithm further reduces complexity of floating point addition from O(n^2) to O(nlog n). These non-parametric and low computational properties construct extremely efficient networks in the number parameters and operations, enjoying accuracy gain. Our proposed DWHT-based model gained 1.49% accuracy increase with 79.4% reduced parameters and 48.4% reduced FLOPs compared with its baseline model (MoblieNet-V1) on the CIFAR 100 dataset.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1l0O6EYDH
PDF	https://openreview.net/pdf?id=H1l0O6EYDH
PWC	https://paperswithcode.com/paper/a-new-pointwise-convolution-in-deep-neural
Repo
Framework

Efficient and Information-Preserving Future Frame Prediction and Beyond


Title	Efficient and Information-Preserving Future Frame Prediction and Beyond
Authors	Anonymous
Abstract	Applying resolution-preserving blocks is a common practice to maximize information preservation in video prediction, yet their high memory consumption greatly limits their application scenarios. We propose CrevNet, a Conditionally Reversible Network that uses reversible architectures to build a bijective two-way autoencoder and its complementary recurrent predictor. Our model enjoys the theoretically guaranteed property of no information loss during the feature extraction, much lower memory consumption and computational efficiency. The lightweight nature of our model enables us to incorporate 3D convolutions without concern of memory bottleneck, enhancing the model’s ability to capture both short-term and long-term temporal dependencies. Our proposed approach achieves state-of-the-art results on Moving MNIST, Traffic4cast and KITTI datasets. We further demonstrate the transferability of our self-supervised learning method by exploiting its learnt features for object detection on KITTI. Our competitive results indicate the potential of using CrevNet as a generative pre-training strategy to guide downstream tasks.
Tasks	Object Detection, Video Prediction
Published	2020-01-01
URL	https://openreview.net/forum?id=B1eY_pVYvB
PDF	https://openreview.net/pdf?id=B1eY_pVYvB
PWC	https://paperswithcode.com/paper/efficient-and-information-preserving-future
Repo
Framework

Deep exploration by novelty-pursuit with maximum state entropy


Title	Deep exploration by novelty-pursuit with maximum state entropy
Authors	Anonymous
Abstract	Efficient exploration is essential to reinforcement learning in huge state space. Recent approaches to address this issue include the intrinsically motivated goal exploration process (IMGEP) and the maximum state entropy exploration (MSEE). In this paper, we disclose that goal-conditioned exploration behaviors in IMGEP can also maximize the state entropy, which bridges the IMGEP and the MSEE. From this connection, we propose a maximum entropy criterion for goal selection in goal-conditioned exploration, which results in the new exploration method novelty-pursuit. Novelty-pursuit performs the exploration in two stages: first, it selects a goal for the goal-conditioned exploration policy to reach the boundary of the explored region; then, it takes random actions to explore the non-explored region. We demonstrate the effectiveness of the proposed method in environments from simple maze environments, Mujoco tasks, to the long-horizon video game of SuperMarioBros. Experiment results show that the proposed method outperforms the state-of-the-art approaches that use curiosity-driven exploration.
Tasks	Efficient Exploration
Published	2020-01-01
URL	https://openreview.net/forum?id=rygUoeHKvB
PDF	https://openreview.net/pdf?id=rygUoeHKvB
PWC	https://paperswithcode.com/paper/deep-exploration-by-novelty-pursuit-with
Repo
Framework

Attack-Resistant Federated Learning with Residual-based Reweighting


Title	Attack-Resistant Federated Learning with Residual-based Reweighting
Authors	Anonymous
Abstract	Federated learning has a variety of applications in multiple domains by utilizing private training data stored on different devices. However, the aggregation process in federated learning is highly vulnerable to adversarial attacks so that the global model may behave abnormally under attacks. To tackle this challenge, we present a novel aggregation algorithm with residual-based reweighting to defend federated learning. Our aggregation algorithm combines repeated median regression with the reweighting scheme in iteratively reweighted least squares. Our experiments show that our aggression algorithm outperforms other alternative algorithms in the presence of label-flipping, backdoor, and Gaussian noise attacks. We also provide theoretical guarantees for our aggregation algorithm.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HkgAJxrYwr
PDF	https://openreview.net/pdf?id=HkgAJxrYwr
PWC	https://paperswithcode.com/paper/attack-resistant-federated-learning-with
Repo
Framework

Batch Normalization has Multiple Benefits: An Empirical Study on Residual Networks


Title	Batch Normalization has Multiple Benefits: An Empirical Study on Residual Networks
Authors	Anonymous
Abstract	Many state of the art models rely on two architectural innovations; skip connections and batch normalization. However batch normalization has a number of limitations. It breaks the independence between training examples within a batch, performs poorly when the batch size is too small, and significantly increases the cost of computing a parameter update in some models. This work identifies two practical benefits of batch normalization. First, it improves the final test accuracy. Second, it enables efficient training with larger batches and larger learning rates. However we demonstrate that the increase in the largest stable learning rate does not explain why the final test accuracy is increased under a finite epoch budget. Furthermore, we show that the gap in test accuracy between residual networks with and without batch normalization can be dramatically reduced by improving the initialization scheme. We introduce “ZeroInit”, which trains a 1000 layer deep Wide-ResNet without normalization to 94.3% test accuracy on CIFAR-10 in 200 epochs at batch size 64. This initialization scheme outperforms batch normalization when the batch size is very small, and is competitive with batch normalization for batch sizes that are not too large. We also show that ZeroInit matches the validation accuracy of batch normalization when training ResNet-50-V2 on ImageNet at batch size 1024.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJeVklHtPr
PDF	https://openreview.net/pdf?id=BJeVklHtPr
PWC	https://paperswithcode.com/paper/batch-normalization-has-multiple-benefits-an
Repo
Framework

ADAPTIVE GENERATION OF PROGRAMMING PUZZLES


Title	ADAPTIVE GENERATION OF PROGRAMMING PUZZLES
Authors	Anonymous
Abstract	AI today is far from being able to write complex programs. What type of problems would be best for computers to learn to program, and how should such problems be generated? To answer the first question, we suggest programming puzzles as a domain for teaching computers programming. A programming puzzle consists of a short program for a Boolean function f(x) and the goal is, given the source code, to find an input that makes f return True. Puzzles are objective in that one can easily test the correctness of a given solution x by seeing whether it satisfies f, unlike the most common representations for program synthesis: given input-output pairs or an English problem description, the correctness of a given solution is not determined and is debatable. To address the second question of automatic puzzle generation, we suggest a GAN-like generation algorithm called “Troublemaker” which can generate puzzles targeted at any given puzzle-solver. The main innovation is that it adapts to one or more given puzzle-solvers: rather than generating a single dataset of puzzles, Tro
Tasks	Program Synthesis
Published	2020-01-01
URL	https://openreview.net/forum?id=HJeRveHKDH
PDF	https://openreview.net/pdf?id=HJeRveHKDH
PWC	https://paperswithcode.com/paper/adaptive-generation-of-programming-puzzles
Repo
Framework

Neural Program Synthesis By Self-Learning


Title	Neural Program Synthesis By Self-Learning
Authors	Anonymous
Abstract	Neural inductive program synthesis is a task generating instructions that can produce desired outputs from given inputs. In this paper, we focus on the generation of a chunk of assembly code that can be executed to match a state change inside the CPU. We develop a neural program synthesis algorithm, AutoAssemblet, learned via self-learning reinforcement learning that explores the large code space efficiently. Policy networks and value networks are learned to reduce the breadth and depth of the Monte Carlo Tree Search, resulting in better synthesis performance. We also propose an effective multi-entropy policy sampling technique to alleviate online update correlations. We apply AutoAssemblet to basic programming tasks and show significant higher success rates compared to several competing baselines.
Tasks	Program Synthesis
Published	2020-01-01
URL	https://openreview.net/forum?id=Hkls_yBKDB
PDF	https://openreview.net/pdf?id=Hkls_yBKDB
PWC	https://paperswithcode.com/paper/neural-program-synthesis-by-self-learning
Repo
Framework