April 1, 2020

2934 words 14 mins read

Paper Group NANR 30

Paper Group NANR 30

Parallel Scheduled Sampling. OBJECT-ORIENTED REPRESENTATION OF 3D SCENES. HaarPooling: Graph Pooling with Compressive Haar Basis. Meta-Q-Learning. Towards Understanding the Transferability of Deep Representations. Switched linear projections and inactive state sensitivity for deep neural network interpretability. Stochastically Controlled Compositi …

Parallel Scheduled Sampling

Title Parallel Scheduled Sampling
Authors Anonymous
Abstract Auto-regressive models are widely used in sequence generation problems. The output sequence is typically generated in a predetermined order, one discrete unit(pixel or word or character) at a time. The models are trained by teacher-forcing where ground-truth history is fed to the model as input, which at test time is replaced by the model prediction. Scheduled Sampling (Bengio et al., 2015) aimsto mitigate this discrepancy between train and test time by randomly replacing some discrete units in the history with the model’s prediction. While teacher-forced training works well with ML accelerators as the computation can be parallelized across time, Scheduled Sampling involves undesirable sequential processing. In this paper, we introduce a simple technique to parallelize Scheduled Sampling across time. Experimentally, we find the proposed technique leads to equivalent or better performance on image generation, summarization, dialog generation, and translation compared to teacher-forced training. n dialog response generation task,Parallel Scheduled Sampling achieves 1.6 BLEU score (11.5%) improvement over teacher-forcing while in image generation it achieves 20% and 13.8% improvement in Frechet Inception Distance (FID) and Inception Score (IS) respectively. Further, we discuss the effects of different hyper-parameters associated with Scheduled Sampling on the model performance.
Tasks Image Generation
Published 2020-01-01
URL https://openreview.net/forum?id=HkedQp4tPr
PDF https://openreview.net/pdf?id=HkedQp4tPr
PWC https://paperswithcode.com/paper/parallel-scheduled-sampling-1
Repo
Framework

OBJECT-ORIENTED REPRESENTATION OF 3D SCENES

Title OBJECT-ORIENTED REPRESENTATION OF 3D SCENES
Authors Anonymous
Abstract In this paper, we propose a generative model, called ROOTS (Representation of Object-Oriented Three-dimension Scenes), for unsupervised object-wise 3D-scene decomposition and and rendering. For 3D scene modeling, ROOTS bases on the Generative Query Networks (GQN) framework, but unlike GQN, provides object-oriented representation decomposition. The inferred object-representation of ROOTS is 3D in the sense that it is viewpoint invariant as the full scene representation of GQN is so. ROOTS also provides hierarchical object-oriented representation: at 3D global-scene level and at 2D local-image level. We achieve this without performance degradation. In experiments on datasets of 3D rooms with multiple objects, we demonstrate the above properties by focusing on its abilities for disentanglement, compositionality, and generalization in comparison to GQN.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BJg8_xHtPr
PDF https://openreview.net/pdf?id=BJg8_xHtPr
PWC https://paperswithcode.com/paper/object-oriented-representation-of-3d-scenes
Repo
Framework

HaarPooling: Graph Pooling with Compressive Haar Basis

Title HaarPooling: Graph Pooling with Compressive Haar Basis
Authors Anonymous
Abstract Deep Graph Neural Networks (GNNs) are instrumental in graph classification and graph-based regression tasks. In these tasks, graph pooling is a critical ingredient by which GNNs adapt to input graphs of varying size and structure. We propose a new graph pooling operation based on compressive Haar transforms, called HaarPooling. HaarPooling is computed following a chain of sequential clusterings of the input graph. The input of each pooling layer is transformed by the compressive Haar basis of the corresponding clustering. HaarPooling operates in the frequency domain by the synthesis of nodes in the same cluster and filters out fine detail information by compressive Haar transforms. Such transforms provide an effective characterization of the data and preserve the structure information of the input graph. By the sparsity of the Haar basis, the computation of HaarPooling is of linear complexity. The GNN with HaarPooling and existing graph convolution layers achieves state-of-the-art performance on diverse graph classification problems.
Tasks Graph Classification
Published 2020-01-01
URL https://openreview.net/forum?id=BJleph4KvS
PDF https://openreview.net/pdf?id=BJleph4KvS
PWC https://paperswithcode.com/paper/haarpooling-graph-pooling-with-compressive-1
Repo
Framework

Meta-Q-Learning

Title Meta-Q-Learning
Authors Anonymous
Abstract This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas. First, we show that Q-learning is competitive with state of the art meta-RL algorithms if given access to a context variable that is a representation of the past trajectory. Second, using a multi-task objective to maximize the average reward across the training tasks is an effective method to meta-train RL policies. Third, past data from the meta-training replay buffer can be recycled to adapt the policy on a new task using off-policy updates. MQL draws upon ideas in propensity estimation to do so and thereby amplifies the amount of available data for adaptation. Experiments on standard continuous-control benchmarks suggest that MQL compares favorably with state of the art meta-RL algorithms.
Tasks Continuous Control, Q-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=SJeD3CEFPH
PDF https://openreview.net/pdf?id=SJeD3CEFPH
PWC https://paperswithcode.com/paper/meta-q-learning-1
Repo
Framework

Towards Understanding the Transferability of Deep Representations

Title Towards Understanding the Transferability of Deep Representations
Authors Anonymous
Abstract Deep neural networks trained on a wide range of datasets demonstrate impressive transferability. Deep features appear general in that they are applicable to many datasets and tasks. Such property is in prevalent use in real-world applications. A neural network pretrained on large datasets, such as ImageNet, can significantly boost generalization and accelerate training if fine-tuned to a smaller target dataset. Despite its pervasiveness, few effort has been devoted to uncovering the reason of transferability in deep feature representations. This paper tries to understand transferability from the perspectives of improved generalization, optimization and the feasibility of transferability. We demonstrate that 1) Transferred models tend to find flatter minima, since their weight matrices stay close to the original flat region of pretrained parameters when transferred to a similar target dataset; 2) Transferred representations make the loss landscape more favorable with improved Lipschitzness, which accelerates and stabilizes training substantially. The improvement largely attributes to the fact that the principal component of gradient is suppressed in the pretrained parameters, thus stabilizing the magnitude of gradient in back-propagation. 3) The feasibility of transferability is related to the similarity of both input and label. And a surprising discovery is that the feasibility is also impacted by the training stages in that the transferability first increases during training, and then declines. We further provide a theoretical analysis to verify our observations.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BylKL1SKvr
PDF https://openreview.net/pdf?id=BylKL1SKvr
PWC https://paperswithcode.com/paper/towards-understanding-the-transferability-of
Repo
Framework

Switched linear projections and inactive state sensitivity for deep neural network interpretability

Title Switched linear projections and inactive state sensitivity for deep neural network interpretability
Authors Anonymous
Abstract We introduce switched linear projections for expressing the activity of a neuron in a ReLU-based deep neural network in terms of a single linear projection in the input space. The method works by isolating the active subnetwork, a series of linear transformations, that completely determine the entire computation of the deep network for a given input instance. We also propose that for interpretability it is more instructive and meaningful to focus on the patterns that deactive the neurons in the network, which are ignored by the exisiting methods that implicitly track only the active aspect of the network’s computation. We introduce a novel interpretability method for the inactive state sensitivity (Insens). Comparison against existing methods shows that Insens is more robust (in the presence of noise), more complete (in terms of patterns that affect the computation) and a very effective interpretability method for deep neural networks
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SyxjVRVKDB
PDF https://openreview.net/pdf?id=SyxjVRVKDB
PWC https://paperswithcode.com/paper/switched-linear-projections-and-inactive-1
Repo
Framework

Stochastically Controlled Compositional Gradient for the Composition problem

Title Stochastically Controlled Compositional Gradient for the Composition problem
Authors Anonymous
Abstract We consider composition problems of the form $\frac{1}{n}\sum\nolimits_{i= 1}^n F_i(\frac{1}{n}\sum\nolimits_{j = 1}^n G_j(x))$. Composition optimization arises in many important machine learning applications: reinforcement learning, variance-aware learning, nonlinear embedding, and many others. Both gradient descent and stochastic gradient descent are straightforward solution, but both require to compute $\frac{1}{n}\sum\nolimits_{j = 1}^n{G_j( x )} $ in each single iteration, which is inefficient-especially when $n$ is large. Therefore, with the aim of significantly reducing the query complexity of such problems, we designed a stochastically controlled compositional gradient algorithm that incorporates two kinds of variance reduction techniques, and works in both strongly convex and non-convex settings. The strategy is also accompanied by a mini-batch version of the proposed method that improves query complexity with respect to the size of the mini-batch. Comprehensive experiments demonstrate the superiority of the proposed method over existing methods.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rygRP2VYwB
PDF https://openreview.net/pdf?id=rygRP2VYwB
PWC https://paperswithcode.com/paper/stochastically-controlled-compositional
Repo
Framework

Layerwise Learning Rates for Object Features in Unsupervised and Supervised Neural Networks And Consequent Predictions for the Infant Visual System

Title Layerwise Learning Rates for Object Features in Unsupervised and Supervised Neural Networks And Consequent Predictions for the Infant Visual System
Authors Anonymous
Abstract To understand how object vision develops in infancy and childhood, it will be necessary to develop testable computational models. Deep neural networks (DNNs) have proven valuable as models of adult vision, but it is not yet clear if they have any value as models of development. As a first model, we measured learning in a DNN designed to mimic the architecture and representational geometry of the visual system (CORnet). We quantified the development of explicit object representations at each level of this network through training by freezing the convolutional layers and training an additional linear decoding layer. We evaluate decoding accuracy on the whole ImageNet validation set, and also for individual visual classes. CORnet, however, uses supervised training and because infants have only extremely impoverished access to labels they must instead learn in an unsupervised manner. We therefore also measured learning in a state-of-the-art unsupervised network (DeepCluster). CORnet and DeepCluster differ in both supervision and in the convolutional networks at their heart, thus to isolate the effect of supervision, we ran a control experiment in which we trained the convolutional network from DeepCluster (an AlexNet variant) in a supervised manner. We make predictions on how learning should develop across brain regions in infants. In all three networks, we also tested for a relationship in the order in which infants and machines acquire visual classes, and found only evidence for a counter-intuitive relationship. We discuss the potential reasons for this.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BkxoglrtvH
PDF https://openreview.net/pdf?id=BkxoglrtvH
PWC https://paperswithcode.com/paper/layerwise-learning-rates-for-object-features
Repo
Framework

A NEW POINTWISE CONVOLUTION IN DEEP NEURAL NETWORKS THROUGH EXTREMELY FAST AND NON PARAMETRIC TRANSFORMS

Title A NEW POINTWISE CONVOLUTION IN DEEP NEURAL NETWORKS THROUGH EXTREMELY FAST AND NON PARAMETRIC TRANSFORMS
Authors Anonymous
Abstract Some conventional transforms such as Discrete Walsh-Hadamard Transform (DWHT) and Discrete Cosine Transform (DCT) have been widely used as feature extractors in image processing but rarely applied in neural networks. However, we found that these conventional transforms have the ability to capture the cross-channel correlations without any learnable parameters in DNNs. This paper firstly proposes to apply conventional transforms on pointwise convolution, showing that such transforms significantly reduce the computational complexity of neural networks without accuracy performance degradation. Especially for DWHT, it requires no floating point multiplications but only additions and subtractions, which can considerably reduce computation overheads. In addition, its fast algorithm further reduces complexity of floating point addition from O(n^2) to O(nlog n). These non-parametric and low computational properties construct extremely efficient networks in the number parameters and operations, enjoying accuracy gain. Our proposed DWHT-based model gained 1.49% accuracy increase with 79.4% reduced parameters and 48.4% reduced FLOPs compared with its baseline model (MoblieNet-V1) on the CIFAR 100 dataset.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=H1l0O6EYDH
PDF https://openreview.net/pdf?id=H1l0O6EYDH
PWC https://paperswithcode.com/paper/a-new-pointwise-convolution-in-deep-neural
Repo
Framework

Efficient and Information-Preserving Future Frame Prediction and Beyond

Title Efficient and Information-Preserving Future Frame Prediction and Beyond
Authors Anonymous
Abstract Applying resolution-preserving blocks is a common practice to maximize information preservation in video prediction, yet their high memory consumption greatly limits their application scenarios. We propose CrevNet, a Conditionally Reversible Network that uses reversible architectures to build a bijective two-way autoencoder and its complementary recurrent predictor. Our model enjoys the theoretically guaranteed property of no information loss during the feature extraction, much lower memory consumption and computational efficiency. The lightweight nature of our model enables us to incorporate 3D convolutions without concern of memory bottleneck, enhancing the model’s ability to capture both short-term and long-term temporal dependencies. Our proposed approach achieves state-of-the-art results on Moving MNIST, Traffic4cast and KITTI datasets. We further demonstrate the transferability of our self-supervised learning method by exploiting its learnt features for object detection on KITTI. Our competitive results indicate the potential of using CrevNet as a generative pre-training strategy to guide downstream tasks.
Tasks Object Detection, Video Prediction
Published 2020-01-01
URL https://openreview.net/forum?id=B1eY_pVYvB
PDF https://openreview.net/pdf?id=B1eY_pVYvB
PWC https://paperswithcode.com/paper/efficient-and-information-preserving-future
Repo
Framework

Deep exploration by novelty-pursuit with maximum state entropy

Title Deep exploration by novelty-pursuit with maximum state entropy
Authors Anonymous
Abstract Efficient exploration is essential to reinforcement learning in huge state space. Recent approaches to address this issue include the intrinsically motivated goal exploration process (IMGEP) and the maximum state entropy exploration (MSEE). In this paper, we disclose that goal-conditioned exploration behaviors in IMGEP can also maximize the state entropy, which bridges the IMGEP and the MSEE. From this connection, we propose a maximum entropy criterion for goal selection in goal-conditioned exploration, which results in the new exploration method novelty-pursuit. Novelty-pursuit performs the exploration in two stages: first, it selects a goal for the goal-conditioned exploration policy to reach the boundary of the explored region; then, it takes random actions to explore the non-explored region. We demonstrate the effectiveness of the proposed method in environments from simple maze environments, Mujoco tasks, to the long-horizon video game of SuperMarioBros. Experiment results show that the proposed method outperforms the state-of-the-art approaches that use curiosity-driven exploration.
Tasks Efficient Exploration
Published 2020-01-01
URL https://openreview.net/forum?id=rygUoeHKvB
PDF https://openreview.net/pdf?id=rygUoeHKvB
PWC https://paperswithcode.com/paper/deep-exploration-by-novelty-pursuit-with
Repo
Framework

Attack-Resistant Federated Learning with Residual-based Reweighting

Title Attack-Resistant Federated Learning with Residual-based Reweighting
Authors Anonymous
Abstract Federated learning has a variety of applications in multiple domains by utilizing private training data stored on different devices. However, the aggregation process in federated learning is highly vulnerable to adversarial attacks so that the global model may behave abnormally under attacks. To tackle this challenge, we present a novel aggregation algorithm with residual-based reweighting to defend federated learning. Our aggregation algorithm combines repeated median regression with the reweighting scheme in iteratively reweighted least squares. Our experiments show that our aggression algorithm outperforms other alternative algorithms in the presence of label-flipping, backdoor, and Gaussian noise attacks. We also provide theoretical guarantees for our aggregation algorithm.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HkgAJxrYwr
PDF https://openreview.net/pdf?id=HkgAJxrYwr
PWC https://paperswithcode.com/paper/attack-resistant-federated-learning-with
Repo
Framework

Batch Normalization has Multiple Benefits: An Empirical Study on Residual Networks

Title Batch Normalization has Multiple Benefits: An Empirical Study on Residual Networks
Authors Anonymous
Abstract Many state of the art models rely on two architectural innovations; skip connections and batch normalization. However batch normalization has a number of limitations. It breaks the independence between training examples within a batch, performs poorly when the batch size is too small, and significantly increases the cost of computing a parameter update in some models. This work identifies two practical benefits of batch normalization. First, it improves the final test accuracy. Second, it enables efficient training with larger batches and larger learning rates. However we demonstrate that the increase in the largest stable learning rate does not explain why the final test accuracy is increased under a finite epoch budget. Furthermore, we show that the gap in test accuracy between residual networks with and without batch normalization can be dramatically reduced by improving the initialization scheme. We introduce “ZeroInit”, which trains a 1000 layer deep Wide-ResNet without normalization to 94.3% test accuracy on CIFAR-10 in 200 epochs at batch size 64. This initialization scheme outperforms batch normalization when the batch size is very small, and is competitive with batch normalization for batch sizes that are not too large. We also show that ZeroInit matches the validation accuracy of batch normalization when training ResNet-50-V2 on ImageNet at batch size 1024.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BJeVklHtPr
PDF https://openreview.net/pdf?id=BJeVklHtPr
PWC https://paperswithcode.com/paper/batch-normalization-has-multiple-benefits-an
Repo
Framework

ADAPTIVE GENERATION OF PROGRAMMING PUZZLES

Title ADAPTIVE GENERATION OF PROGRAMMING PUZZLES
Authors Anonymous
Abstract AI today is far from being able to write complex programs. What type of problems would be best for computers to learn to program, and how should such problems be generated? To answer the first question, we suggest programming puzzles as a domain for teaching computers programming. A programming puzzle consists of a short program for a Boolean function f(x) and the goal is, given the source code, to find an input that makes f return True. Puzzles are objective in that one can easily test the correctness of a given solution x by seeing whether it satisfies f, unlike the most common representations for program synthesis: given input-output pairs or an English problem description, the correctness of a given solution is not determined and is debatable. To address the second question of automatic puzzle generation, we suggest a GAN-like generation algorithm called “Troublemaker” which can generate puzzles targeted at any given puzzle-solver. The main innovation is that it adapts to one or more given puzzle-solvers: rather than generating a single dataset of puzzles, Tro
Tasks Program Synthesis
Published 2020-01-01
URL https://openreview.net/forum?id=HJeRveHKDH
PDF https://openreview.net/pdf?id=HJeRveHKDH
PWC https://paperswithcode.com/paper/adaptive-generation-of-programming-puzzles
Repo
Framework

Neural Program Synthesis By Self-Learning

Title Neural Program Synthesis By Self-Learning
Authors Anonymous
Abstract Neural inductive program synthesis is a task generating instructions that can produce desired outputs from given inputs. In this paper, we focus on the generation of a chunk of assembly code that can be executed to match a state change inside the CPU. We develop a neural program synthesis algorithm, AutoAssemblet, learned via self-learning reinforcement learning that explores the large code space efficiently. Policy networks and value networks are learned to reduce the breadth and depth of the Monte Carlo Tree Search, resulting in better synthesis performance. We also propose an effective multi-entropy policy sampling technique to alleviate online update correlations. We apply AutoAssemblet to basic programming tasks and show significant higher success rates compared to several competing baselines.
Tasks Program Synthesis
Published 2020-01-01
URL https://openreview.net/forum?id=Hkls_yBKDB
PDF https://openreview.net/pdf?id=Hkls_yBKDB
PWC https://paperswithcode.com/paper/neural-program-synthesis-by-self-learning
Repo
Framework
comments powered by Disqus