April 1, 2020

3210 words 16 mins read

Paper Group NANR 123

Redundancy-Free Computation Graphs for Graph Neural Networks. Towards Effective 2-bit Quantization: Pareto-optimal Bit Allocation for Deep CNNs Compression. Online and stochastic optimization beyond Lipschitz continuity: A Riemannian approach. EMS: End-to-End Model Search for Network Architecture, Pruning and Quantization. Out-of-distribution Detec …

Redundancy-Free Computation Graphs for Graph Neural Networks


Title	Redundancy-Free Computation Graphs for Graph Neural Networks
Authors	Anonymous
Abstract	Graph Neural Networks (GNNs) are based on repeated aggregations of information across nodes’ neighbors in a graph. However, because common neighbors are shared between different nodes, this leads to repeated and inefficient computations.We propose Hierarchically Aggregated computation Graphs (HAGs), a new GNN graph representation that explicitly avoids redundancy by managing intermediate aggregation results hierarchically, and eliminating repeated computations and unnecessary data transfers in GNN training and inference. We introduce an accurate cost function to quantitatively evaluate the runtime performance of different HAGsand use a novel search algorithm to find optimized HAGs. Experiments show that the HAG representation significantly outperforms the standard GNN graph representation by increasing the end-to-end training throughput by up to 2.8x and reducing the aggregations and data transfers in GNN training by up to 6.3x and 5.6x. Meanwhile, HAGs improve runtime performance by preserving GNNcomputation, and maintain the original model accuracy for arbitrary GNNs.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1eF3kStPS
PDF	https://openreview.net/pdf?id=H1eF3kStPS
PWC	https://paperswithcode.com/paper/redundancy-free-computation-graphs-for-graph-1
Repo
Framework

Towards Effective 2-bit Quantization: Pareto-optimal Bit Allocation for Deep CNNs Compression


Title	Towards Effective 2-bit Quantization: Pareto-optimal Bit Allocation for Deep CNNs Compression
Authors	Anonymous
Abstract	State-of-the-art quantization methods can compress deep neural networks down to 4 bits without losing accuracy. However, when it comes to 2 bits, the performance drop is still noticeable. One problem in these methods is that they assign equal bit rate to quantize weights and activations in all layers, which is not reasonable in the case of high rate compression (such as 2-bit quantization), as some of layers in deep neural networks are sensitive to quantization and performing coarse quantization on these layers can hurt the accuracy. In this paper, we address an important problem of how to optimize the bit allocation of weights and activations for deep CNNs compression. We first explore the additivity of output error caused by quantization and find that additivity property holds for deep neural networks which are continuously differentiable in the layers. Based on this observation, we formulate the optimal bit allocation problem of weights and activations in a joint framework and propose a very efficient method to solve the optimization problem via Lagrangian Formulation. Our method obtains excellent results on deep neural networks. It can compress deep CNN ResNet-50 down to 2 bits with only 0.7% accuracy loss. To the best our knowledge, this is the first paper that reports 2-bit results on deep CNNs without hurting the accuracy.
Tasks	Quantization
Published	2020-01-01
URL	https://openreview.net/forum?id=H1eKT1SFvH
PDF	https://openreview.net/pdf?id=H1eKT1SFvH
PWC	https://paperswithcode.com/paper/towards-effective-2-bit-quantization-pareto
Repo
Framework

Online and stochastic optimization beyond Lipschitz continuity: A Riemannian approach


Title	Online and stochastic optimization beyond Lipschitz continuity: A Riemannian approach
Authors	Anonymous
Abstract	Motivated by applications to machine learning and imaging science, we study a class of online and stochastic optimization problems with loss functions that are not Lipschitz continuous; in particular, the loss functions encountered by the optimizer could exhibit gradient singularities or be singular themselves. Drawing on tools and techniques from Riemannian geometry, we examine a Riemann–Lipschitz (RL) continuity condition which is tailored to the singularity landscape of the problem’s loss functions. In this way, we are able to tackle cases beyond the Lipschitz framework provided by a global norm, and we derive optimal regret bounds and last iterate convergence results through the use of regularized learning methods (such as online mirror descent). These results are subsequently validated in a class of stochastic Poisson inverse problems that arise in imaging science.
Tasks	Stochastic Optimization
Published	2020-01-01
URL	https://openreview.net/forum?id=rkxZyaNtwB
PDF	https://openreview.net/pdf?id=rkxZyaNtwB
PWC	https://paperswithcode.com/paper/online-and-stochastic-optimization-beyond
Repo
Framework

EMS: End-to-End Model Search for Network Architecture, Pruning and Quantization


Title	EMS: End-to-End Model Search for Network Architecture, Pruning and Quantization
Authors	Anonymous
Abstract	We present an end-to-end design methodology for efficient deep learning deployment. Unlike previous methods that separately optimize the neural network architecture, pruning policy, and quantization policy, we jointly optimize them in an end-to-end manner. To deal with the larger design space it brings, we train a quantization-aware accuracy predictor that fed to the evolutionary search to select the best fit. We first generate a large dataset of <NN architecture, ImageNet accuracy> pairs without training each architecture, but by sampling a unified supernet. Then we use these data to train an accuracy predictor without quantization, further using predictor-transfer technique to get the quantization-aware predictor, which reduces the amount of post-quantization fine-tuning time. Extensive experiments on ImageNet show the benefits of the end-to-end methodology: it maintains the same accuracy (75.1%) as ResNet34 float model while saving 2.2× BitOps comparing with the 8-bit model; we obtain the same level accuracy as MobileNetV2+HAQ while achieving 2×/1.3× latency/energy saving; the end-to-end optimization outperforms separate optimizations using ProxylessNAS+AMC+HAQ by 2.3% accuracy while reducing orders of magnitude GPU hours and CO2 emission.
Tasks	Quantization
Published	2020-01-01
URL	https://openreview.net/forum?id=Bkgr2kHKDH
PDF	https://openreview.net/pdf?id=Bkgr2kHKDH
PWC	https://paperswithcode.com/paper/ems-end-to-end-model-search-for-network
Repo
Framework

Out-of-distribution Detection in Few-shot Classification


Title	Out-of-distribution Detection in Few-shot Classification
Authors	Anonymous
Abstract	In many real-world settings, a learning model must perform few-shot classification: learn to classify examples from unseen classes using only a few labeled examples per class. Additionally, to be safely deployed, it should have the ability to detect out-of-distribution inputs: examples that do not belong to any of the classes. While both few-shot classification and out-of-distribution detection are popular topics, their combination has not been studied. In this work, we propose tasks for out-of-distribution detection in the few-shot setting and establish benchmark datasets, based on four popular few-shot classification datasets. Then, we propose two new methods for this task and investigate their performance. In sum, we establish baseline out-of-distribution detection results using standard metrics on new benchmark datasets and show improved results with our proposed methods.
Tasks	Out-of-Distribution Detection
Published	2020-01-01
URL	https://openreview.net/forum?id=BygNAa4YPH
PDF	https://openreview.net/pdf?id=BygNAa4YPH
PWC	https://paperswithcode.com/paper/out-of-distribution-detection-in-few-shot
Repo
Framework

ON SOLVING COOPERATIVE DECENTRALIZED MARL PROBLEMS WITH SPARSE REINFORCEMENTS


Title	ON SOLVING COOPERATIVE DECENTRALIZED MARL PROBLEMS WITH SPARSE REINFORCEMENTS
Authors	Anonymous
Abstract	Decentralized decision makers learn to cooperate and make decisions in many domains including (but not limited to) search and rescue, drone delivery, box pushing and fire fighting problems. In these cooperative domains, a key challenge is one of sparse rewards, i.e., rewards/reinforcements are obtained only in a few situations (e.g., on extinguishing a fire, on moving a box) and in most other situations there is no reward/reinforcement. The problem of learning with sparse reinforcements is extremely challenging in cooperative Multi-Agent Reinforcement Learning (MARL) problems due to two reasons: (a) Compared to the single agent case, exploration is harder as multiple agents have to be coordinated to receive the reinforcements; and (b) Environment is not stationary as all the agents are learning at the same time (and therefore change policies) and therefore the limited (due to sparse rewards) good experiences can be quickly forgotten. One approach that is scalable, decentralized and has shown great performance in general MARL problems is Neural Fictitious Self Play (NFSP). However, since NFSP averages best response policies, a good policy can be drowned in a deluge of bad best-response policies that come about due to sparse rewards. In this paper, we provide a mechanism for imitation of good experiences within NFSP that ensures good policies do not get overwhelmed by bad policies. We then provide an intuitive justification for why self imitation within NFSP can improve performance and how imitation does not impact the fictitious play aspect of NFSP. Finally, we provide a thorough comparison (experimental or descriptive) against relevant cooperative MARL algorithms to demonstrate the utility of our approach.
Tasks	Multi-agent Reinforcement Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=SJe2YxBtPB
PDF	https://openreview.net/pdf?id=SJe2YxBtPB
PWC	https://paperswithcode.com/paper/on-solving-cooperative-decentralized-marl
Repo
Framework

Emergent Communication in Networked Multi-Agent Reinforcement Learning


Title	Emergent Communication in Networked Multi-Agent Reinforcement Learning
Authors	Anonymous
Abstract	With the ever increasing demand and the resultant reduced quality of services, the focus has shifted towards easing network congestion to enable more efficient flow in systems like traffic, supply chains and electrical grids. A step in this direction is to re-imagine the traditional heuristics based training of systems as this approach is incapable of modelling the involved dynamics. While one can apply Multi-Agent Reinforcement Learning (MARL) to such problems by considering each vertex in the network as an agent, most MARL-based models assume the agents to be independent. In many real-world tasks, agents need to behave as a group, rather than as a collection of individuals. In this paper, we propose a framework that induces cooperation and coordination amongst agents, connected via an underlying network, using emergent communication in a MARL-based setup. We formulate the problem in a general network setting and demonstrate the utility of communication in networks with the help of a case study on traffic systems. Furthermore, we study the emergent communication protocol and show the formation of distinct communities with grounded vocabulary. To the best of our knowledge, this is the only work that studies emergent language in a networked MARL setting.
Tasks	Multi-agent Reinforcement Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=H1lUp1BYDH
PDF	https://openreview.net/pdf?id=H1lUp1BYDH
PWC	https://paperswithcode.com/paper/emergent-communication-in-networked-multi
Repo
Framework

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning


Title	Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning
Authors	Anonymous
Abstract	In multi-agent games, the complexity of the environment can grow exponentially as the number of agents increases, so it is particularly challenging to learn good policies when the agent population is large. In this paper, we introduce Evolutionary Population Curriculum (EPC), a curriculum learning paradigm that scales up Multi-Agent Reinforcement Learning (MARL) by progressively increasing the population of training agents in a stage-wise manner. Furthermore, EPC uses an evolutionary approach to fix an objective misalignment issue throughout the curriculum: agents successfully trained in an early stage with a small population are not necessarily the best candidates for adapting to later stages with scaled populations. Concretely, EPC maintains multiple sets of agents in each stage, performs mix-and-match and fine-tuning over these sets and promotes the sets of agents with the best adaptability to the next stage. We implement EPC on a popular MARL algorithm, MADDPG, and empirically show that our approach consistently outperforms baselines by a large margin as the number of agents grows exponentially.
Tasks	Multi-agent Reinforcement Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=SJxbHkrKDH
PDF	https://openreview.net/pdf?id=SJxbHkrKDH
PWC	https://paperswithcode.com/paper/evolutionary-population-curriculum-for
Repo
Framework

Uncertainty - sensitive learning and planning with ensembles


Title	Uncertainty - sensitive learning and planning with ensembles
Authors	Anonymous
Abstract	We propose a reinforcement learning framework for discrete environments in which an agent optimizes its behavior on two timescales. For the short one, it uses tree search methods to perform tactical decisions. The long strategic level is handled with an ensemble of value functions learned using $TD$-like backups. Combining these two techniques brings synergies. The planning module performs \textit{what-if} analysis allowing to avoid short-term pitfalls and boost backups of the value function. Notably, our method performs well in environments with sparse rewards where standard $TD(1)$ backups fail. On the other hand, the value functions compensate for inherent short-sightedness of planning. Importantly, we use ensembles to measure the epistemic uncertainty of value functions. This serves two purposes: a) it stabilizes planning, b) it guides exploration. We evaluate our methods on discrete environments with sparse rewards: the Deep sea chain environment, toy Montezuma’s Revenge, and Sokoban. In all the cases, we obtain speed-up of learning and boost to the final performance.
Tasks	Montezuma’s Revenge
Published	2020-01-01
URL	https://openreview.net/forum?id=SkglVlSFPS
PDF	https://openreview.net/pdf?id=SkglVlSFPS
PWC	https://paperswithcode.com/paper/uncertainty-sensitive-learning-and-planning
Repo
Framework

On Bonus Based Exploration Methods In The Arcade Learning Environment


Title	On Bonus Based Exploration Methods In The Arcade Learning Environment
Authors	Adrien Ali Taiga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare
Abstract	Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration problems such as Montezuma’s Revenge (Bellemare et al., 2016). Recently, bonus-based exploration methods, which explore by augmenting the environment reward, have reached above-human average performance on such domains. In this paper we reassess popular bonus-based exploration methods within a common evaluation framework. We combine Rainbow (Hessel et al., 2018) with different exploration bonuses and evaluate its performance on Montezuma’s Revenge, Bellemare et al.‘s set of hard of exploration games with sparse rewards, and the whole Atari 2600 suite. We find that while exploration bonuses lead to higher score on Montezuma’s Revenge they do not provide meaningful gains over the simpler epsilon-greedy scheme. In fact, we find that methods that perform best on that game often underperform epsilon-greedy on easy exploration Atari 2600 games. We find that our conclusions remain valid even when hyperparameters are tuned for these easy-exploration games. Finally, we find that none of the methods surveyed benefit from additional training samples (1 billion frames, versus Rainbow’s 200 million) on Bellemare et al.‘s hard exploration games. Our results suggest that recent gains in Montezuma’s Revenge may be better attributed to architecture change, rather than better exploration schemes; and that the real pace of progress in exploration research for Atari 2600 games may have been obfuscated by good results on a single domain.
Tasks	Atari Games, Montezuma’s Revenge
Published	2020-01-01
URL	https://openreview.net/forum?id=BJewlyStDr
PDF	https://openreview.net/pdf?id=BJewlyStDr
PWC	https://paperswithcode.com/paper/on-bonus-based-exploration-methods-in-the
Repo
Framework

Pruning Depthwise Separable Convolutions for Extra Efficiency Gain of Lightweight Models


Title	Pruning Depthwise Separable Convolutions for Extra Efficiency Gain of Lightweight Models
Authors	Anonymous
Abstract	Deep convolutional neural networks are good at accuracy while bad at efficiency. To improve the inference speed, two kinds of directions are developed, lightweight model designing and network weight pruning. Lightweight models have been proposed to improve the speed with good enough accuracy. It is, however, not trivial if we can further speed up these “compact” models by weight pruning. In this paper, we present a technique to gradually prune the depthwise separable convolution networks, such as MobileNet, for improving the speed of this kind of “dense” network. When pruning depthwise separable convolutions, we need to consider more structural constraints to ensure the speedup of inference. Instead of pruning the model with the desired ratio in one stage, the proposed multi-stage gradual pruning approach can stably prune the filters with a finer pruning ratio. Our method achieves 1.68 times speedup with neglectable accuracy drop for MobileNetV2.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1g5qyHYPS
PDF	https://openreview.net/pdf?id=B1g5qyHYPS
PWC	https://paperswithcode.com/paper/pruning-depthwise-separable-convolutions-for
Repo
Framework

ROBUST GENERATIVE ADVERSARIAL NETWORK


Title	ROBUST GENERATIVE ADVERSARIAL NETWORK
Authors	Shufei Zhang, Zhuang Qian, Kaizhu Huang, Rui Zhang, Jimin Xiao
Abstract	Generative adversarial networks (GANs) are powerful generative models, but usually suffer from instability which may lead to poor generations. Most existing works try to alleviate this problem by focusing on stabilizing the training of the discriminator, which unfortunately ignores the robustness of generator and discriminator. In this work, we consider the robustness of GANs and propose a novel robust method called robust generative adversarial network (RGAN). Particularly, we design a robust optimization framework where the generator and discriminator compete with each other in a worst-case setting within a small Wasserstein ball. The generator tries to map the worst input distribution (rather than a specific input distribution, typically a Gaussian distribution used in most GANs) to the real data distribution, while the discriminator attempts to distinguish the real and fake distribution with the worst perturbation. We have provided theories showing that the generalization of the new robust framework can be guaranteed. A series of experiments on CIFAR-10, STL-10 and CelebA datasets indicate that our proposed robust framework can improve consistently on four baseline GAN models. We also provide ablation analysis and visualization showing the efficacy of our method on both generator and discriminator quantitatively and qualitatively.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HyxCRCEKwB
PDF	https://openreview.net/pdf?id=HyxCRCEKwB
PWC	https://paperswithcode.com/paper/robust-generative-adversarial-network
Repo
Framework

Reject Illegal Inputs: Scaling Generative Classifiers with Supervised Deep Infomax


Title	Reject Illegal Inputs: Scaling Generative Classifiers with Supervised Deep Infomax
Authors	Anonymous
Abstract	Deep Infomax~(DIM) is an unsupervised representation learning framework by maximizing the mutual information between the inputs and the outputs of an encoder, while probabilistic constraints are imposed on the outputs. In this paper, we propose Supervised Deep InfoMax~(SDIM), which introduces supervised probabilistic constraints to the encoder outputs. The supervised probabilistic constraints are equivalent to a generative classifier on high-level data representations, where class conditional log-likelihoods of samples can be evaluated. Unlike other works building generative classifiers with conditional generative models, SDIMs scale on complex datasets, and can achieve comparable performance with discriminative counterparts. With SDIM, we could perform \emph{classification with rejection}. Instead of always reporting a class label, SDIM only makes predictions when test samples’ largest logits surpass some pre-chosen thresholds, otherwise they will be deemed as out of the data distributions, and be rejected. Our experiments show that SDIM with rejection policy can effectively reject illegal inputs including out-of-distribution samples and adversarial examples.
Tasks	Representation Learning, Unsupervised Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rkg98yBFDr
PDF	https://openreview.net/pdf?id=rkg98yBFDr
PWC	https://paperswithcode.com/paper/reject-illegal-inputs-scaling-generative
Repo
Framework

Hyper-SAGNN: a self-attention based graph neural network for hypergraphs


Title	Hyper-SAGNN: a self-attention based graph neural network for hypergraphs
Authors	Anonymous
Abstract	Graph representation learning for hypergraphs can be utilized to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. We believe that Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications.
Tasks	Graph Representation Learning, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=ryeHuJBtPH
PDF	https://openreview.net/pdf?id=ryeHuJBtPH
PWC	https://paperswithcode.com/paper/hyper-sagnn-a-self-attention-based-graph
Repo
Framework

Deep Learning of Determinantal Point Processes via Proper Spectral Sub-gradient


Title	Deep Learning of Determinantal Point Processes via Proper Spectral Sub-gradient
Authors	Anonymous
Abstract	Determinantal point processes (DPPs) is an effective tool to deliver diversity on multiple machine learning and computer vision tasks. Under deep learning framework, DPP is typically optimized via approximation, which is not straightforward and has some conflict with diversity requirement. We note, however, there has been no deep learning paradigms to optimize DPP directly since it involves matrix inversion which may result in highly computational instability. This fact greatly hinders the wide use of DPP on some specific objectives where DPP serves as a term to measure the feature diversity. In this paper, we devise a simple but effective algorithm to address this issue to optimize DPP term directly expressed with L-ensemble in spectral domain over gram matrix, which is more flexible than learning on parametric kernels. By further taking into account some geometric constraints, our algorithm seeks to generate valid sub-gradients of DPP term in case when the DPP gram matrix is not invertible (no gradients exist in this case). In this sense, our algorithm can be easily incorporated with multiple deep learning tasks. Experiments show the effectiveness of our algorithm, indicating promising performance for practical learning problems.
Tasks	Point Processes
Published	2020-01-01
URL	https://openreview.net/forum?id=rkeIq2VYPr
PDF	https://openreview.net/pdf?id=rkeIq2VYPr
PWC	https://paperswithcode.com/paper/deep-learning-of-determinantal-point
Repo
Framework