Paper Group NANR 123
Redundancy-Free Computation Graphs for Graph Neural Networks. Towards Effective 2-bit Quantization: Pareto-optimal Bit Allocation for Deep CNNs Compression. Online and stochastic optimization beyond Lipschitz continuity: A Riemannian approach. EMS: End-to-End Model Search for Network Architecture, Pruning and Quantization. Out-of-distribution Detec …
Redundancy-Free Computation Graphs for Graph Neural Networks
Title | Redundancy-Free Computation Graphs for Graph Neural Networks |
Authors | Anonymous |
Abstract | Graph Neural Networks (GNNs) are based on repeated aggregations of information across nodes’ neighbors in a graph. However, because common neighbors are shared between different nodes, this leads to repeated and inefficient computations.We propose Hierarchically Aggregated computation Graphs (HAGs), a new GNN graph representation that explicitly avoids redundancy by managing intermediate aggregation results hierarchically, and eliminating repeated computations and unnecessary data transfers in GNN training and inference. We introduce an accurate cost function to quantitatively evaluate the runtime performance of different HAGsand use a novel search algorithm to find optimized HAGs. Experiments show that the HAG representation significantly outperforms the standard GNN graph representation by increasing the end-to-end training throughput by up to 2.8x and reducing the aggregations and data transfers in GNN training by up to 6.3x and 5.6x. Meanwhile, HAGs improve runtime performance by preserving GNNcomputation, and maintain the original model accuracy for arbitrary GNNs. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1eF3kStPS |
https://openreview.net/pdf?id=H1eF3kStPS | |
PWC | https://paperswithcode.com/paper/redundancy-free-computation-graphs-for-graph-1 |
Repo | |
Framework | |
Towards Effective 2-bit Quantization: Pareto-optimal Bit Allocation for Deep CNNs Compression
Title | Towards Effective 2-bit Quantization: Pareto-optimal Bit Allocation for Deep CNNs Compression |
Authors | Anonymous |
Abstract | State-of-the-art quantization methods can compress deep neural networks down to 4 bits without losing accuracy. However, when it comes to 2 bits, the performance drop is still noticeable. One problem in these methods is that they assign equal bit rate to quantize weights and activations in all layers, which is not reasonable in the case of high rate compression (such as 2-bit quantization), as some of layers in deep neural networks are sensitive to quantization and performing coarse quantization on these layers can hurt the accuracy. In this paper, we address an important problem of how to optimize the bit allocation of weights and activations for deep CNNs compression. We first explore the additivity of output error caused by quantization and find that additivity property holds for deep neural networks which are continuously differentiable in the layers. Based on this observation, we formulate the optimal bit allocation problem of weights and activations in a joint framework and propose a very efficient method to solve the optimization problem via Lagrangian Formulation. Our method obtains excellent results on deep neural networks. It can compress deep CNN ResNet-50 down to 2 bits with only 0.7% accuracy loss. To the best our knowledge, this is the first paper that reports 2-bit results on deep CNNs without hurting the accuracy. |
Tasks | Quantization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1eKT1SFvH |
https://openreview.net/pdf?id=H1eKT1SFvH | |
PWC | https://paperswithcode.com/paper/towards-effective-2-bit-quantization-pareto |
Repo | |
Framework | |
Online and stochastic optimization beyond Lipschitz continuity: A Riemannian approach
Title | Online and stochastic optimization beyond Lipschitz continuity: A Riemannian approach |
Authors | Anonymous |
Abstract | Motivated by applications to machine learning and imaging science, we study a class of online and stochastic optimization problems with loss functions that are not Lipschitz continuous; in particular, the loss functions encountered by the optimizer could exhibit gradient singularities or be singular themselves. Drawing on tools and techniques from Riemannian geometry, we examine a Riemann–Lipschitz (RL) continuity condition which is tailored to the singularity landscape of the problem’s loss functions. In this way, we are able to tackle cases beyond the Lipschitz framework provided by a global norm, and we derive optimal regret bounds and last iterate convergence results through the use of regularized learning methods (such as online mirror descent). These results are subsequently validated in a class of stochastic Poisson inverse problems that arise in imaging science. |
Tasks | Stochastic Optimization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkxZyaNtwB |
https://openreview.net/pdf?id=rkxZyaNtwB | |
PWC | https://paperswithcode.com/paper/online-and-stochastic-optimization-beyond |
Repo | |
Framework | |
EMS: End-to-End Model Search for Network Architecture, Pruning and Quantization
Title | EMS: End-to-End Model Search for Network Architecture, Pruning and Quantization |
Authors | Anonymous |
Abstract | We present an end-to-end design methodology for efficient deep learning deployment. Unlike previous methods that separately optimize the neural network architecture, pruning policy, and quantization policy, we jointly optimize them in an end-to-end manner. To deal with the larger design space it brings, we train a quantization-aware accuracy predictor that fed to the evolutionary search to select the best fit. We first generate a large dataset of <NN architecture, ImageNet accuracy> pairs without training each architecture, but by sampling a unified supernet. Then we use these data to train an accuracy predictor without quantization, further using predictor-transfer technique to get the quantization-aware predictor, which reduces the amount of post-quantization fine-tuning time. Extensive experiments on ImageNet show the benefits of the end-to-end methodology: it maintains the same accuracy (75.1%) as ResNet34 float model while saving 2.2× BitOps comparing with the 8-bit model; we obtain the same level accuracy as MobileNetV2+HAQ while achieving 2×/1.3× latency/energy saving; the end-to-end optimization outperforms separate optimizations using ProxylessNAS+AMC+HAQ by 2.3% accuracy while reducing orders of magnitude GPU hours and CO2 emission. |
Tasks | Quantization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Bkgr2kHKDH |
https://openreview.net/pdf?id=Bkgr2kHKDH | |
PWC | https://paperswithcode.com/paper/ems-end-to-end-model-search-for-network |
Repo | |
Framework | |
Out-of-distribution Detection in Few-shot Classification
Title | Out-of-distribution Detection in Few-shot Classification |
Authors | Anonymous |
Abstract | In many real-world settings, a learning model must perform few-shot classification: learn to classify examples from unseen classes using only a few labeled examples per class. Additionally, to be safely deployed, it should have the ability to detect out-of-distribution inputs: examples that do not belong to any of the classes. While both few-shot classification and out-of-distribution detection are popular topics, their combination has not been studied. In this work, we propose tasks for out-of-distribution detection in the few-shot setting and establish benchmark datasets, based on four popular few-shot classification datasets. Then, we propose two new methods for this task and investigate their performance. In sum, we establish baseline out-of-distribution detection results using standard metrics on new benchmark datasets and show improved results with our proposed methods. |
Tasks | Out-of-Distribution Detection |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BygNAa4YPH |
https://openreview.net/pdf?id=BygNAa4YPH | |
PWC | https://paperswithcode.com/paper/out-of-distribution-detection-in-few-shot |
Repo | |
Framework | |
ON SOLVING COOPERATIVE DECENTRALIZED MARL PROBLEMS WITH SPARSE REINFORCEMENTS
Title | ON SOLVING COOPERATIVE DECENTRALIZED MARL PROBLEMS WITH SPARSE REINFORCEMENTS |
Authors | Anonymous |
Abstract | Decentralized decision makers learn to cooperate and make decisions in many domains including (but not limited to) search and rescue, drone delivery, box pushing and fire fighting problems. In these cooperative domains, a key challenge is one of sparse rewards, i.e., rewards/reinforcements are obtained only in a few situations (e.g., on extinguishing a fire, on moving a box) and in most other situations there is no reward/reinforcement. The problem of learning with sparse reinforcements is extremely challenging in cooperative Multi-Agent Reinforcement Learning (MARL) problems due to two reasons: (a) Compared to the single agent case, exploration is harder as multiple agents have to be coordinated to receive the reinforcements; and (b) Environment is not stationary as all the agents are learning at the same time (and therefore change policies) and therefore the limited (due to sparse rewards) good experiences can be quickly forgotten. One approach that is scalable, decentralized and has shown great performance in general MARL problems is Neural Fictitious Self Play (NFSP). However, since NFSP averages best response policies, a good policy can be drowned in a deluge of bad best-response policies that come about due to sparse rewards. In this paper, we provide a mechanism for imitation of good experiences within NFSP that ensures good policies do not get overwhelmed by bad policies. We then provide an intuitive justification for why self imitation within NFSP can improve performance and how imitation does not impact the fictitious play aspect of NFSP. Finally, we provide a thorough comparison (experimental or descriptive) against relevant cooperative MARL algorithms to demonstrate the utility of our approach. |
Tasks | Multi-agent Reinforcement Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJe2YxBtPB |
https://openreview.net/pdf?id=SJe2YxBtPB | |
PWC | https://paperswithcode.com/paper/on-solving-cooperative-decentralized-marl |
Repo | |
Framework | |
Emergent Communication in Networked Multi-Agent Reinforcement Learning
Title | Emergent Communication in Networked Multi-Agent Reinforcement Learning |
Authors | Anonymous |
Abstract | With the ever increasing demand and the resultant reduced quality of services, the focus has shifted towards easing network congestion to enable more efficient flow in systems like traffic, supply chains and electrical grids. A step in this direction is to re-imagine the traditional heuristics based training of systems as this approach is incapable of modelling the involved dynamics. While one can apply Multi-Agent Reinforcement Learning (MARL) to such problems by considering each vertex in the network as an agent, most MARL-based models assume the agents to be independent. In many real-world tasks, agents need to behave as a group, rather than as a collection of individuals. In this paper, we propose a framework that induces cooperation and coordination amongst agents, connected via an underlying network, using emergent communication in a MARL-based setup. We formulate the problem in a general network setting and demonstrate the utility of communication in networks with the help of a case study on traffic systems. Furthermore, we study the emergent communication protocol and show the formation of distinct communities with grounded vocabulary. To the best of our knowledge, this is the only work that studies emergent language in a networked MARL setting. |
Tasks | Multi-agent Reinforcement Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1lUp1BYDH |
https://openreview.net/pdf?id=H1lUp1BYDH | |
PWC | https://paperswithcode.com/paper/emergent-communication-in-networked-multi |
Repo | |
Framework | |
Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning
Title | Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning |
Authors | Anonymous |
Abstract | In multi-agent games, the complexity of the environment can grow exponentially as the number of agents increases, so it is particularly challenging to learn good policies when the agent population is large. In this paper, we introduce Evolutionary Population Curriculum (EPC), a curriculum learning paradigm that scales up Multi-Agent Reinforcement Learning (MARL) by progressively increasing the population of training agents in a stage-wise manner. Furthermore, EPC uses an evolutionary approach to fix an objective misalignment issue throughout the curriculum: agents successfully trained in an early stage with a small population are not necessarily the best candidates for adapting to later stages with scaled populations. Concretely, EPC maintains multiple sets of agents in each stage, performs mix-and-match and fine-tuning over these sets and promotes the sets of agents with the best adaptability to the next stage. We implement EPC on a popular MARL algorithm, MADDPG, and empirically show that our approach consistently outperforms baselines by a large margin as the number of agents grows exponentially. |
Tasks | Multi-agent Reinforcement Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJxbHkrKDH |
https://openreview.net/pdf?id=SJxbHkrKDH | |
PWC | https://paperswithcode.com/paper/evolutionary-population-curriculum-for |
Repo | |
Framework | |
Uncertainty - sensitive learning and planning with ensembles
Title | Uncertainty - sensitive learning and planning with ensembles |
Authors | Anonymous |
Abstract | We propose a reinforcement learning framework for discrete environments in which an agent optimizes its behavior on two timescales. For the short one, it uses tree search methods to perform tactical decisions. The long strategic level is handled with an ensemble of value functions learned using $TD$-like backups. Combining these two techniques brings synergies. The planning module performs \textit{what-if} analysis allowing to avoid short-term pitfalls and boost backups of the value function. Notably, our method performs well in environments with sparse rewards where standard $TD(1)$ backups fail. On the other hand, the value functions compensate for inherent short-sightedness of planning. Importantly, we use ensembles to measure the epistemic uncertainty of value functions. This serves two purposes: a) it stabilizes planning, b) it guides exploration. We evaluate our methods on discrete environments with sparse rewards: the Deep sea chain environment, toy Montezuma’s Revenge, and Sokoban. In all the cases, we obtain speed-up of learning and boost to the final performance. |
Tasks | Montezuma’s Revenge |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkglVlSFPS |
https://openreview.net/pdf?id=SkglVlSFPS | |
PWC | https://paperswithcode.com/paper/uncertainty-sensitive-learning-and-planning |
Repo | |
Framework | |
On Bonus Based Exploration Methods In The Arcade Learning Environment
Title | On Bonus Based Exploration Methods In The Arcade Learning Environment |
Authors | Adrien Ali Taiga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare |
Abstract | Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration problems such as Montezuma’s Revenge (Bellemare et al., 2016). Recently, bonus-based exploration methods, which explore by augmenting the environment reward, have reached above-human average performance on such domains. In this paper we reassess popular bonus-based exploration methods within a common evaluation framework. We combine Rainbow (Hessel et al., 2018) with different exploration bonuses and evaluate its performance on Montezuma’s Revenge, Bellemare et al.‘s set of hard of exploration games with sparse rewards, and the whole Atari 2600 suite. We find that while exploration bonuses lead to higher score on Montezuma’s Revenge they do not provide meaningful gains over the simpler epsilon-greedy scheme. In fact, we find that methods that perform best on that game often underperform epsilon-greedy on easy exploration Atari 2600 games. We find that our conclusions remain valid even when hyperparameters are tuned for these easy-exploration games. Finally, we find that none of the methods surveyed benefit from additional training samples (1 billion frames, versus Rainbow’s 200 million) on Bellemare et al.‘s hard exploration games. Our results suggest that recent gains in Montezuma’s Revenge may be better attributed to architecture change, rather than better exploration schemes; and that the real pace of progress in exploration research for Atari 2600 games may have been obfuscated by good results on a single domain. |
Tasks | Atari Games, Montezuma’s Revenge |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJewlyStDr |
https://openreview.net/pdf?id=BJewlyStDr | |
PWC | https://paperswithcode.com/paper/on-bonus-based-exploration-methods-in-the |
Repo | |
Framework | |
Pruning Depthwise Separable Convolutions for Extra Efficiency Gain of Lightweight Models
Title | Pruning Depthwise Separable Convolutions for Extra Efficiency Gain of Lightweight Models |
Authors | Anonymous |
Abstract | Deep convolutional neural networks are good at accuracy while bad at efficiency. To improve the inference speed, two kinds of directions are developed, lightweight model designing and network weight pruning. Lightweight models have been proposed to improve the speed with good enough accuracy. It is, however, not trivial if we can further speed up these “compact” models by weight pruning. In this paper, we present a technique to gradually prune the depthwise separable convolution networks, such as MobileNet, for improving the speed of this kind of “dense” network. When pruning depthwise separable convolutions, we need to consider more structural constraints to ensure the speedup of inference. Instead of pruning the model with the desired ratio in one stage, the proposed multi-stage gradual pruning approach can stably prune the filters with a finer pruning ratio. Our method achieves 1.68 times speedup with neglectable accuracy drop for MobileNetV2. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1g5qyHYPS |
https://openreview.net/pdf?id=B1g5qyHYPS | |
PWC | https://paperswithcode.com/paper/pruning-depthwise-separable-convolutions-for |
Repo | |
Framework | |
ROBUST GENERATIVE ADVERSARIAL NETWORK
Title | ROBUST GENERATIVE ADVERSARIAL NETWORK |
Authors | Shufei Zhang, Zhuang Qian, Kaizhu Huang, Rui Zhang, Jimin Xiao |
Abstract | Generative adversarial networks (GANs) are powerful generative models, but usually suffer from instability which may lead to poor generations. Most existing works try to alleviate this problem by focusing on stabilizing the training of the discriminator, which unfortunately ignores the robustness of generator and discriminator. In this work, we consider the robustness of GANs and propose a novel robust method called robust generative adversarial network (RGAN). Particularly, we design a robust optimization framework where the generator and discriminator compete with each other in a worst-case setting within a small Wasserstein ball. The generator tries to map the worst input distribution (rather than a specific input distribution, typically a Gaussian distribution used in most GANs) to the real data distribution, while the discriminator attempts to distinguish the real and fake distribution with the worst perturbation. We have provided theories showing that the generalization of the new robust framework can be guaranteed. A series of experiments on CIFAR-10, STL-10 and CelebA datasets indicate that our proposed robust framework can improve consistently on four baseline GAN models. We also provide ablation analysis and visualization showing the efficacy of our method on both generator and discriminator quantitatively and qualitatively. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyxCRCEKwB |
https://openreview.net/pdf?id=HyxCRCEKwB | |
PWC | https://paperswithcode.com/paper/robust-generative-adversarial-network |
Repo | |
Framework | |
Reject Illegal Inputs: Scaling Generative Classifiers with Supervised Deep Infomax
Title | Reject Illegal Inputs: Scaling Generative Classifiers with Supervised Deep Infomax |
Authors | Anonymous |
Abstract | Deep Infomax~(DIM) is an unsupervised representation learning framework by maximizing the mutual information between the inputs and the outputs of an encoder, while probabilistic constraints are imposed on the outputs. In this paper, we propose Supervised Deep InfoMax~(SDIM), which introduces supervised probabilistic constraints to the encoder outputs. The supervised probabilistic constraints are equivalent to a generative classifier on high-level data representations, where class conditional log-likelihoods of samples can be evaluated. Unlike other works building generative classifiers with conditional generative models, SDIMs scale on complex datasets, and can achieve comparable performance with discriminative counterparts. With SDIM, we could perform \emph{classification with rejection}. Instead of always reporting a class label, SDIM only makes predictions when test samples’ largest logits surpass some pre-chosen thresholds, otherwise they will be deemed as out of the data distributions, and be rejected. Our experiments show that SDIM with rejection policy can effectively reject illegal inputs including out-of-distribution samples and adversarial examples. |
Tasks | Representation Learning, Unsupervised Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkg98yBFDr |
https://openreview.net/pdf?id=rkg98yBFDr | |
PWC | https://paperswithcode.com/paper/reject-illegal-inputs-scaling-generative |
Repo | |
Framework | |
Hyper-SAGNN: a self-attention based graph neural network for hypergraphs
Title | Hyper-SAGNN: a self-attention based graph neural network for hypergraphs |
Authors | Anonymous |
Abstract | Graph representation learning for hypergraphs can be utilized to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. We believe that Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications. |
Tasks | Graph Representation Learning, Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryeHuJBtPH |
https://openreview.net/pdf?id=ryeHuJBtPH | |
PWC | https://paperswithcode.com/paper/hyper-sagnn-a-self-attention-based-graph |
Repo | |
Framework | |
Deep Learning of Determinantal Point Processes via Proper Spectral Sub-gradient
Title | Deep Learning of Determinantal Point Processes via Proper Spectral Sub-gradient |
Authors | Anonymous |
Abstract | Determinantal point processes (DPPs) is an effective tool to deliver diversity on multiple machine learning and computer vision tasks. Under deep learning framework, DPP is typically optimized via approximation, which is not straightforward and has some conflict with diversity requirement. We note, however, there has been no deep learning paradigms to optimize DPP directly since it involves matrix inversion which may result in highly computational instability. This fact greatly hinders the wide use of DPP on some specific objectives where DPP serves as a term to measure the feature diversity. In this paper, we devise a simple but effective algorithm to address this issue to optimize DPP term directly expressed with L-ensemble in spectral domain over gram matrix, which is more flexible than learning on parametric kernels. By further taking into account some geometric constraints, our algorithm seeks to generate valid sub-gradients of DPP term in case when the DPP gram matrix is not invertible (no gradients exist in this case). In this sense, our algorithm can be easily incorporated with multiple deep learning tasks. Experiments show the effectiveness of our algorithm, indicating promising performance for practical learning problems. |
Tasks | Point Processes |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkeIq2VYPr |
https://openreview.net/pdf?id=rkeIq2VYPr | |
PWC | https://paperswithcode.com/paper/deep-learning-of-determinantal-point |
Repo | |
Framework | |