April 1, 2020

3211 words 16 mins read

Paper Group NANR 59

A⋆MCTS: SEARCH WITH THEORETICAL GUARANTEE USING POLICY AND VALUE FUNCTIONS. FEW-SHOT LEARNING ON GRAPHS VIA SUPER-CLASSES BASED ON GRAPH SPECTRAL MEASURES. DEEP GRAPH SPECTRAL EVOLUTION NETWORKS FOR GRAPH TOPOLOGICAL TRANSFORMATION. Modeling Fake News in Social Networks with Deep Multi-Agent Reinforcement Learning. Long-term planning, short-term ad …

A⋆MCTS: SEARCH WITH THEORETICAL GUARANTEE USING POLICY AND VALUE FUNCTIONS


Title	A⋆MCTS: SEARCH WITH THEORETICAL GUARANTEE USING POLICY AND VALUE FUNCTIONS
Authors	Anonymous
Abstract	Combined with policy and value neural networks, Monte Carlos Tree Search (MCTS) is a critical component of the recent success of AI agents in learning to play board games like Chess and Go (Silver et al., 2017). However, the theoretical foundations of MCTS with policy and value networks remains open. Inspired by MCTS, we propose A⋆MCTS, a novel search algorithm that uses both the policy and value predictors to guide search and enjoys theoretical guarantees. Specifically, assuming that value and policy networks give reasonably accurate signals of the values of each state and action, the sample complexity (number of calls to the value network) to estimate the value of the current state, as well as the optimal one-step action to take from the current state, can be bounded. We apply our theoretical framework to different models for the noise distribution of the policy and value network as well as the distribution of rewards, and show that for these general models, the sample complexity is polynomial in D, where D is the depth of the search tree. Empirically, our method outperforms MCTS in these models.
Tasks	Board Games
Published	2020-01-01
URL	https://openreview.net/forum?id=SJloA0EYDr
PDF	https://openreview.net/pdf?id=SJloA0EYDr
PWC	https://paperswithcode.com/paper/amcts-search-with-theoretical-guarantee-using
Repo
Framework

FEW-SHOT LEARNING ON GRAPHS VIA SUPER-CLASSES BASED ON GRAPH SPECTRAL MEASURES


Title	FEW-SHOT LEARNING ON GRAPHS VIA SUPER-CLASSES BASED ON GRAPH SPECTRAL MEASURES
Authors	Anonymous
Abstract	We propose to study the problem of few-shot graph classification in graph neural networks (GNNs) to recognize unseen classes, given limited labeled graph examples. Despite several interesting GNN variants being proposed recently for node and graph classification tasks, when faced with scarce labeled examples in the few-shot setting, these GNNs exhibit significant loss in classification performance. Here, we present an approach where a probability measure is assigned to each graph based on the spectrum of the graph’s normalized Laplacian. This enables us to accordingly cluster the graph base-labels associated with each graph into super-classes, where the L^p Wasserstein distance serves as our underlying distance metric. Subsequently, a super-graph constructed based on the super-classes is then fed to our proposed GNN framework which exploits the latent inter-class relationships made explicit by the super-graph to achieve better class label separation among the graphs. We conduct exhaustive empirical evaluations of our proposed method and show that it outperforms both the adaptation of state-of-the-art graph classification methods to few-shot scenario and our naive baseline GNNs. Additionally, we also extend and study the behavior of our method to semi-supervised and active learning scenarios.
Tasks	Active Learning, Few-Shot Learning, Graph Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=Bkeeca4Kvr
PDF	https://openreview.net/pdf?id=Bkeeca4Kvr
PWC	https://paperswithcode.com/paper/few-shot-learning-on-graphs-via-super-classes
Repo
Framework

DEEP GRAPH SPECTRAL EVOLUTION NETWORKS FOR GRAPH TOPOLOGICAL TRANSFORMATION


Title	DEEP GRAPH SPECTRAL EVOLUTION NETWORKS FOR GRAPH TOPOLOGICAL TRANSFORMATION
Authors	Anonymous
Abstract	Characterizing the underlying mechanism of graph topological evolution from a source graph to a target graph has attracted fast increasing attention in the deep graph learning domain. However, there lacks expressive and efficient that can handle global and local evolution patterns between source and target graphs. On the other hand, graph topological evolution has been investigated in the graph signal processing domain historically, but it involves intensive labors to manually determine suitable prescribed spectral models and prohibitive difficulty to fit their potential combinations and compositions. To address these challenges, this paper proposes the deep Graph Spectral Evolution Network (GSEN) for modeling the graph topology evolution problem by the composition of newly-developed generalized graph kernels. GSEN can effectively fit a wide range of existing graph kernels and their combinations and compositions with the theoretical guarantee and experimental verification. GSEN has outstanding efficiency in terms of time complexity ($O(n)$) and parameter complexity ($O(1)$), where $n$ is the number of nodes of the graph. Extensive experiments on multiple synthetic and real-world datasets have demonstrated outstanding performance.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rkxgHerKvH
PDF	https://openreview.net/pdf?id=rkxgHerKvH
PWC	https://paperswithcode.com/paper/deep-graph-spectral-evolution-networks-for
Repo
Framework


Title	Modeling Fake News in Social Networks with Deep Multi-Agent Reinforcement Learning
Authors	Anonymous
Abstract	We develop a practical and flexible computational model of fake news on social networks in which agents act according to learned best response functions. We achieve this by extending an information aggregation game to allow for fake news and by representing agents as recurrent deep Q-networks (DQN) trained by independent Q-learning. In the game, agents repeatedly guess whether a claim is true or false taking into account an informative private signal and observations of actions of their neighbors on the social network in the previous period. We incorporate fake news into the model by adding an adversarial agent, the attacker, that either provides biased private signals to or takes over a subset of agents. The attacker can follow either a hand-tuned or trained policy. Our model allows us to tackle questions that are analytically intractable in fully rational models, while ensuring that agents follow reasonable best response functions. Our results highlight the importance of awareness, privacy and social connectivity in curbing the adverse effects of fake news.
Tasks	Multi-agent Reinforcement Learning, Q-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=B1xhpa4FvS
PDF	https://openreview.net/pdf?id=B1xhpa4FvS
PWC	https://paperswithcode.com/paper/modeling-fake-news-in-social-networks-with
Repo
Framework

Long-term planning, short-term adjustments


Title	Long-term planning, short-term adjustments
Authors	Anonymous
Abstract	Deep Reinforcement Learning (RL) algorithms can learn complex policies to optimize agent operation over time. RL algorithms have shown promising results in solving complicated problems in recent years. However, their application on real-world physical systems remains limited. Despite the advancements in RL algorithms, the industries often prefer traditional control strategies. Traditional methods are simple, computationally efficient and easy to adjust. In this paper, we propose a new Q-learning algorithm for continuous action space, which can bridge the control and RL algorithms and bring us the best of both worlds. Our method can learn complex policies to achieve long-term goals and at the same time it can be easily adjusted to address short-term requirements without retraining. We achieve this by modeling both short-term and long-term prediction models. The short-term prediction model represents the estimation of the system dynamic while the long-term prediction model represents the Q-value. The case studies demonstrate that our proposed method can achieve short-term and long-term goals without complex reward functions.
Tasks	Q-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=SJlbyCNtPr
PDF	https://openreview.net/pdf?id=SJlbyCNtPr
PWC	https://paperswithcode.com/paper/long-term-planning-short-term-adjustments
Repo
Framework

Learning De-biased Representations with Biased Representations


Title	Learning De-biased Representations with Biased Representations
Authors	Anonymous
Abstract	Many machine learning algorithms are trained and evaluated by splitting data from a single source into training and test sets. While such focus on in-distribution learning scenarios has led interesting advances, it has not been able to tell if models are relying on dataset biases as shortcuts for successful prediction (e.g., using snow cues for recognising snowmobiles). Such biased models fail to generalise when the bias shifts to a different class. The cross-bias generalisation problem has been addressed by de-biasing training data through augmentation or re-sampling, which are often prohibitive due to the data collection cost (e.g., collecting images of snowmobile on a desert) and the difficulty of quantifying or expressing biases in the first place. In this work, we propose a novel framework to train a de-biased representation by encouraging it to be different from a set of representations that are biased by design. This tactic is feasible in many scenarios where it is much easier to define a set of biased representations than to define and quantify bias. Our experiments and analyses show that our method discourages models from taking bias shortcuts, resulting in improved performances on de-biased test data.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJeHwJSYvH
PDF	https://openreview.net/pdf?id=SJeHwJSYvH
PWC	https://paperswithcode.com/paper/learning-de-biased-representations-with
Repo
Framework

Reinforcement Learning with Structured Hierarchical Grammar Representations of Actions


Title	Reinforcement Learning with Structured Hierarchical Grammar Representations of Actions
Authors	Anonymous
Abstract	From a young age humans learn to use grammatical principles to hierarchically combine words into sentences. Action grammars is the parallel idea; that there is an underlying set of rules (a “grammar”) that govern how we hierarchically combine actions to form new, more complex actions. We introduce the Action Grammar Reinforcement Learning (AG-RL) framework which leverages the concept of action grammars to consistently improve the sample efficiency of Reinforcement Learning agents. AG-RL works by using a grammar inference algorithm to infer the “action grammar” of an agent midway through training, leading to a higher-level action representation. The agent’s action space is then augmented with macro-actions identified by the grammar. We apply this framework to Double Deep Q-Learning (AG-DDQN) and a discrete action version of Soft Actor-Critic (AG-SAC) and find that it improves performance in 8 out of 8 tested Atari games (median +31%, max +668%) and 19 out of 20 tested Atari games (median +96%, maximum +3,756%) respectively without substantive hyperparameter tuning. We also show that AG-SAC beats the model-free state-of-the-art for sample efficiency in 17 out of the 20 tested Atari games (median +62%, maximum +13,140%), again without substantive hyperparameter tuning.
Tasks	Atari Games, Q-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=SJxDKerKDS
PDF	https://openreview.net/pdf?id=SJxDKerKDS
PWC	https://paperswithcode.com/paper/reinforcement-learning-with-structured
Repo
Framework

Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning


Title	Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning
Authors	Anonymous
Abstract	In state of the art model-free off-policy deep reinforcement learning (RL), a replay memory is used to store past experience and derive all network updates. Even if both state and action spaces are continuous, the replay memory only holds a finite number of transitions. We represent these transitions in a data graph and link its structure to soft divergence. By selecting a subgraph with a favorable structure, we construct a simple Markov Decision Process (MDP) for which exact Q-values can be computed efficiently as more data comes in - resulting in a Qgraph. We show that the Q-value for each transition in the simplified MDP is a lower bound of the Q-value for the same transition in the original continuous Q-learning problem. By using these lower bounds in TD learning, our method is less prone to soft divergence and exhibits increased sample efficiency while being more robust to hyperparameters. Qgraphs also retain information from transitions that have already been overwritten in the replay memory, which can decrease the algorithm’s sensitivity to the replay memory capacity.
Tasks	Q-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HygTUxHKwH
PDF	https://openreview.net/pdf?id=HygTUxHKwH
PWC	https://paperswithcode.com/paper/qgraph-bounded-q-learning-stabilizing-model
Repo
Framework

Effect of top-down connections in Hierarchical Sparse Coding


Title	Effect of top-down connections in Hierarchical Sparse Coding
Authors	Anonymous
Abstract	Hierarchical Sparse Coding (HSC) is a powerful model to efficiently represent multi-dimensional, structured data such as images. The simplest solution to solve this computationally hard problem is to decompose it into independent layerwise subproblems. However, neuroscientific evidence would suggest inter-connecting these subproblems as in the Predictive Coding (PC) theory, which adds top-down connections between consecutive layers. In this study, a new model called Sparse Deep Predictive Coding (SDPC) is introduced to assess the impact of this inter-layer feedback connection. In particular, the SDPC is compared with a Hierarchical Lasso (Hi-La) network made out of a sequence of Lasso layers. A 2-layered SDPC and a Hi-La networks are trained on 3 different databases and with different sparsity parameters on each layer. First, we show that the overall prediction error generated by SDPC is lower thanks to the feedback mechanism as it transfers prediction error between layers. Second, we demonstrate that the inference stage of the SDPC is faster to converge than for the Hi-La model. Third, we show that the SDPC also accelerates the learning process. Finally, the qualitative analysis of both models dictionaries, supported by their activation probability, show that the SDPC features are more generic and informative.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJlcLaVFvB
PDF	https://openreview.net/pdf?id=rJlcLaVFvB
PWC	https://paperswithcode.com/paper/effect-of-top-down-connections-in
Repo
Framework

Neural Arithmetic Unit by reusing many small pre-trained networks


Title	Neural Arithmetic Unit by reusing many small pre-trained networks
Authors	Anonymous
Abstract	We propose a solution for evaluation of mathematical expression. However, instead of designing a single end-to-end model we propose a Lego bricks style architecture. In this architecture instead of training a complex end-to-end neural network, many small networks can be trained independently each accomplishing one specific operation and acting a single lego brick. More difficult or complex task can then be solved using a combination of these smaller network. In this work we first identify 8 fundamental operations that are commonly used to solve arithmetic operations (such as 1 digit multiplication, addition, subtraction, sign calculator etc). These fundamental operations are then learned using simple feed forward neural networks. We then shows that different operations can be designed simply by reusing these smaller networks. As an example we reuse these smaller networks to develop larger and a more complex network to solve n-digit multiplication, n-digit division, and cross product. This bottom-up strategy not only introduces reusability, we also show that it allows to generalize for computations involving n-digits and we show results for up to 7 digit numbers. Unlike existing methods, our solution also generalizes for both positive as well as negative numbers.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HyerxgHYvH
PDF	https://openreview.net/pdf?id=HyerxgHYvH
PWC	https://paperswithcode.com/paper/neural-arithmetic-unit-by-reusing-many-small
Repo
Framework

Function Feature Learning of Neural Networks


Title	Function Feature Learning of Neural Networks
Authors	Anonymous
Abstract	We present a Function Feature Learning (FFL) method that can measure the similarity of non-convex neural networks. The function feature representation provides crucial insights into the understanding of the relations between different local solutions of identical neural networks. Unlike existing methods that use neuron activation vectors over a given dataset as neural network representation, FFL aligns weights of neural networks and projects them into a common function feature space by introducing a chain alignment rule. We investigate the function feature representation on Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN), finding that identical neural networks trained with different random initializations on different learning tasks by the Stochastic Gradient Descent (SGD) algorithm can be projected into different fixed points. This finding demonstrates the strong connection between different local solutions of identical neural networks and the equivalence of projected local solutions. With FFL, we also find that the semantics are often presented in a bottom-up way. Besides, FFL provides more insights into the structure of local solutions. Experiments on CIFAR-100, NameData, and tiny ImageNet datasets validate the effectiveness of the proposed method.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJgCOySYwH
PDF	https://openreview.net/pdf?id=rJgCOySYwH
PWC	https://paperswithcode.com/paper/function-feature-learning-of-neural-networks
Repo
Framework

DeepSFM: Structure From Motion Via Deep Bundle Adjustment


Title	DeepSFM: Structure From Motion Via Deep Bundle Adjustment
Authors	Anonymous
Abstract	Structure from motion (SfM) is an essential computer vision problem which has not been well handled by deep learning. One of the promising trends is to apply explicit structural constraint, e.g. 3D cost volume, into the network. In this work, we design a physical driven architecture, namely DeepSFM, inspired by traditional Bundle Adjustment (BA), which consists of two cost volume based architectures for depth and pose estimation respectively, iteratively running to improve both. In each cost volume, we encode not only photo-metric consistency across multiple input images, but also geometric consistency to ensure that depths from multiple views agree with each other. The explicit constraints on both depth (structure) and pose (motion), when combined with the learning components, bring the merit from both traditional BA and emerging deep learning technology. Extensive experiments on various datasets show that our model achieves the state-of-the-art performance on both depth and pose estimation with superior robustness against less number of inputs and the noise in initialization.
Tasks	Pose Estimation
Published	2020-01-01
URL	https://openreview.net/forum?id=SyeD0RVtvS
PDF	https://openreview.net/pdf?id=SyeD0RVtvS
PWC	https://paperswithcode.com/paper/deepsfm-structure-from-motion-via-deep-bundle
Repo
Framework

Adapting Behaviour for Learning Progress


Title	Adapting Behaviour for Learning Progress
Authors	Anonymous
Abstract	Determining what experience to generate to best facilitate learning (i.e. exploration) is one of the distinguishing features and open challenges in reinforcement learning. The advent of distributed agents that interact with parallel instances of the environment has enabled larger scale and greater flexibility, but has not removed the need to tune or tailor exploration to the task, because the ideal data for the learning algorithm necessarily depends on its process of learning. We propose to dynamically adapt the data generation by using a non-stationary multi-armed bandit to optimize a proxy of the learning progress. The data distribution is controlled via modulating multiple parameters of the policy (such as stochasticity, consistency or optimism) without significant overhead. The adaptation speed of the bandit can be increased by exploiting the factored modulation structure. We demonstrate on a suite of Atari 2600 games how this unified approach produces results comparable to per-task tuning at a fraction of the cost.
Tasks	Atari Games
Published	2020-01-01
URL	https://openreview.net/forum?id=B1gXWCVtvr
PDF	https://openreview.net/pdf?id=B1gXWCVtvr
PWC	https://paperswithcode.com/paper/adapting-behaviour-for-learning-progress
Repo
Framework

SloMo: Improving Communication-Efficient Distributed SGD with Slow Momentum


Title	SloMo: Improving Communication-Efficient Distributed SGD with Slow Momentum
Authors	Anonymous
Abstract	Distributed optimization is essential for training large models on large datasets. Multiple approaches have been proposed to reduce the communication overhead in distributed training, such as synchronizing only after performing multiple local SGD steps, and decentralized methods (e.g., using gossip algorithms) to decouple communications among workers. Although these methods run faster than AllReduce-based methods, which use blocking communication before every update, the resulting models may be less accurate after the same number of updates. Inspired by the BMUF method of Chen & Huo (2016), we propose a slow momentum (SloMo) framework, where workers periodically synchronize and perform a momentum update, after multiple iterations of a base optimization algorithm. Experiments on image classification and machine translation tasks demonstrate that SloMo consistently yields improvements in optimization and generalization performance relative to the base optimizer, even when the additional overhead is amortized over many updates so that the SloMo runtime is on par with that of the base optimizer. We provide theoretical convergence guarantees showing that SloMo converges to a stationary point of smooth non-convex losses. Since BMUF is a particular instance of the SloMo framework, our results also correspond to the first theoretical convergence guarantees for BMUF.
Tasks	Distributed Optimization, Image Classification, Machine Translation
Published	2020-01-01
URL	https://openreview.net/forum?id=SkxJ8REYPH
PDF	https://openreview.net/pdf?id=SkxJ8REYPH
PWC	https://paperswithcode.com/paper/slomo-improving-communication-efficient
Repo
Framework

Dynamic Scale Inference by Entropy Minimization


Title	Dynamic Scale Inference by Entropy Minimization
Authors	Anonymous
Abstract	Given the variety of the visual world there is not one true scale for recognition: objects may appear at drastically different sizes across the visual field. Rather than enumerate variations across filter channels or pyramid levels, dynamic models locally predict scale and adapt receptive fields accordingly. The degree of variation and diversity of inputs makes this a difficult task. Existing methods either learn a feedforward predictor, which is not itself totally immune to the scale variation it is meant to counter, or select scales by a fixed algorithm, which cannot learn from the given task and data. We extend dynamic scale inference from feedforward prediction to iterative optimization for further adaptivity. We propose a novel entropy minimization objective for inference and optimize over task and structure parameters to tune the model to each input. Optimization during inference improves semantic segmentation accuracy and generalizes better to extreme scale variations that cause feedforward dynamic inference to falter.
Tasks	Semantic Segmentation
Published	2020-01-01
URL	https://openreview.net/forum?id=SyxiRJStwr
PDF	https://openreview.net/pdf?id=SyxiRJStwr
PWC	https://paperswithcode.com/paper/dynamic-scale-inference-by-entropy-1
Repo
Framework