April 1, 2020

3211 words 16 mins read

Paper Group NANR 59

Paper Group NANR 59

A⋆MCTS: SEARCH WITH THEORETICAL GUARANTEE USING POLICY AND VALUE FUNCTIONS. FEW-SHOT LEARNING ON GRAPHS VIA SUPER-CLASSES BASED ON GRAPH SPECTRAL MEASURES. DEEP GRAPH SPECTRAL EVOLUTION NETWORKS FOR GRAPH TOPOLOGICAL TRANSFORMATION. Modeling Fake News in Social Networks with Deep Multi-Agent Reinforcement Learning. Long-term planning, short-term ad …

A⋆MCTS: SEARCH WITH THEORETICAL GUARANTEE USING POLICY AND VALUE FUNCTIONS

Title A⋆MCTS: SEARCH WITH THEORETICAL GUARANTEE USING POLICY AND VALUE FUNCTIONS
Authors Anonymous
Abstract Combined with policy and value neural networks, Monte Carlos Tree Search (MCTS) is a critical component of the recent success of AI agents in learning to play board games like Chess and Go (Silver et al., 2017). However, the theoretical foundations of MCTS with policy and value networks remains open. Inspired by MCTS, we propose A⋆MCTS, a novel search algorithm that uses both the policy and value predictors to guide search and enjoys theoretical guarantees. Specifically, assuming that value and policy networks give reasonably accurate signals of the values of each state and action, the sample complexity (number of calls to the value network) to estimate the value of the current state, as well as the optimal one-step action to take from the current state, can be bounded. We apply our theoretical framework to different models for the noise distribution of the policy and value network as well as the distribution of rewards, and show that for these general models, the sample complexity is polynomial in D, where D is the depth of the search tree. Empirically, our method outperforms MCTS in these models.
Tasks Board Games
Published 2020-01-01
URL https://openreview.net/forum?id=SJloA0EYDr
PDF https://openreview.net/pdf?id=SJloA0EYDr
PWC https://paperswithcode.com/paper/amcts-search-with-theoretical-guarantee-using
Repo
Framework

FEW-SHOT LEARNING ON GRAPHS VIA SUPER-CLASSES BASED ON GRAPH SPECTRAL MEASURES

Title FEW-SHOT LEARNING ON GRAPHS VIA SUPER-CLASSES BASED ON GRAPH SPECTRAL MEASURES
Authors Anonymous
Abstract We propose to study the problem of few-shot graph classification in graph neural networks (GNNs) to recognize unseen classes, given limited labeled graph examples. Despite several interesting GNN variants being proposed recently for node and graph classification tasks, when faced with scarce labeled examples in the few-shot setting, these GNNs exhibit significant loss in classification performance. Here, we present an approach where a probability measure is assigned to each graph based on the spectrum of the graph’s normalized Laplacian. This enables us to accordingly cluster the graph base-labels associated with each graph into super-classes, where the L^p Wasserstein distance serves as our underlying distance metric. Subsequently, a super-graph constructed based on the super-classes is then fed to our proposed GNN framework which exploits the latent inter-class relationships made explicit by the super-graph to achieve better class label separation among the graphs. We conduct exhaustive empirical evaluations of our proposed method and show that it outperforms both the adaptation of state-of-the-art graph classification methods to few-shot scenario and our naive baseline GNNs. Additionally, we also extend and study the behavior of our method to semi-supervised and active learning scenarios.
Tasks Active Learning, Few-Shot Learning, Graph Classification
Published 2020-01-01
URL https://openreview.net/forum?id=Bkeeca4Kvr
PDF https://openreview.net/pdf?id=Bkeeca4Kvr
PWC https://paperswithcode.com/paper/few-shot-learning-on-graphs-via-super-classes
Repo
Framework

DEEP GRAPH SPECTRAL EVOLUTION NETWORKS FOR GRAPH TOPOLOGICAL TRANSFORMATION

Title DEEP GRAPH SPECTRAL EVOLUTION NETWORKS FOR GRAPH TOPOLOGICAL TRANSFORMATION
Authors Anonymous
Abstract Characterizing the underlying mechanism of graph topological evolution from a source graph to a target graph has attracted fast increasing attention in the deep graph learning domain. However, there lacks expressive and efficient that can handle global and local evolution patterns between source and target graphs. On the other hand, graph topological evolution has been investigated in the graph signal processing domain historically, but it involves intensive labors to manually determine suitable prescribed spectral models and prohibitive difficulty to fit their potential combinations and compositions. To address these challenges, this paper proposes the deep Graph Spectral Evolution Network (GSEN) for modeling the graph topology evolution problem by the composition of newly-developed generalized graph kernels. GSEN can effectively fit a wide range of existing graph kernels and their combinations and compositions with the theoretical guarantee and experimental verification. GSEN has outstanding efficiency in terms of time complexity ($O(n)$) and parameter complexity ($O(1)$), where $n$ is the number of nodes of the graph. Extensive experiments on multiple synthetic and real-world datasets have demonstrated outstanding performance.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rkxgHerKvH
PDF https://openreview.net/pdf?id=rkxgHerKvH
PWC https://paperswithcode.com/paper/deep-graph-spectral-evolution-networks-for
Repo
Framework

Modeling Fake News in Social Networks with Deep Multi-Agent Reinforcement Learning

Title Modeling Fake News in Social Networks with Deep Multi-Agent Reinforcement Learning
Authors Anonymous
Abstract We develop a practical and flexible computational model of fake news on social networks in which agents act according to learned best response functions. We achieve this by extending an information aggregation game to allow for fake news and by representing agents as recurrent deep Q-networks (DQN) trained by independent Q-learning. In the game, agents repeatedly guess whether a claim is true or false taking into account an informative private signal and observations of actions of their neighbors on the social network in the previous period. We incorporate fake news into the model by adding an adversarial agent, the attacker, that either provides biased private signals to or takes over a subset of agents. The attacker can follow either a hand-tuned or trained policy. Our model allows us to tackle questions that are analytically intractable in fully rational models, while ensuring that agents follow reasonable best response functions. Our results highlight the importance of awareness, privacy and social connectivity in curbing the adverse effects of fake news.
Tasks Multi-agent Reinforcement Learning, Q-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=B1xhpa4FvS
PDF https://openreview.net/pdf?id=B1xhpa4FvS
PWC https://paperswithcode.com/paper/modeling-fake-news-in-social-networks-with
Repo
Framework

Long-term planning, short-term adjustments

Title Long-term planning, short-term adjustments
Authors Anonymous
Abstract Deep Reinforcement Learning (RL) algorithms can learn complex policies to optimize agent operation over time. RL algorithms have shown promising results in solving complicated problems in recent years. However, their application on real-world physical systems remains limited. Despite the advancements in RL algorithms, the industries often prefer traditional control strategies. Traditional methods are simple, computationally efficient and easy to adjust. In this paper, we propose a new Q-learning algorithm for continuous action space, which can bridge the control and RL algorithms and bring us the best of both worlds. Our method can learn complex policies to achieve long-term goals and at the same time it can be easily adjusted to address short-term requirements without retraining. We achieve this by modeling both short-term and long-term prediction models. The short-term prediction model represents the estimation of the system dynamic while the long-term prediction model represents the Q-value. The case studies demonstrate that our proposed method can achieve short-term and long-term goals without complex reward functions.
Tasks Q-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=SJlbyCNtPr
PDF https://openreview.net/pdf?id=SJlbyCNtPr
PWC https://paperswithcode.com/paper/long-term-planning-short-term-adjustments
Repo
Framework

Learning De-biased Representations with Biased Representations

Title Learning De-biased Representations with Biased Representations
Authors Anonymous
Abstract Many machine learning algorithms are trained and evaluated by splitting data from a single source into training and test sets. While such focus on in-distribution learning scenarios has led interesting advances, it has not been able to tell if models are relying on dataset biases as shortcuts for successful prediction (e.g., using snow cues for recognising snowmobiles). Such biased models fail to generalise when the bias shifts to a different class. The cross-bias generalisation problem has been addressed by de-biasing training data through augmentation or re-sampling, which are often prohibitive due to the data collection cost (e.g., collecting images of snowmobile on a desert) and the difficulty of quantifying or expressing biases in the first place. In this work, we propose a novel framework to train a de-biased representation by encouraging it to be different from a set of representations that are biased by design. This tactic is feasible in many scenarios where it is much easier to define a set of biased representations than to define and quantify bias. Our experiments and analyses show that our method discourages models from taking bias shortcuts, resulting in improved performances on de-biased test data.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SJeHwJSYvH
PDF https://openreview.net/pdf?id=SJeHwJSYvH
PWC https://paperswithcode.com/paper/learning-de-biased-representations-with
Repo
Framework

Reinforcement Learning with Structured Hierarchical Grammar Representations of Actions

Title Reinforcement Learning with Structured Hierarchical Grammar Representations of Actions
Authors Anonymous
Abstract From a young age humans learn to use grammatical principles to hierarchically combine words into sentences. Action grammars is the parallel idea; that there is an underlying set of rules (a “grammar”) that govern how we hierarchically combine actions to form new, more complex actions. We introduce the Action Grammar Reinforcement Learning (AG-RL) framework which leverages the concept of action grammars to consistently improve the sample efficiency of Reinforcement Learning agents. AG-RL works by using a grammar inference algorithm to infer the “action grammar” of an agent midway through training, leading to a higher-level action representation. The agent’s action space is then augmented with macro-actions identified by the grammar. We apply this framework to Double Deep Q-Learning (AG-DDQN) and a discrete action version of Soft Actor-Critic (AG-SAC) and find that it improves performance in 8 out of 8 tested Atari games (median +31%, max +668%) and 19 out of 20 tested Atari games (median +96%, maximum +3,756%) respectively without substantive hyperparameter tuning. We also show that AG-SAC beats the model-free state-of-the-art for sample efficiency in 17 out of the 20 tested Atari games (median +62%, maximum +13,140%), again without substantive hyperparameter tuning.
Tasks Atari Games, Q-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=SJxDKerKDS
PDF https://openreview.net/pdf?id=SJxDKerKDS
PWC https://paperswithcode.com/paper/reinforcement-learning-with-structured
Repo
Framework

Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning

Title Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning
Authors Anonymous
Abstract In state of the art model-free off-policy deep reinforcement learning (RL), a replay memory is used to store past experience and derive all network updates. Even if both state and action spaces are continuous, the replay memory only holds a finite number of transitions. We represent these transitions in a data graph and link its structure to soft divergence. By selecting a subgraph with a favorable structure, we construct a simple Markov Decision Process (MDP) for which exact Q-values can be computed efficiently as more data comes in - resulting in a Qgraph. We show that the Q-value for each transition in the simplified MDP is a lower bound of the Q-value for the same transition in the original continuous Q-learning problem. By using these lower bounds in TD learning, our method is less prone to soft divergence and exhibits increased sample efficiency while being more robust to hyperparameters. Qgraphs also retain information from transitions that have already been overwritten in the replay memory, which can decrease the algorithm’s sensitivity to the replay memory capacity.
Tasks Q-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=HygTUxHKwH
PDF https://openreview.net/pdf?id=HygTUxHKwH
PWC https://paperswithcode.com/paper/qgraph-bounded-q-learning-stabilizing-model
Repo
Framework

Effect of top-down connections in Hierarchical Sparse Coding

Title Effect of top-down connections in Hierarchical Sparse Coding
Authors Anonymous
Abstract Hierarchical Sparse Coding (HSC) is a powerful model to efficiently represent multi-dimensional, structured data such as images. The simplest solution to solve this computationally hard problem is to decompose it into independent layerwise subproblems. However, neuroscientific evidence would suggest inter-connecting these subproblems as in the Predictive Coding (PC) theory, which adds top-down connections between consecutive layers. In this study, a new model called Sparse Deep Predictive Coding (SDPC) is introduced to assess the impact of this inter-layer feedback connection. In particular, the SDPC is compared with a Hierarchical Lasso (Hi-La) network made out of a sequence of Lasso layers. A 2-layered SDPC and a Hi-La networks are trained on 3 different databases and with different sparsity parameters on each layer. First, we show that the overall prediction error generated by SDPC is lower thanks to the feedback mechanism as it transfers prediction error between layers. Second, we demonstrate that the inference stage of the SDPC is faster to converge than for the Hi-La model. Third, we show that the SDPC also accelerates the learning process. Finally, the qualitative analysis of both models dictionaries, supported by their activation probability, show that the SDPC features are more generic and informative.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rJlcLaVFvB
PDF https://openreview.net/pdf?id=rJlcLaVFvB
PWC https://paperswithcode.com/paper/effect-of-top-down-connections-in
Repo
Framework

Neural Arithmetic Unit by reusing many small pre-trained networks

Title Neural Arithmetic Unit by reusing many small pre-trained networks
Authors Anonymous
Abstract We propose a solution for evaluation of mathematical expression. However, instead of designing a single end-to-end model we propose a Lego bricks style architecture. In this architecture instead of training a complex end-to-end neural network, many small networks can be trained independently each accomplishing one specific operation and acting a single lego brick. More difficult or complex task can then be solved using a combination of these smaller network. In this work we first identify 8 fundamental operations that are commonly used to solve arithmetic operations (such as 1 digit multiplication, addition, subtraction, sign calculator etc). These fundamental operations are then learned using simple feed forward neural networks. We then shows that different operations can be designed simply by reusing these smaller networks. As an example we reuse these smaller networks to develop larger and a more complex network to solve n-digit multiplication, n-digit division, and cross product. This bottom-up strategy not only introduces reusability, we also show that it allows to generalize for computations involving n-digits and we show results for up to 7 digit numbers. Unlike existing methods, our solution also generalizes for both positive as well as negative numbers.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HyerxgHYvH
PDF https://openreview.net/pdf?id=HyerxgHYvH
PWC https://paperswithcode.com/paper/neural-arithmetic-unit-by-reusing-many-small
Repo
Framework

Function Feature Learning of Neural Networks

Title Function Feature Learning of Neural Networks
Authors Anonymous
Abstract We present a Function Feature Learning (FFL) method that can measure the similarity of non-convex neural networks. The function feature representation provides crucial insights into the understanding of the relations between different local solutions of identical neural networks. Unlike existing methods that use neuron activation vectors over a given dataset as neural network representation, FFL aligns weights of neural networks and projects them into a common function feature space by introducing a chain alignment rule. We investigate the function feature representation on Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN), finding that identical neural networks trained with different random initializations on different learning tasks by the Stochastic Gradient Descent (SGD) algorithm can be projected into different fixed points. This finding demonstrates the strong connection between different local solutions of identical neural networks and the equivalence of projected local solutions. With FFL, we also find that the semantics are often presented in a bottom-up way. Besides, FFL provides more insights into the structure of local solutions. Experiments on CIFAR-100, NameData, and tiny ImageNet datasets validate the effectiveness of the proposed method.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rJgCOySYwH
PDF https://openreview.net/pdf?id=rJgCOySYwH
PWC https://paperswithcode.com/paper/function-feature-learning-of-neural-networks
Repo
Framework

DeepSFM: Structure From Motion Via Deep Bundle Adjustment

Title DeepSFM: Structure From Motion Via Deep Bundle Adjustment
Authors Anonymous
Abstract Structure from motion (SfM) is an essential computer vision problem which has not been well handled by deep learning. One of the promising trends is to apply explicit structural constraint, e.g. 3D cost volume, into the network. In this work, we design a physical driven architecture, namely DeepSFM, inspired by traditional Bundle Adjustment (BA), which consists of two cost volume based architectures for depth and pose estimation respectively, iteratively running to improve both. In each cost volume, we encode not only photo-metric consistency across multiple input images, but also geometric consistency to ensure that depths from multiple views agree with each other. The explicit constraints on both depth (structure) and pose (motion), when combined with the learning components, bring the merit from both traditional BA and emerging deep learning technology. Extensive experiments on various datasets show that our model achieves the state-of-the-art performance on both depth and pose estimation with superior robustness against less number of inputs and the noise in initialization.
Tasks Pose Estimation
Published 2020-01-01
URL https://openreview.net/forum?id=SyeD0RVtvS
PDF https://openreview.net/pdf?id=SyeD0RVtvS
PWC https://paperswithcode.com/paper/deepsfm-structure-from-motion-via-deep-bundle
Repo
Framework

Adapting Behaviour for Learning Progress

Title Adapting Behaviour for Learning Progress
Authors Anonymous
Abstract Determining what experience to generate to best facilitate learning (i.e. exploration) is one of the distinguishing features and open challenges in reinforcement learning. The advent of distributed agents that interact with parallel instances of the environment has enabled larger scale and greater flexibility, but has not removed the need to tune or tailor exploration to the task, because the ideal data for the learning algorithm necessarily depends on its process of learning. We propose to dynamically adapt the data generation by using a non-stationary multi-armed bandit to optimize a proxy of the learning progress. The data distribution is controlled via modulating multiple parameters of the policy (such as stochasticity, consistency or optimism) without significant overhead. The adaptation speed of the bandit can be increased by exploiting the factored modulation structure. We demonstrate on a suite of Atari 2600 games how this unified approach produces results comparable to per-task tuning at a fraction of the cost.
Tasks Atari Games
Published 2020-01-01
URL https://openreview.net/forum?id=B1gXWCVtvr
PDF https://openreview.net/pdf?id=B1gXWCVtvr
PWC https://paperswithcode.com/paper/adapting-behaviour-for-learning-progress
Repo
Framework

SloMo: Improving Communication-Efficient Distributed SGD with Slow Momentum

Title SloMo: Improving Communication-Efficient Distributed SGD with Slow Momentum
Authors Anonymous
Abstract Distributed optimization is essential for training large models on large datasets. Multiple approaches have been proposed to reduce the communication overhead in distributed training, such as synchronizing only after performing multiple local SGD steps, and decentralized methods (e.g., using gossip algorithms) to decouple communications among workers. Although these methods run faster than AllReduce-based methods, which use blocking communication before every update, the resulting models may be less accurate after the same number of updates. Inspired by the BMUF method of Chen & Huo (2016), we propose a slow momentum (SloMo) framework, where workers periodically synchronize and perform a momentum update, after multiple iterations of a base optimization algorithm. Experiments on image classification and machine translation tasks demonstrate that SloMo consistently yields improvements in optimization and generalization performance relative to the base optimizer, even when the additional overhead is amortized over many updates so that the SloMo runtime is on par with that of the base optimizer. We provide theoretical convergence guarantees showing that SloMo converges to a stationary point of smooth non-convex losses. Since BMUF is a particular instance of the SloMo framework, our results also correspond to the first theoretical convergence guarantees for BMUF.
Tasks Distributed Optimization, Image Classification, Machine Translation
Published 2020-01-01
URL https://openreview.net/forum?id=SkxJ8REYPH
PDF https://openreview.net/pdf?id=SkxJ8REYPH
PWC https://paperswithcode.com/paper/slomo-improving-communication-efficient
Repo
Framework

Dynamic Scale Inference by Entropy Minimization

Title Dynamic Scale Inference by Entropy Minimization
Authors Anonymous
Abstract Given the variety of the visual world there is not one true scale for recognition: objects may appear at drastically different sizes across the visual field. Rather than enumerate variations across filter channels or pyramid levels, dynamic models locally predict scale and adapt receptive fields accordingly. The degree of variation and diversity of inputs makes this a difficult task. Existing methods either learn a feedforward predictor, which is not itself totally immune to the scale variation it is meant to counter, or select scales by a fixed algorithm, which cannot learn from the given task and data. We extend dynamic scale inference from feedforward prediction to iterative optimization for further adaptivity. We propose a novel entropy minimization objective for inference and optimize over task and structure parameters to tune the model to each input. Optimization during inference improves semantic segmentation accuracy and generalizes better to extreme scale variations that cause feedforward dynamic inference to falter.
Tasks Semantic Segmentation
Published 2020-01-01
URL https://openreview.net/forum?id=SyxiRJStwr
PDF https://openreview.net/pdf?id=SyxiRJStwr
PWC https://paperswithcode.com/paper/dynamic-scale-inference-by-entropy-1
Repo
Framework
comments powered by Disqus